It's astonishing how many human problems exist in neatly solvable form in the computer science world--if you just could know enough about computer science to see it! And that is the profound gift of these authors: they convey ideas across domains in readable, relatable everyday language.
See for example how the authors explain "sorting" and "caching" problems by using analogies of bookshelves, libraries--even socks. The readers can then easily see where to apply the right sorting/caching algorithm in their own daily lives (I won't give away any spoilers here, but just remember the acronym LRU).
Scheduling, task-ordering and task-switching problems form yet another category of challenges in both real life and in computer science. Once again, thanks to their knack for an apt metaphor or real-world analogy, the authors explain cross-domain ideas beautifully, helping us see new ways to solve age-old problems. Here's one of my favorite examples: have you ever had that feeling that you've got too much going on, too many irons in the fire, to the point where you don't know what to do or where to start? There's actually a word for this[1] in computer science--and you'll learn what to do to fix it.
I should also mention: this is the first book I've come across that explains Bayesian statistics in a way that a layperson would find comprehensible--and the authors go further still, explaining various ways you can apply Bayesian thinking to make surprisingly accurate inferences about the world. And they do even more to assist the reader: repeating concepts and grouping previous ideas together at helpful points in the book, even using an occasional witty callback to bring back an idea, both to entertain and aid the reader's memory. Well done guys! It's difficult to overstate what an unusual gift this is.
A book like this makes you appreciate how computer scientists--at least the really good ones--must have genuine expertise across multiple domains. And of course: the best insights always seem to arrive when you translate an idea from one domain into another.
[A quick affiliate link to Amazon for those readers who would like to support my work here: if you purchase your Amazon products via any affiliate link from this site, or from my sister site Casual Kitchen, I will receive a small affiliate commission at no extra cost to you. Thank you!]
Footnote:
[1] The word is "thrashing": when a computer gets stuck in a loop of organizing and ordering tasks rather than actually performing those tasks. The concept analogizes perfectly to human experience.
[Readers, please don't read any further, unless you really, really have nothing better to do. What follows are my notes and reactions to the text: they are there to help me order and remember what I read. Life is short!]
Notes:
Introduction: Algorithms to Live By
1ff An example of finding an apartment in a major city like San Francisco where there's no time for "fact-finding and deliberation": you have a few moments to take the apartment you're viewing right now... or it's gone. The authors introduce the "37% rule": essentially, if you've given yourself a month to find an apartment, give yourself 11 days to calibrate, then after that be ready to commit to anything that beats whatever you've seen so far. [The Secretary Problem, basically: the authors will use the proper name for it on page 10]; the authors call this a "provably optimal solution" and part of the class of "optimal stopping" problems. "Mathematically, at least, these are solved problems."
3 Many problems seem unique to humans, but they're not--they've been grappled with by computer science for more than half a century: problems like task switching, using limited memory resources, whether to collect more data or take action based on the data already collected; ".. there's much we can learn from how they [computers] do it."
3ff "An algorithm is just a finite sequence of steps used to solve a problem..." On the Persian mathematician al-Khwarizmi, from whose name the word algorithm comes, the earliest known mathematical algorithm comes from a Sumerian clay tablet describing a scheme for long division; on cooking bread from a recipe or knitting a sweater, in both cases you're following an algorithm.
4ff "In this book, we explore the idea of human algorithm design--searching for better solutions to the challenges people encounter every day." Addressing the criticisms that it would be "misguided" to look to computers to guide us; the authors counter this by saying that much of computer science work these days is to handle real world problems: problems with insufficient information, with time constraints, on needing to be comfortable with chance, a lack of accuracy, and approximations, problems of sorting, caching, etc.; They also cite behavioral economics, which claims we are irrational and error-prone thanks to our own brains' "buggy, idiosyncratic hardware" [note the substantial problems uncovered by various critiques of behavioral economics in recent years: see for example the work of Nassim Taleb, Ole Peterson, etc. Curious if the authors ever talk about this later in the book]; "Thinking algorithmically... helps us better understand the errors that we make." [Note in the footnotes for page 4 there's a discussion of Herbert Simon and bounded rationality and also I.J. Good and his discussion of Type 1 and Type 2 rationality, one which assumes infinite time and computational capacity to solve a problem (and where you arrive at the right answer), and the other takes into account the cost of getting that answer under limited time and resources.]
6-7 On the cross-domain nature of algorithm design for computers: a weird hybrid of math, engineering, statistics, operations research, cognitive science, psychology and economics. [Whenever you stumble into a domain that touches many other domains like this, it's typically extremely interesting.]
Chapter 1: Optimal Stopping: When to Stop Looking
9ff On the so-called "turkey drop," when high school sweethearts come home from college and dump each other; you can't really figure out how good a relationship you have when you don't have a benchmark by which to judge it; on the optimal stopping problem of applying the 37% rule with relationships. Description here of the famous Secretary Problem; supposedly this dates from a 1960 issue of Scientific American, in Martin Gardner's "beloved column on recreational mathematics." [Sometimes you can learn quite a lot about an author by paying attention to offhand parenthetical comments like these.] But likely the idea came from further back; the authors trace it back to Merrill Flood, well known in computer science, who popularized the Traveling Salesman Problem as well as the Prisoner's Dilemma, and he may have possibly coined the term software.
12 On stopping "early" versus stopping "late": leaving the best applicant undiscovered, or holding out for an even better applicant who doesn't even exist. Finding the optimal strategy between these two undesired outcomes; note that the best candidate is by definition the first one you see until you see more and more, and then the chance of it being the best goes steadily lower; see other heuristics: for example, "the third time an applicant trumps everyone seen so far," or taking a "best yet" applicant after a long streak of poor applicants; the best algorithm is the "look then leap" rule: spend a predetermined time looking and then instantly commit to anyone who outshines the best applicant you saw during the look phase.
14ff As you see more applicants the solution converges on 37% (actually 1/e or 0.367, but you can use 35-40%). Note that this algorithm actually has a 63% failure rate! But that is the failure rate of failing to get the very best candidate, the one; in this case you're going to get a very, very good candidate, possibly indistinguishable from "the one." [The authors don't discuss this, but note that in relationships you are also picking the best person who is also willing to pick you, this adds a rather large nuance to the problem.] Also the 37% will holds no matter how large the applicant pool gets; further if you're looking at a huge haystack of applicants it's all the more valuable to know a proper stopping time.
15 Hilarious quote here from Barbara Bush: "I married the first man I ever kissed. When I tell this to my children they just about throw up."
15ff On Carnegie Mellon professor of operations research Michael Trick, who was looking for love and found a woman to propose to using the 37% rule... but she turned him down. If there are odds that you'll be rejected you'll want to start making offers sooner, after 25% of your search. [Note that the authors use a rejection rate of 50%, likely an extremely aggressive assumption for computer science geeks.] Also if you can go back to people you rejected before (in other words, using conditions that don't exist in the original Secretary Problem as stated), you should see 61% of the applicants and then go back to the best one. As the authors phrase it, "look noncommittally for a time, then be ready to leap."
17ff On "no information games" versus "full information games" in computer science; for example: if you can actually measure the metric by which you to choose someone and your first applicant is a 95th percentile candidate, you would hire that person instantly because you know instantly the person is going to be far superior to the vast majority of the other candidates. Thus this would be a "threshold rule" where we immediately accept an applicant above a certain percentile. [Also meta-thoughts here on why you want to find a good informational yardstick that lets you switch from the "look then leap rule" to the "threshold rule" because the odds are far higher; or another way to say this is to play full information games and avoid no information games.]
20ff Applying the secretary problem towards selling a house: note that this is a version of the game with much fuller information; you already have a threshold (the sale price you want), and all of this is defined in a clear metric (dollars), so it's a much less amorphous problem than hiring a person; in this case you follow a threshold rule, but the metric here is how long you can afford to wait because the problem assumes if you're selling a house there's a cost to waiting (in the form of making more mortgage payments for example). In this version, if the cost of waiting is trivial we can be as choosy as we want and wait forever, but if the cost of waiting is significant then we would take anything close to or above our threshold; the point here is "our threshold depends only on the cost of search" (for a buyer who meets our threshold). [One thing I would do if I were in this domain trying to sell a house is I would try to adjust the parameters of the game so that my costs are as low as possible and I can wait as long as I want: another meta-thought might be to make sure the parameters of the game suit you or adjust your situation so that you can adjust the parameters of the game to your liking, and then play that game. Or: never play games with parameters you don't want to be subject to. In the case of housing, most people will have mortgage payments and won't be able to afford to wait too long to sell, they can't carry two houses at the same time, etc; thus before you play this "game" make sure you can don't have a mortgage payment or make sure you can carry two houses at the same time, and then you can totally afford to wait.]
23ff On parking, and various parameters like how much parking costs, the time and inconvenience of walking, the time taking up by seeking out a space; also there are other people trying to park as well [and a footnote here where the authors say they will address game theory in Chapter 11]; on the most important metric, which is the occupancy rate of parking places. A quote from author Donald Shoup, in his book The High Cost of Free Parking, and (unsurprisingly?) his "solution involves installing digital parking meters that are capable of adaptive prices that rise with demand." [Sadly, San Francisco has implemented his solution: the mother of all contrary indicators]. Note certain nonlinearities involved here: for example, when parking space occupancy goes from 90% to 95%, it doubles the length of everyone's search while accommodating only 5% more cars. On parking as another form of stopping problem, where you can use the look-then-leap rule: "pass up all vacant spots occurring more than a certain distance from the destination and then take the first space that appears thereafter."
26 Funny quote here, both directly and also unintentionally: the authors ask Shoup, the alleged "world's top expert on parking" whether his research allows him to optimize his own commute in Los Angeles. His answer? "I ride my bike." [This is funny, of course, and the authors write it to be so, but if you were to look at this through the eyes of say, Nassim Taleb and his cynicism on academics, you would see here a classic example of an alleged "world expert" who doesn't even eat his own cooking, and yet he has no problem designing policies that are adopted by municipalities on a major scale; a classic "skin in the game" problem combined with the expert problem, where "intellecshuals" are attempting to solve problems that they don't actually know anything about in reality.]
26ff On figuring out when to quit while you're ahead; citing Boris Berezsovsky who authored the first and only book entirely devoted to the secretary problem and who ironically had to flee Russia in a sort of existential stopping problem in his own life. On the burglar problem: you have a sequence of robberies you can perform, each providing some reward, but then you eventually will get caught and lose all your accumulated gains, what algorithm should you use to maximize your expected take? "The number of robberies you should carry out is roughly equal to the chance you get away, divided by the chance you get caught." Thus the amateur with a 50/50 chance should rob once, and that's it. [The authors are assuming information in a total no information game here, and situations like this where the information assumptions are unrealistic are somewhat useless to the reader.]
27ff On not giving up, ever; on situations where there is no optimal stopping rule, see for example hypothetical games like triple or nothing, where there's an expected return that never goes away. Also on avoiding certain games, not just stopping but not even starting.
28ff "Always be stopping" [I wonder if the authors are making a reference here to the movie Glengarry Glen Ross]. The authors argue that in general humans do not follow the optimal strategies, they cite research from an optimal stopping experimenter in the laboratory: people tend to leap sooner than they should, they don't gather quite enough data, and they have an urge to commit to quickly that we should control, don't take the first apartment for example; also a discussion here of endogenous time costs of searching which are usually left out of laboratory experiments; also a discussion of how life and the flow of time "turns all decision making into optimal stopping." This gets into the human condition and choosing things like the right time to open a bottle of wine or to kiss someone; since there's an "inexorable one-way march" of time, we are forced to decide based on possibilities we haven't seen and we have to embrace high rates of failure; and although we may get similar choices again, usually we never get the exact situation again, ever, thus hesitation or inaction "is just as irrevocable as action." [Note in the footnotes to page 29 there is a reference to Herbert Simon describing human choice in terms of "satisficing." I first stumbled onto this idea in Barry Schwartz's book The Paradox of Choice, and it changed permanently how I choose most things.]
Chapter 2: Explore/Exploit: The Latest vs. the Greatest
31ff On deciding between new things or sticking with our favorite things; ordering at a restaurant, choosing a restaurant, etc. "We intuitively understand that life is a balance between novelty and tradition, between the latest and the greatest, between taking risks and savoring what we know and love." In computer science this problem is called the explore/exploit tradeoff.
32 Some definitions here: "explore" simply means gathering information; "exploit" means using the information you have to get a good result.
32 Interesting thoughts on how it's actually a special kind of hell to constantly be exploring: the authors give the example of music journalist constantly listening to new music, which turns out to be a really sucky experience.
33ff On "multi-armed bandit" problems, like walking into a casino and seeing all kinds of different slot machines with all kinds of payoffs, but you don't know the odds of any of the games until you start playing. This is a metaphors/problem type for things like choosing a restaurant, or choosing a type of music to listen to; but also it refers to a way of thinking about how our goals in our explore/exploit decisions change as we age; also on the central heuristic that we shouldn't always try to choose the best thing. "A sobering property of trying new things is that the value of exploration, of finding a new favorite, can only go down over time, as the remaining opportunities to savor it dwindle... The flip side is that the value of exploitation can only go up over time... So explore when you will have time to use the resulting knowledge, exploit when you're ready to cash in. The interval makes the strategy."
35 Useful point here that you can infer things about a system based on the explore/exploit nature of it: see for example Hollywood and the fact that it makes tons of sequels; this might indicate something about it (probably that it is in its decline stage) because it's purely in an exploit-focused phase.
36ff Strategies: the win-stay, lose-shift algorithm; also on the Gittins index: "git while the Gittins's good"; note the Gittin index tables on page 40 and 41 (with Gitiin index values under a 90% discount rate and a 99% discount rate): there's a positive payoff to exploring, especially early on, but you can see how the trade-off changes if you more heavily discount future rewards. [Note also the Least Failures Rule, cited in the footnotes to page 41 (on page 275), which say to always choose the option that's failed the least number of times: first, stick with a restaurant until it fails, then choose another restaurant at random, and then after testing as many restaurants as you can, go back to the ones with the most nights of successful dining; note in pragmatic terms, the authors argue that in any big city where new restaurants are opening all the time, the least failures rule says don't ever go back to a restaurant that lets you down: "there's too much else out there."]
42ff On simpler and more flexible strategies for dealing with the multi-armed bandit problem: regret minimization; quoting management theorist Chester Barnard, "To try and fail is at least to learn; to fail to try is to suffer the inestimable loss up what might have been." On Jeff Bezos using a regret minimization framework to decide to start Amazon and leave his job at D.E. Shaw [I love this heuristic, and I use it too; but lately I've been wondering if you should also flip/invert this idea: if you realize that regret is a normal byproduct of a life properly lived with decisions made under uncertainty (some of your decisions will be good, some bad), does this mean if you have no regrets, then you've been living too "safe" a life? In other words, does having regrets simply mean you have been properly testing the frontiers of a wide range of choices and decisions over the course of your life?]
44ff Upper Confidence Bound algorithms: here you pick the option for which the top of the confidence interval is highest; you look for the one-armed bandit that could reasonably perform best in the future, not the one that has performed best; this is sort of an exploring uncertainty heuristic; or as the authors put it "optimism in the face of uncertainty"; also on A/B testing: the authors make an interesting point here if you've used the internet "then you've been a part of someone else's explore/exploit problem." Also comments here on refinements to the canonical A/B testing technique itself (splitting traffic evenly between two options, testing for a period of time, and then giving all traffic to the winner).
48ff On clinical trials for drugs: this is yet another form of A/B testing; on ideas in the clinical trial industry of doing adaptive trials; an interesting example here of ECMO ventilators [extracorporeal membrane oxygenation] used this way with infants. [Note that ECMO use on adults is another interesting domain which sent me down yet another rabbit hole. See this sobering forum discussion on ECMO survival rates, and why people call ECMO an “automatic death sentence.” Basically, if you're sick enough to be put on an ECMO device you are already in deep trouble.]
50ff Discussion of the ethics of these ECMO trials: are we withholding life-saving treatment from one side of the A/B test? [I think the authors here are missing one nuance which is you have to have a control to see the longer term results as well: if an infant lives or dies obviously that's critically important, but you also need to understand instances where the ECMO may harm patients severely, producing "survival," but at a level that isn't worth it; think of this as an ensemble probability problem, you need to gather more longitudinal information, the initial A/B test isn't enough.]
52-3 On experiments showing that people overexplore. [Note here on the airline switching experiment that the authors cite on page 53: even I can come up with an extremely compelling reason to not switch airlines that the authors didn't consider as part of this decision: if you really think about what you're doing when you fly (you're going up into a thin aluminum tube 30,000 feet up the air), you're much less likely to explore untried alternatives.]
55ff Insights here on why it may be important to explore early in life and exploit late in life; comments here on how it takes us years to become confident and autonomous; the authors quote a developmental psychologist saying humans' extended dependence "gives you a developmental way of solving the exploration/exploitation tradeoff": you can just explore because your parents take care of the payoffs. The authors also consider the idea that children's typical exploring behavior (always onto the next thing, pressing random buttons on things, jumping from thing to thing, etc., all of which we consider "childlike" and even cognitively deficient) may actually be more adaptive then it appears. [Interesting insight]
56ff On the elderly: interesting comments here on how they have fewer social relationships and that this is not necessarily bad; it's a function of pruning to avoid social and emotional risks late in life, Also, focusing on relationships that are the most meaningful. Thus considering "how much time you have left" in the decision-making environment is it important component. "We think of a young as stereotypically fickle; the old, stereotypically set in their ways. In fact, both are behaving completely appropriately with respect to their intervals."
Chapter 3: Sorting: Making Order
59ff This chapter starts with a funny anecdote about socks and sorting [this is a wonderful example of good writing: you take what could be a dry and boring discussion of sorting theory and you start it off by talking about socks, and this draws the leader in, make him chuckle, coats the pill with some sugar.] Note in the footnotes for page 60, on page 279, there's an actual discussion of heuristics for sock sorting, the Radix Sort, where you sort the socks into piles based on color or pattern, thus you make one major pass, leaving a set of smaller piles, and then go through the individual piles to find matches; also another sort technique based on needing one pair of socks pulled out of a full drawer of random socks: you take the number of categories (in this case different color types, lets say you have three colors in total), you take four socks out (one more than the number of categories) and you'll automatically have one match. This is based on the mathematics idea of the Pigeonhole Principle: if there are more pigeons than holes there will always be at least one hole with more than one pigeon. Finally the authors' solution is to just buy one kind of sock; this is the most elegant approach which redefines the problems so it can be easily solved.
60ff "Sorting is at the very heart of what computers do. In fact, in many ways it was sorting that brought the computer into being." On the development of the Hollerith Machine, introduced to handle data on punchcards for the US census in the late 1800s; Herman Hollerith's firm eventually became International Business Machines. Basically it was a paper form of a database; comments here on how sorting is "key to the human experience of information"; On the sorting that happens in human hierarchies, establishing rank, etc.
62 On the lack of scale/scale problems in sorting problems, thus "violating our normal intuitions about the virtues of doing things in bulk." The authors describe the difference between cooking for two being as easy as cooking for one, but sorting 100 books is much more than twice as difficult as sorting 50 books: you have twice as many things to organize, twice as many places each can go, etc.
63ff [It's worth noting here that these authors are about to make a discussion on sorting unbelievably readable and easily understandable to a lay reader, using the metaphor of sorting books on a shelf. Well done guys!]
63n Note the footnote for page 63 (page 280) on the most efficient way to sort a deck of cards: deal the cards into 13 piles based on their face value: one pile for all the aces, one pile for all the twos, etc., then after gathering all the piles, deal the cards out into the four suits: there will then be one pile for each suit, ordered by card. Very creative!
64ff On measuring things to minimize the worst-case situation: Big-O notation: "Constant time." meaning Big O of 1, or O(1); this would be a problem like cleaning your house for n guests: the problem is constant no matter how many guests come. And then Big O of n, or O(n) would be a problem like passing a roast around the table, think of this as "linear time." And then O(n^2), quadratic time, which would be when your guests arrive each one hugs the others in greeting. And then exponential time O(2^n) or factorial time O(n!) where the problem gets way worse with additional sorting components.
65ff Bubble sort, simple and intuitive but inefficient: scanning a list of books on your shelf and sorting out-of-order pairs over and over again until there are no more out-of-order pairs on the entire shelf; this is a quadratic time sorting problem; see instead Insertion sort: taking all the books off the shelf and putting the first book in the middle, and taking the second book and comparing it to the first and putting it either to that book's right or left. This is actually not that much faster, it's still in quadratic time.
67ff On collating, or as it's called today mergesort: this is actually simpler than quadratic time: O(n log n), on small numbers this doesn't offer much of an improvement, but at census-level problems with millions or billions of items it is tremendously important, a gigantic improvement. The analogy using the bookshelf problem is like ordering a pizza and inviting over a few friends, dividing the books evenly, having each friend sort their own stack, then pairing people up and having them merge stacks, repeating this until there's just two stacks left, which you combine with a final merge at the end.
70ff On the Bucket sort algorithm: if you want to group n items into m buckets, the grouping can be done in O(nm) time; the assumption here is that you choose intelligently what buckets in which to group things. There's an example given here of a library that groups books by circulation statistics of the various branches to which books will be sent. Example here where a librarian knows the size of the various buckets and chooses accordingly, say for example books with call numbers in the 3500s are all grouped together but below 3500 are put in smaller subgroupings.
72ff On the trade-off between sorting and searching: "The basic principle is this: the effort extended on sorting materials is just a preemptive strike against the effort it'll take to search through them later." Also: "Sorting something that you will never search is a complete waste; searching something you never sorted is merely inefficient." Thus you will want to estimate ahead of time what your usage will be. See for example how Google sorts most searches ahead of time. And other analogy here: you really don't need to sort your own home bookshelves. You already know where everything's going to be anyway. Also "as the cost of searching drops, sorting becomes less valuable." See the works from Steve Whitaker [in the reading list at the end of this interminable post] on various strategies for how to handle email.
73ff On the idea that not sorting, ever, is the optimal choice: sometimes it's more efficient to leave a mess based on the search/sort trade-off. On Charles Lutwidge Dodgson (Lewis Carroll) and his discussion of the single elimination tournament, a certain form of "mess" that exists in sports. Basically the second best player could be any of the players eliminated by the best player--not just the other finalist! Discussion here on single elimination tournaments versus round robin tournaments O(n^2), versus ladder tournaments, also O(n^2), all of which are methods for sorting players in rank order.
76ff On the idea of sports leagues wanting to maintain tension throughout the season, and not sort the teams per se. For example a league may want to make sure everybody plays everybody within their division in the last five weeks of the baseball season: this delays the resolution of uncertainty for the fans.
78ff On applying sorting algorithms to biology and ecosystems where there's a need for robustness: thus the bubble sort algorithm, although it is inefficient, makes it much more robust against noise compared to faster algorithms like merge sort. Also note the interesting footnote on page 78 where the NCAA gets around the biggest problem in single elimination systems by seeding the teams so that the top ranked teams don't meet each other until the last rounds; also on the Comparison Counting Sort algorithm which is the best algorithm for noisy systems; it's not time efficient but it is fault tolerant, this system strongly resembles a sports team's regular season, playing every other team and building up a win-loss record by which it gets ranked.
79ff Interesting discussion here of poker, and a tremendous insight from Isaac Haxton, a top poker player: "In some ways the most important skill as a professional poker player is to be able to evaluate how good you are. If you're anything short of the very best poker player in the world you can be pretty much assured of going broke if you are endlessly willing to play people better than you." [Just like in investing, make sure you're not the patsy, make sure you're not risking your entire bankroll on a specific situation, know your counterparties, think about who might be on the other side of your trade, etc.] With poker (and specifically heads-up poker) this has an analogy in nature with animals' social hierarchies: on the concept of displacement: when an animal uses its knowledge of the hierarchy to avoid a confrontation. This means establishing an order ahead of time, it's a much less violent system, and it works with food, with meeting opportunities, etc. Pecking orders are an example of the search-sort tradeoff. Another insight in nature: there are fewer confrontations as animals are better able to reason and remember what the pecking order actually is. And then back to heads-up poker: most of the top poker players have a consensus about who the best 20 players or so are but whenever those rankings differ in the mind of certain players, a cash game (a "conflict") will ensue.
82ff Finally moving from ordinal numbers to cardinal numbers: consider a marathon where everybody runs together, there's just one race, so you don't need repeated head-to-head matchups to establish a sorted hierarchy. You just need a benchmark--in the case of the marathon a clock; see also the amusing example from the maritime world: the so-called "Law of Gross Tonnage": the smaller ship always gets out of the way of the bigger ship!
Chapter 4: Caching: Forget About It
84ff The authors begin this chapter with a metaphor of organizing your clothes, quoting Francine Jay in her book The Joy of Less, quoting Martha Stewart etc. [It's unfortunate that our authors haven't read Marie Kondo's wonderful book The Life-Changing Magic of Tidying Up.] Next a discussion of another field that thinks obsessively about storage: the computer science of memory management, "the dual problem of what to keep and how to arrange it." On the concept of caching; [I'd like to mentally send the authors a sincere thank you for quoting Lydia Davis on page 85, and sending me down a rabbit hole of her inventive and unsual ultra-short stories]; On the tradeoff of size and speed; hard disk drives versus solid state drives; on the 1946 idea from Arthur Burks, Herman Goldstine and John von Neumann of a memory pyramid with a small, fast memory and a large, slow memory; this wasn't developed in reality until the 1960s, and it eventually evolved into the idea of the cache in the IBM supercomputers of the 1960s, where hard drives, operating systems and web browsers have caches that stores any information that is highly likely to be used frequently; this speeds up the operation of everything.
87ff As the cost of computing has gone down the cost of accessing memory has not improved [note that this is different from the cost of memory storage which has an even steeper declining cost curve than computing costs via Moore's Law]; worse you have an incredibly fast processing speed but your processor spends most of its time "twiddling its thumbs" waiting for information to come over the "memory wall." Thus we develop layers of caches in most laptops, tablets, smartphones, etc. to deal with this problem.
88ff On what to do when your cache fills up--or your closet for that matter; on the computer science idea of cache replacement or cache eviction; on Laszlo Belady, a Hungarian-born engineer who wrote an important paper in 1966 on caching algorithms that attempt to minimize the number of times you can't find what you're looking for in cache, and thus must go to the slower main memory to find it. These are called page faults or cache misses; the optimal (and hypothetical) rule is to evict from cache whatever we will need again the longest time from now, called Belady's Algorithm in tribute to him, a so-called clairvoyant algorithm [because in the idealized example you already know what you will need to find]; on non-clairvoyant algorithms to try to address caching problems: random eviction, first in first out (FIFO), least recently used (LRU); note that LRU performs the best. See also analogies in the real world, in, say, a library: think of where the library might store the most commonly checked-out authors and books--we can think of that as a type of cache. "Libraries in themselves, with their various sections and storage facilities, are a great example of a memory hierarchy with multiple levels. As a consequence, they face all sorts of caching problems." "...almost all use a version of LRU." Which brings us to an irony, the library assistants who put books back on the shelves actually make the system less ordered on some level [the metaphor here is they take the books out of cache (near the checkout) and put them back in memory (back up in the stacks)]. The conceptual idea here is to "turn the library inside out." [Note that in the footnotes there's a list of the most commonly checked-out books at the UC Berkeley library, and I can't help but suggest you should probably think of this as an "anti-library": a person who wants to avoid consensus thinking will make sure he reads books that are not checked out that often, not read that often--in fact if you find you're reading a book that hasn't been checked out in 10 years or more, consider it a good sign.]
90 Note the footnote for page 90 (on page 283-284) where there's a list of variants of the LRU caching system: LRU-K which looks at the time elapsed since the K-th most recent use (for example, LRU-2 would focus on the penultimate use)--this is the most common version; also the 2Q system, which uses two queues, promoting something from the first to the second queue if they're referred to again while they're in the cache, and then items are evicted from the second queue back into the first queue using LRU, which also is used to evict items from the first queue; also LRFU which combines recency and frequency; finally ARC or Adaptive Replacement Cache, which uses two queues like the 2Q system, but adapts the length of the queues based on performance.
92ff On thinking about caching at different fractal levels: for example, thinking about it at the micro architecture of the computer, where you put memory closer to where it needs to be used (it makes a difference even across the distance of a few inches); and then thinking about content on the internet, see the company Akamai, which runs caching services, storing content in various locations to increase the apparent speed of the internet (for example if you're in Australia you would use websites cached in your region even though the sites can be located anywhere); and then considering caching in the physical world: for example Amazon will use caching techniques for storing merchandise in fulfillment centers near where the buyers are; see also Amazon's patent for "anticipatory package shipping" where they look at purchasing data and store products popular in a given region in warehouses close to that region; this even shows up in Netflix where Netflix will cache certain movies in places where those movies are popular. [It's interesting how many situations this idea fits to.]
94ff On applying these caching insights to daily life: typically LRU is the best system, where you dispose of the least-used thing; secondly, consider ways to exploit geography, put the things close to where they're used: the book gives an example of a woman keeping vacuum cleaner bags behind the couch; and then thinking about levels of caching; for example you'd have one cache level in your closet, another cache level in your basement, a self-storage space is a third level: think of these three levels as listed in decreasing order of access speed. Thus you put the least-used things the furthest from where you are using the LRU principle.
96ff Thoughts on putting things or documents into literal piles, quoting Yukio Noguchi and his book Super Organized Method: whatever file you're using, put it exclusively in one big box, but put the most recently used file or document in the front side (or left hand side) of the box (or file cabinet); whenever you're done with a file just put it back on top/in front essentially: think of this as a pile of documents set on its side and thus it is an extension of the LRU principle, plus it adds a concept from computer science of self-organizing lists, the idea here is to sort your files on your computer by date of most recently used and whatever document you use don't put it back where it belongs but put it on top. Thus the Noguchi filing system is an instantiation of the LRU principle and it is near optimal.
99ff Very interesting thoughts here on the human mind, and how it also is a self-organizing caching system; on the Ebbinghaus forgetting curve and how it strikingly resembles the LRU caching system: stuff further off in time starts to fade in our mind, stuff that we recently used or accessed frequently stays there. The authors bring out yet another interesting point here: reality actually looks like this--our brain has a statistical structure that mimics the statistical structure of reality: it remembers things that are frequent and forgets things that aren't.
102ff Further interesting ideas on how certain types of cognitive declines as we age are basically lags and retrieval errors. It's not necessarily indicating a slowing or deteriorating search process; rather it's possibly an unavoidable consequence of the amount of information we have to navigate getting bigger and bigger, there's a much larger store of memories that we are trying to access so it's yet another type of caching problem. "It's not that we're forgetting; it's that we're remembering. We're becoming archives... We say 'brain fart' when we should really say cache miss.'"
Chapter 5: Scheduling: First Things First
105ff A whirlwind three page tour here of scheduling systems, starting with Getting Things Done and Eat That Frog, etc., and then backing up to Frederick Taylor in the 1870s, then Henry Gantt in the 1910s who gave rise to the well-known Gantt charts used for large projects; then on to Selmer Johnson at the RAND corporation who opened up an inquiry into optimizing scheduling starting with a system to minimize downtime between two machines like a washer and a dryer; then the authors move to the most important machine to optimize scheduling with, ourselves.
108ff Various heuristics for ordering tasks: Earliest Due Date: basically start with a task due soonest and work your way toward the task due last. Then, considering other strategies when "not being late" is not the only metric: for example Moore's algorithm tells you to throw out the task that you won't get to in time or rank tasks in terms of what is the severity/downside of throwing that task out. Also: Shortest Processing Time; always do the quickest task you can, with various versions here like minimizing customer wait times, see also with debt paydown methods like the debt snowball versus the debt avalanche, you pick your metric like the highest interest rate or the smallest amount of debt for example and this is like applying weighted completion times or task density to the shortest processing time algorithm.
112ff On priority inversion and priority inheritance problems; see the Mars Rover, which couldn't do a high priority task because it was being interrupted by a lower task which prevented it from "inheriting" the next priority; see also the joke from Mitch Hedberg, "I was at a casino, I was minding my own business, and this guy came up and said, 'You're going to have to move, you're blocking the fire exit.' As though if there was if there was a fire, I wasn't going to run." In this case the bouncer's argument was priority inversion, Hedberg's rebuttal was priority inheritance. If there's a fire he's going to run!
115ff On Gene Lawler, an expert on precedence constraints; if you're using Earliest Due Date, you are going to have nuances and problems if some tasks cannot be started until other tasks are finished. On how some of these problems make scheduling "intractable."
118ff On task switching or preemption; examples here would be using the preemptive version of earliest due date, you switch to a job that comes up if it's due sooner than the one you're currently doing, otherwise ignore it. Likewise shortest processing time: compare the current task to the time it would take to complete the new one and switch if the new one is less. Note that preemption isn't free: there's a switching cost, "metawork" as the authors call it. In the human brain this is when delays, errors and lost context happen.
121ff On "thrashing": the authors at first describe thrashing metaphorically: a juggler can handle so many balls but then he drops all the balls if there's one too many; essentially the system crashes; computers operate like this, when scheduling theory intersects caching theory; thrashing is basically when a computer gets stuck doing metawork instead of actual work, taking a cache out of the working set of needed items and putting it back; this is basically task interruption and task switching to such an extent that the computer actually is not even working on any task at all. Does this sound familiar to a human being? It is a "very recognizable human state."
123ff We know what thrashing is, but how do you get out of a thrashing state? An easy solution is to just start working on whatever task you're working on regardless of where it belongs in the queue or how important it is, etc. Just stop task switching and work on the thing in front of you. The very act of choosing what to do next is usually the biggest source of metawork, see for example scanning through a huge inbox of email messages trying to figure out which are most important will take n^2 as much time to go through as just starting at the top of the inbox. "In a thrashing state, you're making essentially no progress, so even doing tasks in the wrong order is better than doing nothing at all."
124ff On the concept that responsiveness and throughput are not compatible aspects of real-time scheduling. On the idea of that the best strategy is to paradoxically slow down. [See here the short and helpful book The Practicing Mind by Thomas M. Sterner for more discussion of moving slowly and mindfully, and how paradoxically it causes you to work more efficiently.] Thus spend more time on each individual task. Note how the human brain functions better if it spends at least several minutes, if not much longer, on something before switching back to metawork/thinking about the next task. See also the idea about choosing minimum amount of time to spend on any one task. You see this idea applied in timeboxing or pomodoros; in computers on the idea of "you won't even notice while I'm gone" (which is metaphorically like "I'm going to go out and run a quick errand and I'll be back before you know it,"), as the computer task switches in such a way that the user doesn't see jittery or slow performance, the compute is "back" before we notice it was gone. Thus computer science applies psychology [or really cognitive science] to this domain;
125ff On "interrupt coalescing": the analogy here is to designate one day per month as your bill paying day, or in academia the idea of having office hours that will "coalesce" all of the interruptions from students that might happen over the course of a week. Another analogy would be to check your messages once a day, or with some level of periodicity such that you can do deep work for extended periods during the day. The authors also include a witty footnote here on page 126 about how with a standard operating system filled with dialog boxes and error messages and pop up messages of all kinds, the computer's "user interface demands the user's attention in a way that the CPU itself would rarely tolerate." Also finally on "do not disturb" buttons or other techniques for minimizing context switching or task switching: like batch processing your mail every six months or scheduling a weekly meeting to handle minor things, and thus limit spontaneous interruption at random times during the week when we're actually working.
127 Interesting thought here about the rhythm of the postal system and how it "demands minimal responsiveness from you" because it doesn't matter whether you reply 5 minutes, 5 hours or 23 hours and 59 minutes after receiving the letter. [This helps the reader think on a meta-level: what are the "response demands" of a system and how can I tweak them (or tweak myself) to adjust and function best within that system?]
Chapter 6: Bayes's Rule: Predicting the Future
128ff On making decisions based on a single data point: something that seems absurd on one level, but is necessary in many life situations; examples: deciding to wait or not wait for a bus at a foreign city; see how this is chapter starts with a J. Richard Gott, standing at the Berlin Wall in 1969, wondering how long it will last; these are examples of making "an inference from the smallest amount of data we could possibly have: a single observation."
129 On Thomas Bayes, an English Presbyterian minister. [Note this really good example of writing here from the authors: "For somebody who has had such an impact on the history of reasoning under uncertainty, Bayes's own history remains ironically uncertain." Good use of parallelism, good job mooring the idea and the person in the reader's mind; now we know multiple things about uncertainty as well as the guy himself, etc. Good writing!]
130ff Bayes' insight was seeing that it's backwards to reason from comparing winning and losing tickets pulled out of a pool; instead reason forward from hypotheticals. In other words, if you draw three straight winning lottery tickets, you can work out certain probabilities: let's say you assumed that half the tickets were winners, then the odds of drawing three winners in a row are 1/8 (1/2 * 1/2 * 1/2). If just one in a thousand tickets were winners then the odds of this happening are 1/1000 cubed or one in a billion. Thus you can reason about the likelihoods of the makeup of the tickets collectively. Thus you can reason about a probability of probabilities while still not actually knowing the underlying probability of the raffle itself!
131ff Pierre Simone LaPlace: On LaPlace's law ("Want to calculate the chance your bus is late? Count the number of times it has happened in the past plus one then divide by the number of opportunities plus two."), his book Philosophical Essay on Probabilities, per the authors "arguably the first book about probability for a general audience and still one of the best."
134ff Now the authors double back to Richard Gott at the Berlin Wall again (p128), making what would be a "Copernican" assumption about time (Copernicus basically claimed the Earth wasn't anyplace special, certainly not in the middle of the universe), thus Goot decides that time-wise the time he happens to be there is likely not anytime special in the Wall's total lifetime. Thus if any moment was equally likely on average, his arrival should be, roughly, at the halfway point; thus something will likely last about as long as it's lasted already [this sounds a lot like the Lindy effect!]. Thus we then have the Copernican Principle as an algorithm used to make predictions about all sorts of topics without any preconceived expectations. Think of the Copernican Principle as a specific instance of Bayes's Rule. "When Bayes's Rule combines all these probabilities--the more-probable short time spans pushing down the average forecast, the less-probable yet still possible long ones pushing it up--the Copernican Principle emerges: if we want to predict how long something will last, and have no other knowledge about it whatsoever, the best guess we can make is that it will continue just as long as it's gone on so far." Note that this was replicated in a few different interesting instances, including looking at serial numbers of captured tanks in World War II (this led to statistical estimates that the Germans were building 246 tanks per month, the true number turned out to be 245).
138ff On Gaussian distributions, for normally distributed things like human height or life expectancy; Power Law distributions (or scale-free distributions), where most things are below the mean with a few enormous things about the mean: thus there's no real sense of what "normal" is in this distribution; this takes us to get another algorithm which is the Multiplicative Rule, using Bayes' Rule but with a multiplier to take into account a power law distribution: with an uninformative prior the constant factor is 2, thus the Copernican principle, but in other power law cases you use a multiplier that depends on the distribution you're actually working with; box office receipts for movies have a multiplier of 1.4 for example.
141 On the Erlang Distribution which gives us another prediction rule: the Additive Rule: basically, however something has gone on it will go on a constant amount longer; a type of "memoryless" distribution that yield the same prediction no matter the history or current state of the phenomenon. Note that Agner Krarup Erlang developed this notion while working for the Copenhagen Telephone Company trying to estimate how much time would pass between successive calls on a phone network; this idea has been used by urban planners for pedestrian traffic and car traffic; network architecture for the internet uses this principle too, etc. Think about your wife (not mine of course!) saying "Just five more minutes!" when you ask her when they'll be ready to leave... but she keeps saying "Just five more minutes!" five minutes later, and later, and later.
For the geeks out there
144ff On how humans tend to be remarkably good at using Bayesian-type rules for predictions in the real world; the authors claim that it's because we carry around in our heads surprisingly accurate knowledge about distributions without perhaps realizing it.
145ff Note here a pretty interesting debunking/non-replication of the famous Stanford marshmallow experiment, where University of Rochester researchers ran a prior test before the marshmallow test where the researchers acted in an unreliable manner, and the children were then much more likely to eat the marshmallow: in other words the study didn't show anything to indicate their ability to be patient or to "wait for the second marshmallow," but rather the children acted on surprisingly accurate information about the reliability of the researcher: they (correctly) assumed that the researcher would not show up with a second marshmallow, thus it was totally rational to eat the first one. [Anyone who's read anything I've written over the years already knows that I actually think very little of "studies-show science" so it shouldn't be surprising that one of the most famous experiments in psychology can't be replicated. Eventually we all reach the ultimate truth: "studies-show science" is not science--not even close.]
147ff The chapter closes with interesting musings about language and how we talk about our experiences to others and how our talk skews reality for the people we speak to: if you want to maintain accurate priors you have to remember that people (and media) tend to tell stories that are interesting/alarming, etc., thus you will tend to hear news about outlier events. "If you want to be a good intuitive Bayesian--if you want to naturally make good predictions, without having to think about what kind of prediction rule is appropriate--you need to protect your priors. Counterintuitively, that might mean turning off the news."
Chapter 7: Overfitting: When to Think Less
151 On the risks of assuming that more information and more thinking is better, that it will help you make a better decision; this is not true in "the era of machine learning research--the science of teaching computers to make good judgments from experience."
153 "It is indeed true that including more factors in a model will always, by definition, make it a better fit for the data we have already. But a better fit for the available data does not necessarily mean a better prediction." [This is the central problem of overfitting or backfitting.]
156ff Various examples of overfitting in the real world: see for example how fencing has changed because of the scoring mechanisms; optimizing ad impressions on websites; other examples of "juking the stats" [although the authors don't use this expression]. Also sobering examples of "training scars" where police and FBI, after grooving specific training behaviors on the range, replicate certain ancillary behaviors in the real world, like interrupting a gunfight to put spent casings in their pockets.
159 On solutions for this: cross-validation, where you hold back certain data or use out-of-sample data to test an existing model for overfitting; the authors give an interesting example in teaching, where everyone is given a standardized test but one student in 100 is given an oral exam or an essay exam to cross-validate to make sure that the students were actually acquiring knowledge, not just acquiring test-taking skills. Also on the idea of requiring simplification or penalizing complexity, see in statistics the technique of regularization and also Robert Tibshiranis' Lasso algorithm which puts downward weight on factors in a model to eliminate some of the [probably spurious] factors entirely. See also in nature: you have "the burden of metabolism" which imposes "a caloric penalty for overly elaborate machinery." Thus animals have a constant incentive to avoid adding elaborate machinery and overfitting themselves to a given environment that may change in the future. [Good way to think about it.]
162 On using heuristics: see Harry Markowitz developing his Modern Portfolio Theory, but then when it comes to his own money, he ultimately chose a simple 50/50 stock/bonds model! [This is one of the more disturbing examples of "I don't care what you think, just tell me what's in your portfolio" that I've ever heard, I did not know this story. Not only is MPT viewed as a joke among real-world practitioners but even the inventor of it himself doesn't use it! Jesus.]
164ff On slowing down the adaptation process; slowing the process of absorbing the new data: contrasting changes in food fads with the yay organisms evolve; you don't want an organism to overfit itself to its environment. The authors take this insight and suggest a heuristic of having a bias in favor of history and avoiding new things. [Avoiding the bandwagon and avoiding anything "in fashion" is likely an equally useful heuristic; or as the BowTiedBull says, harshly: "do the opposite."] On the computer science manifestation of this is the "regularization technique known as Early Stopping": this shows up in human behavior as well, where giving yourself more time to decide something and considering more pros and cons may mean overfitting. The authors give an example here of one of the one of them overfitting his class preparation, but then later teaching a class with far less preparation that the students liked far more; the latter class was less detail-focused, cutting much of the nitty-gritty from the lectures; he was overfitting the metric of his own taste in the class, assuming that was the students' taste; thus you don't want to overfit to a metric that may just be a proxy metric.
168 The authors return to Darwin, as he considered the pros and cons of marriage, arguing that he just made one page's worth of points, "he was regularizing to the page," a form of early stopping.
Chapter 8: Relaxation: Let It Slide
169ff On "tractable" versus "intractable" problems: problems that can be solved in problems that likely cannot; on the traveling salesman problem which is in the class of intractable problems; on computer science solutions for these types of problems by using Constraint Relaxation: for example with the traveling salesman problem you eliminate the constraint that you can't go to the same town twice, this provides you with a "spanning tree" solution which is very close to the actual solution. Often computer science algorithms can relax certain constraints and yet still figure out that the solution is within a certain percentage of the true solution (which still can't be found), but this may offer information about the actual solution; it also may show you that you've solved the vast majority of the problem in a tiny fraction of the time.
176ff On Continuous Relaxation: relaxing a problem from discreet integers: see a problem like calculating the most fire stations you'll need in a community to reach every house within five minutes: instead of using discrete integers you can use "fractions (even tiny fractions) of a firehouse." A "half a fire truck" or "half a fire station" means nothing technically, but it allows you to approximate the solution, and then translate it back into reality by rounding up certain fractions.
178ff On Lagrangian Relaxation: named after the French mathematician Joseph-Louis Lagrange; the authors give an example here of one time as a child one of the authors was complaining about his chores, and his mother said, "you don't have to do your chores, but then you have to deal with the consequences." So the idea here is to relax some of the consequences of the problem: like allowing more than 10 people to be seated at a wedding table (relax it up to 12 for example) and then the "cost" would be lack of elbow room (or whatever); and then you can see what this particular solution provides and usually it's quite close to what would be the actual solution--that you couldn't otherwise directly find. This applies to say league game scheduling problems: if you can relax certain constraints you get to a schedule that's very close to the optimal solution; also on league scheduling problems using continuous relaxation: note that working with "fractional" games doesn't really help. Also a useful comment here towards the end of the chapter, again on Lagrangian relaxation: by just imagining a constraint being taken away, you can illuminate possibilities you didn't think about before.
Chapter 9: Randomness: When to Leave It to Chance
182ff On deterministic algorithms versus randomized algorithms that use randomly generated numbers to solve a problem. On estimating values and distributions via sampling: see for example LaPlace and his work on Bayes's Rule; on the Monte Carlo method of using sample simulations instead of exhaustive probability calculations, this was used for example on the Manhattan Project from some of the insights of Stanislaw Ulam.
186ff Michael Rabin and his work on prime numbers, which seemed useless until cryptography became a big thing in the 20th century; on one way functions: multiplying versus factoring prime numbers; the Miller-Rabin primality test which "provides a way to quickly identify even gigantic prime numbers with an arbitrary degree of certainty."
193ff On a three-part trade-off of time for sorting, versus time for searching, and then the trade-off of space for caching; and then adding to both time and space a third variable of certainty; thus trading off on certainty saves you time and space; on Bloom filters: named after Burton H. bloom, "if you're willing to tolerate an error rate of just 1% to 2%, storing your findings in a probabilistic data structure like a Bloom filter will save you significant amounts of both time and space." Examples here might be website indexer to check against a list of malicious websites to be "mostly sure" that a site is safe. Also allegedly there is a use case here with Bitcoin [although note the authors don't say anything at all about this use case!]. If you look it up you see that a very thin client (presumably like a validator node?) you can verify a transaction without having to examine the entire blockchain: note also that these are not used anymore with Bitcoin, due to security vulnerabilities apparently.]
194ff Hills, valley and traps: this is an interesting discussion starting with a sample problem of traveling across ten cities the cheapest possible way, and the different techniques you could use, like the greedy algorithm or the myopic algorithm where you take one step and then after that take the best step available each step of the way; or the hill climbing iteration where you tweak it with optimizing at various steps or permutations at given points. Note that these algorthms will sometimes stop you at a local maximum, but it might not be the global maximum (the authors describe a lobster trap as a good example of a local maximum); on augmenting hill climbing with what is known as jitter: make a few random small changes and then go back to the hill climbing algorithm and see if you end up at a higher peak; also the random restart hill climbing algorithm where you completely scramble a solution when you reach a local maximum, and then start hill climbing again: this works well when there are lots of local maxima in a problem, for example in code deciphering.
198ff On simulated annealing; annealing is the way materials change state as they are heated and cooled; this is an idea borrowed from the semiconductor industry; when you warm a system, then cool it back down, the system reorganizes itself; in this case temperature is really velocity/random motion at the molecular scale; this is analogous to adding jitter to the hill climbing algorithm in the prior example. You might add randomness to a problem at higher or lower degrees, kind of metaphorically thinking about it as temperature change.
201ff On applying randomness to human creativity; see producer/musician Brian Eno and artist Peter Schmidt and their deck of cards known as "Oblique Strategies" that are used for solving creative problems and give you random new perspectives on an artistic project; this looks kind of like an example of jitter all over again; note also the interesting idea from the author of using the "random article" link at Wikipedia and using it as a default homepage to send you down different, unexpected rabbit holes. [I think it's worth thinking about reading this way too: add jitter to your reading! You will find connections and insights and fresh ideas this way.]
Chapter 10: Networking: How We Connect
205ff Discussion here of communication protocols: anything from social norms to machine connections; on machine-to-machine problems and solutions that mimic (and illuminate) human communication problems. Discussion of packet switching versus circuit switching; on the telephone industry' initial resistance to this technology, and a comment from a packet switching advocate using the traditional phone network: "It takes 35 seconds to set up a call, you charge me a minimum of 3 minutes, and I want to send 100 milliseconds of data!" Note that with circuit switching, network reliability goes down exponentially as the network gets larger, while with packet switching the network reliability increases exponentially with network size. [I hadn't thought about this aspect. Of course the problem with the phone companies' hesitance with packet switching was the fact that they had huge copper circuit-switching infrastructure already in the ground, thus it would mean, they'd have to strand a lot of this asset base and spend a lot on fresh capex to switch over to packet-based networks.]
209ff On confirming your message is received, and then confirming that confirmation is received: a form of the Byzantine general's problem. On the so-called triple handshake: message, response and then acknowledgment of the response as a third message. Also on the ACK packet (acknowledgement packet), a sort of serial number [this looks quite a bit like a nonce in a blockchain transaction]. Note that these generate a tremendous amount of confirmation transmissions.
212ff How long a period of non-responsiveness would we consider a breakdown in communications? One aspect of this question is the length of the "round trip time" between sender and receiver. And then on what to do about this question: on "Exponential Backoff": doubling the delay before retransmitting a signal. This was innovated in ALOHAnet in the 70s, and then in the 80s it was included in the TCP protocol. Also exponential backoff is used for browsers testing websites that might be down, and also used in password security to prevent hackers from using a dictionary attack against an account, by exponentially increasing the lockout period.
216ff On network congestion: circuit switched networks get full, but a packet switched network gets slow; on AIMD: Additive Increase, Multiplicative Decrease: this leads to a "TCP sawtooth" of bandwidth use: it prevents rapid overload and recovery cycles of congestion. The authors then try--in a bit of a stretch--to analogize this to a solution for the Peter Principle in the business world.
220ff On communications feedback: on the critical role of feedback, both in networking and linguistics: on the back-channel in linguistics: see the work of Victor Yngve, on messages like "uh-huh." Also on another example of feedback: how telling a story to an uninterested listener actually ruins the teller's telling: the teller trails off, loses confidence, etc. Uh-huhs and yeahs are like ACK packets, basically.
222ff On Bufferbloat, and its fix, Tail Drop (cutting off anyone in line beyond a certain point, or dropping all packets and messages beyond a certain point of congestion); basically these are problems of too much latency in a communications network, sort of like if you went away for a vacation and everyone who came to your house didn't go away but maintained a huge line at your front door. The metaphor here is that maybe it's better to drop packets.
Chapter 11: Game Theory: The Minds of Others
229ff On the intersection of game theory and computer science, producing the field of algorithmic game theory; quoting John Maynard Keynes on investing, saying that "successful investing is anticipating the anticipations of others." And then taking this not just to the second and third degree but also the 4th and 5th degrees are higher. Computer science addresses this line of reasoning with what is called the "halting problem" which shows up in situations involving recursion. See also the infinite recursion in, say, a poker hand: you want to play what you believe your opponent has in his cards, and vice versa, and vice versa, etc. Poker players call this leveling: Level 1 is I know, Level 2 is you know that I know, Level 3 is I know that you know that I know. An example would be "Wow, this is a really silly spot to bluff but if he knows that it is a silly spot to bluff and he won't call me and that's where it's the clever spot to bluff." "You only want to play one level above your opponent" otherwise they won't pick up what you want them to see, or you will think that they have information they don't actually have. See also poker players playing completely by the book and luring their opponent into a leveling war against themselves.
233ff On the Nash equilibrium; on "knowing what Nash is" and computer science's algorithmic game theory which is the study of how machines and people come up with strategies for games. Note that using Nash equilibrium to model things is probably suspect because we can't use the equilibrium to predict how other players will behave; and sometimes you don't know that the game even reaches an equilibrium to predict in the first place.
235ff On Nash equilibria which aren't always the best outcome for both sides, see the prisoner's dilemma where the best strategy is to defect, this is what you would call the dominant strategy.
238ff On the tragedy of the commons, and some rather loosely related subjects here, like vacation time: on the idea of having unlimited vacation at a firm and playing it out as a Nash equilibrium; also on stores opening earlier and earlier on Thanksgiving day, the discussion here is about bad equilibria.
240ff On "mechanism design": changing the game, also called reverse game theory: asking "what rules will give us the behavior we want to see?" Like making vacation compulsory, or adding a mafia don to the prisoner's dilemma problem so that the players will cooperate to their benefit rather than defect. Other examples: imposing external rules, like what a government would do or what a religion would do to enforce cooperation.
244ff Interesting example here of a man buying a vacuum cleaner that breaks and then he leaves a vindictive review, this is actually a form of selflessness because he stands to gain little other than satisfaction of writing the review--but everyone else gets to read the review and is protected from a defective vacuum cleaner. The authors describe emotions as one of the drivers of mechanism design, a form of authority outside the game in the form of evolution which gave these emotions to humans. "Morality is the herd instinct in the individual." [Nietzsche] The authors follow this quote with a paraphrase: "emotion is mechanism design in the species." Emotions are involuntary, and not just emotions like anger, but also compassion, guilt, love: see for example one's feelings of commitment with a spouse or with a business partner etc.
247ff On information cascades: see for example financial bubbles; the authors start with auctions before getting into the mechanics of financial bubbles; on different types of auctions: sealed-bid, first-price auctions (which means it's likely the winter overpays; also there is recursion based on what you think other people's bids will be); descending auctions or Dutch auctions (examples here would be a store marking down unsold items: basically the seller begins optimistically but nudges the price down until a buyer is found); the English auction/ascending auction: this is the auction format everybody is familiar with; note also the extra complexity layer in Dutch and English auction styles because you see the public flow of bidding behavior: In the Dutch auction there's an absence of a bid in the English auction there's other people bidding higher and higher. This can present big problems if the bidders are doubtful about their own estimations of the value of the thing being auctioned. This brings us to the information cascade, where participants in an auction (or a market) fall prey to, effectively, infinite misinformation. The authors give examples here of two books being algorithmically priced against each other on Amazon, or a flash crash in the stock market. On information cascades offering a rational theory not only of bubbles, but also of fads and herd behavior in general. Also in these instances of cascades we misinterpret with others think based on what they do (bid higher and higher for an item, buy stocks at crazy valuations, etc.).
252ff On the Vickery auction, named after the economist William Vickery, it's a sealed bid auction and the highest bidder wins, but he ends up paying the bid of the second place bidder. Supposedly this gives the participants an incentive to be honest, because there's no better strategy than bidding exactly what you think the item is worth. Also "shading your bid" doesn't help you either because no matter what you'll still be paying the value of the second highest bid; this is seen as a strategy-proof or truthful mechanism. [This sounds like the kind of auction an economist would come up with. If you wanted to win, you would still bid a very high price, and so would the other aggressive bidders who also wanted to win, so this doesn't really chance the dynamics much from a regular sealed-bid auction; the only difference here--and it's a terrible one--is that the seller gets a less for what he's selling because he doesn't get the highest bid. I highly doubt a seller would do this kind of auction in the real world.]
Conclusion: Computational Kindness
256ff On cases where algorithmic approaches can be easily transferred over to human problems; see the 37% Rule the LRU criterion and the upper confidence bound. Also using an algorithm can help you even if you don't get the results you're looking for; some of these algorithms actually work better than clairvoyance--see for example the LRU algorithm. The authors also distinguish between process and outcome: you can choose the best process but you aren't guaranteed an outcome in life; they call this computational stoicism. Finally on the idea of finding a good enough solution and sticking to problems that are tractable. [These last ideas are essentially game design or problem design: set things up so your "game" is tractable; live your life/manage your expectations so you can always find "good enough" solutions.]
257ff The authors then move on to the problems we pose to each other; on a domain of ethics that they call "computational kindness." The idea is "to not give other people computation to do": in other words to frame issues for others so that their underlying computational problem are easier--particularly in social situations. See for example "What do you want to do tonight?" as a form of "Here's a problem, you handle it." By not stating your preferences you make the other person simulate or imagine them. [This is very interesting!] "Politely withholding your preferences puts the computational problem of inferring them on the rest of the group. In contrast, politely asserting your preferences ('Personally, I'm inclined toward x. What do you think?') helps shoulder the cognitive load of moving the group toward resolution." [!!!] Other examples: to have a list of restaurants or choices, and then have everyone eliminate their least preferred option--and this makes the task easier for everyone. Also inviting someone out, but offering two or so concrete proposals that they can accept or decline. See also the computational load from a parking garage structured like a helix, you just drive up until you find a spot, so the computational load is zero. "One of the chief goals of design ought to be protecting people from unnecessary tension, friction, and mental labor." [This insight could come right out of the wonderful book The Design of Ordinary Things!] The authors go on to say that we could think of this as a "cognitive subsidy" for people.
To Read:
Herbert Simon: Models of Man
***Irving John Good: Good Thinking
Albert-Laszlo Barabasi: Linked: How Everything Is Connected to Everything Else and What it Means for Business, Science, and Everyday Life
Gary Marcus: Kluge: The Haphazard Evolution of the Human Mind
David Hoffman: The Oligarchs
Carola Baumgardt: Johannes Kepler: Life and Letters
Arthur Koestler: The Watershed: A Biography of Johannes Kepler
William Poundstone: Prisoner's Dilemma: John Von Newman, Game Theory, and the Puzzle of the Bomb
William Poundstone: Fortune's Formula
William J. Cook: In Pursuit of the Traveling Salesman: Mathematics at the Limit of Computation
Chester I. Barnard: The Functions of the Executive
Dan Siroker and Pete Koomen: A/B Testing [these are the founders of Optimizely]
William Daniel Hillis: The Pattern on the Stone: The Simple Ideas that Make Computers Work
Steve Whitaker: Am I Wasting My Time Organizing Email? [paper]
***Lydia Davis: Almost No Memory
***Annie Dillard: Pilgrim at Tinker Creek
Annie Dillard: The Writing Life
Francine Jay: The Joy of Less
Andrew Mellen: Unstuff Your Life! kick the Clutter Habit and Completely Organize Your Life for Good
***Daniel Boorstin: The Discoverers: A History of Man's Search to Know His World and Himself
Charles Coulston Gillespie: Pierre Simone LaPlace 1749-1827: A Life in Exact Science
***Pierre-Simone LaPlace: Philosophical Essay on Probabilities [link to public domain copy]
Robert Kanigel: The One Best Way: Frederick Winslow Taylor and the Enigma of Efficiency
Eugene L. Lawler: Scheduling a Single Machine to Minimize the Number of Late Jobs [1983 technical report]
Dennis Shasha and Kathy Lazere: Out of Their Minds: The Lives and Discoveries of 15 Great Computer Scientists
Peter Michael Blau: The Dynamics of Bureaucracy
Luke Rhinehart: The Dice Man (cult novel)