This collection of scientific papers is a challenging but useful discussion on statistical methods, probability, randomness, logic and decision-making. Much of the book centers around Bayesian statistical methods and when and why to use them, as well as "philosophy of science"-type discussions on when a scientist should--or sometimes must--apply subjective judgments to scientific problems.
It will help enormously if you've had a semester or two of statistics to really get at the meat of this book. If not, scroll down a few paragraphs for a short list of layperson-friendly books that address many of these subjects more accessibly.
[A quick affiliate link to Amazon for those readers who would like to support my work here: if you purchase your Amazon products via any affiliate link from this site, or from my sister site Casual Kitchen, I will receive a small affiliate commission at no extra cost to you. Thank you!]
Author Irving Good worked with Alan Turing at the famous Bletchley Park project to decrypt Nazi communications during World War II. He then went on to a long and successful academic career, and the works collected here span much of Professor Good's life. Many of the papers touch on the same subjects, so there is some repetition of the author's thinking across the book. It's helpful: it assists the reader in grooving the author's ideas.
And, once again, if you have some basic understanding of statistics, you'll also see that while he teaches, Professor Good has a hilarious sense of humor. And he sneaks these things into his academic papers! He uses acronyms like SUTC to describe how statisticians "sweep under the carpet" the various subjective assumptions and human judgments that they'd prefer not to admit. He makes a Zen reference to the 11-fold path of statistics. He makes up a name for his own philosophy of statistical rigor, calling it "Doogianism" (his name spelled backwards), and then with a twinkle in his eye he calls Doogianism "the true religion" as an ironic criticism of the enforced orthodoxy of statisticians of his era.[1] And when he writes "we are all Bayesians" I can't tell if he's channeling Nixon channeling Keynes[2], or if he's just candidly pointing out that subjective judgment always show up in statistics, it's just that statisticians (as well as scientists in general) want to be seen as more objective than they really are.[3]
It all reminds you of the kind of gifted professor you'd love to have in college who can take a boring, dry subject and, somehow, bring it to life, making it both fascinating and genuinely useful in the real world. I was lucky enough to have an accounting professor and a statistics professor with this gift. I'm still grateful to them both.
Finally, a short risk management and decision-making reading list:
Annie Duke: Thinking in Bets
Gerd Giggerenzer: Risk Savvy
Naseem Taleb: Fooled by Randomness
Darrell Huff: How to Lie with Statistics
Daniel Kahneman: Thinking Fast and Slow
Footnotes:
[1] It's worth noting that arguments on ideology and orthodoxy occur in every domain. Even statistics.
[2] In frustration, Nixon reportedly once said, "We are all Keynesians now," more or less as an admission that government intervention in the economy was a permanent feature of modernity. Note also that Keynes wrote on probability and statistics too, and is liberally cited throughout this book.
[3] Dr. Good offers an endless range of examples here. A scientist may not want to admit he made a (fully subjective) decision to carry out an experiment because his department got a budget increase and he can now hire four more postdocs. Even pure mathematicians make subjective judgments on which problems to work on: they choose the ones they "feel" they can solve! Finally, the author also uses the wonderful phrase "all statisticians would like their models to be adopted." In other words, whenever a statistician decides to analyze a dataset, he has to choose a method of looking at that data, choose how much data to look at, choose sampling methods, and so on, all relying on subjective judgment. But he may not want to admit that he "used subjective judgment" before he chose how to look at the data, instead he just believes the methods he chose were the right ones! The point here is the real world forces "objective" scientists to make subjective judgments all the time. Best to own up to it.
[Readers, as always, a friendly warning: the notes and quotes from the text that follow are here to help me order my thinking and better remember what I read. They are not worth reading. Feel free to stop right here and return to your lives.]
Notes:
Introduction:
x The author describes his basic philosophy of probability: "a compromise between the views of Keynes and Ramsey. In other words, I consider (i) that, even if physical probability exists, which I think is probable, it can be measured only with the aid of subjective (personal) probability and (ii) that is not always possible to judge whether one subjective probability is greater than another, in other words, that subjective probabilities are only 'partially ordered.' This approach comes to the same as assuming 'upper and lower' subjective probabilities, that is, assuming that a probability is interval valued. But it is also convenient to approximate the interval by a point." Also comments on his influences: the writing of Harold Jeffreys and Bruno de Finetti translated by Jimmie Savage; de Finetti "expresses his position paradoxically by saying that probability does not exist." Also influenced by the writings of Bertrand Russell and H.G. Wells. This guy also worked at Bletchley on cryptography during World War II as an assistant to Alan Turing.
x-xi Comments here on Bayesian analysis and the Good's discovery that the philosopher of science C.S. Peirce had proposed the name "weight of evidence" in nearly the same technical sense as the author's own back in 1878.
xi On the concept of "entropy" as used in probability, see Shannon, S. Kullback and Myron Tribus; the author tells the reader this concept will show up "in several places in the following pages."
xi On neo-Bayesian statistical techniques and philosophical arguments that depend explicitly on subjective or logical probability; on the unpopularity of Bayesian approaches in the 1950s, on how the author was regarded as an extremist. Note this interesting comment here about the movement of the statistical profession's "left/right spectrum" regarding Bayesian statistics, where it shifted over the course of the author's 35 year career: Good's basic position didn't change but the profession (in terms of its spectrum of views on Bayesian analysis) moved all the way to the "left"... meaning it adopted Bayesianism.
xi-xii On the author's idea that probability can be discussed in terms of fundamental principles of axioms, rules and suggestions; axioms are the basis of mathematical theory, the rules show how abstract theory can be applied, and suggestions can be unlimited numbers of devices for eliciting your or others' judgments.
xiiiff A rundown of the contents of the book. Part I covers rationality: Chapter 1 discusses utilities and decisions and a way of rewarding consultants who estimate probabilities [note also the comment here about this paper resembling a time guessing game that the author invented at age 13 using logarithms]. Also there will be a discussion of probability hierarchies Type 1 and Type 2: Type 2 is a probability of a statement containing probabilities of Type 1; Chapter 1 collects together 27 brief principles to codify the basic foundations of rationality; Chapter 3 classifies the variety of Bayesian philosophies; Chapter 4 describes Bayesian influence in statistics and some of the tricks used in non-Bayesian techniques to cover up subjectivity.
xiv Part II deals with probability: Chapter 5 is a chicken and egg question about whether probability or statistics is logically primary, and also criticizes certain conventional statistical concepts; Chapter 6 discusses historical background on kinds of probability; Chapter 7 describes black box theory and how it can be used to generate an axiomatic system for upper and lower subjective or logical probabilities; Chapter 8 is a discussion of randomness; Chapter 9 discusses the history of the hierarchical Bayesian technique; Chapter 10 is a discussion here of dynamic or evolving probabilities, such as in chess, systems of perfect information but dynamic systems or evolving or "sliding" probabilities.
xvff Part III: Corroboration, Hypothesis Testing, and Simplicity: Chapters 11 and 12 resolve Hempel's paradox of confirmation in which confirmation is corroboration [this sounds like an induction versus deduction discussion]; Chapter 13 attempts to decide whether the Titius-Bode law of planetary distances could be due to chance, a Bayesian evaluation of scientific theories; Chapter 14 discusses hypothesis testing using tail-area probabilities, Bayes factors/weights of evidence and surprise indexes; Chapter 15 surveys the topic of corroboration and checkability.
xviff Part IV: Information and Surprise: Chapter 16 covers Warren Weaver's surprise index and Shannon's entropy, also a debate with Shackle on the idea that business decisions were based on the concept of potential surprise, whereas Good considers them based on subjective probability; Chapter 17 explains in terms of rationality why it pays to acquire new evidence when it is free; however, Chapter 18 argues, from the point of view of another person who knows more than you do it can sometimes be to your disadvantage to acquire a small amount of new evidence; Chapter 19 is a survey of the author's works on information, evidence, surprise, causality, explanation, and utility; Chapter 20 points out, contrary to Eddington, we should not be surprised that our galaxy is unusually large.
xviiff Part V deals with causality and explanation (for example in order to assign blame and credit): Chapter 21 uses a "desideratum-explicatum approach" with weight of evidence for understanding when one event causes another one. Chapter 22 is a simplified discussion of Chapter 21; Chapter 23 deals with explicativity, a measure of the explanatory strength of a hypothesis in relation to given observations, a compromise or synthesis between Bayesian and Popperian views.
Part I: Bayesian Rationality
Chapter 1: Rational Decisions
3ff What rules can we lay down for making rational decisions? The author believes making rational decisions should not depend on whether we are statisticians. "In most subjects people usually try to understand what other people mean, but in philosophy and near-philosophy they do not usually try so hard." Discussion of scientific theories that are satisfactory only if they have a precise set of axioms, precise rules of application of the abstract theory, and suggestions for using the theory that are less precisely formed as the axioms and rules. "Some theoreticians formulate theories without specifying the rules of application, so that the theories cannot be understood at all without a lot of experience. Such formulations are philosophically unsatisfactory." On the idiosyncrasy of the theory of probability which occupies a position between logic and empirical science, so it straddles this problem of clear rules of application and experience required to apply them.
5ff Comments here on "degrees of belief" both subjective or objective; on how we pay more attention to some people's judgment than to others, thus it might be useful to enable others to do some of our thinking for us; also if we want to get someone to abandon some of his beliefs we can use promises or threats or suggestions, but we can also ask him questions to obtain information about his beliefs... and then as he explains them we may show that this beliefs are not internally consistent.
6ff On utilities and whether they belong to the theory of probability: the author says this question is linguistic unless he first defines a few terms here; of interest here are terms like "body of beliefs" versus a "reasonable body of beliefs," or a body of decisions versus a "reasonable body of decisions" which does not have contradictions with rational behavior, etc.
9 "I think that once the theory of probability is taken for granted, the principle of maximizing the expected utility per unit time (or rather its integral over the future, with the discounting factor decreasing with time, depending on life expectancy tables) is the only fundamental principle of rational behavior. It teaches us, for example, that the older we become the more important it is to use what we already know rather than to learn more." [Interesting to think through the ramifications of this statement...]
9ff But then certain complications arise, like we have to weigh the time for doing mathematical and statistical calculations against the expected utility of these calculations: sometimes a less good method may be sometimes preferred; like in an emergency, a quick random decision is better than no decision. We also have to allow for the necessity of convincing others, see how Isaac Newton translated his calculus arguments into geometry in the Principia; the author allows for doing something irrational if we're going to worry about it irrationally: thus it's better to purchase insurance for something to buy peace of mind for example; or gambling has a utility of its own, which may explain why we take on bets with negative expected utility; sometimes there's no choice because we have a lack of precision in our judgments; also public and private utilities don't always coincide: the author gives an example of advisor to a firm who advises the adoption of an invention but then it fails; thus the advisor may have the temptation to act against the interests of the firm; the author gives an interesting solution to this which is to ask the advisor for his own estimates of the various probabilities and then instead of making him make the decision, have the firm decide and take the responsibility for the decision on its own shoulders. "In other words, leaders of industry should become more probability conscious." [And agency problem conscious!]
12ff On minimax solutions: "Notice how close all this is to being a classification of the decisions made in ordinary life, i.e., you often choose between (i) making up your mind, (ii) getting further evidence, (iii) deliberately making a mental physical toss-up between alternatives. I cannot think of any other type of decision." On minimax solutions being prudent rather than rational; on using minimax in games, especially if you have a belief that your opponent is a good player, also when you may not have time to work out anything better once a game starts, and to avoid too large a probability of a large loss, especially in a game like poker.
Chapter 2: Twenty-Seven Principles of Rationality
1) Physical probabilities probably exist but they can be measured only with the help of subjective probabilities.
2) A familiar set of axioms of subjective probability are to be used.
3) In principle these axioms should be used in conjunction with inequality judgments and therefore they often lead to inequality discernments.
4) The principal of rationality is the recommendation to maximize expected utility.
5) The input and output to abstract theories of probability and rationality are: judgments of inequalities, odds, Bayesian factors, log factors of weight of evidence, information, surprise indices, utilities and other functions of probabilities and utilities. "It is often convenient to forget about the inequalities for the sake of simplicity and to use precise estimates."
6) When the expected time and effort taken to think and do calculations are allowed for in the costs, then one is using the principle of rationality of Type II; it contains a veiled threat to conventional logic by incorporating a time element and often justifies ad hoc procedures.
7) The purposes of the theories of probability and rationality are to enlarge bodies of beliefs and check them for consistency, and thus to improve the objectivity of subjective judgments; a reference here to Godel's Theorem: this process can never be completed, presumably because we can never be fully consistent.
8) "For clarity in your own thinking, and especially for purposes of communication, it is important to state what judgments you have used and which parts of your argument depend on which judgments." [This is a great heuristic for any and all ethical argument and ethical persuasion. If someone is trying to persuade you of something and they do not or cannot (or will not) state these things, you probably can safely dismiss their argument.]
9) The vagueness of a probability judgment is defined either as the difference between the upper and lower probabilities or as the difference between the upper and lower log odds.
10) The distinction between Type II and Type II rationality is similar to the distinction between standard subjective probabilities and what I call evolving or sliding or dynamic probabilities. "Evolving probabilities are essential for the refutation of Popper's view on simplicity. Great men are not divine."
11) My theories of probability and rationality are theories of consistency only between judgments and the axioms and the rules of application of axioms.
12) On the device of imaginary results: you can derive information about an initial distribution by an imaginary Gedanken [thought] experiment and thus make discernments about the final distribution after a real experiment.
13) On the Baye/non-Bayes compromise: use Bayesian methods to produce statistics, then look at their tail-area probabilities and try to relate these to Bayes factors.
14) "The weakness of Bayesian methods for significance testing is also a strength, since by trying out your assumed initial distribution on problems of significance testing, you can derive much better initial distributions and these can then be used for problems of estimation. This improves the Bayesian methods of estimation!"
15) "Compromises between subjective probabilities and credibility are also desirable because standard priors might be more general-purpose than non-standard ones. In fact it is mentally healthy to think of your subjective probabilities as estimates of credibility. Credibility are an ideal that we cannot reach."
16) The need to compromise between simplicity of hypotheses and the degree to which they explain facts" as opposed to emphasizing simplicity alone; Ockham's razor emphasize simplicity alone without referencing degrees of explaining the facts. [Note here another example of Stigler's Law of Eponymy here: as John Duns Scotus actually was the innovator of the razor concept, not William of Ockham. We'll see repeated discussion of this little factoid in a few of these papers in this book.]
17) The relative probabilities of two hypotheses are more relevant to science than the probabilities of hypotheses tout court.
18) The objectivist or his customer reaches precise results by throwing away evidence. [!!!]
19) If the objectivist is prepared to bet then we can work backwards to infer constraints on his implicit prior beliefs. [!!!]
20) When you don't trust your estimate of the initial probability of a hypothesis you can still use the Bayes factor or tail-area probability to help you decide whether to do more experimenting.
21) Many statistical techniques are legitimate and useful but we should not knowingly be inconsistent.
22) A hierarchy of private probability distributions, corresponding in a physical model to populations, superpopulations, etc., can be helpful to the judgment even when these superpopulations are not physical.
23) Many compromises are possible: the author here suggests the generalization of the likelihood ratio which he mentions in his essay number #198 [which unfortunately is not included in this collection of essays].
24) Quasi- or pseudoutilities: when your judgments of utilities are otherwise too wide it can be useful in planning an experiment to try to maximize the expectation of something else that is is of value, known as a quasiutility or pseudoutility; examples here would be weight of evidence when trying to discriminate between two hypotheses, or strong explanatory power or explicativity, or financial profit when other aims are too intangible; the point here is costs in money and effort have to be allowed for.
25) "The time to make a decision is largely determined by urgency and by the current rate of acquisition of information, evolving or otherwise. For example, consider chess timed by a clock."
26) "In logic, the probability of a hypothesis does not depend on whether it was typed accidentally by a monkey, or whether an experimenter pretends he has a train to catch when he stops a sequential experiment. But in practice we do allow for the degree of respect we have for the ability and knowledge of the person who propounds a hypothesis."
27) All scientific hypotheses are numerological but some are more numerological than others. And subjectivistic analysis of numerological laws is relevant to the philosophy of induction.
Chapter 3: 46656 Varieties of Bayesians
20 The author jokes here that there's actually an infinite variety of Bayesians, "All Bayesians, as I understand the term, believe that it is usually meaningful to talk about the probability of a hypothesis and they make some attempt to be consistent in their judgments." A joke here on von Mises who could also be counted as a Bayesian and then there be at least 46657 varieties which happens to rhyme with the number of Heinz varieties! [Note here he's referring to Richard von Mises, the mathematician, who coincidentally was the younger brother of Ludwig von Mises the economist.]
20ff Comments here on facets of Bayesian thinking: Type II rationality, where the assumption is you maximize utility allowing for the cost of theorizing; of judgments of probabilities, utilities, weights of evidence, likelihoods, etc.; on the precision of judgments; on extremeness; on utilities which can be brought in from the start or avoided; on quasiutilities: note that objectivist statisticians would recognize only one type of utility; vs the idea quasiutilities like weights of evidence are worth using; on physical probabilities versus intuitive probabilities or subjective probabilities, etc.
21 "Thus there are at least 2^4 * 3^6 *4 = 46656 categories. This is more than the number of professional statisticians so some of the categories must be empty." [I think this is an in-joke about contingency tables that statisticians use, but I'm not sure]. Also a joke here about how Thomas Bayes hardly wrote enough to be properly categorized; the idea here is "we are all Bayesians" on some level as we try to make practical, useful estimates of what we should do in reality.
Chapter 4: The Bayesian Influence, or How to Sweep Subjectivism under the Carpet
22ff A discussion of the historical and logical influence of Bayesian arguments; interesting comment here about heated discussions among statisticians in the years after World War II: "There is an unjustifiable convention in the writing of the history of science that science communication occurs only through the printed word." The author argues for the desirability of a Bayes/non-Bayes compromise which would be regarded as a "Type II principle of rationality" which means the maximization of expected utility when the labor and costs of calculations and thinking are taken into account. The author also argues "how some apparently objectives statistical techniques emerge logically from subjective soil." He argues for a "constant interplay between the subjective and objective points of view and not a polarization separating them" as he points out how an orthodox statistician will say one of his techniques has "intuitive appeal" which basically is a Bayesian justification. "To the Bayesian all things are Bayesian." Then a comment here on "cookbook statisticians" who give the impression to their students that "cookbooks are enough" but this is obviously wrong; a subjective judgment of probabilities is always necessary even if that judgment is later used to construct an apparently non-Bayesian process "in the approved sweeping-under-the-carpet manner."
24ff A discussion here of what gets "swept under the carpet" "SUTC" as the the author puts it; on the idea that arguments against Bayesian positions are only valid against some Bayesian positions, it depends what you mean. On his 27 principles [see the previous chapter], and the 11 facets of Bayesian varieties, many of which have been around for a long time with various degrees of "bakedness."
25 Comments here on rationality and probability and black box theory; the "rational man" concept: "Rational men do not exist, but the concept is useful in the same way as the concept of a reasonable man in legal theory." Thus it acts as a concept to hold in our mind when we ourselves should be rational. "Every decision is a bet"; even people who argue against rationality have to agree that it is sometimes desirable to be rational: they would want a rational doctor or a rational judge; also on working with judgments of inequalities between utilities or weights of evidence, etc., which involves subjectivity.
26 On respect for judgment in theory, but also respect for logic and the fact that they must be combined if we are going to use the human brain to make decisions; on the human brain's cunning and rationalization for the sake of its own equilibrium: "You can make bad judgments so you need a black box to check your subjectivism and to make it more objective. That then is the purpose of a subjective theory; to increase the objectivity of your judgments, to check them for consistency, to detect the inconsistencies and to remove them. Those who want their subjective judgments to be free and untraveled by axioms regard themselves as objectivists; paradoxically, it is the subjectivists who are prepared to discipline their own judgments!"
27ff On consistency and the unobviousness of the obvious: the author cites a paper he was working on considering probabilities of events that had never occurred before, and he was instructed by a referee on the paper that this concept itself was too provocative, it apparently was a concept that was an example of itself; but then the author says such a so-called "pioneering" remark was actually obvious: every event in life is unique on some level. See for example the species sampling problem, the estimation of the probability that the next animal or word sampled will be one that is not previously occurred, the probability turns out to be n/N where n is the number of species that have so far occurred just once and N is the total sample size; this idea was originally from Turing, anticipating the Bayes method in a special case. Then the author goes through an example working out the probability that the next animal will be one that is so far been represented r times, which is a calculation of the "frequency of the frequency r."
29 Interesting discussion here on calculating initial probabilities that must be assumed before final probabilities can be calculated. "But there's nothing in the theory to prevent the implication being in the reverse direction: we can make judgments of initial probabilities and infer final ones, or we can equally make judgments of final ones and infer initial ones by Bayes theorem in reverse. Moreover this can be done corresponding to entirely imaginary observations." An example of a hypothetical experiment like those common in physics. "Ye priors shall be known by their posteriors." "Even a slightly more obvious technique of imaginary bets is still disdained by many decision makers who like to say 'That possibility is purely hypothetical.' Anyone who disdains the hypothetical is a philistine."
29ff On Types I and II rationality: rationality of Type I is the recommendation to maximize expected utility, Type II is the same except that it allows for the cost of theorizing. "It means that in any practical situation you have to decide when to stop thinking." Plus you may have to sacrifice strict logical consistency, at best achieving consistency as far as you have seen up date; this "resolves a great many of the controversies between the orthodox and Bayesian points of view... Another name for the principal of Type II rationality might be the Principle of Non-dogmatism." [This guy has a pretty amusing sense of humor!]
30ff Thoughts here on use as many types of judgments that are relevant; you should allow any kind of judgment that is relevant or even play judgments off against each other; like in the case of a doctor who tells you it's better to operate, and then compare this to standard practice, but then play that off against separate judgments of the probabilities and outcomes of various treatments. Also comments here how many probability discussions are not numerically precise; on wanting to be as eclectic as you can and using orthodox and non-orthodox methods as long as they are good enough. "It is always an application of Type II rationality to say that a method is good enough."
31ff Comments here on utility and accepting that utility judgments can vary massively among individuals; thus the idea of quasiutilities or pseudoutilities.
32 Another interesting offhand comment here in a section on physical probabilities: "I think good terminology is important in crystallizing out ideas. Language can easily mislead, but part of the philosopher's job is to find out where it can lead." Comments here also on subjective probabilities, on arguing for subjectivism: "All statisticians would like their models to be adopted,' meaning that in some sense everybody is a subjectivist."
33 On asking yourself "is that judgment probable?" when making a judgment about probabilities; how this is like a two-stage Bayesian approach, with priors of the second order and first order with parameters and hyperparameters.
33ff Comment here on Jimmie Savage; what kind of probability do you want? On choosing which axioms, or how many you use, whether they should be simple or additive, etc.; note this video explaining Savage's "subjective probability" idea.
34ff Examples of the Bayesian influence and of SUTC (sweeping under the carpet): "One aspect of utility is communicating with other people." On the idea that you should make your assumptions clear and you should separate out the part that is disputable from the part that is less so. Thus an emphasis on likelihood seen in Bayesian thinking, where you have initial probabilities, then you have likelihoods (which are probabilities of the event given the various hypotheses), and you multiply the likelihoods by the probabilities and that gives you results proportional to the final probabilities. "Of course you usually have to use subjective judgment in laying down your parametric model. Now the hidebound objectivist tends to hide that fact; he will not volunteer the information that he uses judgment at all, but if pressed he will say 'I do, in fact, have good judgment.' So there are good and bad subjectivists, the bad subjectivists are the people with bad or dishonest judgment and also the people who do not make their assumptions clear when communicating with other people. But, on the other hand, there are no good 100% (hidebound) objectivists; they are all bad because they sweep their judgments UTC [under the carpet]." [Harshly put but he's right!]
35 The author dates the idea of introducing likelihoods to at least 1774 with Baye's theorem and LaPlace, also on maximum likelihood used by Daniel Bernoulli; also Gauss in 1798 used inverse probability combined with a Bayes postulate (of uniform initial distribution). [!!! It's funny thinking about applying statistical methods, using an implicit--but extremely subjective assumption--that the thing you are analyzing is even a normal-ish looking distribution subject to Gaussian probability estimates in the first place!]
35 Also comments on "trade unionism among statisticians" where the statistician wants to give his customer absolutely clear-cut results, yet he can't usually do it, so he has to pretend that he's not had to use any "judgment."
36 "Given the likelihood, the inferences that can be drawn from the observations would, for example, be unaffected if the statisticians arbitrarily and falsely claimed that he had a train to catch, although he really had decided to stop sampling because his favorite hypothesis was out of the game. (This might cause you to distrust the statistician, but if you believe his observations, this distrust would be immaterial.) On the other hand, the 'Fisherian' tail-area method for significance testing violates the likelihood principle because the statistician who was prepared to pretend he has a train to catch (optional stopping of sampling) can reach arbitrarily high significance levels, given enough time, even when the null hypothesis is true."
37 Another good quote here on admitting when you're making assumptions about the weight of the evidence, the Bayesian or at least the "Doogian" [this is the author's joking name for his Bayes/non-Bayes compromise philosophy, it's his name spelled backward] can argue logically "If we assume that it was sensible to start a sampling experiment in the first place, and if it has provided appreciable weight of evidence in favor of some hypothesis, and it is felt that the hypothesis is not yet convincing enough, then it is sensible to enlarge the sample since we know that the final odds of the hypothesis have increased whatever they are."
37ff "If we knew how we made judgments we would not call them judgments." What happens with a hypothesis that is very plausible or has a great weight of evidence initially is that we can sweep the initial probability under the carpet, and this leads to an apparent objectivism that is really multi-subjectivism. "The same happens in many affairs of ordinary life, in perception, in the law, and in medical diagnosis." Also comments here on the expression "weight of evidence" which captures in ordinary language the meaning intended in this process of determining likelihood. "...one of the functions and philosophy is to make such captures."
39 Comments on the relationship between Bayesian methods and maximum likelihood: "the influence of informal Bayesian thinking on apparently non-Bayesian methods has been considerable at both the conscious and less conscious level, ever since 1763, and even from 1925 to 1950 when non-Bayesian methods were at their zenith relative to Bayesian ones."
39ff Comments here on loss functions: this is when you have a predictive statistical model that is wrong, and where you quantify the error; comments here on statisticians making use of squared error loss, regarding it as "Gauss-given" when it's really Bayesian.
41 On the use of minimax procedures to minimize the maximum expected loss; note that it is easy to judge a sample size to be "large," but the author gives an example of a sample of 1,000 letters of English text that could easily not contain the letter z, thus the sample is "large" in one sense but "small" in another, and it would lead to an appallingly bad bet if you gambled here while giving large odds against this letter occurring on the next trial--or ever. [Here's he talking about a form of risk that shows up catatrophically in finance, blowup risk basically. "Our models say a down 10% day is a four standard deviation event" and then you have what you thought would be an utterly impossible "20 sd event" happen during the GFC, because you forgot that financial assets are always autocorrelated during market crashes!]
45ff Discussion of estimating probability densities, done here using "window methods" to determine how many of N observations lie in some interval or region around x. The author argues here that this is a non-Bayesian method but yet it is suggested for Bayesian reasons.
47ff On non-uniqueness of utilities: for some decision problems utility functions can easily be measured in monetary terms, like in gambling or an insurance problems; but in many other decision problems utility is not readily expressible in monetary terms and can vary greatly from one person to another. The "Doogian" will wish to keep utilities separate from the rest of these statistical analysis if he can. Note also an even important nuance on utilities they vary because "the expected benefit of a client is not necessarily the same, nor even of the same sign, as that of the statistical consultant." [Again he really does touch on very interesting agency problems that show up in statistical analysis.] See also confidence interval estimation where the statistician can confuse a 95% confidence interval with his utility gain, and it may not bear any relation to the client's utility on a single specific occasion. [The author makes a great quote here: "especially if he learns his statistics from cookbooks" this is turning out to be one of the authors most prickly insults. Cookbook authors hurt most!]
48 "Notice further that there are degrees of dogmatism and that greater degrees can be justified when the principles involved are the more certain. For example, it seems more reasonable to be dogmatic that 7 times 9 is 63 than that witches exist and should be caused not to exist. Similarly it is more justifiable to be dogmatic about the axioms of subjective probability than to insist that probabilities can be sharply judged or that confidence intervals should be used in preference to Bayesian posterior intervals. (Please don't call them 'Bayesians confidence intervals,' which is a contradiction in terms.)"
49ff Comments here on tail-area probabilities; note the nuances on a bimodal distribution where you actually have three or even four tails, thus what is "more extreme" in these cases?
51ff Also note instances where tail events can be crucial, like in a medical trial or in ordinary life. "The more important a decision the more 'Bayesian' it is apt to be." [Again, he's talking about risk of ruin here.] Also interesting comments here on test criteria for tail-area probabilities. "Many elementary textbooks recommend that test criteria should be chosen before observations are made. Unfortunately this could lead to a data analyst's [sic] missing some unexpected and therefore probably important features of the data. There is no existing substitute for examining the original observations with care. This is often more valuable than the application of formal significance tests... The point of the usual advice is to protect the statistician against his own poor judgment."
55 Interesting quote here: "Fisher (1959, p. 47) argued that, at the time he wrote, there was little evidence that smoking was a cause of lung cancer. The research by Doll et al. had shown a tail-area probability of about 1/80 suggesting such a relationship, but had also shown a similar level of significance suggesting that inhalers less often get lung cancer than non-inhalers. Fisher's ironical punchline then was that therefore the investigators should have recommended smokers to inhale!" The idea here is the original probability was probably too low and likely spurious; you have to use a certain preliminary judgment to think about initial probabilities before deciding which work to follow up on and which hypotheses to analyze. [I wonder if another important rabbit hole to explore is whether it's really accurate or not to hold with great confidence the claim "smoking causes cancer"... the more I've learned about the various corruptions across centralized medical research, across pharma trials, hell, across all of "studies show science" in general... maybe this claim isn't the universal truth I thought it was.]
Part Two: Probability
Chapter 5: Which Comes First, Probability or Statistics
59ff Which came first, probability or statistics? it's like a chicken or egg problem; the author examines some examples of statistical principles like maximum likelihood, or tail-area probabilities, where even in the 17th century gamblers had to have a rough-and-ready way to assess things before claiming cheating on their opponents--and then drawing swords! Also on large sample theory: deciding on significance tests before taking a sample, this might be good advice to use with those statisticians whose judgment you do not trust, but the author gives an example here: can you assume a normal distribution when a sample of 100 readings contains one reading that is 20 standard deviations above the mean?
61 Funny comment here about confidence intervals used to protect the reputation of the statistician by being right in a certain proportion of cases in the long run; but "it sometimes leads to such absurd statements that if one of them were made there would not be a long run." [!!] "...the confidence method is a confidence trick, at least if used too dogmatically."
61 Also an interesting comment here about Type 1 and Type 2 errors [false positive, false negative] and a discussion about what "robustness" means in a significance test, it depends on what kind of sensitivity you want, what type of departure from the null hypothesis you need? [The author doesn't use these examples, but the idea here is do you want to reduce the error to miss a cancer or miss a pregnancy, or do you want the opposite and thus you get a ton of false positives? This stuff obviously shows up in medicine, and the false positive/test specificity problem is a very squishy, difficult to grasp problem, even for medical professionals.]
Chapter 6: Kinds of Probability
63 "The mathematician, the statistician, and the philosopher do different things with a theory of probability... Each does his job better if he knows something about the work of the other two." Comments here on different kinds of probability being like different kinds of life on different layers; sure, there are animal and vegetable life forms, but there are also genera and species-level life forms, but then again there really is only one kind of life! The author says probability is like this on some level, there are five kinds of probability at least, but from another point of view they can all be seen as one kind. "I shall elaborate this remark and begin by describing some different kinds of probability. Classification of different kinds of probability is half the problem of the philosophy of probability."
64ff Comments here from Aristotle in 300 BC: "The probable is what usually happens"; on Cicero in 60 BC, describing probability as the "guide of life"; on both of these as primitive theories of probability and rational behavior; the ancient Romans practiced insurance and Domitius Ulpianus drew up a table of life expectancies in 200 AD; commentary in Dante's Purgatorio in 1477 giving the probabilities of various totals when three dice are thrown; Jerome Cardan [Gerolamo Cardano], "an inveterate gambler" made several simple probability calculations of use to gamblers. Pascal, to whom the origin of mathematical theory is usually ascribed; in correspondence with Fermat around 1654 Pascal "solved the first mathematical non-trivial [probability] problems." The first book on the subject of any depth was published soon afterwards by Huygens. All of these authors were trying to explain one kind of probability in terms of another kind without being explicit about it. Then onto James Bernoulli and his work Ars Conjectandi, 1713, arriving at ideas like "causally independent" [like with coin tosses for example, events that are discrete]; the problem here is that Bernoulli tried to apply his theorem to social affairs when those probabilities are likely to be variable [or autocorrelated, or variably autocorrelated!].
65ff On subjective probability: the author gives an example of a deck of cards where the red cards are sticky and so the odds of a black card brought to the bottom would be higher--but only for those who knew that the red cards were sticky. Thus the probability is not the same for everyone depending on information as well as the specific event estimated. Plus we have to think about "propositions" or assumed information; also on the idea of talking about hypothetical probabilities as well as true probabilities.
66ff On inverse probability, inverting Bernoulli's Theorem, estimating the probability that say if we send a questionnaire to and smokers are of them refuse to fill them out what is the probability that the next smoker selected will refuse to fill out this questionnaire, and what is the proportion of all smokers who will refuse, this is basically estimating a probability with "a very small sample so far" of something that will happen next as part of that sample. This was a process arrived at by Thomas Bayes in 1763, discussed in LaPlace's 1812 book Théorie analytique des probabilités, thus it is called the Bayes-Laplace method of statistical inference; today we discuss it in terms of initial probabilities, final probabilities and likelihoods. "Bayes's theorem is, in effect, that the final probability of a hypothesis is proportional to its initial probability times its likelihood."
67 On LaPlace's formula: after r successes and in n trials, probability p can be estimated as (r+1)/(n+2). "This formula is open to dispute and has often been disputed. It leads, for example, to the conclusion that anything that has been going on for a given length of time has a probability [close to] 1/2 of going on for the same length of time again. This does not seem to me to be too bad a rule of thumb if it is applied with common sense." [This sounds exactly like the Lindy Effect right here.]
68 On other inverse probability methods of statistical inference: for example the idea of maximum likelihood addressed by Bernoulli in 1777, Gauss in 1823, and "especially by Fisher in 1912." In both of these cases the typical objection is that the initial probabilities cannot be determined by clear-cut rules, but the method itself is clear-cut, and "does not lend itself so easily used by conscious or unconscious cheating." Note also with small samples this process can lead to "absurd conclusions."
68 On using long run frequency versus naive frequentism: the author gives an example of a coin tossing machine that produces HTHTHTHTHT; the proportion is 1/2, but this is clearly not a fair system. On Richar von Mises in 1919 on "irregular collectives"; that the proportion of "successes" [in this case "heads"] is the same for every sub-sequence selected in advance; note that von Mises left out how to apply this idea in the real world: he said that "the sequences must be long, but he did not say how long"; the author likens it to the idea in geometry that dots must be small [but how small?] before they are called points.
69ff The author discusses his own views on probability, calling it neoclassical or neo-Bayesian; he discusses here the mid-20th century orthodox statisticians led by Fisher who opposed his views. On the theory of subjective probability as meaning to introduce as much objectivity into your subjective body of beliefs, and then probability judgments are plugged into sort of a black box and discernments are fed out; also that "many orthodox statistical techniques achieve objectivity only by throwing away information, sometimes too much." The author considers his framework important for statistical practice and for making real-world decisions.
70ff The author gives six other kinds of probability in addition to subjective and physical probability:
* Degrees of belief (which he says "hardly deserves to be called a probability")
* Subjective probability
* Multi-subjective probability
* Credibility
* Physical probability
* Tautological probability
"I shall merely repeat dogmatically my opinion that although there are at least five different kinds of probability we can get along with just one kind, mainly, subjective probability. This opinion is analogous to the one that we can know the world only through our own sensations, an opinion that does not necessarily make us solipsists, nor does it prevent us from talking about the outside world. Likewise, the subjectivist can be quite happy talking about physical probability, although he can measure it only with the help of subjective probability."
71-2 Discussion here of physical probability and metaphysical theories on determinism or indeterminism of the universe; "Whether or not we assume determinism, every physical probability can be interpreted as a subjective probability or as a credibility. If we do assume determinism, then such an interpretation is forced upon us. Those philosophers who believed that the only kind of probability is physical must be indeterminists. It was for this reason that von Mises asserted indeterminism before it became fashionable. He was lucky."
Chapter 7: Subjective Probability as the Measure of a Non-measurable Set
73ff This is a discussion of "aspects of axiom systems for subjective and other kinds of probability." First there are some definitions given, distinguishing between various kinds of probability: like physical probability (like the unknown probability of a load dice to come up as a six; these probabilities have nothing to do with other minds); psychological probability (extrapolating from your behavior for example); subjective probability (psychological probability modified by an attempt to achieve consistency with a theory using mature judgment); logical probability (assuming the ideal of infinite samples and perfect rationality, these are likely to be unknown in practice); all of these definitions are here in order that the author can define how he uses the expression "subjective probability." "Each application of a theory of probability is made by a communication system that has apparently purposive behavior. I designate it as 'you.' ... 'You' may be one person, or an android, or a group of people, machines, neural circuits, telepathic fields, spirits, Martians and other beings. One point of the reference to machines is to emphasize that subjective probability need not be associated with metaphysical problems concerning mind."
74 On the idea that logical probability cannot be really done and there will always be domains were subjective probability will have to be used instead; on the idea that physical probability automatically obeys axioms, while subjective probability depends on axioms, and psychological probability neither obeys axioms nor depends very much on them.
75 Here the author offers a black box description of the application of formalized theories, see photo:
75ff Observations and experiments have been omitted, this is a closed loop: you feed judgments into a black box and feed discernments out of it; the discernments are made in the black box as deductions from judgments and axioms, and the totality of judgments at any time is called "a body of beliefs"; you examine each discernment, if it seems reasonable, you transfer it to the body of beliefs; the purpose of deductions is to enlarge the body of beliefs and to detect inconsistencies in it, after which you will remove them by means of more mature judgment. "The black box may be entirely outside you, and used like a tame mathematician, or it may be partially or entirely inside you, but in many case you do not know the probabilities precisely." Also comments here on "deluxe black boxes" with extra equipment so that additional types of judgment and discernment can be used, like weights of evidence, judgments of normality, theories of rational behavior, judgments of odds or log odds, judgments of other people or organizations, comparisons of utilities, etc., these are all types of "suggestions" that can add to your subjective probability system.
77ff The rest of this paper discusses certain axiom systems--and it is technical and over my head--and then the paper closes with a discussion of higher types of probability: the author calls Type II probability where we would use a more pragmatic expression like 0.5 or "more likely" as opposed to saying "the probability lies between 0.2 and 0.8." [Basically here you just call something "a 50/50 likelihood" instead of getting an overly precise range that is more or less useless to a person trying to make a decision.] Finally there's a discussion of continuity as a type of probability function.
Chapter 8: Random Thoughts about Randomness
83ff "When philosophers define terms, they try to go beyond the dictionary, but the dictionary is a good place to start and one dictionary definition of 'random' is 'having no pattern or regularity.'" Comments here on finite and infinite random sequences; on what it means to select a sequence of digits or numbers "at random"; on the three interrelated concepts of randomness, regularity and probability; also on the paradox that there can occur regularity inside a random sequence, thus the definition of randomness as having no regularity is not rigorous enough.
85ff The author hypothesizes what it would be like as a child or a baby psychologically as it forms impressions of reality: whether it would notice regularities and find new regularities and patterns and then arrive at the concept of randomness; he argues that we form the concepts of regularity, randomness and probability in an intertwined manner; note this quote here where he quotes Samuel Butler: "To ask which of the concepts of regularity, randomness, and probability is psychologically prior might be like asking whether 'the chicken is the egg's way of making another egg.'"
86 On the place of randomness in statistics; random sampling and random designs of experiments were introduced into statistics to achieve apparent precision and objectivity: he uses an anecdote about a lady in an experiment to find out whether she could tell whether milk was put into her tea first; and of course we don't know what information she is drawing on to know this: was it based on her ability to taste? Or was it based on some suspiciously regular looking pattern? The conclusion here is we still have to make subjective judgments about what is actually happening in this experiment, not just look naively at the odds of her getting all of the tests correct, and thus concluding a certain probability from that. There's always some subjective aspect: either the judgments going in, the type of patterns or randomness used to make a sample, or some other factor we're not aware of.
87 "A good Bayesian does better than a non-Bayesian, but a bad Bayesian gets clobbered." --Herman Rubin, 1970.
88 "The problem of which finite random sequences are satisfactory occurs in a dramatic form in the science of cryptology. It might be supposed that there is no better method of enciphering a secret message than a random method. Yet suppose that, by extraordinarily bad luck, the enciphered message came out exactly the same as the original one: would you use it? Would you use it if you knew that your opponent knew you were using a random method?"
88 On the idea of the "statistician's stooge" who selects the random numbers but does not reveal them to the experimenter who then claims to maintain objectivity.
89ff More comments here on randomness: on how we know whether a sequence is random and how this interferes with an objective analysis of random sampling of data, for example in a statistical experiment. Also a discussion here about using judgment in evolving or shifting probabilities: for example if you're estimating the probability of a mathematical theorem before deciding whether to prove it or not, you have to make a subjective guesstimate at the likelihood that you will succeed. You have to do this at least informally. Also see an example here where if you are betting that the millionth digit of pi is a 7, you can use the estimated probability of 0.1, but strict logic says the probability is either 0 or 1!
90 Back to the lady tea taster example: if we use a pseudorandom number sequence, say the square root of 2 starting at the 777th digit, it is extremely unlikely this lady will know these digits (we are subjectively sure she cannot compute this mentally), thus pseudorandom numbers are just as useful here as strictly random ones.
91 The discussion gets a little bit abstruse here as the author begins comparing and contrasting mathematical conceptions of randomness with physical randomness; then a discussion that we can never prove that a physical system is deterministic so we lose nothing practically by displacing the indeterminism from our shoulders to those of the physical system, and we can say we are just describing properties of the system; the author argues that this could be thought of as solipsistic; then he gets into the mind-body problem and quantum mechanics.
92ff Discussion here of a real world probability problem like a contingency table of two-way classifications of people classified by their occupations and causes of death, where some tables might be empty (like the cell corresponding to the number of professional chess players kicked to death by a horse, heh); on making an estimate by lumping together some of the rows or some of the columns; on actuaries grappling with this problem for more than 100 years, "they usually select some reference class that favors their insurance company at the expense of the customer." [!!]
94 Finally the author quotes himself in a book review he wrote in the Times Literary Supplement addressing whether randomness is responsible for the emergence of all new ideas and hypotheses and sciences about the world; he argues that we could study all kinds of biology and all kinds of life not just that exists but that also could exist; or likewise all machines that could exist not just ones that exist; but this is impossible in practice (although it is technically logically possible in principle), but the problem is because we have a limit on time, we can only study the life forms we "meet"; this is why reductionism appears to be false but the author concludes the trouble with reductionism might be merely a lack of time. [What's refreshing about this author is how he promptly dispenses with a lot of fru-fru science and masturbatory "how many angels can dance on the head of a pin"-type philosophical problems, and sticks to practical problems that occur in the real world--along with the practical constraints that come along with those practical problems. Refreshing and realistic.]
Chapter 9: Some History of the Hierarchical Bayesian Methodology
95 The summary here is typical of the author's sense of humor: "A standard technique in subjective Bayesian methodology is for a subject ('you') to make judgments of the probabilities that a physical probability lies in various intervals. In the hierarchical Bayesian technique you make probability judgments (of a higher type, order, level, or stage) concerning the judgments of lower type. The paper will outline some of the history of this hierarchical technique with emphasis on the contributions by I.J. Good because I have read every word written by him."
95ff Discussion of how in 1947 he made a bet that the predominant philosophy of statistics in the century ahead would be Bayesian, he now thinks that there is a Bayes/non-Bayes compromise; discussion of how we have to make probability judgments, and subjective probability is more directly involved in your thinking than physical probabilities; they are required for reasoning and the probabilities cannot be sharp in general. "...a theory of partially ordered subjective probabilities is a necessary ingredient of rationality."
96 Three categories of hierarchies of different types or orders of probability:
1) Hierarchies of physical probabilities: populations, superpopulations, super-duperpopulations, etc., or hierarchies of physical probabilities: random sequences, random sequences of random sequences, etc.
2) Hierarchies arising in a subjective theory: basically trying to cope with any vagueness or to allow for confidence you feel in judgments for a decision.
3) Mixed hierarchies: say, for example two levels where a subjective or logical distribution is assumed for a physical probability.
99 Discussing small probabilities in large contingency tables; Good gives an example of a hierarchical Bayesian argument by looking at a set of contingency tables and assuming that the table as a whole had an approximately normal distribution, then examining it and finding it was so.
99ff Maximum likelihood/entropy for estimation in contingency tables: On the principle of maximum entropy as a method for selecting prior distributions or a method for formulating hypotheses.
100ff On multinomial distributions: "What can be done with fewer parameters is done in vain with more." Also a little blurb here that "Ockham's razor" was emphasized 20 years before Ockham by the famous medieval philosopher John Duns Scotus; this section is rather technical here and above my pay grade; in fact it's so technical it doesn't seem like it even fits stylewise with anything in the book so far.
104 On probability density estimation and "bump hunting": this just means looking for points of inflection on a curve. "The number of bumps was proposed as one measure of complexity, and the greater the number the smaller the initial probability of the density curve ceteris paribus." This was apparently used in experimental physics as well as in analyzing chondrites, a common type of meteor.
Chapter 10: Dynamic Probability, Computer Chess, and the Measurement of Knowledge
106ff On convincing people that the philosophy of probability is relevant. Also some interesting statements here about formal systems that could have come right out of Godel, Escher, Bach: "Formal systems, such as those used in mathematics, logic, and computer programming, can lead to deductions outside the system only when there is an input of assumptions. For example, no probability can be numerically inferred from the axioms of probability unless some probabilities are assumed without using the axioms: ex nihilo nehil fit [Nothing comes from nothing.] This leads to the main controversies in the foundations of statistics: the controversies of whether intuitive probability should be used in statistics and, if so, whether it should be logical probability (credibility) or subjective (personal).
107 Another brief review expressing the theory of comparative subjective probability, framing the theory as a black box and the person using the black box is you: the black box is a formal system (in this case of statistics, axioms), and its input consists of your collection of judgments "many of which are the form that one probability is not less than another one, and the output consists of similar inequalities better called 'discernments.' the collection of input judgments is your initial body of beliefs, B, but the output can be led back into the input, so that the body of beliefs (usually) grows larger as time elapses. The purpose of the theory is to enlarge the body of beliefs and to detect inconsistencies in it. It then becomes your responsibility to resolve the inconsistencies by means of more mature judgment... This theory is not restricted to rationality but is put forward as a model of all completed scientific theories." More comments here on dynamic situations like chess with sliding or evolving probabilities; the author now prefers the expression "dynamic probability." "It is difficult to see how a subjective probability, whether of a man or a machine, can be anything other than a dynamic one." Again comments here on how even mathematicians make subjective judgments about the truth or falsity of a mathematical theorem (albeit informally) before they begin to work on it: "in the process of finding and conjecturing theorems every real mathematician is guided by his judgments of what is probably true." [Note a good footnote here to this comment: "One man's routine is another man's creativity." Of course the first time you figure out a way to solve a problem it involves creativity; once you know how to do it, it's routine.]
108 Another interesting footnote here on page 108 where the author says that he thinks there is a probability exceeding one half that a machine will come in the present century that can simulate all the intellectual activities of any man. [Very interesting, albeit not quite right. Granted he may have written that in the 1950s and back then when people thought about the 2000s they all assumed it would be flying cars and the Jetsons.]
108 "...it used to puzzle me how a machine could make probability judgments. I realize it later that this is no more and no less puzzling than the same question posed for a man instead of a machine. We ought to be puzzled by how judgments are made, for when we know how they are made we don't call them judgments. If judgments ever cease then there will be nothing left for philosophers to do."
108 "If analysis were free, it would pay you in expectation to go on analyzing until you were blue in the face, for it is known that free evidence is always of non-negative expected utility. But of course analysis is not free, for it costs effort, time on your chess clock, and possibly facial blueness. In deciding formally how much analysis to do, these costs will need to be quantified."
109ff A discussion of dynamic probability in the context of a chess game; the different pieces have weighted values and the probability of winning changes dynamically based on the amount of pieces each player has on the board, etc.; also the cost of calculations show up in the case of a timed chess match, or you may want to preserve your energy for the next game, "this accounts for many 'grandmaster draws.'" "Current chess programs all depend on tree analysis, with backtracking, and the truncation of the tree in certain positions. On superficial or surface probabilities that do not depend on analysis in depth; also on whether to truncate an analysis tree to fixed depth or not; the earliest chess programmers recognized that an important criterion for chess position to be regarded as an endpoint of an analysis tree was "quiescence": "A quiescent position can be defined as one where the player with the move is neither threatened with immediate loss, nor can threaten his opponent with immediate loss." On turbulence in a chess game; also thinking about dynamic probability with the state of the advantage of one player over the other: "With best play on both sides we would expect the rate of increase of advantage to be some increasing function of the advantage."
111ff Comments here on the relevance of dynamic probability to the quantification of knowledge; an amount of information can be a quasiutility for cutting down on a search for problems like theorem-proving, medical diagnosis, and even in chess; also on adaptive programs where "the cost of calculation tend to decrease when the program is used repeatedly."
113 Interesting point here: "It should now be clear that dynamic probability is fundamental for a theory of practical chess, and has wider applicability. Any search procedure, such as is definitely required in non-routine mathematical search, whether by humans or by machines, must make use of sub goals to fight the combinatorial explosion. Dynamic utilities are required in such work because, when you set up subgoals, you should estimate their expected utility as an aid to the main goal before you bother your pretty head in trying to attain the subgoals... Both human problem-solvers and pseudognostical machines must use dynamic probability." [This sounds a little like the search/sort tradeoff, as well as the optimal stopping heuristics, both discussed in Algorithms to Live By.]
114 Comments here on looking at dynamic probability in connection with the principle of rationality, in context of the amount of thinking or calculation required to solve a problem; these are probabilities and utilities that are dynamic: "When a conscious attempt is made to allow for the costs we may say that we are obeying the principle of rationality of Type II." The author claims that this is Bayesian thinking and even non-Bayesian are forced to use it, and thus dynamic probability and dynamic utility help us to achieve a Bayes/non-Bayes synthesis. Finally, "Inequality judgments rather than sharp probability judgments also contribute to the synthesis: a strict non-Bayesian should choose the interval (0,1) for all his subjective probabilities!"
Part III: Corroboration, Hypothesis Testing, Induction, and Simplicity
Chapter 11: The White Shoe Is a Red Herring
119 "Hempel's paradox of confirmation can be worded thus, "A case of a hypothesis supports the hypothesis. Now the hypothesis that all crows are black is logically equivalent to the contrapositive that all non-black things are non-crows, and this is supported by the observation of a white shoe."
119 This example shows that there are instances where a case of a hypothesis does not necessarily supports that hypothesis, even though it seems to be true. And then the author gives another example of two hypothetical worlds:
Suppose we know we are in one of two worlds and we are considering hypothesis H that all crows in our world are black. The two worlds are:
1) a world in which there are 100 black crows, no crows that are not black, and a million other birds.
2) a world in which there are 1000 black crows, one white one, and a million other birds.
A bird is selected randomly, it turns out to be a black crow.
"This is strong evidence (a Bayes-Jeffreys-Turing factor of about 10) that we are in the second world, wherein not all crows are black. Thus the observation of a black crow, in the circumstances described, underming undermines the hypothesis that all the crows in our world are black. Thus the initial premise of the paradox of confirmation is false, and no reference to the contrapositive is required."
Chapter 12: The White Shoe qua Herring Is Pink
121 This is a response to Hempel who responded to the prior brief paper, arguing that a white shoe is not a red herring; the author goes on to give another example: if there are crows, then there is a reasonable chance that they are of a variety of colors. Even if I were to discover that a black crow exists I would consider "all crows to be black" to be less probable then it was initially.
Chapter 13: A Subjective Evaluation of Bode's Law and an "Objective" Test for Approximate Numerical Rationality
122 "This paper is intended in part to be a contribution to the Bayesian evaluation of physical theories, although the main law discussed, Bode's law or the Bode-Titius law, is not quite a theory in the usual sense of the term." First the author briefly discusses the foundations of probability, statistics, and induction, "to make the paper more self-contained."
122ff On the seven kinds of probability:
1) Tautological/mathematical
2) Physical
3) Intuitive, and intuitive splits up into:
4) Logical subjective
5) Multisubjective
6) Evolving/dynamic, and
7) Psychological.
"Subjective probability is psychological probability to which some canons of consistency have been applied. Logical probability is the subjective probability in the mind of a hypothetical perfectly rational man, and is often called credibility." On evolving or dynamic probabilities which changes in the light of reasoning alone without the intervention of strictly new evidence [see, again, for example the game of chess or estimating the probability of a mathematical theorem]; these types of evolving probabilities are the most important kind in practical affairs and in scientific induction.
123ff Interesting paragraphs here on terminology: and how the author tried for decades to use terminology close to ordinary English uses of the terms "as an aid to intuition, instruction and the unity of knowledge..." see for example "initial" instead of "prior" or "final" instead of "posterior" and "intermediate" instead of "preposterior"; although he followed the masses in the use of the term "prior" more recently; on the Bayesian term "weight of evidence"; "Logically, terminology should not matter; but, in practice, it does." [Reminds me of the Yogi Berra quote: "In theory, there is no difference between theory and practice. But in practice, there is."]
125 On necessity of judging whether a simple statistical hypothesis should be entertained at all, whether the complexities of the hypothesis are reasonable, whether the hypothesis could be rejected out of hand by the evidence, thus a Bayesian will have to look at the observations before deciding on a full specification of a theory or hypothesis, this may include further simplifying assumptions; both Bayesians and non-Bayesians have to guess that a certain mathematical model of reality may or may not be adequate. "The main difference is that in a non-Bayesian analysis more is swept under the carpet... The Bayesian is forced to put his cards on the table instead of up his sleeve. He thus helps others to improve his analysis, and this is always possible in a problem concerning the real world."
125 "Deep down inside we are all subjectivists..." "Judgment is necessary when deciding whether a set of judgments is sufficient."
126 On scientific induction: on Karl Popper in 1959 who believed the logical probability of any general scientific theory is zero; Good's response is this is "a view that if adopted will kill induction stone dead and is intended to do so." Discussion of "weight of evidence" between competing theories to determine which theory is preferable.
127 on the idea that magistrates and medical diagnosticians make judgments using the weight of evidence all the time "without quite knowing it. Statisticians should be more like magistrates and magistrates more like Bayesian statisticians."
127ff A subjective analysis of Bode's Law; now we get to the meat of this paper; on the idea that there is a sociological element in evaluating scientific theories, there's judgment involved: for economic reasons, for courtesy, on the authority of other more powerful scientists, etc. "All judgments are subjective but some are more subjective than others." [Love this callout to Animal Farm.] On evaluating Bode's Law concerning the mean distances of the planets from the sun. On Bode's Law as an example of physical numerology in that it provides a formula for the calculation of physical quantities without an attempt to explain the formula. "If we were convinced that Bode's Law could not be ascribed to chance, its importance for theories of the origin of the solar system would be increased."
128 Interesting little blurb here on the astronomer Cecelia Payne-Gaposchkin who "even apparently contradicts herself... when she says Bode's Law 'probably an empirical accident' and that 'the fact that the asteroids move in his own near to which Bode's Law predicts a planet suggests that they represent such a planet.'" Additional comments on this law from different scientists, ranging from "it cannot be considered a mere coincidence" to "there is a certain measure of regularity in the spacing of planetary orbits, and this regularity cannot be entirely without significance"; and here's a funny one where the author here says "Young (1902) manages to contradict himself in a single sentence: 'For the present, at least, it must therefore be regarded as a mere coincidence rather than a real law, but it is not unlikely that it's explanation may ultimately be found...' He wins, or loses, both ways." [The best kind of prediction right there!]
128 Finally a few comments on Bode's Law itself: that it gives an approximate empirical law for the relative mean distances from the sun to all the planets that were known at the time; this was 1772 in the case of Bode and in 1766 in the case of Titius, [Bode's law is yet another example an example of Stigler's Law of Eponymy!]; the law later was found to fit Uranus [ohhh the jokes write themselves here], but it failed for Neptune and Pluto, and also fails on some level for Mercury: the formula in its usual expression is that the distances are proportional to 4 + 2^n*3. And then a debate here about what fair value to associate with Mercury: if you use -1 and then increase from there this simplifies the law but it does not give good agreement with Mercury's actual mean distance from the Sun. [This would also be an example of datamining! You look at the planets' arrangement, and then tweak the formula to either 1) fit Mercury, or 2) be the most simplified formula.]
Chapter 14: Some Logic and History of Hypothesis Testing
129ff The author [again!] argues here for a synthesis or compromise philosophy of statistics using both Bayesian and non-Bayesian methods in a compromise. Discussion of the controversy between neo-Bayesian methods and orthodox sampling theory methods in statistics [note this quote in the footnote here "...for some years after World War II I was almost the only person at meetings at the Royal Statistical Society to defend the use of Bayesian ideas. Since such ideas are now much more popular it might be better to call non-Bayesian methods 'sampling theory' methods and this is often done in current statistical publications."] On the idea that Bayesian methods are based on the assumption that you should make your subjective or personal probabilities more objective "whereas anti-Bayesians act as if they wish to sweep their subjective probabilities under the carpet." Once again the author outlines the judgment that everyone has to use their best judgment when they decide on a specific statistical method, but you can give your work an air of objectivity by suppressing some of the background judgments that you make, whereas the author argues that these judgments are of potential importance.
130 "My own philosophy of probability and statistics is a Bayes/non-Bayes compromise. I prefer to call it the Doogian philosophy rather than the Good philosophy because the latter expression might appear self-righteous."
130 "As a psychological aid to introspection, for eliciting your own probabilities, I advocate the use of probability distributions of probabilities. This gives rise to a hierarchical theory... It shows how a good philosophical point of view leads to practical statistical procedures, a possibility that might surprise many philosophers and statisticians. Like mathematics, philosophy can be either pure, applied, or applicable."
131ff On introducing the term "likelihood": to consider the likelihood of hypothesis H given circumstances E; the author considers a set of mutually exclusive hypotheses, each of which have their own likelihoods, this is thus a set of likelihoods; and we can look at the odds of each hypothesis and ratio them, or look at the log of the Bayes factor and call that the weight of evidence.
132 The author contrasts a person guessing one digit correctly, and then you would not hypothesize he could always do so, but if you saw a typewriter print a digit corresponding to a key you press you would assume it always would do so until it breaks; thus you can only modify the beliefs you had before you made the relevant observation; also on comments here that non-Bayesians give the impression that you can look at likelihoods by themselves to choose between hypotheses, but then the author adds an amusing footnote here: "By the time a pure likelihood man recognizes that initial probabilities must be allowed for, he calls them initial likelihoods or initial supports instead so as to avoid admitting that he has become a Bayesian!"
133 "If we want to be able to say that H should be rejected because the observation is too improbable given H we have to do more than compute P(E|H) even when this probability can be computed. Let us consider various approaches to this problem." A Bayesian believes it is meaningful to talk about a factor or logarithm of the weight of evidence, at least in approximation; the main objection here is that it is difficult to specify the probabilities with much precision; the response here is that non-Bayesian methods also need Bayesian judgments, but they just sweep them under the carpet; on the use of tail-area probabilities or p-values, this means a tail-area probability means the probability that the outcome of a given observation would have been at least as extreme as the actual outcome; if the p is small enough then the null hypothesis should be rejected--in other words the hypothesis has merit. Typical values of p are given here (0.05, 0.02, etc.); comments on how Fisher had issued tables corresponding to these thresholds. "Some statisticians choose a threshold without being fully conscious of why they chose it." [You have to admit, he's showing all these pseudorigorous statistical techniques and revealing how much inexact judgment goes into them all, proving everybody is a Bayesian. It's convincing and it's good rhetoric too.]
135 "Many Fisherians (and Popperians) say that 'you can't get (much) evidence in favor of a null hypothesis but can only refute it.' regarding the statement itself is a kind of null hypothesis [!!] The Fisherian's experience tends to support it, as an approximation, Doogianwise, so the Fisherian (and Popperian) comes to believe it, because he is to some extent a Doogian without knowing it."
135 On how all statisticians will use optional stopping at an arbitrarily high sigma or a small tail-area probability; the Fisherian can be sure of rejecting a true null hypothesis if he's prepared to go on sampling for a long time. "The way I usually express this 'paradox' is that a Fisherian (but not a Bayesian) can cheat by pretending he has a train to catch like a gambler who leaves the table when he is ahead." [The idea here is you're making certain judgments as to whether to continue with a hypothesis, or an experiment, or a process, and this is by definition Bayesian thinking, even if you'd like to frame it as the entire thing is completely quantitative statistics and totally objective; it is obvious that certain subjective elements come into play, and the author would argue that you should at least admit this, and use these subjective elements with as much objectivity as you can.]
136 What is more extreme, what is further away mean, "The idea that one outcome is more extreme than another one depends on whether it seems to be 'further away' from the null hypothesis...The statistic chosen for testing the null hypothesis is chosen to reflect this distance."
137 "Notice how the likelihood ratio is analogous to a Bayes factor which would be defined as a ratio of the weighted averages of the probability of a hypothesis given E but these probabilities are verboten by the non-Bayesian."
138 Back to the discussion about planetary arrangement: for example if Bernoulli had included Pluto in his thinking about the uniform distribution of planetary angles, this would actually look like an outlier; for example we could delete one planet and still have a rule although you have to offer some "payment" to allow for the artificiality; and also that it would still be true that all the other planets or arranged such that there must be some physical reason for it, even if some of the observations deviate; and the method of throwing out one of the planets "provides an example of selecting a significance test after looking at the data. Many textbooks forbid this. Personally I think that Rule 1 in the analysis of data is 'look at the data.'" [!!!]
139 "...it is sometimes sensible to decide on a significance test after looking at a sample. As I've said elsewhere this practice is dangerous, useful, and often done. It is especially useful in cryptanalysis, but one needs good detached judgment to estimate the initial probability of a hypothesis that is suggested by the data. Cryptanalysts even invented a special name for a very far-fetched hypothesis related after looking at the data, namely a 'kinkus' (plural: 'kinkera'). It is not easy to judge the prior probability of a kinkus after it has been observed."
139 "Fisher once said privately that many of his clients were not especially intelligent, and this might have been part of his reason for avoiding Bayesian methods." [Ouch]
139 "It is curious that Fisher introduced general features of statistics for estimation purposes, but not for significance tests. He seemed to select his significance-test criteria by common sense unaided by explicit general principles." [Ouch again!]
140 "It is a common fallacy that if a concept is not precise then it does not exist at all. I call this the 'precision fallacy.' if it were true then it is doubtful whether any concepts would have any validity because language is not entirely precise though it is often clear enough."
140ff On Bayes factors and tail-area probabilities: comment here on parapsychology experiments where the experiments are large with proportional bulges very small, see for example Helmut Schmidt's experiments which had a 52.5% success rate in 6,400 trials; "I have found that, for sample sizes that are not extremely large, there is usually an approximate relationship between a Bayes factor F and a tail area of probability P. I shall now discuss this relationship." [There's a technical discussion here that follows for about a page and a half here] "I personally am in favor of a Bayes/non-Bayes compromise or synthesis. Partly for the sake of communication with other statisticians who are in the habit of using tail-area probabilities, I believe it is often convenient to use them especially when it is difficult to estimate a Bayes factor. But caution should be expressed when the samples are very large or if the tail area probability is not extremely small."
144ff Discussion here of the Neyman-Pearson approach, "which is to consider the probabilities of errors of the first and second kinds." [Again, think of it like a specificity test.] "An error of the first kind is defined as the rejection of the null hypothesis H when it is true, and an error of the second kind is the acceptance of H when it is false." An example given here in clinical trials [note this is long before the corruption in pharmaceutical trials that we have today], whereby overemphasizing Type I errors and neglecting Type II errors led to many ineffective clinical trials, for example there is a analysis of 71 negative randomized controlled trials where 50 of the trials had a 10% risk of missing a 50% therapeutic improvement. "This poor performance might have been avoided if the experimenters had allowed for errors of the second kind when planning their experiments. They would have realized that their samples were too small." Also a discussion of ethics in medical trials--which is why samples are small, because there are risks to the volunteers--the author argues that this ethical difficulty can be overcome either by social contract or if the patients voluntarily accept compensation. [Interesting sidebar right there.]
145 "The precise significance test and the value of a are supposed to be determined in advance of the experiment or observation. I have already argued that one cannot always sensibly determine a significance test in advance, because, heretical though it may be in some quarters, sometimes the data overwhelmingly suggests a sensible theory after the data are examined."
145ff On surprise: "The evolutionary value of surprise is that it causes us to check our assumptions. Hence if an experiment gives rise to a surprising result, given some null hypothesis H, it might cause us to wonder whether H is true even in the absence of a vague alternative to H. It is therefore natural to consider whether a statistical test of H might be made to depend upon some 'index of surprise.'"
146 "...sometimes a surprising event is regarded as 'merely a coincidence' because we cannot think of any reasonable alternative to the null hypothesis." [Hmmm, one can't help but think about medical or vaccine injuries, where we call them "coincidences" because we don't want it to be true that they were caused by the medical intervention...]
147 "So far in this article the emphasis has been on whether a hypothesis is probable, but the selection of a hypothesis depends also on its utility or on a quasiutility such as its power to predict or to explain. If this were not so we would always prefer a tautology such as 2 = 2 to any more informative hypothesis."
Chapter 15: Explicativity, Corroboration, and the Relative Odds of Hypotheses
149 [This paper contains a fair amount of criticism of Karl Popper, showing that induction and subjective estimates of likelihood are not only useful but absolutely necessary, even in areas where Popper denies they exist!] "In this paper I shall discuss probability, rationality, induction and the relative odds of theories, weight of evidence and corroboration, complexity and simplicity (with a partial recantation), explicativity, predictivity, the sharpened razor, testability and metaphysicality, and gruesomeness."
149ff On the shifting meaning of "Bayesian": "in 1950 there were so few Bayesians that they clearly formed a cluster including myself. [See below where the author offers a witty dot plot where he groups statisticians by name: the author argues that he used to be a Bayesian but now it looks like he's representing a Bayes/non-Bayes compromise. "The meaning of words are often determined by clusters in abstract spaces."]
150ff Subjective probability: to the author this means his personal probability if he makes an attempt at coherence or consistency with statistical axioms; however for a purely snap judgment he uses the expression "psychological probability." And then the term "logical probability" or "credibility" where he means the unique rational belief that something is probable. "Whether logical probability exists or not it is an ideal to hold in mind like absolute truth."
151ff On Popper's comment that it is impossible to generate knowledge from ignorance. The author says "It seems to me that this is precisely what mammals have been doing for the last billion years, and even I once heard Popper state that science is based on a swamp." Just as language develops adaptively so can probability judgments. Also on the fact that we do not know how judgments are made: if we did we would call them inferences; also this quote,"Honest objectivism leads inevitably to subjectivism... It is the denial of the need of subjectivism, not its acceptance that is the chronic illness."
152ff The author outlines his philosophy here: "I believe that, of the various interpretations of probability, the most operational, the one closest to action, is subjective or personal probability, for it enables us to extend ordinary logic into a useful general-purpose system a reasoning and decision-making."
153 "To be more precise it is better to describe my philosophy as an adherence to the 'black box theory of probability and rationality.' The black box is supposed to contain the axioms of the subject; it has an input that consists of inequalities between probabilities, probability ratios, utilities, expected utilities, etc. These inequalities constitute a 'body of beliefs.' the output consists of discernments. By a 'discernment' I mean a judgment that becomes compulsory once it has been deduced... The purpose of the theory is to enlarge the body of beliefs and to detect inconsistencies in it, where upon the judgments need revision. In this respect it resembles Aristotelian logic."
154 Discussion of using physical probabilities as if they existed [and then the discussion begins to get a little bit metaphysical here] this is what a consistent subjectivist would do; and then on a discussion of what "as if" means; and if all our concepts are no more than "as if" you might as well just drop the "as if" out of the discussion: if a concept can always be used in it "as if" way then we might as well just say that the concept is real. "Perhaps this is an adequate definition of reality."
156ff Discussion here of Godel showing that even arithmetic is infinitely complex; the author argues that it is possible the laws of physics are also infinitely complex; it's uncertain that even physics can be described without using reference to the Godel problem; then a discussion of a summation of the probabilities of all the mutually exclusive theories that are not infinitely complex; the author argues this is non-zero and says this refutes Popper's claim that in an infinite universe the probability of a fundamental law is always zero. "The only sensible conclusion is that all those laws that are not self-contradictory have positive probability."
158 ".. we often look at the experimental results first, and then formulate hypotheses. We usually test them by further experiments, but farfetched hypotheses would need extra strong corroboration because to say that they are far-fetched is to say that their initial probabilities are low."
159ff Discussion of the weight of evidence, beginning with Peirce (1878); on weighing the factors of the evidence using a Bayes factor. "The concept of weight of evidence was central to my first book and occurred also in at least 32 other publications. What I say thirty-three times is true."
162ff On explicativity and productivity; on at least three kinds of explanation:
1) semantic explanation or elucidation
2) informative explanation
3) purely theoretical explanation
162-3 Extremely interesting example here about planetary motion when humanity moved from one view (geocentric) to another (heliocentric); showing an instance where we are forced to the view that explanations depend on evolving probabilities when the explanation does not involve any new empirical observations. This is a good example of a purely theoretical explanation from above. Also instead of a clumsy expression "strong explanatory power" the author uses the single word "explicativity."
163ff On induction and the truth of theories: he starts off with a witty comment: "When reading Popper's work I assume, by the principle of induction, that the words he uses have much the same meanings as they seem to have had in the past in other writings... Then again, when he says that induction is not used for the acceptance of scientific theories but by their 'proving their mettle' in virtue of our honest attempts to refute them, I assume that he has noticed this happening in the past and, by scientific induction, he expects this to continue in the future." [!!!! He traps Popper inside his own argument. Hilarious.] "He has formulated a scientific hypothesis here, belonging to the area of the sociology of scientists, a hypothesis that, if it can be accepted at all, will either have to be accepted by scientific induction, or because it proves its mettle by surviving our honest attempts to refute it."
164 "Hume argued that induction cannot be logically justified because induction is needed to justify it. Equally we could argue that mettle-proving cannot be logically justified either by induction or by another mettle-proving operation, so we are once again in infinite regress, or we hit metaphysics. Similarly Popper often makes statements in the present tense of the form that 'we can learn from experience" (Popper, 1962, p. 291). I think what he means is that in the past we have learned from experience but there is presumably and implication that we shall go on doing so. If he means that then he seems to have accepted a principle of induction as applied here to a hypothesis in psychology." [Again, well done.]
166 "If Newtonian mechanics were stated with appropriate limitations, including upper bounds on all relative velocities and lower bounds on the accuracies of the observations, I would expect its discrepancies from the Special Theory of Relativity to be negligible. In this modified form, Newtonian mechanics would not be refuted as compared with Special Relativity, but would merely explain a smaller collection of observations and would have less explicativity and less predictivity. Such a modified form of Newtonian mechanics would perhaps be strictly true, and a subjectivist might be able to say that its probable truth had been established inductively. Such an approximative form of Newtonian mechanics is of course still extremely useful and cannot reasonably be said to have been refuted."
167ff On Popper's idea of testability and refutability/falsifiability as a criterion demarking between science and non-science. The author adds "checkability" as a measure which embraces processes that are both confirming and disconfirming; the author gives certain examples like the theory that horses still exist on Earth, saying are we then to say that the nonexistence of horses is more scientific than their existence? [Here referring to Popper's falsifiability idea]; also "It has always struck me as surprising that Popper does not discuss statistical significance tests in his books... It is not usually pointed out that many scientific theories satisfy this definition. For example, provided a precise assumption is made for the law of error of observation, the following theories are simple statistical hypotheses: Newtonian Mechanics, Special Relativity, General Relativity, Quantum Mechanics, and Classical Statistical Mechanics."
168ff Discussion here of how corroboration can be just as useful as refutation, by which we can obtain a large positive or large negative weight of evidence. "...a modern Bayesian philosophy shows why refutability is usually more important than corroborability but not always. In other words the Bayesian philosophy explains how it is possible for a philosopher of science to fall into a dogmatic position and put all the emphasis on refutation: it explains the existence of a Popperian philosophy and at the same time improves it."
Part IV: Information and Surprise
Chapter 16: The Appropriate Mathematical Tools for Describing and Measuring Uncertainty
173 "In this paper I shall be concerned less with decisions that are made than with those that are rational... The function of the theory is to introduce a certain amount of objectivity into your subjective body of judgments, to act as shackles on it, to detect inconsistencies in it, and to increase its size by the addition of discernments. It is not misleading to describe the discernments as implied judgments. We do not yet know precisely how the mind makes judgments: if we did we could build a machine to do all our thinking for us. Until we can do this it will be necessary to describe scientific techniques with the help of suggestions as well as axioms and rules."
173ff Comments here on degrees of belief: we can order our degrees of belief by the potential degrees of surprise associated with them; discussion of a few different surprise indexes and their formulas.
174-5 Subtly amusing comment here on utilities: "It is possible that infinite utilities could occur in questions of salvation and damnation, as suggested by Pascal (1670)..."
175 "Some philosophers regard the phrase 'degree of belief' as metaphysical. They would presumably prefer to use the theory with a body of decisions rather than a body of beliefs."
176 On fair fees: "When we engage a professional expert to make probability estimates... we may have already formed our own 'amateur' estimates of the probabilities." And then a joke about how these fair fees as calculated can be used as a method of introducing piecework into the Meteorological Office. And then: "When making probability estimates it may help to imagine that you are to be paid in accordance with the above scheme."
Chapter 17: On the Principle of Total Evidence
178ff Carnap and his "principle of total evidence" which is a recommendation to use all the available evidence when estimating a probability; on the idea that it does pay to take into account further evidence provided the cost of doing so can be ignored.
Chapter 18: A Little Learning Can Be Dangerous
181ff On the idea that it pays you to acquire information when that information is free but this idea may break down when the expectation is computed by someone else, it depends on the other person's level of informedness, or whether they have better judgment.
Chapter 19: The Probabilistic Explication of Information, Evidence, Surprise, Causality, Explanation, and Utility
184 This paper is a review of some of Irving Good's life's work in mathematics of philosophy; he spares the reader from arguing the case of subjective probability and instead just quotes a remark from Henry Daniels that makes subjectivity clearly self-evidence throughout statistics--whether objectivist statisticians want it there or not: "Each statistician wants his own methods to be adopted by everybody." --Henry E. Daniels
184ff On the weight of evidence approach; also "what Carnap calls the desideratum explicatum approach to the analysis of linguistic terms." Basically this means first figuring out what it is that you're looking for, and then sharpening the language of an originally vaguely defined term. Again comments on on Type I rationality (logical consistency with the axioms of rationality) but then rationality of Type II, which is that which you can adopt in practice, say for example, if a decision is extremely urgent, or in a situation of evolving or dynamic probabilities and the intervention of new empirical information.
186ff On the difference between evidence and information: see for example the evidence for or against a hypothesis in legal circumstances, versus information relevant to or concerning a hypothesis. Or information discriminating a hypothesis from its negation.
187-8 Another discussion of Hempel's paradox of confirmation: all crows are black; all non-black objects are non crows; therefore a white shoe supports the hypothesis that all crows are black. The point of this paradox is that "if left unresolved it would undermine the whole of statistical inference."
189ff Finally a discussion of utility, for example, when a statistician is wrong when they describe a certain distribution incorrectly, such as "variable x has the distribution G" when the true distribution is F.
191ff Comments here on Markov chains which are hypotheses that are independent from each other, memoryless; Note here then the author mentions a medical diagnostic search tree, where you want to minimize the entropy of a set of mutually exclusive diseases.
Chapter 20: Is the Size of Our Galaxy Surprising?
193ff This is just a brief excerpt from the paper, asking "Is the expected size of our galaxy larger than the average size of a galaxy? The answer is yes..." This is basically a Bayesian analysis based on only one observation, the galaxy that we happen to be in. The expected size of the galaxy conditional on our being in it, divided by the average size of a galaxy. This result is greater than one, thus yes it is a surprise! [Note: I recommend taking a look at Chapter 6 of Algorithms to Live By for a better, more layperson-friendly discussion of single observation inferences like this.]
Part V: Causality and Explanation
Chapter 21: A Causal Calculus
197ff The author uses the phrase "the tendency of one event to cause another one" and looks for a way to explain the phrase quantitatively; on statistics contributing to the philosophy of science and adding to an improved understanding of the function of randomization; on the temporal relationship between event F and another event E, on the idea of F happening before E; also side thoughts on definitions of terms that he leaves out of this paper, and then also on the idea that the future may affect the past and how to cope with that problem--but these were left out of the final draft of this paper.
197 On various factors playing into the weight of evidence of F to cause E: the strength of the causal chain, the causal net joining F to E; causal chains and causal nets will be defined in later sections.
199ff [Long section of notation here, which the author says the reader can skip!] A discussion of various notations and ideas like the strength of a causal chain joining event F to event E is only as strong as its weakest link, and if any part of the chain is cut the causation goes away; the chain also is already cut if any of the links is of zero strength; he goes over a list of various axioms like this to create a sort of formal language; and then various theorems derived, once again such as "a chain is only as strong as its weakest link" or "the resistance of a chain is equal to the sum of the resistances of its links"; a definition of the strength of a net a causal net consisting of a certain number of chains in parallel; a set of chains in parallel can be summed up to the strength of a net of chains, etc.
205 Comments here on how an analogy to electric networks is useful here.
209 "A well known pitfall in statistics is to imagine that a statistically significant correlation or association is necessarily indicative of a causal relationship. The seeing of lightning is not usually a cause of the hearing of thunder, though the two are strongly associated." A discussion of both spurious or illusory correlations, also partially spurious correlations, the author gives an example of how smoking and dust exposure might be a strong cause of lung cancer, but smoking by itself is only a weak cause. Thus we can find a (possibly spurious) correlation between smoking and lung cancer to be high in areas where there's pollution or high dust levels. [The discussion throughout this book where he uses examples of smoking and lung cancer are very interesting because it does indicate that the connection between these two is likely lower, possibly much, much lower than we have been brainwashed to believe.]
212 Further discussion on causal nets and networks that work in series or in parallel, or even causal nets that have aspects of independence; on a causal net leading to event E beginning with event F. The author's language here is the causal net "leads from" F and "leads to" E.
215 "... whether degrees of causality exist is a matter of physics, even if we take for granted that physical probabilities exist."
215 "There is always the possibility that something has been overlooked. Even in a statistical experiment involving randomization, from which we can apparently deduce that some x(E:F) is large, in fact E and F may both have been caused by some preceding events... We are always thrown back on judgment."
216 Interesting example here of where both Q (the causal support or the tendency of F to cause E) and X (its likelihood) cannot be identified: the author describes Sherlock Holmes at the foot of a cliff, where at the top of the cliff is Dr, Watson, Professor Moriarty and a large boulder: Watson knows that Moriarty intends to push the boulder onto Holmes's head, killing him, and so Watson decides he'll instead push the boulder, but at an angle so that it will miss Holmes. "Watson then makes the decision (event F) to push the boulder, but his skill fails him and the boulder falls on Holmes and kills him (event E)." This example shows that the tendency of F to cause E and the likelihood of F to cause E cannot be identified since F had a tendency to prevent E and yet caused it. "We say that F was a cause of E because there was a chain of events connecting F to E, each of which was strongly caused by the preceding one."
Chapter 22: A Simplification in the Causal Calculus
218 [This just simplifies certain assumptions in the prior paper and eliminates some of the calculations.]
Chapter 23: Explicativity: A Mathematical Theory of Explanation with Statistical Applications
219ff "By explicativity is meant the extent to which one proposition or event explains why another one should be believed. Detailed mathematical and philosophical arguments are given for accepting a specific formula for explicativity that was previously proposed by the author with a much less complete discussion. Some implications of the formula are discussed, and it is applied to several problems of statistical estimation and significance testing with intuitively appealing results. The work is intended to be a contribution to both philosophy and statistics."
219 On the word choice "explicativity" versus explanatoriness; the author's word choice is intended to be more quantitative, and it also "has a more euphonic plural" [one thing I love about this author, he has a sense of elegance]. On explicativity thought of as a quasi-utility when ordinary utility is difficult to judge. "The need for at least a rough measure of explicativity arises in pure science more obviously than in commerce where utilities can often be judged in financial terms."
220 "The advantage of the mathematics of philosophy over classical philosophy is that a formula can be worth many words. The topic is mathematical because it depends on probability. In this respect explicativity resembles some explications for information, weight of evidence, and causal propensity, and it will be convenient to list these explications first, without details of their derivations."
220 "It may be possible sometimes to invert our approach, and to use explicativity inequalities to aid us in our probability judgments."
220ff Discussion of the various notation the author is going to use here.
221 Interesting comment here in the context of using the letter U as a notation of a "screening off" of causality under the usual assumptions about the nature of time; the author returns to the example of lightning and thunder: "...seeing a flash of lightning is not an important cause of hearing loud thunder soon afterwards. Both events were caused by a certain electrical discharge. Equally, the thunder is not explained by the visual experience of lightning. On the other hand seeing the lightning does explain why one believes that thunder will soon occur; whereas hearing thunder is a good reason for believing that the lightning flash previously occurred. The experiences are thus valid reasons for prediction and retrodiction respectively."
222 Discussions of the "large and interesting literature on the philosophy of explanation" including Mill 1843; Hempel, 1948; Braithwaite, 1953; Popper, 1959; Nagel, 1961; Scheffler, 1963; Kim, 1967; Rescher, 1970; Salmon, 1971.
223 More terminology: the explanandum which is the thing to be explained and the explanans which is what explains it [this reminds me a bit of Julian James and the words he builds around metaphors: metaphrand (the thing the metaphor describes) and metaphier (the thing or relation used to elucidate the metaphor) in his wonderful The Origin of Consciousness and the Breakdown of the Bicameral Mind]; on the categories of explanation basically corresponding to what, how, and why, some nuances here: for example, on explaining what there's a philosophical level of explanation with precise meanings and analytic consideration of the usage of words; also with explaining why it could be an event, a class of events or a scientific law, and then there's subtleties on to what extent it should be believed. Also a nuance here on a phenomenon E that we may know is true and yet still demand an explanation, and also on differentiating between a belief in E and the belief of the cause of E, but in some cases it can be both; see for example a circumstance where the author explains "Observing the shadow of an elephant can explain why we believe an elephant is present; whereas observing an elephant can explain both why the shadow is there and why we believe the shadows should be there."
224ff Various nuances with explanations: there could be physical laws that serve in multiple contexts, like gravity which explains why apples fall as well as why the planets have specific motions; sometimes laws of nature that form part of an explanation are ignored or taken for granted because of their familiarity; "the window broke because Tom threw a stone at it" lets us break things down to specific physical laws; also discussion of various boundary conditions that may happen in say deterministic physics; also on the "intimate relationship between explanation and causation" for example "the broken window was both caused and explained by Tom's naughty behavior."
226 "We regard explanations as good or bad depending in part on whether the probability of the explanans is high or low."
226 Comments here on levels of explicativity: see for example the hypothesis that Tom threw a stone at the window has more explicativity than the Mother Superior did it, because we believe Tom is naughtier as well as a better shot than the Mother Superior; but this of course would change if we saw the Mother Superior throw a stone vigorously! Thus the latter would be an explanans that has very high informed explicativity. Both putative explicativity [Tom is naughtier and a better shot than the Mother Superior] and informed explicativity [we actually saw the Mother Superior throw a stone] are measures of explanatory power of F with respect to E and will seldom include certainty of F.
227ff Discussion of what the author calls an early historical approach to explanation: the most naive is E is explained by H if H logically implies E. The author says this is neither a necessary nor sufficient condition for H to be a good explanation of E. On William of Ockham and Aristotle both recommending "simpler" explanations: in other words if there are two completing hypotheses explaining E the simpler is to be preferred; although the author writes that although "the initial probability of a hypothesis has something to do with its simplicity the relationship is not obvious." The author argues that we need not refer explicitly to simplicity or complexity, and then he takes up the matter further in Appendix A at the end of this paper.
230ff On the author's "sharpened razor" concept: choosing the hypothesis that maximizes explicativity with respect to E for all known evidence; also on the idea that probabilities are dynamic on some level because you might test a hypothesis that has low initial probability but you use it because it is testable; also a theory with low initial probability but with high final probability means that the theory was informative.
231 On repeated trials: it can be a compound event or a time series which describes a probabilistic outcome of an experiment performed independently N times. With a large N the frequencies of various outcomes "settle down" to a distribution.
232 On predictivity: a notion that is necessarily vague, a type of special case of putative explanation made before the experimental result occurs and thus it is natural to measure the preductivity of a hypothesis. "The notion of predictivity is necessarily vague; but it might be defined as the expected explicativity over all future observations with a discounting of the future at some rate. The concept is important in spite of its vagueness." [In investing, it is critical to focus on the future and on what's going to happen, so you always traffic in predictiveness and you always want to make sure your insights (or the insights of the people whose advice you follow) are predictive.]
233ff On collateral information versus background information: see the example here:
E: Jones won the Irish sweepstake
H: Jones bought a ticket in this lottery
In this case if we know that H is true it makes the probability of the explanans H equal to 1. The author contrasts this with background information: for example Tom was at the scene of the crime, which thus improves the probability that he threw a stone at the window. Note also that with the lottery ticket the purchase of the ticket did not do much to cause E although it was a necessary condition for it. "If Jones had not won the sweepstake, it would have been negligible evidence against his having bought a ticket, so... the causal propensity of the purchase is small." Another example: if someone is hit by a meteorite while out walking we would not blame her for suicidal tendencies; her decision to go out for a walk was a necessary condition, but if she were never hit by a meteorite that would have been negligible evidence that she was indoors when the meteorite fell.
234 Interesting comments here on simplicity versus elegance or aesthetic appeal of a hypothesis, and on how physicists in particular love a truly elegant theory: Good quotes a scientist [Paul] Dirac, from 1963, saying "...it is more important to have beauty in one's equations than to have them fit experiment." Then the author himself writes "I believe the beauty is often a matter of simplicity arising out of complexity arising out of simplicity."
235ff The final comment here in an appendix on complexity, specifically complexity in the linguistic string of an explanation; the author suggests the idea of defining complexity as the weighted length of the shortest way of expressing it; also another measure of the complexity of a scientific theory is the number of independent axioms in it; the author considers this also a useful rule of thumb but it does not allow for the relative complexities of those axioms. "In practice, the beauty of a theory, rather than its simplicity, might be more important when estimating initial probabilities... To fall back on beauty as a criterion is presumably to admit that the left hemispheres of the brains of philosophers of science have not yet formalized the intuitive activities of the right hemispheres.
Vocab:
Hypallagous: Hypallage is a rhetorical device where a word is grammatically associated with another word that it doesn't logically modify, essentially swapping the two words in a phrase. (examples: A careless remark, a sleepless night, her beauty's face, the happy road home, etc.)
To Read:
J.M. Keynes: A Treatise on Probability
J.M. Keynes: Essays in Biography
Karl Popper: The Logic of Scientific Discovery
Karl Popper: Conjectures and Refutations
Rudolf Carnap and Richard C. Jeffrey: Studies in Inductive Logic and Probability
J.K. Feibleman: An Introduction to the Philosophy of Charles S. Peirce
R.A. Fischer: Smoking: The Cancer Controversy
H. Reichenbach: The Direction of Time
H. Reichenbach: Modern Philosophy of Science
Herbert Simon: Models of Man


