Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
In the expanded form of Bayes’ rule, why did you write “$P(E|\neg H) - P(E|\neg H)P(H)$” in the denominator rather than “$P(E|\neg H)(1-P(H))$” or “$P(E|\neg H)P(\neg H)$”? It seems like a weird choice to me because it disguises the symmetry between $H$ and $\neg H$.
Great that you added something to the article! Not that it really matters, but surely the default name should be Bayes’ rule, with the apostrophe at the end, with a redirect from Bayes rule without the apostrophe?
I would second that point about Bayes’. Punctuation that is not standard should be avoided (when it does not interfere with functionality).
Bayes rules!
I left the apostrophe out of the page name for the same reason that it's left out of Stokes theorem. It may be ‘Bayes's Rule’ or ‘Bayes' Rule’, but it's also ‘the Bayes Rule’. So rather than argue about whether to use the old rule (singular nouns and names ending in an /s/ sound (or other sibilant) take only an apostrophe in the possessive, just like plurals formed by adding the letter ⟨s⟩ (and possibly other changes) do) or to use the new rule (all singular nouns and names take an apostrophe and an ⟨s⟩ in the possessive, no exceptions), we don't use a possessive at all, but use the name as an attributive noun.
The practice of using attributives rather than possessives is catching on mostly due to naming things after multiple people (compare ‘the Kelvin–Stokes theorem’ to ‘Kelvin's and Stokes's theorem’, which would be ‘Kelvin and Stokes's theorem’ if they had worked together), but not having to argue about possessives is another good reason. (But if you want to argue about possessives, then I say that it's ‘Stokes's’. Way back in 1926, Fowler wrote
It was formerly customary, when a word ended in -s, to write its possessive with an apostrophe but no additional s, e.g. Mars’ hill, Venus’ Bath, Achilles’ thews. In verse, & in poetic or reverential contexts, this custom is retained, & the number of syllables is the same as in the subjective case, e.g. Achilles’ has three, not four; Jesus’ or of Jesus, not Jesus’s. But elsewhere we now add the s & the syllable, Charles’s Wain, St James’s not St James’, Jones’s children, the Rev. Septimus’s surplice, Pythagoras’s doctrines.
and while we may revere George Stokes and Thomas Bayes and find poetic beauty in their mathematics, I still advocate using modern grammar when talking about them.)
@Oscar
I didn't write $P(\neg{H})$, because I wanted it broken down into the simplest concepts. The same goes for expanding rather than factoring subexpressions, although that's not as significant. There are certainly intermediate forms of the rule, but I just wrote out the two most extreme forms.
Although come to think of it, $P(E|H) P(H) - P(E|\neg{H}) P(H) + P(E|\neg{H})$ might be an even better way to put the denominator than $P(E|H) P(H) + P(E|\neg{H}) - P(E|\neg{H}) P(H)$ (as I put it). That way, it's clearer that there are two ways to write it with factored subexpressions: $P(E|H) P(H) + P(E|\neg{H}) \big(1 - P(H)\big)$ (as you suggested) or $\big(P(E|H) - P(E|\neg{H})\big) P(H) + P(E|\neg{H})$.
@Toby: I agree with you. In fact I had been idly looking up Stokes and his history and noted ’the Kelvin-stokes theorem’ and realised the one can argue (as you did) that ’Stokes theorem’ is related to the use of ’the Stokes theorem’, i.e. as an adjective rather than a possessive. Yes I like that;-)
Hehe, I just have never heard anyone say anything other than Bayes’ theorem or Stokes’ theorem (’the Bayes theorem’ actually sounds to me as though the definite article has been added for comical effect!) , but it’s not important, I don’t mind being outvoted :-).
I’ve never heard anyone say something like “the Bayes theorem” when there is only one person named. It sounds weird to me, but I could live with it as a back-formation from the multi-person case. But I don’t see any grammatical justification for dropping the article.
We don't put ‘the’ in page titles (except for generalized the, which is about the word). That's not an nLab thing or even a math thing; almost nobody does that. But you're right that one couldn't drop it in running text without bringing in the possessives. (English needs a definite determiner here, which could be the definite article or could be a possessive but cannot be an attributive noun.)
As for ‘Stokes theorem’, sometimes I use that in the plural! (See the title of Chapter 7 in these notes, for example.) The idea is that the specific theorems taught in Vector Calculus, such as the Kelvin–Stokes Theorem and the Ostrogradsky–Gauss Theorem, besides being special cases of the one overarching Stokes Theorem, can also be considered as separate Stokes theorems when treated individually.
For what it’s worth, my feeling would be that we should try, on the main nLab, to be as unobtrusive as possible in our choices, irrespective of our personal preferences. Thus for example I would probably not look at the title at all if I went to look at Bayes’ theorem, which is why I don’t feel it’s important, but if I did I would certainly react a bit upon what I would interpret as careless grammar (remembering that people are not typically going to look here for an explanation). If that is the way that the majority of people would react as well, my feeling would be to choose something more conventional.
to be as unobtrusive as possible in our choices
but if I did I would certainly react a bit
I think that’s a basic principle of good mathematical writing: not to distract the reader by calling attention to itself.
Maybe I've just gotten so used to leaving off possessive suffixes in the names of theorems and the like (even though I'm not consistent about it), but nothing about ‘the Bayes Rule’ (or ‘Bayes rule’ as a title) would distract me. Do people feel the same about ‘Stokes theorem’ (which is how it first appeared here, nearly 8 years ago)?
For what it's worth, there is a little more explanation of the name in the article now: no treatises on grammar, but enough variety to get across the idea. In particular, the exact phrase ‘the Bayes Rule’ now appears.
Yes; I would also prefer “Stokes’ theorem”.
I too would prefer Stokes’ theorem, yes. But I do think what you have is fine really, I don’t feel strongly about it.
If in HoTT, $\sum_{x:A} B(x)$ for propositions $A$ and $B$ is a type that can represent the more typical conjunction of dependency, “It’s raining, and doing so heavily”, rather than the kind we teach our logic students, such as “$2+2= 4$ and London is capital of the UK”, perhaps we can see Bayes Rule as resting on the relation between two ways to factor a dependent sum/pair:
$\sum_{x:A} B(x) \simeq \sum_{y:B} A(y).$It has often been observed that probability is an extension of logic. I wonder if an intuitionistic bayesianism would be the path to pursue here.
@David I don’t quite follow how you factored that. $A$ is a type and $B$ is a type family, so the second sigma doesn’t make sense to me.
Sorry, I wasn’t being precise. I was imagining something like a subset/relation $R$ of $A \times B$ for two sets $A$ and $B$, and then thinking of $B(a)$ as the subset of $B$ which are $R$-related to $a$ in $A$. Then likewise for $A(b)$.
I can’t see that much has been written on extending Martin-Löf type theory with probabilities. I started compiling a list of where probability theory meets category theory/type theory. This includes
Crane considers a monad, $P$, acting on types-as-propositions to convert them into conjectures whose elements are corresponding pieces of evidence. Then we might have a proof of inference between two propositions, $f: A \to B$, acting on evidence for $P A$ to give evidence for $P B$. Plausibility comes before probability here.
Are there any other options? What precisely is one to assign a probability to?
I was imagining something like a subset/relation $R$ of $A \times B$ for two sets $A$ and $B$, and then thinking of $B(a)$ as the subset of $B$ which are $R$-related to $a$ in $A$. Then likewise for $A(b)$.
So then $R$ is a dependent type on $A \times B$ and the rule you want is the functoriality of $\Sigma$ along $A \times B \to B \to \ast$ which equals $A \times B \to A \to \ast$:
$\underset{a \colon A}{\sum} \left( \underset{b \colon B}{\sum} R(a,b) \right) \;\simeq\; \underset{b \colon B}{\sum} \left( \underset{a \colon A}{\sum} R(a,b) \right)$Yes, good. And, of course, as you have it, $R$ could be a dependent set, so a span. Then it’s easy to see how something like $P(a) P(b|a) = P(b) P(a|b)$ could arise, say, from a uniform distribution over $R$ elements.
But how to phrase things in terms either of the ’$\Gamma \vdash A$ is true’ or ’$\Gamma \vdash a: A$’ styles of presenting type theory?
I guess you can see why the Giry monad gets introduced, where $p: A \to Dist(B)$ corresponds to a conditional distribution. Then a probability for a proposition can appear as the value corresponding to an element of $Dist(\mathbf{2})$.
Have people mixed probability theory with dependent type theory or even HoTT? I suppose you could have $p: A \to Dist(B(x))$, where $B(x)$ depends on $x: A$. Have people looked for a version of the Giry monad for higher groupoids? At the very least one might want equivariant distributions.
We were considering the reader monad as a means of expressing random variables.
Perhaps probabilistic HoTT will just be a simple exercise in the adjoint logic program, when they finish the dependent type version.
A jumble of thoughts for the new year.
Hi, Thanks for setting up the list and all the interesting links. Regarding dependent types, several things in probability do have a dependent flavour. For example, disintegration as often phrased looks a bit like it is something of the type
Dist (Sigma (a : A) (B a)) -> Pi (a : A) (Dist (B a))
(Quasi-Borel spaces are locally cartesian closed, but we haven’t investigated that properly yet.)
Re #23, this would seem to bring us close to the dependent linear De Morgan duality (Prop. 3.18, p. 43) of Urs’s paper
Perhaps that’s not surprising if we take $Dist(X)$ as some subobject of $[X, \mathbb{R}^{\geq 0}]$.
People are certainly thinking of $[0,1]$ playing a dualizing role
The role played by the two-element set $\{0,1\}$ in these classical results—e.g.as “schizophrenic” object—is played in our probabilistic analogues by the unit interval $[0,1]$ (The Expectation Monad in Quantum Foundations, p. 2)
1 to 24 of 24