Thanks. That should be added to *statistical significance*. ( am on my phone now, myself)

$2\sigma$ is the de facto standard because of an almost throwaway line by RA Fisher. He was coming from (mostly) agricultural statistics, where the number of data points and variables measured were not very large

… it is convenient to draw the line at about the level at which we can say: “Either there is something in the treatment, or a coincidence has occurred such as does not occur more than once in twenty trials.”…

If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 per cent point), or one in a hundred (the 1 per cent point). Personally, the writer prefers to set a low standard of significance at the 5 per cent point, and ignore entirely all results which fail to reach this level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance. (Fisher RA (1926), “The Arrangement of Field Experiments,” Journal of the Ministry of Agriculture of Great Britain, 33, 503-513.)

Amusingly, he implicitly is saying at the end that *repeated* (designed) experiments achieving p<0.05 almost all the time is what counts as a discovery, not one such, as is taken to be the case in eg psychology.

Thanks for these references! So it seems that we agree now that $p$-values are neither “fetish” nor “cult”, but that one’s confidence of course increases with the threshold. And I keep thinking that all the problems the non-physics communities have with it is that $p = 0.0.5$ – which is $\lt 2 \sigma$ – is just not a good threshold.

With regards to the topic of this thread on leptoquarks suggested by flavour anomalies in $b$-quark decays, I find it striking to see that both Lyons and Dorigo argued that the threshold for these should be at $3 \sigma$ significance, much less than the generally accepted $4.1\sigma$ signal already seen in each channel separately.

Will add that to the entry now..

]]>Three of many interesting points by Dorigo:

In particle physics you have an advantage – “It is quite special to physics that we do believe in our ’point nulls’”. Elsewhere, looking at the effect of a drug, it’s highly implausible that it has no effect.

Several $6 \sigma$ signals have been spurious, Table 1.

$5 \sigma$ wouldn’t have been enough to persuade him of superluminal neutrinos. His required level is THTQ (too high to quote), which shows the role of background knowledge.

This is not to do with what philosophers think (although credit to them when they’re right). As on Twitter, I’ve now pointed you to two pieces of writing by people working directly with the statistics of particle physics:

- Louis Lyons,
*Discovering the Significance of 5 sigma*, (arXiv:1310.1284)

Author of textbooks on the subject, worked for years at CERN

It would be very useful if we could distance ourselves from the attitude of ‘Require 5σ for all discovery claims’. This is far too blunt a tool for dealing with issues such as the Look Elsewhere Effect, the plausibility of the searched-for effect, the role of systematics, etc., which vary so much from experiment to experiment.

- Tommaso Dorigo,
*Extraordinary claims: the 0.000029% solution*, (pdf)

CERN physicist

]]>Forty-six years after the first suggestion of a 5σ threshold for discovery claims, and 20 years after the start of its consistent application, the criterion appears inadequate to address the experimental situation in particle and astro-particle physics.

Hi David,

only now see your #25.

I think it’s evidently hopeless for anyone not directly involved in the data analysis to judge the quality of the statistical analysis. But I also see no indication of sloppy statistical analysis in particle physics, quite on the contrary.

I am proceeding here on a very simple principle: I go and check what the experimentalists say they think they measured.

I don’t really care so much about the details of the process that makes them conclude this, since it would need my lifetime energy to dive into the matter and do it justice, I would need to become an experimentalist myself.

I don’t really care what the philosophers think that “$5 \sigma$” really means, what I care about is that the experimentalists have agreed that they will call this number to indicate that they are sure that their experiment is seeing a signal.

The literature published by experimentalists themselves is quite unambiguous about them seeing signals in flavour violating decays and in the muon anomalous magnetic momentum. I think that’s interesting, and I am recording that.

]]>added the relvant quote Crivellin 18, p. 2 (for my anonymous correspondent on Dorigo’s site here):

]]>the global fit $[$ to flavour anomalies $]$ even shows compelling evidence for New Physics $[$… $]$ The vector leptoquark (LQ) $SU(2)_L$ singlet with hypercharge $-4/3$ arising in the famous Pati-Salam model is capable of explaining all the $[$flavour $]$ anomalies and therefore several attempts to construct a UV completion for this LQ to address the anomalies have been made. It can give a sizeable effect in $b \to c(u)\tau \nu$ data without violating bounds from $b \to s(d)\nu \bar \nu$ and/or direct searches, provides (at tree level) a $C_9 = - C_{10}$ solution to $b \to s \ell^+ \ell^-$ data and does not lead to proton decay at any order in perturbation theory.

Let’s see if I can convey my perplexity about what you’ve been writing recently.

You appear to be taking my comments as arising from the “fundamental physics in crisis” viewpoint of Sabine Hossenfelder and others. This isn’t the case, as I should have thought was evident. I can’t see that there’s anything better to do than push on theoretically with string/M-theory, and experimentally to build more powerful colliders, telescopes, detectors, etc.

All I’ve been trying to do is to raise the issue of the subtleties of data analysis. You seem to be aware that there’s more to doing data analysis that citing sigma values (#17), and yet that’s all you ever mention. Have you given a serious look at how they move from data to conclusion? Do you know about issues such as ‘coverage’, the ‘Look Elsewhere Effect’, treatment of ‘systematic errors’, difficulties with the Neyman frequentist construction with several parameters? A list which goes on and on.

I have no reason to doubt van Dyk when he says that data analysis for the Higgs was “Probably the most careful frequentist analysis ever conducted”. Techniques from years of work are deployed, and I have no doubt about their conclusions. I was merely raising the question in #23 of whether people yet feel as safe applying analysis techniques in a new setting. But maybe you know more and can assure us there’s nothing new introduced in the data analysis of loop effect precision measurements.

]]>I am perplexed where you are coming from. I think it’s the exact opposite. Carelessness was the 750 GeV hunt.

It must be frustrating these days to be an experimentalist at CERN, with nobody looking at results that don’t fit the prejudice, every armchair theoretician going on about how there are no experimental results. Strange times. At least Dorigo “recently” changed his mind… :-)

]]>Given the often expressed point, made for example here by Sinervo

- Pekka K. Sinervo,
*Signal Significance in Particle Physics*, in Proceedings of*Advanced Statistical Techniques in Particle Physics*Durham, UK, March 18-22, 2002 (arXiv:hep-ex/0208005, spire:601052)

what is an appropriate criteria for claiming a discovery on the basis of the $p$-value of the null hypothesis? The recent literature would suggest a $p$-value in the range of $10^{-6}$, comparable to a “$5 \sigma$” observation, provides convincing evidence. However, the credibility of such a claim relies on the care taken to avoid unconscious bias in the selection of the data and the techniques chosen to calculate the $p$-value,

wouldn’t a simple explanation of delay in accepting novel indirect detections be that it takes time to build up confidence in that “care”?

]]>The point about perception of “direct detection events” over “loop effect precision measurement” which I made in #8 and #14 I have forwarded to the experts here. Particle physicist Tommaso Dorigo seems to quite agree with it.

]]>The insistence on objective statistical significances is precisely to rule out such mistakes of theory and see what the experiment actually gives.

Maybe this is a can of worms not worth opening without a lot of time. The Bayesian/frequentist debates have rumbled on for many decades now. The machine learning group I belonged to in 2005-7 divided along those lines. Further back, I hosted a particularly ill-tempered meeting on the division back in 2001, which had one side accuse the other of having a mother unjustly sent to jail for not understanding the dependence of the chances of sudden death in her two children.

The accusations in their simplest form amount to something like:

Bayesians: You unthinkingly use your techniques without proper consideration of the context. You present your results as objectively determined by hiding your reliance on subjective appraisal.

Frequentists: You make statistics a subjective process. You fail to test as severely as possible by your reliance on background knowledge.

It would be interesting to delve further into the case of particle physics where there is so much background knowledge and so much data. Perhaps things “wash out” in this case, so that how to act is the same on either approach.

When I have a moment I’ll revise statistical significance.

]]>A flavour of his views concerning significance levels can be gained from here:

The reason why practically every particle physicist is highly confident that the Higgs is in the region indicated by LHC has little to do with the number of sigma’s (I hope the reader understands now that the mythical value of 5 for a ‘true discovery’ is by itself pure nonsense…

I find this argument dangerous,
. Just recall that many of the arguments that said “the Higgs now *has* to be in this low mass window” continued with “and therefore its large quantum corrections necessarily *have* to be canceled by weak-scale supersymmetry”.

The theoretical argument worked in the first case, but it failed in the second. Which means it cannot universally be relied on. One needs/wants an actual experimental test to support the theoretical arguments.

The insistence on objective statistical significances is precisely to rule out such mistakes of theory and see what the experiment actually gives.

Of course the theoretical arguments are needed to decide where to look and which experiment to do, which in the case of the Higgs is how they helped to find it. I’d disagree with D’Agostini in his ridiculing of how the Higgs detection results were announced as if this were a signal showing up anywhere, independent of the strong expectation to find it right there. This should be the established practice of good statistics, to avoid bias in interpreting data.

Theoretical bias is, incidentally where these leptoquarks come in: Leptoquarks are currently a good (maybe the best) potential theoretical/conceptual explanation of the joint effects seen with high statistical significance both in B-meson decays and in the muon magnetic moment

Such potential theoretical explanations are necessary to decide where to look next. For instance there is a good argument that the anomalies seen both in the B-meson decays as well as the muon anomalous magnetic moment are sensitive to meson loop contributions, which would prefer the hadron version of the next collider (say FCC) over the lepton version, a question under lively debate as we speak.

]]>You seem to have already started to immerse yourself into the topic. Why not start some paragraphs on some dedicated page?

]]>Yes, not something for now, but it’s an odd situation where a calculation to a number of sigma, leads to a probability based on frequencies of signals surviving which achieved that sigma, but having nothing to do with what one normally understands by $n \sigma$. So $p_n$ = proportion of signals surviving out of signals achieving $n$ sigma. Then $p_5$ is deemed small enough.

Of course, you’d expect the kind of situation you’re studying to have a bearing on expectation of signal survival, contrast being near certain a priori that the Higgs is somewhere in a vicinity to being very uncertain whether there are leptoquarks.

]]>These statistical errors are a thorny topic in themselves, there is plenty of discussion about the pro-s and con-s out there, would be good to produce some digest of that, maybe at *standard deviation*.

There is loads of assumptions that goes into naming the number of $\sigma$-s, notably regarding systematic errors. Less than a mathematical fact it’s a matter of experience in the particle physics community over the decades that whenever the $\sigma$-number they assigned to some observation (whatever it really is or means) was less than four, they experienced cases where this apparent signal later disappeared, less so when the number was higher than 4. That’s why at some point they declared it must be 5 $\sigma$ to be on the safe side.

But since the statistics is all available, with enough energy, we could dig into more details as much as desired. But as my personal time is better used elsewhere, I’ll stick with those $\sigma$-s for the time being.

]]>A flavour of his views concerning significance levels can be gained from here:

]]>The reason why practically every particle physicist is highly confident that the Higgs is in the region indicated by LHC has little to do with the number of sigma’s (I hope the reader understands now that the mythical value of 5 for a ‘true discovery’ is by itself pure nonsense…)

Yes, I agree. I’d be interested in seeing something on the statistical reasoning going on in these two cases.

Back in my Bayesian phase I remember enjoying the writings of Giulio D’Agostini who was a Bayesian at CERN. He took issue with frequentist ways of framing some of the conclusions there, especially confidence intervals and p-values. I wonder if that’s still debated.

]]>@David C., if one remembers how mind-bogglingly indirect all “observations” in a modern accelerator are in any case (it’s not like anyone really “directly saw” a Higgs particle even in extremely generous readings of these words) I wouldn’t think that the distinction between what the particle physics community distinguishes as “direct detection events” and “loop effects in precision measurements” is easily matched to general considerations made elsewhere. Let me know if I am wrong here.

I am thinking a much more elementary effect/fallacy is at work in the community (at least in that part which likes to vent its opinions on social media): The nature of the next-relevant experiment may change over time.

]]>@Alizter, after The Big Move it took Fermilab 6 years to set up their $g-2$ experiment, and they started taking data in February 2018. So they have only just begun.

You can read about what the expected prospects of their eventual results should be in reviews such as Jegerlehner 18a (who says that if the effect is there, Fermilab should be able to push its statistical signifcance to between $6\sigma-10 \sigma$, hence beyond doubt).

]]>David R., right but what I was trying to highlight in #6 is that already right now there is a real case of “$SU(5)$-signature”. Even if in the end it’s not $SU(5)$-GUT but who knows what, maybe some variant of it, it seems that the punchline of the LHC results at this point really is: a consistent signature of $SU(5)$-GUT, at over $4.1 \sigma$. That in itself is just plain remarkable.

(Given the discussion in the public domain, I think it is crucially imortant to recognize bits of progress and not dismiss all insight just on the basis of ever more amazing claims that remain open.)

And if I am allowed to add some theoretical considerations to these experimental findings: Apart from the well known but still striking inner working of $SU(5)$-GUT in itself, there is these two curious data points:

Schelleken et al.’s computer scan of heterotic Gepner model compactifications. In their graphics here, the two faint lines that intersect at $SU(5)$-GUT theory are

*not*drawn by hand (only the blue cricle around their intersection point is), but are due to density of the models as found by the computer scan.our computer scan of the image of equivariant stable cohomotopy inside equivariant K-theory which (slide 17 here) breaks type II $E_8$-GUT to $SU(5)$-GUT.

Something is going on here.

]]>I meant it would be *very* cool if the SM gauge group really was broken symmetry from $SU(5)$. That we are seeing hints of such a thing is still on its own amazing.

I was wondering if the direct/indirect detection distinction related to what Peter Galison described in Image and Logic as the difference between traditions: the logic tradition uses statistical arguments, while the image tradition looks for telling images, “golden” single events. But then, I guess all recent detection is statistical. One critic, I see, claims there never was such a difference as Galison proposes, see Staley’s Golden Events and Statistics: What’s Wrong with Galison’s Image/Logic Distinction?.

]]>Has Fermilab finished its g2 experiment yet? I remember they moved some equipment last year.

]]>