added pointer to the recent

- Roger John Barlow,
*Practical Statistics for Particle Physics*(arXiv:1905.12362)

at the point where it said that high statistical significance suggests that the null hypothesis is wrong, I added the complementary pointer to *null result* for low statistical significance.

Added quote from Fisher 1926.

]]>Removed one $0$ so that it’s $0.000029\%$ in Dorigo’s paper here and 2 other pages.

]]>added here the tables by Lyons and by Dorigo with suggestions for detection-threshold significances of various particle physics effects, that David kindly pointed out.

Added also a note that with these modified suggestions the currently measured significance of both the $g-2$-anomaly as well as the flavour anomaly is well beyond detection-threshold already

]]>I’m just back from a walk in the winter sunshine, so have had a chance for my levels of annoyance to drop. For heaven’s sake let us try on this forum to understand each other as charitably as possible. We’ve already had one unfortunate spat in the past few weeks. If we can’t make it happen here amongst friends, then what hope is there for online communication about substantial matters?

It must be my poor expression that makes you misunderstand me. I’m not in the slightest belittling the achievements of LHC people. I can see that my parenthetic comment about going to the press was ill-judged, but it arose not to criticise a specific group of particle physicists, but as a caricature of a way of acting. Had I said something about psychologists rushing to publish on achieving $2 \sigma$, it would have been made for the same point. Even then, no doubt many psychologists behave thoughtfully. That some researchers in some disciplines behave somewhat unthinkingly in terms of significance levels is a charge made by many parties, of all statistical persuasions.

So I do think there are people placing improper reliance on significance levels, but as far as I know, LHC physicists are the most careful and subtle of data analysts. If I hear a high $\sigma$ value from them, I assume that there’s much more going on in the background of their understanding of the situation. I think what you wrote at the end of section 1 is fine to make this point, and I’m glad it’s there so that entries that do cite sigma values can link to it.

What I’ve also been trying to convey is a Bayesian perspective. Here I find it quite remarkable that there exist two radically different understandings of probability, different to the extent that even the kind of thing that can be said to have a probability differs. E.g., the Bayesian is happy to speak of the probability of an observed exoplanet having a mass in a given range, or of a constant of nature taking a value in a certain range. You would expect this to lead to a great difference in the treatment of data. It turns out, however, that in many situations the thoughtful frequentist and the thoughtful Bayesian end with very similar conclusions. The Bayesian may not like the forms of expression of the frequentist, e.g., how to understand a confidence interval, but in terms of what it’s reasonable to believe and do, they often come to the same conclusions. Indeed there are plenty of cases where you can prove that this is so, that if a Bayesian chooses a certain kind of prior, their decision procedure coincides with some standard frequentist technique. There seems to be a case of this in another paper by Lyons on particle physics

- Luc Demortier, Louis Lyons,
*Testing Hypotheses in Particle Physics: Plots of $p_0$ Versus $p_1$*, (arXiv:1408.6123), appendix.

I’ve no idea if there are cases at the LHC where such differences in the understanding of statistics could lead two parties to some important difference in conclusion. With the overwhelming amounts of data available, perhaps not.

But even if we find that it makes little difference in many situations, I still find it enormously interesting that these two understandings of probability exist. I have been persuaded over the years that the Bayesian one is more coherent, so wherever this translates ever into different practices, I’m intrigued to see which are preferable.

It seems that in some parts of physics there is a difference to be found. E.g., in lattice QCD, there is

- Joshua Landon, Frank X. Lee, Nozer D. Singpurwalla,
*A Problem in Particle Physics and Its Bayesian Analysis*, (arXiv:1201.1141)

which concludes

In this paper we have proposed and developed a statistical approach for addressing a much discussed problem in particle physics. Indeed, a problem that has spawned several Nobel prizes in Physics.The essence of the problem boils down to estimating a large (conceptually infinite) number of unknown parameters based on a finite number of non-linear equations. Statisticians refer to such problems as large p–small n. Each equation in our problem comprises of the sum of several exponential functions. Previous approaches for addressing this problem have been physics based–such as perturbation methods–and statistics based–such as chi-squared goodness of fit, and Empirical Bayes. Physicists have found such approaches unsatisfactory, and have called for a use of proper Bayesian approaches, thus this paper.

There are plenty of similar papers about. I wonder what a frequentist makes of such work.

]]>A $p$-value by itself can’t tell you what to do, and yet it is often presented (especially in softer sciences) as though it does warrant a certain action or belief.

Observation of the stars also cannot tell you what to do or believe, and the fact that astrologers claim it does in no way invalidates the good work astronomers are doing.

At the LHC people doing hard and accurate and highly cross-checked work are consistently seeing an effect which is extremely unlikely to be seen if there is no new physics. That’s not a certainty, since nothing outside pure maths is ever certain, but to brush this off on vague grounds that statistics can’t tell us what to “do or believe” seems intellectually sad.

Go to the press to announce discovery when you achieve $5 \sigma$ would be similarly wrong.

Would it? Why mention the press, though, all these entities outside of the scientific endeavour should play no role in this discussion. How about the scientist inside you? You just shrug it off? Just because nothing in experimental science is ever certain? It seems to me a dark alley you are pointing down here.

]]>I have read through your reference

- Ronald Wasserstein, Nicole Lazar,
*The ASA’s Statement on $p$-Values: Context, Process, and Purpose*, The American Statistician 70(2), 2016, pp. 129-133 (doi:10.1080/00031305.2016.1154108)

That was an unexpected read. This is a kind of political party-line manifesto, not a scientific document. But it points to

- Greenland, Senn, Rothman, Carlin, Poole, Goodman, Altman,
*Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations*, Eur J Epidemiol. 2016 Apr;31(4):337-50. (doi:10.1007/s10654-016-0149-3)

which seems to be the kind of substantial analysis one would hope for (but I haven’t found a pdf version yet).

(Do you notice: How on earth may they have come up with the ordering of their author names? Probably some sophisticated statistics of author contribitions at work here ;-)

I’ll have to look into something else now. But I am getting away with the impression that much of the fuzz is based on the mundane fact that the usefulness of $p$-values evidently depends on how small one forces $p$ to be, that $p = 0.05$ isn’t terribly small, that $5 \sigma \leftrightarrow p \sim 0.00000028$ is considerably smaller, and that this easily explains why the communities using $p = 0.05$ are all in rage about $p$-values, while the community that uses $0.00000028$ is the one that is consistently producing the deepest insights into nature that humanity knows of.

]]>The complaint is that statistical significance of data relative to a chosen null hypothesis is not the quantity you should be interested in. You’re trying to decide what to believe and how to act on the basis of the information you have. A p-value by itself can’t tell you what to do, and yet it is often presented (especially in softer sciences) as though it does warrant a certain action or belief. (Go to the press to announce discovery when you achieve $5 \sigma$ would be similarly wrong.) Bayesians believe that their broader understanding of probability as applicable to hypotheses, propositions about parameter values, etc. allows them to reason openly (and rationally) about how to respond to data. This may rely on proposals of prior distributions. Frequentists worry that this reliance introduces subjectivity into what should be an objective process. Bayesians consider understanding of the context of information to be inescapable and worry that frequentist recipes for proceeding hide this subjective appraisal under a mask of objectivity. They also worry that often these recipes make no sense. A classic case is where it is known that a parameter value must be non-negative, e.g., since it’s a squared quantity, and yet the frequentist calculation yields an upper limit which is negative (see Lyons sec 5.1). The Bayesian would have a prior which avoided non-negative support.

I guess what would make sense is a top page which compares Frequentist and Bayesian approaches, rather than topics in each with sections of criticism from the other.

]]>Ah, maybe I get the idea. You don’t want to say that there is anything evil about stating a statistical significance, but you want to point out that this does not exhaust the topic of “hypothesis testing”.

Maybe what you’d rather want to do is start a page on “hypothesis testing”.

]]>To be frank (not having read all the links you gave yet) I still don’t know what the issue is that is being debated, could you explain it?

So far we have a definition of statistical significance, and a practice of stating such for a given observation, and both of it seems rather elementary, straightforward and innocent.

Now what next? Where does it start mattering which -isms we subscribe to, or not?

I am seriously asking, this is not a rethorical question.

]]>This following paper seems to offer a nicely balanced appraisal, so I’ve added it:

- Louis Lyons,
*Bayes and Frequentism: a Particle Physicist’s perspective*, (arXiv:1301.1273)

He observes that it’s much more common to use a frequentist method than a Bayesian to decide in particle physics, but “In other fields, Bayesian approaches tend to be favoured”.

Having assessed both, he ends

]]>A cynic’s view of the two techniques is provided by the quotation:

“Bayesians address the question everyone is interested in by using assumptions no-one believes, while Frequentists use impeccable logic to deal with an issue of no interest to anyone.”

However, it is not necessary to be so negative, and for physics analyses at the CERN’s LHC, the aim is, at least for determining parameters and setting upper limits in searches for various new phenomena, to use both approaches; similar answers would strengthen confidence in the results, while differences suggest the need to understand them in terms of the somewhat different questions that the two approaches are asking. It thus seems that the old war between the two methodologies is subsiding, and that they can hopefully live together in fruitful cooperation.

I have added a bit more on the definition, and on the passage from p-values to standard deviations

]]>I have added pointer to hep-ex/0208005, which is informative and sober. Also rearranged text and references slightly, to disentangle definition of the concept from discussion of its possible misuse.

]]>Will do something here when I have a moment.

]]>Am on my phone, not in position to follow your pointers to textbooks. But as currently stated, it still looks a little odd: an elementary and straightforward definition being followed by some vague, mysterious suggestions of its alleged illnessess.

I did go to look at D’Agostini, though. He quotes from Wikipedia a warning on five ways to misread the definition. That’s good practice for a Wiki, but hardly a sign of deficiency of the concept being explained. That kind of warning applies to every step in life.

Following D’Agostini in quoting from Wikipedia, the following is what it says further down its entry:

]]>There is nothing wrong with hypothesis testing and p-values per se as long as authors, reviewers, and action editors use them correctly.” [53] Using Bayesian statistics can improve confidence levels but also requires making additional assumptions,[54] and may not necessarily improve practice regarding statistical testing.[55]

Ok, made some changes.

]]>Thanks for starting this.

At the point where you mention criticism of “such practices” it is not clear which practices are meant, since the preceding paragraph just tries to explain the definition of “statistical significance”.

]]>As suggested on another thread, I added something here.

]]>