CLs method (particle physics): Difference between revisions

Content deleted Content added

Inline

Latest revision as of 21:02, 21 June 2023

In particle physics, CLs^[1] represents a statistical method for setting upper limits (also called exclusion limits^[2]) on model parameters, a particular form of interval estimation used for parameters that can take only non-negative values. Although CLs are said to refer to Confidence Levels, "The method's name is ... misleading, as the CLs exclusion region is not a confidence interval."^[3] It was first introduced by physicists working at the LEP experiment at CERN and has since been used by many high energy physics experiments. It is a frequentist method in the sense that the properties of the limit are defined by means of error probabilities, however it differs from standard confidence intervals in that the stated confidence level of the interval is not equal to its coverage probability. The reason for this deviation is that standard upper limits based on a most powerful test necessarily produce empty intervals with some fixed probability when the parameter value is zero, and this property is considered undesirable by most physicists and statisticians.^[4]

Upper limits derived with the CLs method always contain the zero value of the parameter and hence the coverage probability at this point is always 100%. The definition of CLs does not follow from any precise theoretical framework of statistical inference and is therefore described sometimes as ad hoc. It has however close resemblance to concepts of statistical evidence^[5] proposed by the statistician Allan Birnbaum.

Definition

Let X be a random sample from a probability distribution with a real non-negative parameter $\theta \in [0,\infty )$ . A CLs upper limit for the parameter θ, with confidence level $1-\alpha '$ , is a statistic (i.e., observable random variable) $\theta _{up}(X)$ which has the property:

{\frac {\mathbb {P} (\theta _{up}(X)<\theta |\theta )}{\mathbb {P} (\theta _{up}(X)<\theta |0)}}\leq \alpha '{\text{ for all }}\theta .

(1)

The inequality is used in the definition to account for cases where the distribution of X is discrete and an equality can not be achieved precisely. If the distribution of X is continuous then this should be replaced by an equality. Note that the definition implies that the coverage probability $\mathbb {P} (\theta _{up}(X)\geq \theta |\theta )$ is always larger than $1-\alpha '$ .

An equivalent definition can be made by considering a hypothesis test of the null hypothesis $H_{0}:\theta =\theta _{0}$ against the alternative $H_{1}:\theta =0$ . Then the numerator in (1), when evaluated at $\theta _{0}$ , correspond to the type-I error probability ( $\alpha$ ) of the test (i.e., $\theta _{0}$ is rejected when $\theta _{up}(X)<\theta _{0}$ ) and the denominator to the power ( $1-\beta$ ). The criterion for rejecting $H_{0}$ thus requires that the ratio $\alpha /(1-\beta )$ will be smaller than $\alpha '$ . This can be interpreted intuitively as saying that $\theta _{0}$ is excluded because it is $\alpha '$ less likely to observe such an extreme outcome as X when $\theta _{0}$ is true than it is when the alternative $\theta =0$ is true.

The calculation of the upper limit is usually done by constructing a test statistic $q_{\theta }(X)$ and finding the value of $\theta$ for which

{\frac {\mathbb {P} (q_{\theta }(X)\geq q_{\theta }^{*}|\theta )}{\mathbb {P} (q_{\theta }(X)\geq q_{\theta }^{*}|0)}}=\alpha '.

where $q_{\theta }^{*}$ is the observed outcome of the experiment.

Usage in high energy physics

Upper limits based on the CLs method were used in numerous publications of experimental results obtained at particle accelerator experiments such as LEP, the Tevatron and the LHC, most notable in the searches for new particles.

Origin

The original motivation for CLs was based on a conditional probability calculation suggested by physicist G. Zech^[6] for an event counting experiment. Suppose an experiment consists of measuring $n$ events coming from signal and background processes, both described by Poisson distributions with respective rates $s$ and $b$ , namely $n\sim {\text{Poiss}}(s+b)$ . $b$ is assumed to be known and $s$ is the parameter to be estimated by the experiment. The standard procedure for setting an upper limit on $s$ given an experimental outcome $n^{*}$ consists of excluding values of $s$ for which $\mathbb {P} (n\leq n^{*}|s+b)\leq \alpha$ , which guarantees at least $1-\alpha$ coverage. Consider, for example, a case where $b=3$ and $n^{*}=0$ events are observed, then one finds that $s+b\geq 3$ is excluded at 95% confidence level. But this implies that $s\geq 0$ is excluded, namely all possible values of $s$ . Such a result is difficult to interpret because the experiment cannot essentially distinguish very small values of $s$ from the background-only hypothesis, and thus declaring that such small values are excluded (in favor of the background-only hypothesis) seems inappropriate. To overcome this difficulty Zech suggested conditioning the probability that $n\leq n^{*}$ on the observation that $n_{b}\leq n^{*}$ , where $n_{b}$ is the (unmeasurable) number of background events. The reasoning behind this is that when $n_{b}$ is small the procedure is more likely to produce an error (i.e., an interval that does not cover the true value) than when $n_{b}$ is large, and the distribution of $n_{b}$ itself is independent of $s$ . That is, not the over-all error probability should be reported but the conditional probability given the knowledge one has on the number of background events in the sample. This conditional probability is

\mathbb {P} (n\leq n^{*}|n_{b}\leq n^{*},s+b)={\frac {\mathbb {P} (n\leq n^{*},n_{b}\leq n^{*}|s+b)}{\mathbb {P} (n_{b}\leq n^{*}|s+b)}}={\frac {\mathbb {P} (n\leq n^{*}|s+b)}{\mathbb {P} (n\leq n^{*}|b)}}.

which correspond to the above definition of CLs. The first equality just uses the definition of Conditional probability, and the second equality comes from the fact that if $n\leq n^{*}\Rightarrow n_{b}\leq n^{*}$ and the number of background events is by definition independent of the signal strength.

Generalization of the conditional argument

Zech's conditional argument can be formally extended to the general case. Suppose that $q(X)$ is a test statistic from which the confidence interval is derived, and let

p_{\theta }=\mathbb {P} (q(X)>q^{*}|\theta )

where $q*$ is the outcome observed by the experiment. Then $p_{\theta }$ can be regarded as an unmeasurable (since $\theta$ is unknown) random variable, whose distribution is uniform between 0 and 1 independent of $\theta$ . If the test is unbiased then the outcome $q*$ implies

p_{\theta }\leq \mathbb {P} (q(X)>q^{*}|0)\equiv p_{0}^{*}

from which, similarly to conditioning on $n_{b}$ in the previous case, one obtains

\mathbb {P} (q(X)\geq q^{*}|p_{\theta }\leq p_{0}^{*},\theta )={\frac {\mathbb {P} (q(X)\geq q^{*}|\theta )}{\mathbb {P} (p_{\theta }\leq p_{0}^{*}|\theta )}}={\frac {\mathbb {P} (q(X)\geq q^{*}|\theta )}{p_{0}^{*}}}={\frac {\mathbb {P} (q(X)\geq q^{*}|\theta )}{\mathbb {P} (q(X)>q^{*}|0)}}.

Relation to foundational principles

The arguments given above can be viewed as following the spirit of the conditionality principle of statistical inference, although they express a more generalized notion of conditionality which do not require the existence of an ancillary statistic. The conditionality principle however, already in its original more restricted version, formally implies the likelihood principle, a result famously shown by Birnbaum.^[7] CLs does not obey the likelihood principle, and thus such considerations may only be used to suggest plausibility, but not theoretical completeness from the foundational point of view. (The same however can be said on any frequentist method if the conditionality principle is regarded as necessary).

Birnbaum himself suggested in his 1962 paper that the CLs ratio $\alpha /(1-\beta )$ should be used as a measure of the strength of statistical evidence provided by significance tests, rather than $\alpha$ alone. This followed from a simple application of the likelihood principle: if the outcome of an experiment is to be only reported in a form of a "accept"/"reject" decision, then the overall procedure is equivalent to an experiment that has only two possible outcomes, with probabilities $\alpha$ , $(1-\beta )$ and $1-\alpha$ , $(\beta )$ under $H_{1},(H_{2})$ . The likelihood ratio associated with the outcome "reject $H_{1}$ " is therefore $\alpha /(1-\beta )$ and hence should determine the evidential interpretation of this result. (Since, for a test of two simple hypotheses, the likelihood ratio is a compact representation of the likelihood function). On the other hand, if the likelihood principle is to be followed consistently, then the likelihood ratio of the original outcome should be used and not $\alpha /(1-\beta )$ , making the basis of such an interpretation questionable. Birnbaum later described this as having "at most heuristic, but not substantial, value for evidential interpretation".

A more direct approach leading to a similar conclusion can be found in Birnbaum's formulation of the Confidence principle, which, unlike the more common version, refers to error probabilities of both kinds. This is stated as follows:^[8]

"A concept of statistical evidence is not plausible unless it finds 'strong evidence for $H_{2}$ as against $H_{1}$ ' with small probability $(\alpha )$ when $H_{1}$ is true, and with much larger probability $\ (1-\beta )\$ when $H_{2}$ is true."

Such definition of confidence can naturally seem to be satisfied by the definition of CLs. It remains true that both this and the more common (as associated with the Neyman-Pearson theory) versions of the confidence principle are incompatible with the likelihood principle, and therefore no frequentist method can be regarded as a truly complete solution to the problems raised by considering conditional properties of confidence intervals.

Calculation in the large sample limit

If certain regularity conditions are met, then a general likelihood function will become a Gaussian function in the large sample limit. In such case the CLs upper limit at confidence level $1-\alpha '$ (derived from the uniformly most powerful test) is given by^[9]

\theta _{up}={\hat {\theta }}+\sigma \Phi ^{-1}(1-\alpha '\Phi ({\hat {\theta }}/\sigma )),

where $\Phi$ is the standard normal cumulative distribution, ${\hat {\theta }}$ is the maximum likelihood estimator of $\theta$ and $\sigma$ is its standard deviation; the latter might be estimated from the inverse of the Fisher information matrix or by using the "Asimov"^[9] data set. This result happens to be equivalent to a Bayesian credible interval if a uniform prior for $\theta$ is used.

References

^ Read, A. L. (2002). "Presentation of search results: The CL(s) technique". Journal of Physics G: Nuclear and Particle Physics. 28 (10): 2693–2704. Bibcode:2002JPhG...28.2693R. doi:10.1088/0954-3899/28/10/313.
^ Particle Physics at the Tercentenary of Mikhail Lomonosov, p. 13, at Google Books
^ Amnon Harel. "Statistical methods in CMS searches" (PDF). indico.cern.ch. Retrieved 2015-04-10.
^ Mark Mandelkern (2002). "Setting Confidence Intervals for Bounded Parameters". Statistical Science. 17 (2): 149–159. doi:10.1214/ss/1030550859. JSTOR 3182816.
^ Ronald N. Giere (1977). "Allan Birnbaum's Conception of Statistical Evidence". Synthese. 36 (1): 5–13. doi:10.1007/bf00485688. S2CID 46973213.
^ G. Zech (1989). "Upper limits in experiments with background or measurement errors" (PDF). Nucl. Instrum. Methods Phys. Res. A. 277 (2–3): 608–610. Bibcode:1989NIMPA.277..608Z. doi:10.1016/0168-9002(89)90795-X.
^ Birnbaum, Allan (1962). "On the foundations of statistical inference". Journal of the American Statistical Association. 57 (298): 269–326. doi:10.2307/2281640. JSTOR 2281640. MR 0138176. (With discussion.)
^ Birnbaum, Allan (1977). "The Neyman-Pearson Theory as Decision Theory, and as Inference Theory; with a Criticism of the Lindley-Savage Argument for Bayesian Theory". Synthese. 36 (1): 19–49. doi:10.1007/bf00485690. S2CID 35027844.
^ ^a ^b G. Cowan; K. Cranmer; E. Gross; O. Vitells (2011). "Asymptotic formulae for likelihood-based tests of new physics". Eur. Phys. J. C. 71 (2): 1554. arXiv:1007.1727. Bibcode:2011EPJC...71.1554C. doi:10.1140/epjc/s10052-011-1554-0.

External links

The Particle Data Group (PDG) review of statistical methods

[Read-1] Read, A. L. (2002). "Presentation of search results: The CL(s) technique". Journal of Physics G: Nuclear and Particle Physics. 28 (10): 2693–2704. Bibcode:2002JPhG...28.2693R. doi:10.1088/0954-3899/28/10/313.

[2] Particle Physics at the Tercentenary of Mikhail Lomonosov, p. 13, at Google Books

[cern-3] Amnon Harel. "Statistical methods in CMS searches" (PDF). indico.cern.ch. Retrieved 2015-04-10.

[4] Mark Mandelkern (2002). "Setting Confidence Intervals for Bounded Parameters". Statistical Science. 17 (2): 149–159. doi:10.1214/ss/1030550859. JSTOR 3182816.

[Giere-5] Ronald N. Giere (1977). "Allan Birnbaum's Conception of Statistical Evidence". Synthese. 36 (1): 5–13. doi:10.1007/bf00485688. S2CID 46973213.

[zech-6] G. Zech (1989). "Upper limits in experiments with background or measurement errors" (PDF). Nucl. Instrum. Methods Phys. Res. A. 277 (2–3): 608–610. Bibcode:1989NIMPA.277..608Z. doi:10.1016/0168-9002(89)90795-X.

[7] Birnbaum, Allan (1962). "On the foundations of statistical inference". Journal of the American Statistical Association. 57 (298): 269–326. doi:10.2307/2281640. JSTOR 2281640. MR 0138176. (With discussion.)

[8] Birnbaum, Allan (1977). "The Neyman-Pearson Theory as Decision Theory, and as Inference Theory; with a Criticism of the Lindley-Savage Argument for Bayesian Theory". Synthese. 36 (1): 19–49. doi:10.1007/bf00485690. S2CID 35027844.

[asimov-9] G. Cowan; K. Cranmer; E. Gross; O. Vitells (2011). "Asymptotic formulae for likelihood-based tests of new physics". Eur. Phys. J. C. 71 (2): 1554. arXiv:1007.1727. Bibcode:2011EPJC...71.1554C. doi:10.1140/epjc/s10052-011-1554-0.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

@@ Line 1: / Line 1: @@
-'''CLs'''<ref name="Read">{{cite journal|last=Read|first=A. L.|title=Presentation of search results: The CL(s) technique|journal=Journal of Physics G: Nuclear and Particle Physics|year=2002|volume=28|issue=10|pages=2693–2704|doi=10.1088/0954-3899/28/10/313|url=http://iopscience.iop.org/0954-3899/28/10/313/}}</ref> (from [[confidence level|Confidence Levels]]) is a [[statistics|statistical]] method for setting upper limits (also called ''exclusion limits''<ref>{{Google books |id=I9lfo-g_WIoC |page=13 |title=Particle Physics at the Tercentenary of Mikhail Lomonosov }}</ref>) on model [[parameter]]s, a particular form of [[interval estimation]] used for parameters that can take only non-negative values. "The method's name is ... misleading, as the CLs exclusion region is not a [[confidence interval]]."<ref>Amnon Harel, "Statistical methods in CMS searches," [http://indico.cern.ch/event/107747/contribution/47/material/paper/0.pdf]</ref> It was first introduced by physicists working at the [[LEP]] experiment at [[CERN]] and has since been used by many [[high energy physics]] experiments. It is a [[frequentist]] method in the sense that the properties of the limit are defined by means of [[probability of error|error probabilities]], however it differs from standard confidence intervals in that the stated confidence level of the interval is not equal to its [[coverage probability]]. The reason for this deviation is that standard upper limits based on a [[uniformly most powerful test|most powerful test]] necessarily produce empty intervals with some fixed probability when the parameter value is zero, and this property is considered undesirable by most physicists and statisticians.<ref>
+In [[particle physics]], '''CLs'''<ref name="Read">{{cite journal |last=Read |first=A. L. |title=Presentation of search results: The CL(s) technique |journal=Journal of Physics G: Nuclear and Particle Physics |year=2002 |volume=28 |issue=10 |pages=2693–2704 |doi=10.1088/0954-3899/28/10/313 |bibcode=2002JPhG...28.2693R}}</ref> represents a [[statistics|statistical]] method for setting ''upper limits'' (also called ''exclusion limits''<ref>{{Google books |id=I9lfo-g_WIoC |page=13 |title=Particle Physics at the Tercentenary of Mikhail Lomonosov }}</ref>) on model [[parameter]]s, a particular form of [[interval estimation]] used for parameters that can take only non-negative values. Although CLs are said to refer to [[confidence level|Confidence Levels]], "The method's name is ... misleading, as the CLs exclusion region is not a [[confidence interval]]."<ref name="cern">{{cite web |url=http://indico.cern.ch/event/107747/contribution/47/material/paper/0.pdf |author=Amnon Harel |title=Statistical methods in CMS searches |publisher=indico.cern.ch |accessdate=2015-04-10}}</ref> It was first introduced by physicists working at the [[LEP]] experiment at [[CERN]] and has since been used by many [[high energy physics]] experiments. It is a [[frequentist]] method in the sense that the properties of the limit are defined by means of [[probability of error|error probabilities]], however it differs from standard confidence intervals in that the stated confidence level of the interval is not equal to its [[coverage probability]]. The reason for this deviation is that standard upper limits based on a [[uniformly most powerful test|most powerful test]] necessarily produce empty intervals with some fixed probability when the parameter value is zero, and this property is considered undesirable by most physicists and statisticians.<ref> {{cite journal |author=Mark Mandelkern |title=Setting Confidence Intervals for Bounded Parameters. |journal=Statistical Science |volume=17 |number=2 |pages=149–159 |year=2002 |jstor=3182816 |doi=10.1214/ss/1030550859|doi-access=free }}</ref>
-{{cite journal|
+Upper limits derived with the CLs method always contain the zero value of the parameter and hence the coverage probability at this point is always 100%. The definition of CLs does not follow from any precise theoretical framework of [[statistical inference]] and is therefore described sometimes as ''ad hoc''. It has however close resemblance to concepts of ''statistical evidence''<ref name="Giere">{{cite journal |volume=36 |number=1 |author=Ronald N. Giere |title=Allan Birnbaum's Conception of Statistical Evidence |journal=Synthese |year=1977 |pages=5–13 |url=http://philpapers.org/rec/GIEABC |doi=10.1007/bf00485688|s2cid=46973213 }}</ref>
-author=Mark Mandelkern| title = Setting Conﬁdence Intervals for Bounded Parameters.|journal= Statistical Science|
-volume = 17| number=2|pages= 149–159 | year= 2002 | jstor = 3182816| doi=10.1214/ss/1030550859}}</ref>
-Upper limits derived with the '''CLs''' method always contain the zero value of the parameter and hence the coverage probability at this point is always 100%. The definition of '''CLs''' does not follow from any precise theoretical framework of [[statistical inference]] and is therefore described sometimes as ''ad hoc''. It has however close resemblance to concepts of ''statistical evidence''
-<ref name="Giere">{{cite journal|volume = 36|number = 1|author = Ronald N. Giere|title = Allan Birnbaum's Conception of Statistical Evidence|journal = Synthese|year = 1977|pages = 5–13|url=http://philpapers.org/rec/GIEABC|doi=10.1007/bf00485688}}</ref>
 proposed by the statistician [[Allan Birnbaum]].
 == Definition ==
-Let ''X'' be a  [[random sample]] from a [[probability distribution]] with a real non-negative [[parameter]]  <math>\theta \in [0,\infty)</math>. A ''CLs'' upper limit for the parameter ''θ'', with confidence level <math>1-\alpha'</math>, is a statistic (i.e., observable [[random variable]]) <math>\theta_{up}(X)</math> which has the property:
+Let ''X'' be a [[random sample]] from a [[probability distribution]] with a real non-negative [[parameter]] <math>\theta \in [0,\infty)</math>. A ''CLs'' upper limit for the parameter ''θ'', with confidence level <math>1-\alpha'</math>, is a statistic (i.e., observable [[random variable]]) <math>\theta_{up}(X)</math> which has the property:
-{{NumBlk|:|<math> \frac{\mathbb{P}( \theta_{up}(X) < \theta | \theta)  }{ \mathbb{P}( \theta_{up}(X) < \theta | 0 ) } \leq \alpha' \text{  for all  } \theta.</math>|{{EquationRef|1}}}}
+{{NumBlk|:|<math> \frac{\mathbb{P}( \theta_{up}(X) < \theta |\theta) }{ \mathbb{P}( \theta_{up}(X) < \theta |0 ) } \leq \alpha' \text{ for all } \theta.</math>|{{EquationRef|1}}}}
-The inequality is used in the definition to account for cases where the distribution of ''X'' is discrete and an equality can not be achieved precisely. If the distribution of ''X'' is [[Continuous probability distribution|continuous]] then this should be replaced by an equality. Note that the definition implies that the [[coverage probability]] <math>\mathbb{P}( \theta_{up}(X) \geq \theta | \theta)</math> is always larger than <math>1-\alpha'</math>.
+The inequality is used in the definition to account for cases where the distribution of ''X'' is discrete and an equality can not be achieved precisely. If the distribution of ''X'' is [[Continuous probability distribution|continuous]] then this should be replaced by an equality. Note that the definition implies that the [[coverage probability]] <math>\mathbb{P}( \theta_{up}(X) \geq \theta |\theta)</math> is always larger than <math>1-\alpha'</math>.
-An equivalent definition can be made by considering a [[hypothesis test]] of the null hypothesis <math>H_0:\theta=\theta_0</math> against the alternative <math>H_1:\theta=0</math>. Then the numerator in ({{EquationNote|1}}), when evaluated at <math>\theta_0</math>, correspond to the [[type I and type II errors|type-I error probability]] (<math>\alpha</math>) of the test  (i.e., <math>\theta_0</math> is rejected when <math>\theta_{up}(X) < \theta_0</math>) and the denominator to the [[statistical power|power]] (<math>1-\beta</math>).  The criterion for rejecting  <math>H_0</math> thus requires that the ratio <math>\alpha/(1-\beta)</math> will be smaller than <math>\alpha'</math>. This can be interpreted intuitively as saying that <math>\theta_0</math> is excluded because it is <math>\alpha'</math> less likely to observe such an extreme outcome as ''X'' when <math>\theta_0</math> is true than it is when the alternative <math>\theta=0</math> is true.
+An equivalent definition can be made by considering a [[hypothesis test]] of the null hypothesis <math>H_0:\theta=\theta_0</math> against the alternative <math>H_1:\theta=0</math>. Then the numerator in ({{EquationNote|1}}), when evaluated at <math>\theta_0</math>, correspond to the [[type I and type II errors|type-I error probability]] (<math>\alpha</math>) of the test (i.e., <math>\theta_0</math> is rejected when <math>\theta_{up}(X) < \theta_0</math>) and the denominator to the [[statistical power|power]] (<math>1-\beta</math>). The criterion for rejecting <math>H_0</math> thus requires that the ratio <math>\alpha/(1-\beta)</math> will be smaller than <math>\alpha'</math>. This can be interpreted intuitively as saying that <math>\theta_0</math> is excluded because it is <math>\alpha'</math> less likely to observe such an extreme outcome as ''X'' when <math>\theta_0</math> is true than it is when the alternative <math>\theta=0</math> is true.
 The calculation of the upper limit is usually done by constructing a [[test statistic]] <math>q_\theta(X)</math> and finding the value of <math>\theta</math> for which
-:<math> \frac{\mathbb{P}(q_\theta(X) \geq q_\theta^* |\theta)}{\mathbb{P}( q_\theta(X) \geq q_\theta^* | 0 )} = \alpha' .</math>
+:<math> \frac{\mathbb{P}(q_\theta(X) \geq q_\theta^* |\theta)}{\mathbb{P}( q_\theta(X) \geq q_\theta^* |0 )} = \alpha' .</math>
 where <math>q_\theta^*</math> is the observed outcome of the experiment.
@@ Line 25: / Line 22: @@
 == Usage in high energy physics ==
-Upper limits based on the CLs method were used in numerous publications of experimental results obtained at particle accelerator experiments such as [[LEP]], the [[Tevatron]] and the [[LHC]]. Perhaps most notable are the upper limits placed on the production cross section of [[Higgs boson]]s.
+Upper limits based on the CLs method were used in numerous publications of experimental results obtained at particle accelerator experiments such as [[LEP]], the [[Tevatron]] and the [[LHC]], most notable in the searches for new particles.
 == Origin ==
+The original motivation for CLs was based on a conditional probability calculation suggested by physicist G. Zech<ref name="zech">{{cite journal |title=Upper limits in experiments with background or measurement errors |journal=Nucl. Instrum. Methods Phys. Res. A |volume=277 |number=2–3 |pages=608–610 |year=1989 |doi=10.1016/0168-9002(89)90795-X |author=G. Zech |bibcode=1989NIMPA.277..608Z|url=https://cds.cern.ch/record/193135/files/198812358.pdf }}</ref> for an event counting experiment. Suppose an experiment consists of measuring <math>n</math> events coming from signal and background processes, both described by [[Poisson distribution]]s with respective rates <math>s</math> and <math>b</math>, namely <math>n \sim \text{Poiss}(s+b)</math>. <math>b</math> is assumed to be known and <math>s</math> is the parameter to be estimated by the experiment. The standard procedure for setting an upper limit on <math>s</math> given an experimental outcome <math>n^*</math> consists of excluding values of <math>s</math> for which <math>\mathbb{P}(n \leq n^*|s+b) \leq \alpha</math>, which guarantees at least <math>1-\alpha</math> coverage. Consider, for example, a case where <math>b=3</math> and <math>n^*=0</math> events are observed, then one finds that <math>s+b \geq 3</math> is excluded at 95% confidence level. But this implies that <math>s \geq 0</math> is excluded, namely all possible values of <math>s</math>. Such a result is difficult to interpret because the experiment cannot essentially distinguish very small values of <math>s</math> from the background-only hypothesis, and thus declaring that such small values are excluded (in favor of the background-only hypothesis) seems inappropriate. To overcome this difficulty Zech suggested conditioning the probability that <math>n \leq n^*</math> on the observation that <math>n_b \leq n^*</math>, where <math>n_b</math> is the (unmeasurable) number of background events. The reasoning behind this is that when <math>n_b</math> is small the procedure is more likely to produce an error (i.e., an interval that does not cover the true value) than when <math>n_b</math> is large, and the distribution of <math>n_b</math> itself is independent of <math>s</math>. That is, not the over-all error probability should be reported but the conditional probability given the knowledge one has on the number of background events in the sample. This conditional probability is
-The original motivation for '''CLs''' was based on a conditional probability calculation suggested by physicist G. Zech
-<ref name="zech">{{cite journal|title = Upper limits in experiments with background or measurement errors|
-journal = Nucl. Instrum. and Methods in Phys. Res. Section A|
-volume = 277|
-number = 2-3|
-pages = 608–610|
-year = 1989|
-doi = 10.1016/0168-9002(89)90795-X|
-url = http://www.sciencedirect.com/science/article/pii/016890028990795X|
-author = G. Zech}}</ref> for an event counting experiment. Suppose an experiment consists of measuring <math>n</math> events coming from signal and background processes, both described by [[Poisson distribution]]s with respective rates <math>s</math> and <math>b</math>, namely <math>n \sim \text{Poiss}(s+b)</math>. <math>b</math> is assumed to be known and <math>s</math> is the parameter to be estimated by the experiment. The standard procedure for setting an upper limit on <math>s</math> given an experimental outcome <math>n^*</math> consists of excluding values of <math>s</math> for which <math>\mathbb{P}(n \leq n^*|s+b) \leq \alpha</math>, which guarantees at least <math>1-\alpha</math> coverage. Consider, for example, a case where <math>b=3</math> and <math>n^*=0</math> events are observed, then one finds that <math>s+b \geq 3</math> is excluded at 95% confidence level. But this implies that <math>s \geq 0</math> is excluded, namely all possible values of <math>s</math>. Such a result is difficult to interpret because the experiment cannot essentially distinguish very small values of <math>s</math> from the background-only hypothesis, and thus declaring that such small values are excluded (in favor of the background-only hypothesis) seems inappropriate. To overcome this difficulty Zech suggested conditioning the probability that <math>n \leq n^*</math> on the observation that <math>n_b \leq n^*</math>, where <math>n_b</math> is the (unmeasurable) number of background events. The reasoning behind this is that when <math>n_b</math> is small the procedure is more likely to produce an error (i.e., an interval that does not cover the true value) than when <math>n_b</math> is large, and the distribution of <math>n_b</math> itself is independent of <math>s</math>. That is, not the over-all error probability should be reported but the conditional probability given the knowledge one has on the number of background events in the sample. This conditional probability is easily seen to be
-:<math>\mathbb{P}(n \leq n^* | n_b \leq n^* , s+b) = \frac{\mathbb{P}(n \leq n^* |s+b)}{\mathbb{P}(n_b \leq n^* |s+b)}
+:<math>\mathbb{P}(n \leq n^* |n_b \leq n^* , s+b) = \frac{\mathbb{P}(n \leq n^*, n_b \leq n^* |s+b)}{\mathbb{P}(n_b \leq n^* |s+b)}
 = \frac{\mathbb{P}(n \leq n^* |s+b)}{\mathbb{P}(n \leq n^* |b)}.</math>
+which correspond to the above definition of CLs. The first equality just uses the definition of [[Conditional probability]], and the second equality comes from the fact that if <math>n \leq n^* \Rightarrow n_b \leq n^*</math> and the number of background events is by definition independent of the signal strength.
-which correspond to the above definition of '''CLs'''.
 === Generalization of the conditional argument ===
@@ Line 49: / Line 37: @@
 Zech's conditional argument can be formally extended to the general case. Suppose that <math>q(X)</math> is a [[test statistic]] from which the confidence interval is derived, and let
-:<math> p_{\theta} = \mathbb{P}( q(X) > q^* | \theta) </math>
+:<math> p_{\theta} = \mathbb{P}( q(X) > q^* |\theta) </math>
 where <math>q*</math> is the outcome observed by the experiment. Then <math>p_{\theta}</math> can be regarded as an unmeasurable (since <math>\theta</math> is unknown) random variable, whose distribution is uniform between 0 and 1 independent of <math>\theta</math>. If the test is unbiased then the outcome <math>q*</math> implies
-:<math> p_{\theta} \leq \mathbb{P}( q(X) > q^* | 0 ) \equiv p_0^*</math>
+:<math> p_{\theta} \leq \mathbb{P}( q(X) > q^* |0 ) \equiv p_0^*</math>
 from which, similarly to conditioning on <math>n_b</math> in the previous case, one obtains
-:<math>\mathbb{P}(q(X) \geq q^* | p_\theta \leq p_0^* , \theta) = \frac{\mathbb{P}(q(X) \geq q^* |\theta)}{\mathbb{P}(p_\theta \leq p_0^* |\theta)}
+:<math>\mathbb{P}(q(X) \geq q^* |p_\theta \leq p_0^* , \theta) = \frac{\mathbb{P}(q(X) \geq q^* |\theta)}{\mathbb{P}(p_\theta \leq p_0^* |\theta)}
-= \frac{\mathbb{P}(q(X) \geq q^* |\theta)}{p_0^*} = \frac{\mathbb{P}(q(X) \geq q^* |\theta)}{\mathbb{P}( q(X) > q^* | 0 )}.</math>
+= \frac{\mathbb{P}(q(X) \geq q^* |\theta)}{p_0^*} = \frac{\mathbb{P}(q(X) \geq q^* |\theta)}{\mathbb{P}( q(X) > q^* |0 )}.</math>
-=== Relation to foundational principles ===
+== Relation to foundational principles ==
+{{originalsyn|section|date=April 2016}}
+The arguments given above can be viewed as following the spirit of the [[conditionality principle]] of statistical inference, although they express a more generalized notion of conditionality which do not require the existence of an [[ancillary statistic]]. The [[conditionality principle]] however, already in its original more restricted version, formally implies the [[likelihood principle]], a result famously shown by [[Allan Birnbaum|Birnbaum]].<ref>{{cite journal |last=Birnbaum |first=Allan |authorlink=Allan Birnbaum |year=1962 |title=On the foundations of statistical inference |journal=[[Journal of the American Statistical Association]] |volume=57 |issue=298 |pages=269–326 |doi=10.2307/2281640 |mr=0138176 |jstor=2281640 }} ''(With discussion.)''</ref> CLs does not obey the [[likelihood principle]], and thus such considerations may only be used to suggest plausibility, but not theoretical completeness from the foundational point of view. (The same however can be said on any frequentist method if the [[conditionality principle]] is regarded as necessary).
+{{anchor|Allan Birnbaum}}Birnbaum himself suggested in his 1962 paper that the CLs ratio <math>\alpha/(1-\beta)</math> should be used as a measure of the strength of ''statistical evidence'' provided by significance tests, rather than <math>\alpha</math> alone. This followed from a simple application of the [[likelihood principle]]: if the outcome of an experiment is to be only reported in a form of a "accept"/"reject" decision, then the overall procedure is equivalent to an experiment that has only two possible outcomes, with probabilities <math>\alpha</math>,<math>(1-\beta)</math> and <math>1-\alpha</math>,<math>(\beta)</math> under <math>H_1,(H_2)</math>. The [[Likelihood function|likelihood ratio]] associated with the outcome "reject <math>H_1</math>" is therefore <math>\alpha/(1-\beta)</math> and hence should determine the evidential interpretation of this result. (Since, for a test of two simple hypotheses, the likelihood ratio is a compact representation of the [[likelihood function]]). On the other hand, if the likelihood principle is to be followed consistently, then the likelihood ratio of the original outcome should be used and not <math>\alpha/(1-\beta)</math>, making the basis of such an interpretation questionable. Birnbaum later described this as having "at most heuristic, but not substantial, value for evidential interpretation".
-The arguments given above can be viewed as following the spirit of the [[conditionality principle]] of statistical inference, although they express a more generalized notion of conditionality which do not require the existence of an [[ancillary statistic]] . The [[conditionality principle]] however, already in its original more restricted version, formally implies the [[likelihood principle]], a result famously shown by [[Allan Birnbaum|Birnbaum]]<ref>
-{{cite journal|last=Birnbaum|first=Allan|authorlink=Allan Birnbaum|year=1962|title=On the foundations of statistical inference|journal=[[Journal of the American Statistical Association]]|volume=57|issue=298|pages=269–326|doi= 10.2307/2281640|mr=0138176|jstor=2281640 }} ''(With discussion.)''</ref>
-. '''CLs''' does not obey the [[likelihood principle]], and thus such considerations may only be used to suggest plausibility, but not theoretical completeness from the foundational point of view. (The same however can be said on any frequentist method if the [[conditionality principle]] is regarded as necessary).
+A more direct approach leading to a similar conclusion can be found in Birnbaum's formulation of the ''Confidence principle'', which, unlike the more common version, refers to error probabilities of both kinds. This is stated as follows:<ref> {{cite journal |volume=36 |number=1 |last=Birnbaum |first=Allan |authorlink=Allan Birnbaum |title=The Neyman-Pearson Theory as Decision Theory, and as Inference Theory; with a Criticism of the Lindley-Savage Argument for Bayesian Theory |journal=Synthese |year=1977 |pages=19–49 |url=http://philpapers.org/rec/BIRTNT |doi=10.1007/bf00485690|s2cid=35027844 }}</ref>
-Interestingly, Birnbaum himself suggested in his 1962 paper that the CLs ratio <math>\alpha/(1-\beta)</math> should be used as a measure of the strength of ''statistical evidence'' provided by significance tests, rather than <math>\alpha</math> alone. This followed from a simple application of the [[likelihood principle]]: if the outcome of an experiment is to be only reported in a form of a "accept"/"reject" decision, then the overall procedure is equivalent to an experiment that has only two possible outcomes, with probabilities <math>\alpha</math>,<math>(1-\beta)</math> and <math>1-\alpha</math>,<math>(\beta)</math> under <math>H_1,(H_2)</math>. The [[Likelihood function|likelihood ratio]] associated with the outcome "reject <math>H_1</math>" is therefore <math>\alpha/(1-\beta)</math> and hence should determine the evidential interpretation of this result. (Since, for a test of two simple hypotheses, the likelihood ratio is a compact representation of the [[likelihood function]]). On the other hand, if the likelihood principle is to be followed consistently, then the likelihood ratio of the original outcome should be used and not <math>\alpha/(1-\beta)</math>, making the basis of such an interpretation questionable. Birnbaum later described this as having "at most heuristic, but not substantial, value for evidential interpretation".
+<blockquote> "A concept of statistical evidence is not plausible unless it finds 'strong evidence for <math>H_2</math> as against <math>H_1</math>' with small probability <math>(\alpha)</math> when <math>H_1</math> is true, and with much larger probability <math>\ (1 - \beta)\ </math> when <math>H_2</math> is true." </blockquote>
-A more direct approach leading to a similar conclusion can be found in Birnbaum's formulation of the ''Confidence principle'', which, unlike the more common version, refers to error probabilities of both kinds. This is stated as follows :<ref>
-{{cite journal|volume = 36|number = 1|last=Birnbaum|first=Allan|authorlink=Allan Birnbaum|title = The Neyman-Pearson Theory as Decision Theory, and as Inference Theory; with a Criticism of the Lindley-Savage Argument for Bayesian Theory|journal = Synthese|year = 1977|pages = 19–49| url=http://philpapers.org/rec/BIRTNT|doi=10.1007/bf00485690}}</ref>
+Such definition of confidence can naturally seem to be satisfied by the definition of CLs. It remains true that
-<blockquote>
-"A concept of statistical evidence is not plausible unless it finds
-'strong evidence for <math>H_2</math> as against <math>H_1</math>' with small probability (<math>\alpha</math>)
-when <math>H_1</math> is true, and with much larger probability (1 -<math>\beta</math>) when
-<math>H_2</math> is true. " </blockquote>
-Such definition of confidence can naturally seem to be satisfied by the definition of '''CLs'''. It remains true that
 both this and the more common (as associated with the [[Jerzy Neyman|Neyman]]-[[Egon Pearson|Pearson]] theory) versions of the confidence principle are incompatible with the likelihood principle, and therefore no frequentist method can be regarded as a truly complete solution to the problems raised by considering conditional properties of confidence intervals.
 == Calculation in the large sample limit ==
-If certain regularity conditions are met, then a general likelihood function will become a [[Gaussian function]] in the large sample limit. In such case the '''CLs''' upper limit at confidence level <math>1-\alpha'</math> (derived from the [[uniformly most powerful test]]) is given by<ref name="asimov">{{cite journal|
+If certain regularity conditions are met, then a general likelihood function will become a [[Gaussian function]] in the large sample limit. In such case the CLs upper limit at confidence level <math>1-\alpha'</math> (derived from the [[uniformly most powerful test]]) is given by<ref name="asimov">{{cite journal |author1=G. Cowan |author2=K. Cranmer |author3=E. Gross |author4=O. Vitells |title=Asymptotic formulae for likelihood-based tests of new physics |journal=Eur. Phys. J. C|volume=71 |issue=2 |pages=1554 |doi=10.1140/epjc/s10052-011-1554-0 |year=2011 |arxiv=1007.1727 |bibcode=2011EPJC...71.1554C }}</ref>
-author         = G. Cowan, K. Cranmer, E. Gross, and O. Vitells |
-title          = Asymptotic formulae for likelihood-based tests of new physics |
-journal        = Eur.Phys.J. |
-volume         = C71 |
-pages          = 1554 |
-doi            = 10.1140/epjc/s10052-011-1554-0 |
-year           = 2011 }}</ref>
-:<math> \theta_{up} = \hat\theta + \sigma\Phi^{-1}(1 - \alpha'\Phi(\hat\theta / \sigma )   ) ,</math>
+:<math> \theta_{up} = \hat\theta + \sigma\Phi^{-1}(1 - \alpha'\Phi(\hat\theta / \sigma ) ) ,</math>
-where <math>\Phi</math> is the [[Normal_distribution#Cumulative_distribution_function|standard normal cumulative distribution]], <math>\hat\theta</math> is the [[maximum likelihood]] estimator of <math>\theta</math> and <math>\sigma</math> is its [[standard deviation]]. <math>\sigma</math> might be estimated from the inverse of the [[Fisher information]] matrix or by using the "Asimov"<ref name="asimov"/> data set. This result happens to be equivalent to a [[Bayesian inference|Bayesian]] [[credible interval]] if a uniform [[Prior probability|prior]] for <math>\theta</math> is used.
+where <math>\Phi</math> is the [[Normal distribution#Cumulative distribution function|standard normal cumulative distribution]], <math>\hat\theta</math> is the [[maximum likelihood]] estimator of <math>\theta</math> and <math>\sigma</math> is its [[standard deviation]]; the latter might be estimated from the inverse of the [[Fisher information]] matrix or by using the "Asimov"<ref name="asimov"/> data set. This result happens to be equivalent to a [[Bayesian inference|Bayesian]] [[credible interval]] if a uniform [[Prior probability|prior]] for <math>\theta</math> is used.
-<!--== Criticism ==-->
 == References ==
@@ Line 100: / Line 72: @@
 == Further reading ==
-* {{cite journal | author = Leon Jay Gleser | title = [Setting Conﬁdence Intervals for Bounded Parameters]: Comment|
+* {{cite journal |author=Leon Jay Gleser |title=[Setting Confidence Intervals for Bounded Parameters]: Comment |
-journal =  Statistical Science| volume =  17 | number = 2|pages =  161–163 | year =  2002 | jstor = 3182818 | doi=10.1214/ss/1030550859}}
+journal=Statistical Science |volume=17 |number=2 |pages=161–163 |year=2002 |jstor=3182818 |doi=10.1214/ss/1030550859|doi-access=free }}
+* {{ cite journal |journal=Phys. Rev. D |doi=10.1103/PhysRevD.69.033002 |issue=3 |author1=Fraser, D. A. S. |author2=Reid N. |author3=Wong, A. C. M. |title=Inference for bounded parameters |year=2004 |pages=033002 |volume=69 |arxiv=physics/0303111 |s2cid=18947032 }}
-*  {{ cite journal|
+* {{cite arXiv |author=Robert D. Cousins |title=Negatively Biased Relevant Subsets Induced by the Most-Powerful One-Sided Upper Confidence Limits for a Bounded Physical Parameter|eprint=1109.2023 |year=2011 |class=physics.data-an}}
-journal = Phys. Rev. D | doi = 10.1103/PhysRevD.69.033002 |  issue = 3 |
-author = Fraser, D. A. S. and Reid, N. and Wong, A. C. M. |
-title = Inference for bounded parameters |
-year = 2004 |
-url = http://link.aps.org/doi/10.1103/PhysRevD.69.033002 |
-pages = 033002 |
-volume = 69 }}
-* {{cite arXiv|
-author = Robert D. Cousins|
-title = Negatively Biased Relevant Subsets Induced by the Most-Powerful One-Sided Upper Confidence Limits for a Bounded Physical Parameter|eprint = 1109.2023 | year = 2011 }}
 == External links ==
@@ Line 118: / Line 81: @@
 <!--- Categories --->
-[[Category:Statistical inference]]
-[[Category:Statistical terminology]]
-[[Category:Measurement]]
 [[Category:Statistical intervals]]
+[[Category:Experimental particle physics]]