Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget

Florian E. Dorner; Moritz Hardt

arxiv: 2402.02249 · v3 · submitted 2024-02-03 · 💻 cs.LG

Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget

Florian E. Dorner , Moritz Hardt This is my paper

Pith reviewed 2026-05-24 03:48 UTC · model grok-4.3

classification 💻 cs.LG

keywords noisy labelsclassifier comparisonlabeling budgetlarge deviationsCramér's theoremmachine learning evaluationbinary classificationbenchmark design

0 comments

The pith

To identify the better of two binary classifiers, spending a labeling budget on one noisy label per sample beats collecting multiple labels per sample.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how to allocate a fixed budget of noisy labels most effectively when the task is to decide which of two binary classifiers has higher accuracy. Standard practice aggregates several noisy labels on the same point via majority vote to reduce noise, yet the authors prove the opposite strategy wins: label more distinct points once each. The proof applies Cramér's theorem from large deviations theory to the error rates of the two accuracy estimators under an independent label noise model. The result supplies tighter sample-size guarantees than Hoeffding bounds and directly questions common design choices for machine-learning evaluation sets.

Core claim

Under a budget of noisy labels and an independent noise model, the probability of correctly ranking two binary classifiers is strictly higher when the budget is used to obtain one label on each of many samples than when it is used to obtain multiple labels on fewer samples, as established by a large-deviations analysis via Cramér's theorem.

What carries the argument

Cramér's theorem applied to the large-deviation rate functions of the empirical accuracy estimators for the two classifiers under the independent label-noise model.

If this is right

Benchmark construction should favor labeling more unique examples rather than repeated labeling of the same examples.
Sample-size calculations for pairwise classifier comparisons can use the new bounds, which improve on those derived from Hoeffding's inequality.
Label aggregation via majority vote is not optimal when the sole goal is to rank two classifiers.
The ordering of labeling strategies reverses if the objective shifts from ranking to estimating absolute accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same quantity-over-quality logic may apply to other pairwise ranking tasks that rely on noisy observations, such as A/B testing with noisy user feedback.
Crowdsourcing platforms could redesign task allocation to prioritize breadth over depth when the downstream use is model comparison.
If label noise turns out to be correlated across samples, the single-label strategy could lose its advantage and repeated labeling might regain value.

Load-bearing premise

The noisy labels for different samples are independent and follow the paper's specific noise model so that the large-deviation rates govern the comparison error.

What would settle it

A simulation or crowdsourced experiment, under the paper's noise model, in which the probability of correctly identifying the superior classifier is higher when the budget is spent on repeated labels than on single labels across more samples.

Figures

Figures reproduced from arXiv: 2402.02249 by Florian E. Dorner, Moritz Hardt.

**Figure 2.** Figure 2: Probability of identifying cb for accuracy p = 0.8, margin ϵ = 0.01, budget k = 1500 (a), label accuracy q = 0.8 (b). 3.1 Hoeffding Bounds Hoeffding’s inequality yields the following lemma proven in Appendix C that allows us to lower bound the probability that cb beats cw on a test set: 5 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: a): Convergence of normalized log error rates to the values implied by Cram´er’s Theorem for [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It's common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it's best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cram\'er's theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding's bound.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proves that under independent label noise, one label per example beats multiple labels per example for identifying the better of two classifiers, via a Cramér rate comparison.

read the letter

The key result is that with a fixed labeling budget, collecting single noisy labels on more points beats aggregating multiple labels on fewer points when the task is to decide which of two binary classifiers is more accurate. The proof applies Cramér's theorem to the large-deviation rate of the probability that the empirical accuracy estimate for the worse classifier exceeds the one for the better one. This is new for the classifier-comparison setting and directly challenges the usual recommendation to majority-vote labels for cleaner signals. The paper also supplies sample-size bounds that improve on what Hoeffding gives. The derivation itself looks clean and uses only standard large-deviations machinery under the stated model. Credit is due for spelling out the practical implication for benchmark construction. The central argument holds up inside the independent-flip noise model the authors use. The soft spot is exactly the one flagged in the stress-test note: the per-sample random variable changes if labels on the same point are correlated, which is common when the same annotator labels multiple times. The paper states its independence assumption, but does not appear to test sensitivity to correlation or give conditions under which the k=1 optimum survives. That is a real but contained limitation rather than a load-bearing flaw. The work is for researchers who design or analyze evaluation budgets and benchmarks. A reader who wants a crisp theoretical counter-example to label-aggregation heuristics will get something usable. It deserves peer review because the formal claim is derived from first principles and the practical stakes are clear, even though the noise model is stylized.

Referee Report

2 major / 2 minor

Summary. The paper claims that, when allocating a fixed budget of noisy labels to identify which of two binary classifiers has higher accuracy, it is optimal to collect one label per sample across as many samples as possible rather than obtaining multiple labels per sample and aggregating them (e.g., by majority vote). The result is derived from a non-trivial application of Cramér's theorem to the large-deviation rate of the event that the empirical accuracy of the worse classifier exceeds that of the better one; the per-sample random variable depends on the number k of labels collected per point, and the rate is shown to be maximized at k=1 under the paper's noise model.

Significance. If the central theorem holds under the stated assumptions, the result supplies sample-size bounds tighter than those obtained from Hoeffding's inequality and directly challenges conventional practice in benchmark construction. It also supplies a concrete, falsifiable prediction about the relative value of label quantity versus label quality that can be tested on real annotation pipelines.

major comments (2)

[Theorem and proof (likely §3–4)] The application of Cramér's theorem (presumably in the proof of the main result) is performed on the empirical means of per-sample random vectors whose distribution depends on k. The manuscript must explicitly state the noise model (independence of labels, fixed flip probability, etc.) in the section containing the theorem statement and verify that the resulting rate function I_k satisfies I_1 > I_k for k>1; without these details the inequality cannot be checked.
[Noise model and discussion of assumptions] The skeptic concern is load-bearing: if labels for the same point are correlated (same annotator, batch effects, etc.), the effective per-sample law changes and the ordering of the rates may reverse. The paper should either prove robustness or add a remark quantifying how much correlation is tolerable before the k=1 optimum fails.

minor comments (2)

[Introduction or final discussion] Add a short paragraph contrasting the new bounds with the Hoeffding-derived bounds mentioned in the abstract, including the explicit functional form of the improvement.
[Problem formulation] Clarify the precise budget constraint (total labels B fixed, or total samples n fixed?) and how the comparison is performed when the two classifiers are evaluated on the same labeled set.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments. We address the major points below.

read point-by-point responses

Referee: [Theorem and proof (likely §3–4)] The application of Cramér's theorem (presumably in the proof of the main result) is performed on the empirical means of per-sample random vectors whose distribution depends on k. The manuscript must explicitly state the noise model (independence of labels, fixed flip probability, etc.) in the section containing the theorem statement and verify that the resulting rate function I_k satisfies I_1 > I_k for k>1; without these details the inequality cannot be checked.

Authors: We agree. In the revised manuscript we will state the noise model (independent labels, each with fixed flip probability) explicitly in the theorem section and include a verification (or appendix reference) that the rate function satisfies I_1 > I_k for k>1. revision: yes
Referee: [Noise model and discussion of assumptions] The skeptic concern is load-bearing: if labels for the same point are correlated (same annotator, batch effects, etc.), the effective per-sample law changes and the ordering of the rates may reverse. The paper should either prove robustness or add a remark quantifying how much correlation is tolerable before the k=1 optimum fails.

Authors: We will add a remark in the discussion section acknowledging the independence assumption and noting that correlations could reverse the ordering. A precise quantification of the tolerable correlation threshold lies outside the scope of the present analysis. revision: partial

standing simulated objections not resolved

Precise quantification of the correlation level at which the k=1 optimum fails.

Circularity Check

0 steps flagged

No significant circularity; derivation applies external Cramér theorem

full rationale

The paper derives its central comparison of large-deviation rates by applying Cramér's theorem (a standard, externally established result in large deviations theory) to the empirical means of per-sample random vectors whose distribution is fixed by the stated i.i.d. label-noise model. This application yields the rate function I_k and the inequality rate(k=1) > rate(k>1) as a mathematical consequence rather than by redefinition, fitting, or self-citation. No load-bearing step reduces to the paper's own inputs by construction; the noise-model assumptions are explicit and the theorem is independent. The result is therefore self-contained against the external benchmark of Cramér's theorem.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no details on free parameters, axioms, or invented entities; the result relies on Cramér's theorem, a standard result in large deviations theory.

pith-pipeline@v0.9.0 · 5656 in / 968 out tokens · 25159 ms · 2026-05-24T03:48:57.541167+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 3 internal anchors

[1]

and Welty, C

Aroyo, L. and Welty, C. Crowd truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. WebSci2013. ACM, 2013(2013),

work page 2013
[2]

A large annotated corpus for learning natural language inference

Bowman, S. R., Angeli, G., Potts, C., and Manning, C. D. A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 ,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Chen, D., Yu, Z., and Bowman, S. R. Clean or annotate: How to spend a limited data collection budget. arXiv preprint arXiv:2110.08355 ,

work page arXiv
[4]

and Pluim, J

Cheplygina, V. and Pluim, J. P. Crowd disagreement about medical images is informative. In Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis: 7th Joint International Workshop, CVII-STENT 2018 and Third International Workshop, LABELS 2018, Held in Conjunction with MICCAI 2018, Gran...

work page 2018
[5]

Whose ground truth? accounting for individual and collective identities underlying dataset annotation

Denton, E., D´ ıaz, M., Kivlichan, I., Prabhakaran, V., and Rosen, R. Whose ground truth? accounting for individual and collective identities underlying dataset annotation. arXiv preprint arXiv:2112.04554,

work page arXiv
[6]

E., Peychev, M., Konstantinov, N., Goel, N., Ash, E., and Vechev, M

Dorner, F. E., Peychev, M., Konstantinov, N., Goel, N., Ash, E., and Vechev, M. Human-guided fair classification for natural language processing. arXiv preprint arXiv:2212.10154 ,

work page arXiv
[7]

L., Lam, M

Gordon, M. L., Lam, M. S., Park, J. S., Patel, K., Hancock, J., Hashimoto, T., and Bernstein, M. S. Jury learning: Integrating dissenting voices into machine learning models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems , pp. 1–19,

work page 2022
[8]

Klenke, A

Accessed: 2024-01-24. Klenke, A. Probability theory: a comprehensive course . Springer Science & Business Media,

work page 2024
[9]

R., Stevens, K., Barhoum, A., Duc, N

K¨ opf, A., Kilcher, Y., von R¨ utte, D., Anagnostidis, S., Tam, Z.-R., Stevens, K., Barhoum, A., Duc, N. M., Stanley, O., Nagyfi, R., et al. Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327 ,

work page arXiv
[10]

and Sra, S

Mania, H. and Sra, S. Why do classifier accuracies show linear trends under distribution shift? arXiv preprint arXiv:2012.15483,

work page arXiv 2012
[11]

Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment

Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., and Zamparelli, R. Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pp. 1–8,

work page 2014
[12]

On Releasing Annotator-Level Labels and Information in Datasets

Prabhakaran, V., Davani, A. M., and Diaz, M. On releasing annotator-level labels and information in datasets. arXiv preprint arXiv:2110.05699 ,

work page arXiv
[13]

and Leonardelli, E

Ramponi, A. and Leonardelli, E. Dh-fbk at semeval-2022 task 4: leveraging annotators’ disagreement and multiple data views for patronizing language detection. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) , pp. 324–334,

work page 2022
[14]

D., Ng, A

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., and Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing , pp. 1631–1642,

work page 2013
[15]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhar- gava, P., Bhosale, S., et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288,

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Williams, A., Nangia, N., and Bowman, S. R. A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426 ,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Proof. We use an upper bound version of Stirling’s approximation based on Theorem 2.6 in (Stanica, 2001): m − 1 m−1 2 < 4 m−1 2 q π m−1 2 , 18 the fact that q(1 − q) is maximized at q = 0.5 and the monontonicity of 1√ k to estimate 2σ(m, q) = 2 m−2X k uneven k ⌈ k 2 ⌉ q⌈ k 2 ⌉(1 − q)⌈ k 2 ⌉ = 2 m−2X k uneven k k+1 2 q k+1 2 (1 − q) k+1 2 = 2 m−2X k uneven...

work page 2001
[18]

Under assumption 2, this implies that the single label strategy outperforms the m−label strategy for these n

For G as defined as in Section 3.2, m > 1 and qb, qw, pw, p0 b, p1 b fixed such that assumption 1 holds, there exist an N ∈ N such that for n > N P nX i Gi(Mm(q), p) > 0 ! < P mnX i Gi(q, p) > 0 ! . Under assumption 2, this implies that the single label strategy outperforms the m−label strategy for these n. The proof of theorem 2 is based on Cram´ er’s Th...

work page 2013
[19]

Correspondingly, (11) holds if and only if (1 − 2Mm(q))m m − 1 m−1 2 q m−1 2 (1 − q) m−1 2 ≥ m(1 − 2q) p f ∗(q) 2 p f ∗(q) + c p g∗(q) 2 p g∗(q) + c

2n n Z q 0 xn(1 − x)ndx (Boland et al., 1989). Correspondingly, (11) holds if and only if (1 − 2Mm(q))m m − 1 m−1 2 q m−1 2 (1 − q) m−1 2 ≥ m(1 − 2q) p f ∗(q) 2 p f ∗(q) + c p g∗(q) 2 p g∗(q) + c . 24 For 0.5 < q < 1, this is equivalent to 2Mm(q) − 1 2q − 1 m − 1 m−1 2 q m−1 2 (1 − q) m−1 2 ≤ p f ∗(q) 2 p f ∗(q) + c p g∗(q) 2 p g∗(q) + c . (12) Lemma D.1....

work page 1989
[20]

29 Proof

Then c1A−c2B c1C−c2D ≤ A C is true if and only if CB ≥ DA. 29 Proof. c1A − c2B c1C − c2D ≤ A C ⇐ ⇒ c1A − c2B ≥ A(c1C − c2D) C ⇐ ⇒ C(c1A − c2B) ≤ A(c1C − c2D) ⇐ ⇒ c1CA − c2CB ≤ c1CA − c2DA ⇐ ⇒ −c2CB ≤ −c2DA ⇐ ⇒ CB ≥ DA We set A = (1 − 2Mm(q + δ)), B = (1 − 2Mm(q)), C = (1 − 2(q + δ)), D = (1 − 2q), such that CB ≥ DA is equivalent to (1 − 2(q + δ))(1 − 2Mm(...

work page 1989

[1] [1]

and Welty, C

Aroyo, L. and Welty, C. Crowd truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. WebSci2013. ACM, 2013(2013),

work page 2013

[2] [2]

A large annotated corpus for learning natural language inference

Bowman, S. R., Angeli, G., Potts, C., and Manning, C. D. A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 ,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Chen, D., Yu, Z., and Bowman, S. R. Clean or annotate: How to spend a limited data collection budget. arXiv preprint arXiv:2110.08355 ,

work page arXiv

[4] [4]

and Pluim, J

Cheplygina, V. and Pluim, J. P. Crowd disagreement about medical images is informative. In Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis: 7th Joint International Workshop, CVII-STENT 2018 and Third International Workshop, LABELS 2018, Held in Conjunction with MICCAI 2018, Gran...

work page 2018

[5] [5]

Whose ground truth? accounting for individual and collective identities underlying dataset annotation

Denton, E., D´ ıaz, M., Kivlichan, I., Prabhakaran, V., and Rosen, R. Whose ground truth? accounting for individual and collective identities underlying dataset annotation. arXiv preprint arXiv:2112.04554,

work page arXiv

[6] [6]

E., Peychev, M., Konstantinov, N., Goel, N., Ash, E., and Vechev, M

Dorner, F. E., Peychev, M., Konstantinov, N., Goel, N., Ash, E., and Vechev, M. Human-guided fair classification for natural language processing. arXiv preprint arXiv:2212.10154 ,

work page arXiv

[7] [7]

L., Lam, M

Gordon, M. L., Lam, M. S., Park, J. S., Patel, K., Hancock, J., Hashimoto, T., and Bernstein, M. S. Jury learning: Integrating dissenting voices into machine learning models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems , pp. 1–19,

work page 2022

[8] [8]

Klenke, A

Accessed: 2024-01-24. Klenke, A. Probability theory: a comprehensive course . Springer Science & Business Media,

work page 2024

[9] [9]

R., Stevens, K., Barhoum, A., Duc, N

K¨ opf, A., Kilcher, Y., von R¨ utte, D., Anagnostidis, S., Tam, Z.-R., Stevens, K., Barhoum, A., Duc, N. M., Stanley, O., Nagyfi, R., et al. Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327 ,

work page arXiv

[10] [10]

and Sra, S

Mania, H. and Sra, S. Why do classifier accuracies show linear trends under distribution shift? arXiv preprint arXiv:2012.15483,

work page arXiv 2012

[11] [11]

Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment

Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., and Zamparelli, R. Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pp. 1–8,

work page 2014

[12] [12]

On Releasing Annotator-Level Labels and Information in Datasets

Prabhakaran, V., Davani, A. M., and Diaz, M. On releasing annotator-level labels and information in datasets. arXiv preprint arXiv:2110.05699 ,

work page arXiv

[13] [13]

and Leonardelli, E

Ramponi, A. and Leonardelli, E. Dh-fbk at semeval-2022 task 4: leveraging annotators’ disagreement and multiple data views for patronizing language detection. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) , pp. 324–334,

work page 2022

[14] [14]

D., Ng, A

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., and Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing , pp. 1631–1642,

work page 2013

[15] [15]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhar- gava, P., Bhosale, S., et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288,

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

Williams, A., Nangia, N., and Bowman, S. R. A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426 ,

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Proof. We use an upper bound version of Stirling’s approximation based on Theorem 2.6 in (Stanica, 2001): m − 1 m−1 2 < 4 m−1 2 q π m−1 2 , 18 the fact that q(1 − q) is maximized at q = 0.5 and the monontonicity of 1√ k to estimate 2σ(m, q) = 2 m−2X k uneven k ⌈ k 2 ⌉ q⌈ k 2 ⌉(1 − q)⌈ k 2 ⌉ = 2 m−2X k uneven k k+1 2 q k+1 2 (1 − q) k+1 2 = 2 m−2X k uneven...

work page 2001

[18] [18]

Under assumption 2, this implies that the single label strategy outperforms the m−label strategy for these n

For G as defined as in Section 3.2, m > 1 and qb, qw, pw, p0 b, p1 b fixed such that assumption 1 holds, there exist an N ∈ N such that for n > N P nX i Gi(Mm(q), p) > 0 ! < P mnX i Gi(q, p) > 0 ! . Under assumption 2, this implies that the single label strategy outperforms the m−label strategy for these n. The proof of theorem 2 is based on Cram´ er’s Th...

work page 2013

[19] [19]

Correspondingly, (11) holds if and only if (1 − 2Mm(q))m m − 1 m−1 2 q m−1 2 (1 − q) m−1 2 ≥ m(1 − 2q) p f ∗(q) 2 p f ∗(q) + c p g∗(q) 2 p g∗(q) + c

2n n Z q 0 xn(1 − x)ndx (Boland et al., 1989). Correspondingly, (11) holds if and only if (1 − 2Mm(q))m m − 1 m−1 2 q m−1 2 (1 − q) m−1 2 ≥ m(1 − 2q) p f ∗(q) 2 p f ∗(q) + c p g∗(q) 2 p g∗(q) + c . 24 For 0.5 < q < 1, this is equivalent to 2Mm(q) − 1 2q − 1 m − 1 m−1 2 q m−1 2 (1 − q) m−1 2 ≤ p f ∗(q) 2 p f ∗(q) + c p g∗(q) 2 p g∗(q) + c . (12) Lemma D.1....

work page 1989

[20] [20]

29 Proof

Then c1A−c2B c1C−c2D ≤ A C is true if and only if CB ≥ DA. 29 Proof. c1A − c2B c1C − c2D ≤ A C ⇐ ⇒ c1A − c2B ≥ A(c1C − c2D) C ⇐ ⇒ C(c1A − c2B) ≤ A(c1C − c2D) ⇐ ⇒ c1CA − c2CB ≤ c1CA − c2DA ⇐ ⇒ −c2CB ≤ −c2DA ⇐ ⇒ CB ≥ DA We set A = (1 − 2Mm(q + δ)), B = (1 − 2Mm(q)), C = (1 − 2(q + δ)), D = (1 − 2q), such that CB ≥ DA is equivalent to (1 − 2(q + δ))(1 − 2Mm(...

work page 1989