arxiv: 2605.07091 · v1 · submitted 2026-05-08 · 💻 cs.DS

Recognition: 2 theorem links

· Lean Theorem

Estimating Correlation Clustering Cost in Node-Arrival Stream

Kaiwen Liu , Seba Daniela Villalobos , Qin Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:07 UTC · model grok-4.3

classification 💻 cs.DS

keywords correlation clusteringstreaming algorithmsnode-arrival modelsublinear spaceapproximation algorithmscost estimationsampling

0 comments

The pith

A streaming algorithm approximates the cost of correlation clustering from a node-arrival stream using sublinear space and constant passes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper focuses on correlation clustering in a model where only nodes arrive in the stream and pairwise similarities or dissimilarities are computed on demand from a function rather than being given as edges. It develops C^4Approx, which estimates the minimum cost of partitioning the nodes to minimize disagreements with the derived labels. The algorithm achieves this with space that grows much slower than the total number of nodes and with only a fixed number of passes. Matching lower bounds show that sublinear space is necessary in the worst case. Experiments confirm that retaining roughly 2 percent of the nodes produces cost estimates close to those of standard offline methods even when the underlying graph is sparse.

Core claim

In the node-arrival streaming model, where nodes stream in and edge relations are obtained via an on-demand similarity function, C^4Approx approximates the optimal correlation clustering cost to within a constant factor using space sublinear in the number of nodes and a constant number of passes over the stream, with complementary lower bounds establishing the necessity of this space usage.

What carries the argument

C^4Approx, a constant-pass streaming procedure that maintains a small sample of nodes and uses them to estimate the global clustering cost by simulating the effect of local pivot decisions on the derived similarity labels.

If this is right

The full set of nodes need not be stored to obtain a reliable estimate of the minimum clustering cost.
Clustering tasks on large raw-object streams become feasible when similarity checks are treated as an oracle rather than precomputed edges.
Constant-pass algorithms suffice for cost estimation even though the complete graph is never materialized.
The same sublinear-space bound applies across both dense and sparse underlying similarity graphs in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The node-arrival model with on-demand oracles may extend to other graph partitioning objectives where edge materialization is expensive.
Similar sampling ideas could be tested for dynamic streams in which nodes both arrive and depart.
The gap between the achieved approximation and the lower bound leaves room for tighter constants or fewer passes on restricted similarity functions.

Load-bearing premise

The similarity function that labels pairs as similar or dissimilar can be evaluated quickly enough on demand that the overall procedure remains efficient when only a small fraction of nodes is retained.

What would settle it

A family of node-arrival instances on which C^4Approx requires linear space or more than a constant number of passes to achieve constant-factor cost approximation, or on which a sublinear-space constant-pass algorithm beats the stated lower bound.

Figures

Figures reproduced from arXiv: 2605.07091 by Kaiwen Liu, Qin Zhang, Seba Daniela Villalobos.

**Figure 1.** Figure 1: Relative error of PrunedPivot and C 4Approx w.r.t. Pivot for varying space (k = 15). Shaded regions show ±1 SD. 2 5 10 15 20 -10% -5% 0% 5% 10% Relative error vs. Pivot Wikipedia 2 5 10 15 20 LiveJournal 2 5 10 15 20 ImageNet PrunedPivot C 4Approx k [PITH_FULL_IMAGE:figures/full_fig_p019_1.png] view at source ↗

**Figure 2.** Figure 2: Relative error of PrunedPivot and C 4Approx w.r.t. Pivot for varying k (space fixed at 4%). Shaded regions show ±1 SD. C 4Approx. This is largely due to the sparsity of the graph and the scarcity of high-degree nodes, causing the additive term of ASW23-SDD to dominate its output. Pass-Accuracy Tradeoff. We also investigate the trade-off between the accuracy of C 4Approx and PrunedPivot (compared with Pivot… view at source ↗

**Figure 4.** Figure 4: Absolute cost of C 4Approx and ASW23-SDD for varying space and k = 15. Shaded regions show ±1 SD. References [1] Arvind Arasu, Christopher R´e, and Dan Suciu. Large-scale deduplication with constraints using dedupalog. In Yannis E. Ioannidis, Dik Lun Lee, and Raymond T. Ng, editors, ICDE, pages 952–963. IEEE Computer Society, 2009. [2] Jessica Shi, Laxman Dhulipala, David Eisenstat, Jakub Lacki, and Vahab … view at source ↗

read the original abstract

We study the correlation clustering problem in the node-arrival data stream model. Unlike previous work, where the stream consists of the graph's edges, we focus on the setting in which the stream contains only the nodes. This model better reflects many real-world scenarios in which the data stream naturally consists of raw objects (e.g., images, tweets), and the similar/dissimilar edges are derived through a similarity function. We present C$^4$Approx, a streaming algorithm that approximates the cost of correlation clustering using sublinear space in the number of nodes and a constant number of passes. We further complement this result with lower bounds. Experiments on real-world datasets show that by storing only 2% of the nodes, our algorithm achieves performance comparable to the classic Pivot algorithm and the more recent PrunedPivot algorithm, even on sparse graphs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims a sublinear-space streaming algorithm for correlation clustering cost under a node-arrival model, but the experiments only test a linear-space regime and provide no evidence for the main theoretical guarantee.

read the letter

The main thing to know is that this paper shifts correlation clustering to a node-arrival stream where nodes come in and similar/dissimilar pairs are computed on demand via a similarity function. They give an algorithm C^4Approx that they say approximates the clustering cost with sublinear space in the number of nodes and constant passes, plus some lower bounds. Experiments on real data show that keeping 2% of the nodes produces results close to the classic Pivot and PrunedPivot methods, even on sparse graphs. That model choice is reasonable for pipelines that start with raw objects rather than precomputed edges, and the empirical comparison is straightforward and uses actual datasets. The work is honest about contrasting the setting with prior edge-stream papers. The soft spot is exactly the one flagged in the stress test. Storing 2% of nodes is still linear in n, so the runs never enter the o(n) regime the theorem requires. If the approximation depends on a sample size that must grow with n to control error, the reported numbers say nothing about whether the guarantee survives at n^0.9 or polylog space. The abstract gives no approximation ratio, no proof sketch, and no detail on how sublinearity is achieved, which makes it impossible to judge the central claim from the given text. This is for streaming-algorithms people who care about clustering under realistic arrival models. A reader in that niche would get value from the model discussion and the practical numbers, but would need the full proofs and better experiments to decide if the sublinear part works. It deserves peer review because the problem is well-posed and the claims are concrete enough for referees to check, even though the current experiments need strengthening to match the theory.

Referee Report

2 major / 2 minor

Summary. The manuscript studies correlation clustering in the node-arrival streaming model, where the input stream consists solely of nodes and similar/dissimilar relations are derived on demand via a similarity function. It presents the C^4Approx algorithm, a constant-pass streaming procedure that approximates the correlation-clustering cost using sublinear space in the number of nodes n, together with matching lower bounds. Experiments on real-world datasets are reported to show that storing 2% of the nodes yields performance comparable to the Pivot and PrunedPivot baselines even on sparse graphs.

Significance. If the sublinear-space guarantee and approximation ratio are correctly established, the result would be a meaningful contribution to streaming algorithms for clustering, because the node-arrival model matches many practical scenarios (images, tweets, etc.) better than the classic edge-arrival model. The addition of lower bounds and real-data experiments strengthens the paper; however, the current experimental regime does not yet demonstrate that the claimed sublinear space bound is practically attainable.

major comments (2)

[Abstract / Experimental Evaluation] Abstract and Experimental Evaluation: the central claim is a sublinear-space (o(n)) constant-pass algorithm, yet the reported experiments store 2% of the nodes (explicitly Θ(n) space). No results, tables, or analysis are supplied for any o(n) space bound (e.g., n^{0.9}, n / log n, or polylog n), so the experiments do not validate the space complexity that the theorem asserts.
[Theorem / Analysis] Theorem statement (presumably §3 or §4): the approximation ratio and the precise dependence of space on the error parameter are not visible in the provided abstract or experimental summary; without an explicit ratio or a proof sketch that isolates the space term, it is impossible to verify whether the sublinear bound is load-bearing or merely asymptotic.

minor comments (2)

[Abstract] The abstract does not state the achieved approximation ratio; adding it would allow readers to assess the result at a glance.
[Experimental Evaluation] The experimental protocol (number of passes, exact sampling procedure, how the similarity function is evaluated in the stream) should be described with sufficient detail to permit reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each point below and will revise the manuscript to better highlight the theoretical guarantees and strengthen the connection between theory and experiments.

read point-by-point responses

Referee: [Abstract / Experimental Evaluation] Abstract and Experimental Evaluation: the central claim is a sublinear-space (o(n)) constant-pass algorithm, yet the reported experiments store 2% of the nodes (explicitly Θ(n) space). No results, tables, or analysis are supplied for any o(n) space bound (e.g., n^{0.9}, n / log n, or polylog n), so the experiments do not validate the space complexity that the theorem asserts.

Authors: We agree that the reported experiments use a fixed 2% sampling rate, which yields linear (though small-constant) space for the finite datasets tested. The theoretical analysis allows the sampling rate to be parameterized as a function of n and the error parameter to achieve o(n) space. In the revision we will add (i) a dedicated paragraph in the experimental section explaining the parameter choices that yield sublinear space, (ii) additional runs on larger synthetic instances with space budgets scaling as n^{0.9} and n / log n, and (iii) a plot of approximation quality versus normalized space usage. These additions will directly illustrate the sublinear regime. revision: yes
Referee: [Theorem / Analysis] Theorem statement (presumably §3 or §4): the approximation ratio and the precise dependence of space on the error parameter are not visible in the provided abstract or experimental summary; without an explicit ratio or a proof sketch that isolates the space term, it is impossible to verify whether the sublinear bound is load-bearing or merely asymptotic.

Authors: The full manuscript states the theorem (with explicit approximation ratio for the clustering cost and the space bound's dependence on the error parameter) and its proof in Section 3. The sublinear space term arises directly from the sampling and sketching primitives used in the analysis and is therefore load-bearing. We will revise the abstract and introduction to quote the precise ratio and space expression, and we will include a short proof sketch in the introduction that isolates the space dependence on n and the error parameter. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation; algorithm and bounds are self-contained

full rationale

The paper introduces C^4Approx as a new streaming algorithm for correlation clustering cost estimation in the node-arrival model, claiming sublinear space and constant passes, complemented by lower bounds. The abstract and summary contain no self-definitional steps, no fitted parameters renamed as predictions, and no load-bearing self-citations that reduce the central claims to prior unverified inputs. The derivation appears independent, with experiments serving as empirical validation rather than part of the mathematical chain. The noted experimental space usage (2% of nodes) raises a separate question about regime coverage but does not create circularity in the stated theoretical results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities. The algorithm name C^4Approx is introduced but its internal construction is not described.

pith-pipeline@v0.9.0 · 5438 in / 1121 out tokens · 40833 ms · 2026-05-11T01:07:52.596263+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

C^4Approx ... approximates the cost of correlation clustering using sublinear space ... high-low degree decomposition to control variance of the sampling-based estimations
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

reference set R ... highest ranked nodes ... node partition V=A⊎B ... each cluster lies entirely within either A or B

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

[1]

Large-scale deduplication with constraints using dedupalog

Arvind Arasu, Christopher R´ e, and Dan Suciu. Large-scale deduplication with constraints using dedupalog. In Yannis E. Ioannidis, Dik Lun Lee, and Raymond T. Ng, editors, ICDE, pages 952–963. IEEE Computer Society, 2009

work page 2009
[2]

Mirrokni

Jessica Shi, Laxman Dhulipala, David Eisenstat, Jakub Lacki, and Vahab S. Mirrokni. Scalable community detection via parallel correlation clustering.Proc. VLDB Endow., 14(11):2305–2313, 2021

work page 2021
[3]

Nate Veldt, Anthony Wirth, and David F. Gleich. Parameterized correlation clustering in hypergraphs and bipartite graphs. In Rajesh Gupta, Yan Liu, Jiliang Tang, and B. Aditya Prakash, editors,KDD, pages 1868–1876. ACM, 2020

work page 2020
[4]

Higher- order correlation clustering for image segmentation

Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, and Chang Dong Yoo. Higher- order correlation clustering for image segmentation. In John Shawe-Taylor, Richard S. Zemel, Peter L. Bartlett, Fernando C. N. Pereira, and Kilian Q. Weinberger, editors, NeurIPS, pages 1530–1538, 2011

work page 2011
[5]

Bounding and comparing methods for correlation clustering beyond ilp

Micha Elsner and Warren Schudy. Bounding and comparing methods for correlation clustering beyond ilp. InProceedings of the Workshop on Integer Linear Programming for Natural Language Processing, pages 19–27, 2009

work page 2009
[6]

Anindya Bhattacharya and Rajat K. De. Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values.J. Biomed. Informatics, 43(4):560–568, 2010. 20

work page 2010
[7]

Puleo, and Olgica Milenkovic

Pan Li, Hoang Dau, Gregory J. Puleo, and Olgica Milenkovic. Motif clustering and overlapping clustering for social network analysis. InINFOCOM, pages 1–9. IEEE, 2017

work page 2017
[8]

An application of correlation clustering to portfolio diversification, 2015

Hannah Cheng, Juan Zhan, William Rea, and Alethea Rea. An application of correlation clustering to portfolio diversification, 2015. URL https://arxiv.org/abs/1511.07945

work page arXiv 2015
[9]

Correlation clustering.Mach

Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering.Mach. Learn., 56(1-3):89–113, 2004

work page 2004
[10]

Clustering with qualitative information.J

Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitative information.J. Comput. Syst. Sci., 71(3):360–383, 2005

work page 2005
[11]

Aggregating inconsistent information: Ranking and clustering.J

Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: Ranking and clustering.J. ACM, 55(5):23:1–23:27, 2008

work page 2008
[12]

Near optimal LP rounding algorithm for correlation clustering on complete and complete k-partite graphs

Shuchi Chawla, Konstantin Makarychev, Tselil Schramm, and Grigory Yaroslavtsev. Near optimal LP rounding algorithm for correlation clustering on complete and complete k-partite graphs. InSTOC, pages 219–228. ACM, 2015

work page 2015
[13]

Correlation clustering with sherali-adams

Vincent Cohen-Addad, Euiwoong Lee, and Alantha Newman. Correlation clustering with sherali-adams. InFOCS, pages 651–661. IEEE, 2022

work page 2022
[14]

Handling correlated rounding error via preclustering: A 1.73-approximation for correlation clustering

Vincent Cohen-Addad, Euiwoong Lee, Shi Li, and Alantha Newman. Handling correlated rounding error via preclustering: A 1.73-approximation for correlation clustering. In FOCS, pages 1082–1104. IEEE, 2023

work page 2023
[15]

Understanding the cluster linear program for correlation clustering

Nairen Cao, Vincent Cohen-Addad, Euiwoong Lee, Shi Li, Alantha Newman, and Lukas Vogl. Understanding the cluster linear program for correlation clustering. InSTOC, pages 1617–1628. ACM, 2024

work page 2024
[16]

Solving the correlation cluster LP in sublinear time

Nairen Cao, Vincent Cohen-Addad, Euiwoong Lee, Shi Li, David Rasmussen Lolck, Alantha Newman, Mikkel Thorup, Lukas Vogl, Shuyi Yan, and Hanwen Zhang. Solving the correlation cluster LP in sublinear time. In Michal Kouck´ y and Nikhil Bansal, editors, STOC, pages 1154–1165. ACM, 2025

work page 2025
[17]

Dalvi, and Ravi Kumar

Flavio Chierichetti, Nilesh N. Dalvi, and Ravi Kumar. Correlation clustering in mapre- duce. InKDD, pages 641–650. ACM, 2014

work page 2014
[18]

Correlation clustering in data streams

Kook Jin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, and Anthony Wirth. Correlation clustering in data streams. InICML, pages 2237–2246. JMLR.org, 2015

work page 2015
[19]

Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, and Michael I

Xinghao Pan, Dimitris S. Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan. Parallel correlation clustering on big graphs. In NeurIPS, pages 3357–3365, 2015

work page 2015
[20]

Correlation clustering in constant many parallel rounds

Vincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrovic, Ashkan Norouzi-Fard, Nikos Parotsidis, and Jakub Tarnawski. Correlation clustering in constant many parallel rounds. InICML, pages 2069–2078. PMLR, 2021. 21

work page 2069
[21]

Almost 3-approximate correlation clustering in constant rounds

Soheil Behnezhad, Moses Charikar, Weiyun Ma, and Li-Yang Tan. Almost 3-approximate correlation clustering in constant rounds. InFOCS, pages 720–731. IEEE, 2022

work page 2022
[22]

Sublinear time and space algorithms for correlation clustering via sparse-dense decompositions

Sepehr Assadi and Chen Wang. Sublinear time and space algorithms for correlation clustering via sparse-dense decompositions. In Mark Braverman, editor,ITCS, volume 215 ofLIPIcs, pages 10:1–10:20. Schloss Dagstuhl - Leibniz-Zentrum f¨ ur Informatik, 2022

work page 2022
[23]

Single-pass streaming algorithms for correlation clustering

Soheil Behnezhad, Moses Charikar, Weiyun Ma, and Li-Yang Tan. Single-pass streaming algorithms for correlation clustering. InSODA, pages 818–853. SIAM, 2023

work page 2023
[24]

Single-pass pivot algorithm for correla- tion clustering

Konstantin Makarychev and Sayak Chakrabarty. Single-pass pivot algorithm for correla- tion clustering. keep it simple! InNeurIPS, 2023

work page 2023
[25]

Streaming algorithms and lower bounds for estimating correlation clustering cost

Sepehr Assadi, Vihan Shah, and Chen Wang. Streaming algorithms and lower bounds for estimating correlation clustering cost. InNeurIPS, 2023

work page 2023
[26]

Breaking 3-factor approximation for correlation clustering in polylogarithmic rounds

Nairen Cao, Shang-En Huang, and Hsin-Hao Su. Breaking 3-factor approximation for correlation clustering in polylogarithmic rounds. InSODA, pages 4124–4154. SIAM, 2024

work page 2024
[27]

A (3 + ε)-approximate correlation clustering algorithm in dynamic streams

M´ elanie Cambus, Fabian Kuhn, Etna Lindy, Shreyas Pai, and Jara Uitto. A (3 + ε)-approximate correlation clustering algorithm in dynamic streams. InSODA, pages 2861–2880. SIAM, 2024

work page 2024
[28]

Combinatorial correlation clustering

Vincent Cohen-Addad, David Rasmussen Lolck, Marcin Pilipczuk, Mikkel Thorup, Shuyi Yan, and Hanwen Zhang. Combinatorial correlation clustering. InSTOC, pages 1617–1628. ACM, 2024

work page 2024
[29]

Pruned pivot: Correlation clustering algorithm for dynamic, parallel, and local computation models

Mina Dalirrooyfard, Konstantin Makarychev, and Slobodan Mitrovic. Pruned pivot: Correlation clustering algorithm for dynamic, parallel, and local computation models. InICML. OpenReview.net, 2024

work page 2024
[30]

The space complexity of approximating the frequency moments.J

Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments.J. Comput. Syst. Sci., 58(1):137–147, 1999

work page 1999
[31]

Frequency moments in noisy streaming and distributed data under mismatch ambiguity, 2026

Kaiwen Liu and Qin Zhang. Frequency moments in noisy streaming and distributed data under mismatch ambiguity, 2026. URLhttps://arxiv.org/abs/2603.11216

work page arXiv 2026
[32]

Correlation clustering beyond the pivot algorithm

Soheil Behnezhad, Moses Charikar, Vincent Cohen-Addad, Alma Ghafari, and Weiyun Ma. Correlation clustering beyond the pivot algorithm. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff, and Jerry Zhu, editors,ICML, Proceedings of Machine Learning Research. PMLR / OpenReview.net, 2025

work page 2025
[33]

Fully dynamic maximal independent set with polylogarithmic update time

Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Cliff Stein, and Madhu Sudan. Fully dynamic maximal independent set with polylogarithmic update time. In David Zuckerman, editor,FOCS, pages 382–405. IEEE Computer Society, 2019. 22

work page 2019
[34]

Fully dynamic maximal independent set in expected poly-log update time

Shiri Chechik and Tianyi Zhang. Fully dynamic maximal independent set in expected poly-log update time. In David Zuckerman, editor,FOCS, pages 370–381. IEEE Computer Society, 2019

work page 2019
[35]

Dy- namic correlation clustering in sublinear update time

Vincent Cohen-Addad, Silvio Lattanzi, Andreas Maggiori, and Nikos Parotsidis. Dy- namic correlation clustering in sublinear update time. In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,ICML, Proceedings of Machine Learning Research, pages 9230–9274. PMLR / OpenReview....

work page 2024
[36]

Sparse-pivot: Dynamic correlation clustering for node insertions

Mina Dalirrooyfard, Konstantin Makarychev, and Slobodan Mitrovic. Sparse-pivot: Dynamic correlation clustering for node insertions. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff, and Jerry Zhu, editors,ICML. PMLR / OpenReview.net, 2025

work page 2025
[37]

Cambridge University Press, Cambridge, 1996

Eyal Kushilevitz and Noam Nisan.Communication Complexity. Cambridge University Press, Cambridge, 1996. ISBN 9780521560672. doi: 10.1017/CBO9780511574948

work page doi:10.1017/cbo9780511574948 1996
[38]

Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity.J. Comput. Syst. Sci., 68(4): 702–732, 2004

work page 2004
[39]

The webgraph framework I: compression techniques

Paolo Boldi and Sebastiano Vigna. The webgraph framework I: compression techniques. In Stuart I. Feldman, Mike Uretsky, Marc Najork, and Craig E. Wills, editors,WWW, pages 595–602. ACM, 2004

work page 2004
[40]

Huttenlocher, Jon M

Lars Backstrom, Daniel P. Huttenlocher, Jon M. Kleinberg, and Xiangyang Lan. Group formation in large social networks: membership, growth, and evolution. In Tina Eliassi- Rad, Lyle H. Ungar, Mark Craven, and Dimitrios Gunopulos, editors,SIGKDD, pages 44–54. ACM, 2006

work page 2006
[41]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, pages 248–255. IEEE Computer Society, 2009

work page 2009
[42]

Openclip, July 2021

Gabriel Ilharco, Mitchell Wortsman, Ross Wightman, Cade Gordon, Nicholas Carlini, Ro- han Taori, Achal Dave, Vaishaal Shankar, Hongseok Namkoong, John Miller, Hannaneh Hajishirzi, Ali Farhadi, and Ludwig Schmidt. Openclip, July 2021. 23 A Math Tools Lemma A.1(Chernoff Bound).Let X = {x1, . . . , xN } be a finite population of N points and X1, . . . , Xt b...

work page 2021