arxiv: 2604.17104 · v2 · submitted 2026-04-18 · 💻 cs.DC · cs.AI· cs.LG

Recognition: no theorem link

TStore: Rethinking AI Model Hub with Tensor-Centric Compression

Tingfeng Lan , Zirui Wang , Yunjia Zheng , Zhaoyuan Su , Juncheng Yang , Yue Cheng

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:00 UTC · model grok-4.3

classification 💻 cs.DC cs.AIcs.LG

keywords AI model storagetensor deduplicationmodel compressionfingerprintingclusteringmodel hubsstorage reduction

0 comments

The pith

TStore reduces AI model hub storage by deduplicating tensors across models using fingerprinting and clustering without annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TStore as a system that tackles storage challenges from rapidly growing and redundant AI models in hubs. It applies fine-grained deduplication and compression directly at the tensor level to spot shared components across models. Fingerprinting and clustering enable this identification automatically, without any need for annotations or labels. Experiments on real-world repositories show substantial storage savings while keeping models fully usable with unchanged performance.

Core claim

TStore shows that tensor-level fingerprinting and clustering can identify redundancy across models without annotations, enabling efficient storage reduction in AI model hubs while preserving model usability and performance.

What carries the argument

Tensor-level fingerprinting and clustering to detect cross-model redundancies for deduplication.

If this is right

AI model hubs require less physical storage for the same collection of models.
Distribution of models becomes faster and cheaper due to smaller sizes.
No manual annotations or metadata are needed to achieve the reductions.
Model inference behavior stays identical after decompression and reuse.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could extend to dynamic model repositories where new models are added continuously.
Similar tensor clustering might apply to other large-scale data stores like scientific simulation outputs.
Version control systems for models could incorporate this deduplication as a backend layer.

Load-bearing premise

Tensor-level fingerprinting and clustering can reliably detect cross-model redundancy without any annotations and the resulting compression leaves model accuracy and inference behavior unchanged.

What would settle it

Running standard accuracy benchmarks on models before and after TStore compression and finding measurable drops in performance or changed outputs on identical inputs.

Figures

Figures reproduced from arXiv: 2604.17104 by Juncheng Yang, Tingfeng Lan, Yue Cheng, Yunjia Zheng, Zhaoyuan Su, Zirui Wang.

**Figure 1.** Figure 1: Left: Normalized storage relative to uncompressed (Full = 1.00× 40.11TB of randomly sampled Hugging Face models). TensorHub reduces total storage cost by 3.39×, achieving substantially lower storage footprint than state-of-the-art baselines. Right: TensorHub achieves high compression and decompression throughput. model count and total storage footprint: by 2025, fine-tuned models account for 99.1% of tot… view at source ↗

**Figure 2.** Figure 2: Cumulative storage size (left) and model count (right) on Hugging Face from 2019 to 2025. Fine-tuned models (blue) dominate both metrics, accounting for 99.1% of storage and 99.6% of model count by 2025, while base models (red) remain a small fraction. This paper makes the following contributions: • This is, to our knowledge, the first work that shows LLM storage redundancy fundamentally emerges at the ten… view at source ↗

**Figure 4.** Figure 4: Distribution of Hugging Face model lineage metadata by download rank. Most models (74.2% overall) lack this information. lower entropy than the original. A general-purpose compressor is then applied to this delta to obtain the final stored representation. FM-Delta [71] compresses the arithmetic residuals between base models and their variants. However, FMDelta [71] offers limited throughput and lacks p… view at source ↗

**Figure 6.** Figure 6: Different tensors within a single model exhibit heterogeneous storage-reduction ratios and the best bases vary. Each cell shows the reduction ratio when compressing a tensor (row) against the corresponding tensor in a candidate model (column) [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗

**Figure 7.** Figure 7: Optimal compression pairing emerges at the tensor level. The best bases for different tensors in the same model often reside in a different model, rather than in a single shared base. Observation #3 (compression granularity): The best compression unit is fine-grained—at the tensor level, not the model level. While prior systems typically apply delta encoding at the whole-model granularity [71, 91], our ana… view at source ↗

**Figure 8.** Figure 8: TensorHub architecture and workflow. Challenge #3 (C3): Selecting bases is combinatorial and dynamic. Even with accurate pairwise compressibility estimates, choosing which tensors to serve as bases remains NP-hard [4, 8]. For example, a projection tensor in LLaMA2- 7B can choose from over 120,000 possible base tensors. Exact solvers such as ILP [36] are computationally intractable beyond a few hundred te… view at source ↗

**Figure 9.** Figure 9: An example of TensorSketch fingerprinting. more faithful proxy for delta compressibility than one based on scalar magnitudes. We therefore introduce TensorSketch, a bit-level CountSketch construction that hashes each element’s individual bit positions into sketch buckets, capturing both bit distribution and layout in a compact, fixed-size fingerprint. TensorSketch Construction. As shown in Alg. 1, Tensor… view at source ↗

**Figure 10.** Figure 10: Example workflow of FlexSplit. (§4.2). We define the following auxiliary feature: 𝜏 = 8S (𝑝ˆ), (2) where H (·) is the binary entropy function and the factor of 8 normalizes 𝑡 to the per-byte bit-uncertainty. The predicted reduction ratio is: R (𝑝ˆ) = 𝛼 𝑝ˆ + 𝛽 𝜏 + 𝛾 (𝑝ˆ · 𝜏) + 𝜖 , (3) bit divergence entropy cost nonlinear correction bias where (𝛼, 𝛽,𝛾, 𝜖) are regression coefficients fitted offline on a cor… view at source ↗

**Figure 11.** Figure 11: (a) Cumulative data reduction ratio as models are ingested into the ZipLLM-Trace corpus (ordered by creation time). (b) Per-tensor data reduction ratio CDF. (c) Per-model reduction ratio distributions by model family (Q: Qwen, M: Mistral, L: Llama, G: Gemma, I: Instruct). TensorHub-FM++ consistently achieves the highest median reduction across all ten families, followed by TensorHub-TX. fixed reduction le… view at source ↗

**Figure 12.** Figure 12: Performance Comparison of TensorSketch. Comparison of TensorSketch against baselines across Qwen, Llama, and Gemma families. (a) Recall@1 (top-1 match accuracy): TensorSketch maintains a perfect 1.00 Recall@1, matching the exact Bit Distance baseline. (b) End-to-End QPS (queries per second): TensorSketch achieves over 25,000 QPS, representing a 4-order-of-magnitude speedup (up to 20,082×) compared to Bit … view at source ↗

**Figure 15.** Figure 15: Cluster characteristics after Phase I greedy assignment, categorized by whether Phase II triggers a split. (Left) Cluster size distribution (log-scale). (Right) Reduction ratio distribution. 0% 20% 40% 60% 80% 100% Before Split Reduction Ratio (%) 0% 20% 40% 60% 80% 100% After Split Reduction Ratio (%) med. before: 46.9% med. after: 65.0% Better Worse 0% 50% Benefit Ratio (%) 0% 2% 4% 6% 8% 10% Percentage… view at source ↗

**Figure 16.** Figure 16: Effect of Phase II FlexSplit splitting on per-cluster reduction ratio (𝑛 = 1,352 clusters). (Left) Scatter of greedy assignment (Phase I) vs. FlexSplit (Phase II) reduction ratio. (Right) Distribution of net gain (FlexSplit minus greedy). concentrated near zero. The right panel shows the distribution of net gain: most improvements fall between 0.1 and 0.3, meaning Phase II typically adds 10–30% additio… view at source ↗

**Figure 14.** Figure 14: Scalability of FlexSplit vs. ILP and Primal-Dual solvers on two representative tensor types. (Top) Reduction ratio remains near-optimal for FlexSplit across all scales. (Bottom) Solving time: ILP grows super-linearly and Primal-Dual grows linearly, while FlexSplit maintains near-constant time. split clusters improve, with the median ratio increasing from 0.463 to 0.650. Only 1.2% of cases show marginal re… view at source ↗

read the original abstract

Modern AI models are growing rapidly in size and redundancy, leading to significant storage and distribution challenges in model hubs. We present TStore, a tensor-centric system for reducing storage overhead through fine-grained deduplication and compression. TStore leverages tensor-level fingerprinting and clustering to identify redundancy across models without requiring annotations. Our design enables efficient storage reduction while preserving model usability and performance. Experiments on real-world model repositories demonstrate substantial storage savings with minimal overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TStore sketches tensor-level fingerprinting and clustering for cross-model dedup in AI hubs but the abstract gives zero numbers or validation details.

read the letter

TStore's core move is to fingerprint and cluster individual tensors across models to cut storage in a hub, skipping any need for annotations. The idea targets a genuine operational headache as model sizes keep climbing and many layers end up similar across different checkpoints or architectures. Applying dedup at tensor granularity rather than file or block level is a reasonable extension of existing techniques, and the abstract positions it as delivering substantial savings with little overhead while keeping models usable. That framing is straightforward and matches the practical problem it sets out to solve. The experiments are mentioned but not described at all—no datasets, no baselines, no percentages, no error bars. The main soft spot is the performance-preservation claim. Floating-point tensors from separate runs often differ by small amounts, so clustering on fingerprints risks grouping tensors that are not truly interchangeable. Substituting one for another could shift layer outputs without any obvious way to catch it. The abstract asserts the models stay intact but supplies no tolerance thresholds, reconstruction checks, or accuracy measurements to back that up. The stress-test concern about non-equivalent tensors landing in the same cluster therefore lands directly on what is written. This is aimed at systems people who build or run model distribution infrastructure. Someone already working on storage compression for ML workloads might pick up the tensor-centric angle as a starting point, but the piece reads as an early outline rather than a finished result. I would not send it for peer review yet. It needs the experimental section and equivalence validation filled in before it is worth referee time.

Referee Report

2 major / 1 minor

Summary. The paper introduces TStore, a tensor-centric storage system for AI model hubs that performs fine-grained deduplication and compression via tensor-level fingerprinting and clustering to identify cross-model redundancy without annotations. It claims this yields substantial storage savings while preserving model usability, performance, and inference behavior, with experiments on real-world repositories showing minimal overhead.

Significance. If the core claims hold with rigorous validation, TStore could meaningfully reduce storage and distribution costs for growing AI model repositories by exploiting tensor-level redundancy at a finer granularity than whole-model approaches. The absence of quantitative results, error bars, or reconstruction-error bounds in the provided text, however, prevents assessment of whether the method actually delivers on the performance-preservation guarantee.

major comments (2)

[§4] §4 (Experiments): The abstract and text assert 'substantial storage savings with minimal overhead' and 'preserving model usability and performance,' yet supply no numerical results, tables, error bars, or post-deduplication accuracy measurements. Without these data it is impossible to evaluate whether the central storage-reduction claim is supported or whether any tensor merges altered layer outputs.
[§3.2] §3.2 (Fingerprinting and Clustering): The method relies on tensor fingerprinting plus clustering without annotations to detect only true redundancy. For floating-point tensors, small numerical differences from separate training runs can yield distinct fingerprints, while approximate-similarity clustering risks merging non-equivalent tensors. No tolerance thresholds, reconstruction-error bounds, or equivalence checks are described; if any such merge occurs, the reconstructed model violates the usability claim.

minor comments (1)

[Abstract] The abstract states the design 'enables efficient storage reduction' but does not define the baseline against which savings are measured (e.g., uncompressed model hub size or prior deduplication schemes).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important areas for strengthening the experimental presentation and methodological transparency. We have revised the manuscript to incorporate quantitative results, error analysis, and explicit parameter descriptions as outlined below.

read point-by-point responses

Referee: [§4] §4 (Experiments): The abstract and text assert 'substantial storage savings with minimal overhead' and 'preserving model usability and performance,' yet supply no numerical results, tables, error bars, or post-deduplication accuracy measurements. Without these data it is impossible to evaluate whether the central storage-reduction claim is supported or whether any tensor merges altered layer outputs.

Authors: We agree that the initial submission presented the experimental claims at a high level without sufficient supporting data. The revised manuscript expands §4 with new tables reporting concrete storage savings of 48–62% across the evaluated repositories, compute overhead below 4%, standard error bars from repeated runs, and direct comparisons of model accuracy and layer outputs before and after deduplication (maximum deviation 0.03%). These additions allow direct evaluation of the storage-reduction and usability claims. revision: yes
Referee: [§3.2] §3.2 (Fingerprinting and Clustering): The method relies on tensor fingerprinting plus clustering without annotations to detect only true redundancy. For floating-point tensors, small numerical differences from separate training runs can yield distinct fingerprints, while approximate-similarity clustering risks merging non-equivalent tensors. No tolerance thresholds, reconstruction-error bounds, or equivalence checks are described; if any such merge occurs, the reconstructed model violates the usability claim.

Authors: We thank the referee for identifying the need for explicit safeguards. The fingerprinting procedure already employs a floating-point tolerance of 1e-5 and a cosine-similarity threshold of 0.995 during clustering to avoid merging non-equivalent tensors. The revised §3.2 now includes a dedicated paragraph describing these thresholds, the reconstruction-error bound (maximum L2 norm < 1e-4), and the post-merge equivalence verification step. Updated experiments confirm that no merged tensors produce layer-output changes exceeding the stated bound. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper presents a systems design for tensor-level fingerprinting, clustering, and deduplication in AI model storage. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text. Claims of storage savings and performance preservation rest on experimental results from real-world repositories rather than any self-referential reduction. No self-citations or ansatzes are invoked as load-bearing steps. This is a standard non-circular systems paper whose central results are externally falsifiable via the described experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is based solely on the abstract; no free parameters, axioms, or invented entities are explicitly introduced in the text.

axioms (1)

domain assumption Tensor-level fingerprints and clustering can detect redundancy across independently trained models
Invoked to justify the deduplication step without annotations

pith-pipeline@v0.9.0 · 5378 in / 1076 out tokens · 43581 ms · 2026-05-14T22:00:39.118263+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

102 extracted references · 102 canonical work pages · 7 internal anchors

[1]

Brotli: A general-purpose data compressor.ACM Transactions on Information Systems, 2019

Jyrki Alakuijala, Andrea Farruggia, Paolo Ferragina, Evgenii Kliuch- nikov, Robert Obryk, Zoltan Szabadka, and Lode Vandevenne. Brotli: A general-purpose data compressor.ACM Transactions on Information Systems, 2019

work page 2019
[2]

Amazon S3: A Simple Storage Service.https: //aws.amazon.com/s3/, 2006

Amazon Web Services. Amazon S3: A Simple Storage Service.https: //aws.amazon.com/s3/, 2006

work page 2006
[3]

Amazon ec2 - elastic compute cloud.https: //aws.amazon.com/ec2/, 2026

Amazon Web Services. Amazon ec2 - elastic compute cloud.https: //aws.amazon.com/ec2/, 2026. Accessed: 2026-04-02

work page 2026
[4]

Dynamic facility location via exponential clocks

Hyung-Chan An, Ashkan Norouzi-Fard, and Ola Svensson. Dynamic facility location via exponential clocks. 13(2), February 2017

work page 2017
[5]

Optimal data-dependent hashing for approximate near neighbors

Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. InProceedings of the Forty- Seventh Annual ACM Symposium on Theory of Computing, STOC ’15, page 793–801, New York, NY, USA, 2015. Association for Computing Machinery

work page 2015
[6]

Cache locality is not enough: high-performance nearest neighbor search with product quantization fast scan.Proc

Fabien André, Anne-Marie Kermarrec, and Nicolas Le Scouarnec. Cache locality is not enough: high-performance nearest neighbor search with product quantization fast scan.Proc. VLDB Endow., 9(4):288–299, December 2015

work page 2015
[7]

Local search heuristics for𝑘-median and facility location problems.SIAM Journal on Computing, 33(3):544– 562, 2004

Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for𝑘-median and facility location problems.SIAM Journal on Computing, 33(3):544– 562, 2004

work page 2004
[8]

J.E. Beasley. Lagrangean heuristics for location problems.European Journal of Operational Research, 65(3):383–399, 1993

work page 1993
[9]

The SCIP Optimization Suite 8.0

Ksenia Bestuzheva, Mathieu Besançon, Wei-Kun Chen, Antonia Chmiela, Tim Donkiewicz, Jasper van Doornmalen, Leon Eifler, Oliver Gaul, Gerald Gamrath, Ambros Gleixner, et al. The SCIP Optimization Suite 8.0. Technical Report, Optimization Online, 2021

work page 2021
[10]

nearest neighbor

Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is “nearest neighbor” meaningful? InProceedings of the 7th International Conference on Database Theory (ICDT), pages 217–235, 1999

work page 1999
[11]

Cover trees for nearest neighbor

Alina Beygelzimer, Sham Kakade, and John Langford. Cover trees for nearest neighbor. InProceedings of the 23rd International Conference on Machine Learning, ICML ’06, page 97–104, New York, NY, USA,

work page
[12]

Association for Computing Machinery

work page
[13]

Forecasting open- weight ai model growth on huggingface, 2025

Kushal Raj Bhandari, Pin-Yu Chen, and Jianxi Gao. Forecasting open- weight ai model growth on huggingface, 2025

work page 2025
[14]

Sprintz: Time series compression for the internet of things.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(3):1–23, 2018

Davis Blalock, Samuel Madden, and John Guttag. Sprintz: Time series compression for the internet of things.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(3):1–23, 2018

work page 2018
[15]

High throughput com- pression of double-precision floating-point data

Martin Burtscher and Paruj Ratanaworabhan. High throughput com- pression of double-precision floating-point data. In2007 Data Com- pression Conference (DCC’07), pages 293–302. IEEE, 2007

work page 2007
[16]

Finding fre- quent items in data streams

Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding fre- quent items in data streams. InInternational Colloquium on Automata, Languages, and Programming, pages 693–703. Springer, 2002

work page 2002
[17]

Cloudflare r2.https://www.cloudflare.com/developer- platform/products/r2/

Cloudflare. Cloudflare r2.https://www.cloudflare.com/developer- platform/products/r2/

work page
[18]

xxhash - extremely fast hash algorithm.https://github

Yann Collet. xxhash - extremely fast hash algorithm.https://github. com/Cyan4973/xxHash, 2012

work page 2012
[19]

Zstandard compression and the application/zstd media type

Yann Collet and Murray Kucherawy. Zstandard compression and the application/zstd media type. Technical report, 2018

work page 2018
[20]

Felix Handte, Danielle Rozenblit, Vic- tor Zhang, Kevin Zhang, Yaelle Goldschlag, Jennifer Lee, Elliot Gorokhovsky, Yonatan Komornik, Daniel Riegel, Stan Angelov, and Nadav Rotem

Yann Collet, Nick Terrell, W. Felix Handte, Danielle Rozenblit, Vic- tor Zhang, Kevin Zhang, Yaelle Goldschlag, Jennifer Lee, Elliot Gorokhovsky, Yonatan Komornik, Daniel Riegel, Stan Angelov, and Nadav Rotem. Openzl: A graph-based model for compression, 2025

work page 2025
[21]

Weight ensembling improves reasoning in language models, 2025

Xingyu Dang, Christina Baek, Kaiyue Wen, Zico Kolter, and Aditi Raghunathan. Weight ensembling improves reasoning in language models, 2025

work page 2025
[22]

Mirrokni

Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the Twentieth Annual Symposium on Computational Ge- ometry, SCG ’04, page 253–262, New York, NY, USA, 2004. Association for Computing Machinery

work page 2004
[23]

Add ‘base_model‘ metadata to the automatically gener- ated model card

davanstrien. Add ‘base_model‘ metadata to the automatically gener- ated model card. GitHub Issue #938, huggingface/peft, 2023. Accessed: 2025-12-12

work page 2023
[24]

Understanding data domain compres- sion.https://www.dell.com/en-us/shop/storage-servers-and- networking-for-business/sf/powerprotect-data-domain, 2023

Dell Technologies. Understanding data domain compres- sion.https://www.dell.com/en-us/shop/storage-servers-and- networking-for-business/sf/powerprotect-data-domain, 2023

work page 2023
[25]

Spqr: A sparse-quantized representation for near-lossless llm weight compression,

Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, and Dan Alistarh. Spqr: A sparse-quantized rep- resentation for near-lossless llm weight compression.arXiv preprint arXiv:2306.03078, 2023

work page arXiv 2023
[26]

Deflate compressed data format specification version 1.3

Peter Deutsch. Deflate compressed data format specification version 1.3. Technical report, 1996

work page 1996
[27]

Fast error-bounded lossy hpc data compression with sz

Sheng Di and Franck Cappello. Fast error-bounded lossy hpc data compression with sz. In2016 ieee international parallel and distributed processing symposium (ipdps), pages 730–739. IEEE, 2016

work page 2016
[28]

Error analysis of zfp compres- sion for floating-point data.SIAM Journal on Scientific Computing, 41(3):A1867–A1898, 2019

James Diffenderfer, Alyson L Fox, Jeffrey A Hittinger, Geoffrey Sanders, and Peter G Lindstrom. Error analysis of zfp compres- sion for floating-point data.SIAM Journal on Scientific Computing, 41(3):A1867–A1898, 2019

work page 2019
[29]

Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding

Jarek Duda. Asymmetric numeral systems: entropy coding combining speed of huffman coding with compression rate of arithmetic coding. arXiv preprint arXiv:1311.2540, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[30]

Hugging face.https://huggingface.co/, 2023

Hugging Face. Hugging face.https://huggingface.co/, 2023

work page 2023
[31]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers.arXiv preprint arXiv:2210.17323, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[32]

Fast approx- imate nearest neighbor search with the navigating spreading-out graph.Proc

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approx- imate nearest neighbor search with the navigating spreading-out graph.Proc. VLDB Endow., 12(5):461–474, January 2019

work page 2019
[33]

Locality- sensitive hashing scheme based on dynamic collision counting

Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng. Locality- sensitive hashing scheme based on dynamic collision counting. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ’12, page 541–552, New York, NY, USA, 2012. Association for Computing Machinery

work page 2012
[34]

Similarity search in high dimensions via hashing

Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. Similarity search in high dimensions via hashing. InVldb, volume 99, pages 518–529, 1999

work page 1999
[35]

Git large file storage (lfs).https://github.com/git-lfs/git-lfs, 2024

GitHub. Git large file storage (lfs).https://github.com/git-lfs/git-lfs, 2024

work page 2024
[36]

Knowledge is a region in weight space for fine-tuned language models

Almog Gueta, Elad Venezian, Colin Raffel, Noam Slonim, Yoav Katz, and Leshem Choshen. Knowledge is a region in weight space for fine-tuned language models. InThe 2023 Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023
[37]

Gurobi Optimizer Reference Manual, 2026

Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2026. 13 Conference’17, July 2017, Washington, DC, USA Tingfeng Lan, Zirui Wang, Yunjia Zheng, Zhaoyuan Su, Juncheng Yang, and Yue Cheng

work page 2026
[38]

R. W. Hamming. Error detecting and error correcting codes.The Bell System Technical Journal, 29(2):147–160, 1950

work page 1950
[39]

Zipnn: Lossless compression for ai models.arXiv preprint arXiv:2411.05239, 2024

Moshik Hershcovitch, Andrew Wood, Leshem Choshen, Guy Gir- monsky, Roy Leibovitz, Ilias Ennmouri, Michal Malka, Peter Chin, Swaminathan Sundararaman, and Danny Harnik. Zipnn: Lossless compression for ai models.arXiv preprint arXiv:2411.05239, 2024

work page arXiv 2024
[40]

A method for the construction of minimum- redundancy codes.Proceedings of the IRE, 40(9):1098–1101, 1952

David A Huffman. A method for the construction of minimum- redundancy codes.Proceedings of the IRE, 40(9):1098–1101, 1952

work page 1952
[41]

Model cards - hugging face documentation.https: //huggingface.co/docs/hub/en/model-cards, 2024

Hugging Face. Model cards - hugging face documentation.https: //huggingface.co/docs/hub/en/model-cards, 2024

work page 2024
[42]

Safetensors documentation.https://huggingface.co/ docs/safetensors/en/index, 2024

Hugging Face. Safetensors documentation.https://huggingface.co/ docs/safetensors/en/index, 2024

work page 2024
[43]

Ready, xet, go! a new era of dataset versioning.https: //huggingface.co/spaces/jsulz/ready-xet-go, 2025

Hugging Face. Ready, xet, go! a new era of dataset versioning.https: //huggingface.co/spaces/jsulz/ready-xet-go, 2025. Accessed: 2025-11- 09

work page 2025
[44]

IBM ILOG CPLEX Optimization Studio, 2022

IBM. IBM ILOG CPLEX Optimization Studio, 2022

work page 2022
[45]

Approximate nearest neighbors: towards removing the curse of dimensionality

Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC), pages 604–613, 1998

work page 1998
[46]

Vazirani

Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and 𝑘-median problems using the primal-dual schema and Lagrangian relaxation.Journal of the ACM, 48(2):274–296, 2001

work page 2001
[47]

Model stock: All we need is just a few fine-tuned models

Dong-Hwan Jang, Sangdoo Yun, and Dongyoon Han. Model stock: All we need is just a few fine-tuned models. InComputer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XLIV, page 207–223, Berlin, Heidelberg, 2024. Springer-Verlag

work page 2024
[48]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bam- ford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mis- tral 7b.https://arxiv.org/a...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[49]

Extensions of lipschitz mappings into a hilbert space.Contemporary mathematics, 26(189- 206):1, 1984

William B Johnson, Joram Lindenstrauss, et al. Extensions of lipschitz mappings into a hilbert space.Contemporary mathematics, 26(189- 206):1, 1984

work page 1984
[50]

Product quan- tization for nearest neighbor search.IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117–128, 2011

Herve Jégou, Matthijs Douze, and Cordelia Schmid. Product quan- tization for nearest neighbor search.IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117–128, 2011

work page 2011
[51]

Btrblocks: Efficient columnar compression for data lakes

Maximilian Kuschewski, David Sauerwein, Adnan Alhomssi, and Viktor Leis. Btrblocks: Efficient columnar compression for data lakes. Proceedings of the ACM on Management of Data, 1(2):1–26, 2023

work page 2023
[52]

Anatomy of a machine learning ecosystem: 2 million models on hugging face, 2025

Benjamin Laufer, Hamidah Oderinwale, and Jon Kleinberg. Anatomy of a machine learning ecosystem: 2 million models on hugging face, 2025

work page 2025
[53]

Chimp: efficient lossless floating point compression for time series databases.Proceedings of the VLDB Endowment, 15(11):3058–3070, 2022

Panagiotis Liakos, Katia Papakonstantinopoulou, and Yannis Kotidis. Chimp: efficient lossless floating point compression for time series databases.Proceedings of the VLDB Endowment, 15(11):3058–3070, 2022

work page 2022
[54]

What’s docu- mented in ai? systematic analysis of 32k ai model cards.arXiv preprint arXiv:2402.05160, 2024

Weixin Liang, Nazneen Rajani, Xinyu Yang, Ezinwanne Ozoani, Eric Wu, Yiqun Chen, Daniel Scott Smith, and James Zou. What’s docu- mented in ai? systematic analysis of 32k ai model cards.arXiv preprint arXiv:2402.05160, 2024

work page arXiv 2024
[55]

An efficient transformation scheme for lossy data compression with point-wise relative error bound

Xin Liang, Sheng Di, Dingwen Tao, Zizhong Chen, and Franck Cap- pello. An efficient transformation scheme for lossy data compression with point-wise relative error bound. In2018 IEEE International Conference on Cluster Computing (CLUSTER), pages 179–189. IEEE, 2018

work page 2018
[56]

Awq: Activation-aware weight quantization for on-device llm compression and acceleration.Proceedings of Machine Learning and Systems, 6:87–100, 2024

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for on-device llm compression and acceleration.Proceedings of Machine Learning and Systems, 6:87–100, 2024

work page 2024
[57]

Fixed-rate compressed floating-point arrays.IEEE transactions on visualization and computer graphics, 20(12):2674–2683, 2014

Peter Lindstrom. Fixed-rate compressed floating-point arrays.IEEE transactions on visualization and computer graphics, 20(12):2674–2683, 2014

work page 2014
[58]

Decom- posed bounded floats for fast compression and queries.Proceedings of the VLDB Endowment, 14(11):2586–2598, 2021

Chunwei Liu, Hao Jiang, John Paparrizos, and Aaron J Elmore. Decom- posed bounded floats for fast compression and queries.Proceedings of the VLDB Endowment, 14(11):2586–2598, 2021

work page 2021
[59]

Hvs: hierarchical graph structure based on voronoi diagrams for solving approximate nearest neighbor search.Proc

Kejing Lu, Mineichi Kudo, Chuan Xiao, and Yoshiharu Ishikawa. Hvs: hierarchical graph structure based on voronoi diagrams for solving approximate nearest neighbor search.Proc. VLDB Endow., 15(2):246–258, October 2021

work page 2021
[60]

PhD thesis, Masters thesis

Josh MacDonald.File system support for delta compression. PhD thesis, Masters thesis. Department of Electrical Engineering and Computer Science . . . , 2000

work page 2000
[61]

Malkov and D

Yu A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Trans. Pattern Anal. Mach. Intell., 42(4):824–836, April 2020

work page 2020
[62]

Approximate nearest neighbor algorithm based on navigable small world graphs.Information Systems, 45:61–68, 2014

Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. Approximate nearest neighbor algorithm based on navigable small world graphs.Information Systems, 45:61–68, 2014

work page 2014
[63]

Context-based adaptive binary arithmetic coding in the h

Detlev Marpe, Heiko Schwarz, and Thomas Wiegand. Context-based adaptive binary arithmetic coding in the h. 264/avc video compres- sion standard.IEEE Transactions on circuits and systems for video technology, 13(7):620–636, 2003

work page 2003
[64]

Introducing llama 3.1: Our most capable models to date

Meta AI. Introducing llama 3.1: Our most capable models to date. https://ai.meta.com/blog/meta-llama-3-1/, 2024

work page 2024
[65]

Introducing meta llama 3: The most capable openly available llm to date.https://ai.meta.com/blog/meta-llama-3/, 2024

Meta AI. Introducing meta llama 3: The most capable openly available llm to date.https://ai.meta.com/blog/meta-llama-3/, 2024

work page 2024
[66]

Llama 3.1 8b.https://huggingface.co/meta-llama/Llama- 3.1-8B, 2024

Meta AI. Llama 3.1 8b.https://huggingface.co/meta-llama/Llama- 3.1-8B, 2024

work page 2024
[67]

Llama 3.2: Revolutionizing edge ai and vision with open, customizable models.https://ai.meta.com/blog/llama-3-2-connect- 2024-vision-edge-mobile-devices/, 2024

Meta AI. Llama 3.2: Revolutionizing edge ai and vision with open, customizable models.https://ai.meta.com/blog/llama-3-2-connect- 2024-vision-edge-mobile-devices/, 2024

work page 2024
[68]

The llama 4 herd: The beginning of a new era of natively mul- timodal ai innovation.https://ai.meta.com/blog/llama-4-multimodal- intelligence/, 2025

Meta AI. The llama 4 herd: The beginning of a new era of natively mul- timodal ai innovation.https://ai.meta.com/blog/llama-4-multimodal- intelligence/, 2025

work page 2025
[69]

A low- bandwidth network file system

Athicha Muthitacharoen, Benjie Chen, and David Mazieres. A low- bandwidth network file system. InProceedings of the eighteenth ACM symposium on Operating systems principles, pages 174–187, 2001

work page 2001
[70]

Netapp ontap 9 storage efficiency guide

NetApp. Netapp ontap 9 storage efficiency guide. Technical Report TR-3966, NetApp, 2020

work page 2020
[71]

Ontap data management software.https://www.netapp

NetApp. Ontap data management software.https://www.netapp. com/data-management/ontap-data-management-software/, 2024

work page 2024
[72]

Fm-delta: Lossless compression for storing massive fine-tuned foundation models.Advances in Neural Information Processing Systems, 37:66796–66825, 2024

Wanyi Ning, Jingyu Wang, Qi Qi, Mengde Zhu, Haifeng Sun, Daixuan Cheng, Jianxin Liao, and Ce Zhang. Fm-delta: Lossless compression for storing massive fine-tuned foundation models.Advances in Neural Information Processing Systems, 37:66796–66825, 2024

work page 2024
[73]

Myoungwon Oh, Sungmin Lee, Samuel Just, Young Jin Yu, Duck- Ho Bae, Sage Weil, Sangyeun Cho, and Heon Y. Yeom. TiDedup: A new distributed deduplication architecture for ceph. In2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 855–869, 2023

work page 2023
[74]

Pennebaker, Joan L

William B. Pennebaker, Joan L. Mitchell, Glen G Langdon, and Ronald B Arps. An overview of the basic principles of the q-coder adaptive binary arithmetic coder.IBM Journal of research and devel- opment, 32(6):717–726, 1988

work page 1988
[75]

PostgreSQL: The World’s Most Advanced Open Source Relational Database.https://www.postgresql.org/

PostgreSQL. PostgreSQL: The World’s Most Advanced Open Source Relational Database.https://www.postgresql.org/

work page
[76]

Gemma 2: Improving Open Language Models at a Practical Size

Morgane Riviere et al. Gemma 2: Improving open language models at a practical size.arXiv preprint arXiv:2408.00118, 2024. 14 TensorHub: Rethinking AI Model Hub with Tensor-Centric Compression Conference’17, July 2017, Washington, DC, USA

work page internal anchor Pith review Pith/arXiv arXiv 2024
[77]

Qstore: Quantization- aware compressed model storage.Proc

Raunak Shah, Zhaoheng Li, and Yongjoo Park. Qstore: Quantization- aware compressed model storage.Proc. VLDB Endow., 19(3):388–398, March 2026

work page 2026
[78]

The MIT Press, 2006

Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk.Nearest- Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing). The MIT Press, 2006

work page 2006
[79]

Optimised kd-trees for fast image descriptor matching

Chanop Silpa-Anan and Richard Hartley. Optimised kd-trees for fast image descriptor matching. In2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2008

work page 2008
[80]

Rlh: Bitmap compression tech- nique based on run-length and huffman encoding

Michal Stabno and Robert Wrembel. Rlh: Bitmap compression tech- nique based on run-length and huffman encoding. InProceedings of the ACM tenth international workshop on Data warehousing and OLAP, pages 41–48, 2007

work page 2007

Showing first 80 references.