pith. sign in

arxiv: 2607.01686 · v1 · pith:U6TUWQUMnew · submitted 2026-07-02 · 💻 cs.LG

WARP: Weight-Space Analysis for Recovering Training Data Portfolios

Pith reviewed 2026-07-03 17:52 UTC · model grok-4.3

classification 💻 cs.LG
keywords trainingwarpdatadomainmixturesmodelsthemaccess
0
0 comments X

The pith

WARP recovers domain mixture weights of fine-tuned models from their released weights alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents WARP as a way to recover the proportions of different data domains used to train a fine-tuned model, using only the model's final weights and a base model. This addresses the problem that foundation models are often released without disclosing their training data recipes, limiting what researchers can learn about how the models were shaped. WARP generates simulated training checkpoints by merging the base and fine-tuned models in different proportions, then pulls out geometric features from these points in weight space to estimate the domain proportions. Experiments show it can recover the mixtures with mean absolute errors as low as 0.046 on BERT and 0.104 on GPT-2, beating other methods.

Core claim

WARP recovers a fine-tuned model's training mixtures directly from its released weights. It does so by interpolating between the base and fine-tuned models using model merging, generating pseudo-checkpoints that approximate the missing training trajectory and expose a geometric footprint of the training data in the weight space. From these simulated footprints, WARP extracts geometric features and maps them to domain proportions using either a parameter-free softmax readout or an MLP projector trained on synthetic mixtures.

What carries the argument

Model merging to create pseudo-checkpoints that simulate the training trajectory and reveal geometric features in weight space encoding domain mixtures.

Load-bearing premise

Interpolating base and fine-tuned models with merging produces pseudo-checkpoints that closely approximate the actual training path and carry a detectable geometric signature of the data domains.

What would settle it

Measuring the correlation between the extracted geometric features from merged models and the known domain proportions; if the correlation is near zero, the central claim would fail.

Figures

Figures reproduced from arXiv: 2607.01686 by Aditya Goyal, Frederic Sala, John Cooper, Tzu-Heng Huang.

Figure 1
Figure 1. Figure 1: Two directions of the domain mixture problem. Left: the well-studied forward problem—given a corpus, search for an optimized mixture π ⋆ over domains (e.g., web, math, code, agentic data) and train a model θ ⋆ under it, whose final weights are then publicly released. Right: the inverse problem we study—given only the released endpoints (θ ⋆ base,θ⋆ ref) and a small probing dataset, with the corpus and true… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of WARP. (1) Simulate the Training Trajectory: since the true fine-tuning path is unobserved, we first approximate it with a sequence of pseudo-checkpoints obtained via model merging. (2) Distill Geometric Footprint: at each pseudo-checkpoint, we compute the Mimic Score by projecting the gradient of probing samples from each domain onto the direction pointing to reference model. This measures how … view at source ↗
read the original abstract

Foundation models are routinely released to the public, yet the data recipes used to train them -- such as domain mixture weights that determine how different sources are sampled -- are rarely disclosed. This creates an access asymmetry: researchers study the resulting models but lack visibility into the training distribution that produces them. Prior works for inferring training data, such as membership inference, detect at the level of individual samples and thus cannot characterize the global composition of the training corpus. We introduce WARP, a framework that recovers a fine-tuned model's training mixtures directly from its released weights. WARP interpolates between the base and fine-tuned models using model merging, generating pseudo-checkpoints that approximate the missing training trajectory and expose a geometric footprint of the training data in the weight space. From these simulated footprints, WARP extracts geometric features and maps them to domain proportions using either a parameter-free softmax readout or an MLP projector trained on synthetic mixtures. In controlled experiments with BERT and GPT-2, WARP recovers domain mixtures with an average MAE as low as 0.046 and 0.104 respectively, outperforming membership inference and a variant with access to the true training trajectory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces WARP, a framework for recovering the domain mixture proportions used to train fine-tuned foundation models directly from their weights. The approach generates pseudo-checkpoints by linearly interpolating between a base model and the fine-tuned model via model merging, extracts geometric features from the weight space along this path, and then uses either a parameter-free softmax or an MLP projector (trained on synthetic mixtures) to map these features to the domain proportions. Controlled experiments on BERT and GPT-2 report average mean absolute errors (MAE) of 0.046 and 0.104 respectively, claiming to outperform both membership inference attacks and a variant that has access to the true training trajectory.

Significance. If the central claims hold, WARP would offer a practical method to infer training data portfolios from publicly released model weights, which has significant implications for transparency and reproducibility in foundation model research. The idea of using model merging to simulate training trajectories is creative, and the reported outperformance, including over a true-trajectory baseline, is noteworthy. The parameter-free readout option is particularly appealing as it avoids additional training dependencies.

major comments (3)
  1. [Abstract and §3] Abstract and §3: The core assumption that linear interpolation between base and fine-tuned models approximates the nonlinear SGD training trajectory and thereby exposes a geometric footprint determined by domain proportions (rather than by the interpolation operator) is load-bearing but lacks direct validation. The manuscript should include comparisons between features extracted from the linear path and those from actual intermediate checkpoints during fine-tuning to rule out artifacts.
  2. [Abstract] Abstract: The claim of outperformance over 'a variant with access to the true training trajectory' is surprising and requires clarification on how the true trajectory variant is implemented and why the pseudo-checkpoint approach yields lower MAE; this could indicate either a strength of the method or an issue in the baseline construction.
  3. [Abstract] Abstract: The MLP projector is trained on synthetic mixtures; if the synthetic generation process does not match the statistical properties of real domain mixtures, the mapping may reduce to a fitted quantity by construction rather than recovering an independent geometric signal (this is especially relevant given the reported outperformance over the true-trajectory baseline).
minor comments (1)
  1. The abstract would benefit from a one-sentence definition of the specific geometric features (e.g., which layer-wise statistics or directions are extracted) to improve readability before the full methods section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the positive assessment of WARP's significance and for the constructive major comments. We address each point below with clarifications and have revised the manuscript accordingly to strengthen the presentation and validation of our claims.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3: The core assumption that linear interpolation between base and fine-tuned models approximates the nonlinear SGD training trajectory and thereby exposes a geometric footprint determined by domain proportions (rather than by the interpolation operator) is load-bearing but lacks direct validation. The manuscript should include comparisons between features extracted from the linear path and those from actual intermediate checkpoints during fine-tuning to rule out artifacts.

    Authors: We agree that direct validation against actual training checkpoints would provide stronger evidence for the core assumption. In the controlled experiments, we trained the BERT and GPT-2 models ourselves and saved intermediate checkpoints, allowing us to perform such comparisons. We have added a new subsection in §3 (and corresponding results in the experiments) that extracts geometric features from both the linear interpolation path and the true training trajectory checkpoints, showing that the domain-proportion signal is preserved and not an artifact of the interpolation operator. This revision directly addresses the concern. revision: yes

  2. Referee: [Abstract] Abstract: The claim of outperformance over 'a variant with access to the true training trajectory' is surprising and requires clarification on how the true trajectory variant is implemented and why the pseudo-checkpoint approach yields lower MAE; this could indicate either a strength of the method or an issue in the baseline construction.

    Authors: We thank the referee for highlighting the need for clarification. The true-trajectory variant is implemented by extracting the identical geometric features from the actual saved intermediate checkpoints along the real SGD fine-tuning path (rather than the linear merge path) and feeding them into the same readout (parameter-free softmax or MLP). We have expanded the abstract and added a detailed paragraph in §4.2 explaining the implementation. The lower MAE for the pseudo-checkpoint approach appears to stem from the linear path providing a smoother, more consistent trajectory that reduces the impact of SGD noise and optimizer-specific artifacts present in the real checkpoints; we include additional analysis plots in the revision to illustrate this difference. revision: yes

  3. Referee: [Abstract] Abstract: The MLP projector is trained on synthetic mixtures; if the synthetic generation process does not match the statistical properties of real domain mixtures, the mapping may reduce to a fitted quantity by construction rather than recovering an independent geometric signal (this is especially relevant given the reported outperformance over the true-trajectory baseline).

    Authors: We acknowledge the importance of ensuring the synthetic mixtures reflect real statistical properties. The synthetic data are generated by fine-tuning on the exact same domain corpora and mixture proportions as the real experiments (using the same data loaders and preprocessing), with the MLP trained only on synthetic mixtures and evaluated on held-out real mixtures. We have added a new appendix section detailing the synthetic generation procedure, including statistics confirming distributional match, and additional ablation results showing that the geometric signal remains predictive even when the MLP is replaced by the parameter-free softmax. This supports that the mapping recovers an independent signal rather than fitting by construction. revision: yes

Circularity Check

0 steps flagged

No significant circularity: recovery derives from independent interpolation and mapping steps.

full rationale

The derivation chain begins with linear interpolation via model merging to create pseudo-checkpoints, followed by extraction of geometric features (layer-wise norms/directions) from those weights, then mapping to domain proportions via either a fixed softmax readout or an MLP trained on separately generated synthetic mixtures with known labels. Neither the interpolation operator nor the feature-to-proportion mapping reduces the output proportions to the input weights by algebraic construction or self-referential fitting; the MLP is a standard supervised model on external synthetic data, and the parameter-free option requires no training. No self-citations, uniqueness theorems, or ansatzes from prior author work are invoked as load-bearing. The controlled experiments measure MAE against held-out known mixtures, confirming the pipeline has independent content rather than tautological equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are described in sufficient detail to populate the ledger.

pith-pipeline@v0.9.1-grok · 5738 in / 1122 out tokens · 27547 ms · 2026-07-03T17:52:22.717049+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 20 canonical work pages · 5 internal anchors

  1. [1]

    Data Mixing for Large Language Models Pretraining: A Survey and Outlook

    Chen, Z.; Miao, Y .; Xiong, D.; others Data Mixing for Large Language Models Pretraining: A Survey and Outlook. arXiv preprint arXiv:2604.163802026,

  2. [2]

    M.; Pham, H.; Dong, X.; Du, N.; Liu, H.; Lu, Y .; Liang, P

    Xie, S. M.; Pham, H.; Dong, X.; Du, N.; Liu, H.; Lu, Y .; Liang, P. S.; Le, Q. V .; Ma, T.; Yu, A. W. Doremi: Optimizing data mixtures speeds up language model pretraining.Advances in Neural Information Processing Systems2023,36, 69798–69818

  3. [3]

    Ge, A.; Huang, T.-H.; Cooper, J.; Trost, A.; Chu, Z.; GNVV , S. S. S. N.; Cai, Z.; Park, K.; Roberts, N.; Sala, F. R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training.arXiv preprint arXiv:2505.003582025,

  4. [4]

    F.; Hu, M

    Chen, M. F.; Hu, M. Y .; Lourie, N.; Cho, K.; Ré, C. Aioli: A unified optimization framework for language model data mixing.arXiv preprint arXiv:2411.057352024,

  5. [5]

    Doge: Domain reweighting with generalization estimation.arXiv preprint arXiv:2310.153932023,

    Fan, S.; Pagliardini, M.; Jaggi, M. Doge: Domain reweighting with generalization estimation.arXiv preprint arXiv:2310.153932023,

  6. [6]

    Time Travel in LLMs: Tracing Data Contamination in Large Language Models.CoRR2023, abs/2308.08493

    Golchin, S.; Surdeanu, M. Time Travel in LLMs: Tracing Data Contamination in Large Language Models.CoRR2023, abs/2308.08493. 8

  7. [7]

    Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models.CoRR2023,abs/2311.06233

    Golchin, S.; Surdeanu, M. Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models.CoRR2023,abs/2311.06233

  8. [8]

    Data contamination: From memorization to exploitation

    Magar, I.; Schwartz, R. Data contamination: From memorization to exploitation. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (V olume 2: Short Papers). 2022; pp 157–165

  9. [9]

    P.; Galley, M.; Caruana, R.; Gao, J

    Singh, C.; Inala, J. P.; Galley, M.; Caruana, R.; Gao, J. Rethinking interpretability in the era of large language models. arXiv preprint arXiv:2402.017612024,

  10. [10]

    MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

    Wen, B.; Salekin, S.; Kang, F.; Howe, B.; Wang, L. L.; Movellan, J.; Bilkhu, M. MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining.arXiv preprint arXiv:2604.141982026,

  11. [11]

    Regmix: Data mixture as regression for language model pre-training

    Liu, Q.; Zheng, X.; Muennighoff, N.; Zeng, G.; Dou, L.; Pang, T.; Jiang, J.; Lin, M. Regmix: Data mixture as regression for language model pre-training. International Conference on Learning Representations. 2025; pp 38305–38339

  12. [12]

    Autoscale: Scale-aware data mixing for pre-training llms.arXiv preprint arXiv:2407.201772024,

    Kang, F.; Sun, Y .; Wen, B.; Chen, S.; Song, D.; Mahmood, R.; Jia, R. Autoscale: Scale-aware data mixing for pre-training llms.arXiv preprint arXiv:2407.201772024,

  13. [13]

    Scaling laws for optimal data mixtures.Advances in Neural Information Processing Systems2026,38, 129554–129579

    Shukor, M.; Bethune, L.; Busbridge, D.; Grangier, D.; Fini, E.; El-Nouby, A.; Ablin, P. Scaling laws for optimal data mixtures.Advances in Neural Information Processing Systems2026,38, 129554–129579

  14. [14]

    Membership inference attacks against fine-tuned large language models via self-prompt calibration.Advances in Neural Information Processing Systems2024,37, 134981–135010

    Fu, W.; Wang, H.; Gao, C.; Liu, G.; Li, Y .; Jiang, T. Membership inference attacks against fine-tuned large language models via self-prompt calibration.Advances in Neural Information Processing Systems2024,37, 134981–135010

  15. [15]

    Do membership inference attacks work on large language models?arXiv preprint arXiv:2402.078412024,

    Duan, M.; Suri, A.; Mireshghallah, N.; Min, S.; Shi, W.; Zettlemoyer, L.; Tsvetkov, Y .; Choi, Y .; Evans, D.; Hajishirzi, H. Do membership inference attacks work on large language models?arXiv preprint arXiv:2402.078412024,

  16. [16]

    Membership inference attacks against language models via neighbourhood comparison

    Mattern, J.; Mireshghallah, F.; Jin, Z.; Schölkopf, B.; Sachan, M.; Berg-Kirkpatrick, T. Membership inference attacks against language models via neighbourhood comparison. Findings of the Association for Computational Linguistics: ACL 2023. 2023; pp 11330–11343

  17. [17]

    Membership Inference Attacks against Machine Learning Models

    Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V . Membership Inference Attacks against Machine Learning Models. 2017;https://arxiv.org/abs/1610.05820

  18. [18]

    X.; Yin, J

    Morris, J. X.; Yin, J. O.; Kim, W.; Shmatikov, V .; Rush, A. M. Approximating Language Model Training Data from Weights.arXiv preprint arXiv:2506.155532025,

  19. [19]

    A.; Zhu, J.-Y

    Cazenavette, G.; Wang, T.; Torralba, A.; Efros, A. A.; Zhu, J.-Y . Dataset distillation by matching training trajectories. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022; pp 4750–4759

  20. [20]

    Evaluating Sample Utility for Efficient Data Selection by Mimicking Model Weights

    Huang, T.-H.; Bilkhu, M.; Cooper, J.; Sala, F.; Movellan, J. Evaluating Sample Utility for Efficient Data Selection by Mimicking Model Weights.arXiv preprint arXiv:2501.067082025,

  21. [21]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 2019; pp 4171–4186

  22. [22]

    Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I.; others Language models are unsupervised multitask learners.OpenAI blog2019,1, 9

  23. [23]

    F.; Murray, T.; Heineman, D.; Jordan, M.; Hajishirzi, H.; Ré, C.; Soldaini, L.; Lo, K

    Chen, M. F.; Murray, T.; Heineman, D.; Jordan, M.; Hajishirzi, H.; Ré, C.; Soldaini, L.; Lo, K. Olmix: A framework for data mixing throughout lm development.arXiv preprint arXiv:2602.122372026, 9

  24. [24]

    Albalak, A.; Pan, L.; Raffel, C.; Wang, W. Y . Efficient online data mixing for language model pre-training.arXiv preprint arXiv:2312.024062023,

  25. [25]

    LLM Dataset Inference: Did you train on my dataset? 2024

    Maini, P.; Jia, H.; Papernot, N.; Dziedzic, A. LLM Dataset Inference: Did you train on my dataset? 2024

  26. [26]

    Model merging in llms, mllms, and beyond: Methods, theories, applications, and opportunities.ACM Computing Surveys2026,58, 1–41

    Yang, E.; Shen, L.; Guo, G.; Wang, X.; Cao, X.; Zhang, J.; Tao, D. Model merging in llms, mllms, and beyond: Methods, theories, applications, and opportunities.ACM Computing Surveys2026,58, 1–41

  27. [27]

    Arcee’s MergeKit: A Toolkit for Merging Large Language Models

    Goddard, C.; Siriwardhana, S.; Ehghaghi, M.; Meyers, L.; Karpukhin, V .; Benedict, B.; McQuade, M.; Solawetz, J. Arcee’s MergeKit: A Toolkit for Merging Large Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track. Miami, Florida, US, 2024; pp 477–485

  28. [28]

    Editing Models with Task Arithmetic

    Ilharco, G.; Ribeiro, M. T.; Wortsman, M.; Gururangan, S.; Schmidt, L.; Hajishirzi, H.; Farhadi, A. Editing models with task arithmetic.arXiv preprint arXiv:2212.040892022,

  29. [29]

    K.; Hayase, J.; Srinivasa, S

    Ainsworth, S. K.; Hayase, J.; Srinivasa, S. Git re-basin: Merging models modulo permutation symmetries.arXiv preprint arXiv:2209.048362022,

  30. [30]

    Repair: Renormalizing permuted activations for interpolation repair.arXiv preprint arXiv:2211.084032022,

    Jordan, K.; Sedghi, H.; Saukh, O.; Entezari, R.; Neyshabur, B. Repair: Renormalizing permuted activations for interpolation repair.arXiv preprint arXiv:2211.084032022,

  31. [31]

    A.; Bansal, M

    Yadav, P.; Tam, D.; Choshen, L.; Raffel, C. A.; Bansal, M. Ties-merging: Resolving interference when merging models. Advances in neural information processing systems2023,36, 7093–7115

  32. [32]

    Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

    Yu, L.; Yu, B.; Yu, H.; Huang, F.; Li, Y . Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch. International Conference on Machine Learning. 2024

  33. [33]

    Localize-and-stitch: Efficient model merging via sparse task arithmetic

    He, Y .; Hu, Y .; Lin, Y .; Zhang, T.; Zhao, H. Localize-and-stitch: Efficient model merging via sparse task arithmetic. arXiv preprint arXiv:2408.136562024,

  34. [34]

    Y .; Roelofs, R.; Gontijo-Lopes, R.; Morcos, A

    Wortsman, M.; Ilharco, G.; Gadre, S. Y .; Roelofs, R.; Gontijo-Lopes, R.; Morcos, A. S.; Namkoong, H.; Farhadi, A.; Carmon, Y .; Kornblith, S.; others Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. International conference on machine learning. 2022; pp 23965–23998

  35. [35]

    Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications, and Opportunities.ACM Comput

    Yang, E.; Shen, L.; Guo, G.; Wang, X.; Cao, X.; Zhang, J.; Tao, D. Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications, and Opportunities.ACM Comput. Surv.2026,58

  36. [36]

    Young, P.; Lai, A.; Hodosh, M.; Hockenmaier, J. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions.Transactions of the Association for Computational Linguistics2014,2, 67–78

  37. [37]

    Character-level convolutional networks for text classification.Advances in neural information processing systems2015,28

    Zhang, X.; Zhao, J.; LeCun, Y . Character-level convolutional networks for text classification.Advances in neural information processing systems2015,28. 10 Appendix Roadmap Our appendix is structured as follows. We use Appendix A to summarize our full framework in pseudocode. Appendix B documents the experimental setup and training configurations. Finally...