pith. machine review for the scientific record. sign in

arxiv: 2305.15717 · v1 · submitted 2023-05-25 · 💻 cs.CL

The False Promise of Imitating Proprietary LLMs

Pith reviewed 2026-05-18 06:49 UTC · model grok-4.3

classification 💻 cs.CL
keywords language model imitationfinetuningChatGPTopen-source LLMscapabilities gapinstruction followingfactuality
0
0 comments X

The pith

Finetuning open models on proprietary LLM outputs like ChatGPT fails to close the capabilities gap.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether finetuning weaker open language models on outputs from systems like ChatGPT can cheaply close the performance difference. Models ranging from 1.5B to 13B parameters were trained on imitation datasets from 0.3M to 150M tokens. Human crowd raters judged the resulting outputs as competitive in instruction following, yet targeted automatic benchmarks revealed almost no reduction in the gap to ChatGPT on tasks outside the imitation data distribution. The authors conclude that imitation primarily reproduces surface style rather than underlying capabilities, making it an ineffective route compared with building stronger base models.

Core claim

Model imitation is a false promise: there exists a substantial capabilities gap between open and closed LMs that, with current methods, can only be bridged using an unwieldy amount of imitation data or by using more capable base LMs.

What carries the argument

Finetuning runs that generate imitation models from base sizes 1.5B–13B on varying volumes of ChatGPT outputs, followed by side-by-side comparison of crowd ratings against automatic evaluations on held-out NLP tasks.

Load-bearing premise

That the targeted automatic evaluations on tasks not heavily supported in the imitation data accurately capture the meaningful capabilities gap, rather than reflecting only the distribution of the collected imitation data itself.

What would settle it

An imitation model trained on a moderate data volume that matches ChatGPT accuracy on a task whose required skills are absent from the imitation set would refute the claim of a persistent gap.

read the original abstract

An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model, such as a proprietary system like ChatGPT (e.g., Alpaca, Self-Instruct, and others). This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model. In this work, we critically analyze this approach. We first finetune a series of LMs that imitate ChatGPT using varying base model sizes (1.5B--13B), data sources, and imitation data amounts (0.3M--150M tokens). We then evaluate the models using crowd raters and canonical NLP benchmarks. Initially, we were surprised by the output quality of our imitation models -- they appear far better at following instructions, and crowd workers rate their outputs as competitive with ChatGPT. However, when conducting more targeted automatic evaluations, we find that imitation models close little to none of the gap from the base LM to ChatGPT on tasks that are not heavily supported in the imitation data. We show that these performance discrepancies may slip past human raters because imitation models are adept at mimicking ChatGPT's style but not its factuality. Overall, we conclude that model imitation is a false promise: there exists a substantial capabilities gap between open and closed LMs that, with current methods, can only be bridged using an unwieldy amount of imitation data or by using more capable base LMs. In turn, we argue that the highest leverage action for improving open-source models is to tackle the difficult challenge of developing better base LMs, rather than taking the shortcut of imitating proprietary systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that finetuning open-source LMs (1.5B–13B) on ChatGPT outputs using 0.3M–150M tokens of imitation data produces models that human raters find competitive with ChatGPT on instruction following, yet targeted automatic evaluations show these imitation models close little to none of the capabilities gap to ChatGPT on tasks not heavily supported in the imitation data. The authors attribute the human-automatic discrepancy to stylistic mimicry without corresponding gains in factuality, concluding that imitation is a false promise that can only be overcome with impractically large data volumes or stronger base models, and that open-source progress should instead prioritize better base LMs.

Significance. If the central empirical findings hold, the work provides a timely cautionary result for the open-source LLM community by demonstrating that current imitation pipelines do not substitute for stronger base models. The systematic variation across base-model sizes, data sources, and data volumes, combined with dual human and automatic evaluation protocols, supplies concrete evidence that stylistic fluency can mask persistent factual and reasoning shortfalls. This strengthens the case for redirecting effort toward pretraining improvements rather than post-hoc distillation.

major comments (2)
  1. [Evaluation and Results sections] The load-bearing claim that imitation closes little of the gap specifically on tasks 'not heavily supported in the imitation data' (abstract and results) lacks explicit quantification. No per-task coverage statistics, token-overlap metrics, or ablations that increase support while holding the base model fixed are reported; without these, the observed discrepancies risk being partly tautological with the data-collection process rather than evidence of an intrinsic capabilities ceiling.
  2. [Human vs. Automatic Evaluation Comparison] The interpretation that human raters are fooled by style while automatic metrics reveal factuality gaps (results) is plausible but under-supported. Additional controls—such as factuality-specific probes on high- versus low-coverage tasks or inter-rater reliability broken down by factual accuracy—would be needed to rule out that the automatic benchmarks simply penalize distribution shift.
minor comments (2)
  1. [Experimental Setup] Clarify the precise list of canonical NLP benchmarks, any data-filtering rules applied to the imitation sets, and whether statistical significance was assessed across the multiple experimental configurations.
  2. [Figures] Label all curves and bars in the performance plots with exact base-model sizes and token counts so that trends across the 0.3M–150M range are immediately readable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each of the major comments below, providing clarifications and indicating where revisions will be made to improve the paper.

read point-by-point responses
  1. Referee: [Evaluation and Results sections] The load-bearing claim that imitation closes little of the gap specifically on tasks 'not heavily supported in the imitation data' (abstract and results) lacks explicit quantification. No per-task coverage statistics, token-overlap metrics, or ablations that increase support while holding the base model fixed are reported; without these, the observed discrepancies risk being partly tautological with the data-collection process rather than evidence of an intrinsic capabilities ceiling.

    Authors: We appreciate this point and agree that more explicit quantification would help substantiate the claim. While our imitation data consists of general instruction-following examples generated via Self-Instruct, which by design covers a broad range of tasks, we acknowledge the value of direct metrics. In the revised version, we will add token-overlap statistics between the imitation dataset and each evaluation benchmark to quantify support levels. We will also include per-task analysis showing performance gaps on low-overlap tasks. Regarding ablations that increase support while holding the base model fixed, this would require collecting additional targeted data for specific benchmarks, which is computationally intensive but we will discuss this as a direction for future work and potentially include a small-scale experiment if space permits. revision: partial

  2. Referee: [Human vs. Automatic Evaluation Comparison] The interpretation that human raters are fooled by style while automatic metrics reveal factuality gaps (results) is plausible but under-supported. Additional controls—such as factuality-specific probes on high- versus low-coverage tasks or inter-rater reliability broken down by factual accuracy—would be needed to rule out that the automatic benchmarks simply penalize distribution shift.

    Authors: We agree that additional controls would strengthen the interpretation. The current evidence comes from the consistent pattern where imitation models match ChatGPT on human ratings for instruction following but lag on automatic metrics for factual and reasoning tasks. To address this, we will incorporate factuality-specific probes (e.g., using datasets like TruthfulQA) and analyze performance on high- versus low-coverage tasks in the revision. For inter-rater reliability, we will report breakdowns by task type if the data permits, to show that raters are consistent on stylistic aspects but the gaps appear in objective measures. We maintain that the automatic benchmarks are standard and not merely penalizing distribution shift, as the base models and imitation models are evaluated under the same conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparisons on held-out tasks are self-contained

full rationale

The paper conducts direct experimental finetuning of open LMs on imitation data from ChatGPT and evaluates performance gaps using crowd ratings and canonical NLP benchmarks on tasks not heavily supported in the data. No mathematical derivations, equations, or first-principles predictions are present that could reduce to fitted inputs by construction. The central claim rests on observable discrepancies between base models, imitation models, and the target, with no self-citation chains or ansatzes invoked to justify uniqueness or force results. This is a standard empirical study self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of crowd-sourced ratings as an initial quality signal and on the assumption that tasks outside the imitation data distribution are representative of general capabilities.

free parameters (1)
  • imitation data volume
    Varied experimentally from 0.3M to 150M tokens to test scaling behavior.
axioms (1)
  • domain assumption Crowd worker ratings on instruction following provide a meaningful initial signal of model quality
    The paper first reports positive crowd ratings before introducing automatic evaluations.

pith-pipeline@v0.9.0 · 5843 in / 1254 out tokens · 63564 ms · 2026-05-18T06:49:53.207839+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • Foundation.DimensionForcing dimension_forced unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    model imitation is a false promise: there exists a substantial capabilities gap between open and closed LMs that, with current methods, can only be bridged using an unwieldy amount of imitation data or by using more capable base LMs

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Multi-Rollout On-Policy Distillation via Peer Successes and Failures

    cs.LG 2026-05 unverdicted novelty 7.0

    MOPD improves on-policy distillation for LLMs by using peer successes for positive patterns and failures for negative examples to create more informative teacher signals.

  2. GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

    cs.AI 2026-04 unverdicted novelty 7.0

    GFT uses group advantage learning and dynamic coefficient rectification to fix reward sparsity and optimization instability in SFT for LLMs, yielding better policies than standard SFT.

  3. CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment

    cs.SE 2025-10 conditional novelty 7.0

    CodeRL+ integrates variable-level execution trajectory inference into RLVR training to align textual code representations with execution semantics, delivering 4.6% relative pass@1 gains and generalization to code-reas...

  4. Self-Rewarding Language Models

    cs.CL 2024-01 conditional novelty 7.0

    Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.

  5. SOD: Step-wise On-policy Distillation for Small Language Model Agents

    cs.CL 2026-05 unverdicted novelty 6.0

    SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.

  6. Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

    cs.SE 2026-04 unverdicted novelty 6.0

    Structured knowledge extracted from corpora enables test-driven data engineering for LLMs by mapping training data to source code, model training to compilation, benchmarking to unit testing, and failures to targeted ...

  7. Hybrid Policy Distillation for LLMs

    cs.CL 2026-04 unverdicted novelty 6.0

    Hybrid Policy Distillation unifies existing knowledge distillation methods for LLMs into a reweighted log-likelihood objective and introduces a hybrid forward-reverse KL approach with mixed data sampling to improve st...

  8. Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

    cs.LG 2025-07 unverdicted novelty 6.0

    RaR uses aggregated rubric feedback as rewards in on-policy RL, delivering up to 31% relative gains on HealthBench and 7% on GPQA-Diamond versus direct Likert LLM-as-judge baselines.

  9. Training Language Models to Self-Correct via Reinforcement Learning

    cs.LG 2024-09 unverdicted novelty 6.0

    SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.

  10. Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

    cs.AI 2024-08 conditional novelty 6.0

    Empirical analysis shows scaling inference compute via strategies like tree search can be more efficient than scaling model parameters, with 7B models plus novel search outperforming 34B models.

  11. LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

    cs.SE 2024-03 unverdicted novelty 6.0

    LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

  12. Zephyr: Direct Distillation of LM Alignment

    cs.LG 2023-10 accept novelty 6.0

    Zephyr-7B achieves state-of-the-art chat benchmark results among 7B models by distilling alignment via dDPO on AI feedback preferences, surpassing the 70B Llama-2-Chat model on MT-Bench with no human data required.

  13. Towards Understanding Sycophancy in Language Models

    cs.CL 2023-10 conditional novelty 6.0

    Sycophancy is prevalent in state-of-the-art AI assistants and is likely driven in part by human preferences that favor agreement over truthfulness.

  14. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

    cs.CL 2023-10 unverdicted novelty 6.0

    Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.

  15. Textbooks Are All You Need

    cs.CL 2023-06 unverdicted novelty 6.0

    A 1.3B-parameter code model trained on 7B tokens of curated textbook and synthetic data achieves 50.6% on HumanEval, indicating data quality can enable strong performance at small scale.

  16. MiniLLM: On-Policy Distillation of Large Language Models

    cs.CL 2023-06 conditional novelty 6.0

    MiniLLM distills large language models into smaller ones via reverse KL divergence and on-policy optimization, yielding higher-quality responses with lower exposure bias than standard KD baselines.

  17. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

    cs.CL 2023-06 accept novelty 6.0

    GPT-4 as an LLM judge achieves over 80% agreement with human preferences on MT-Bench and Chatbot Arena, matching human agreement levels and providing a scalable evaluation method.

  18. Agent AI: Surveying the Horizons of Multimodal Interaction

    cs.AI 2024-01 unverdicted novelty 4.0

    The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.

  19. A Survey on Knowledge Distillation of Large Language Models

    cs.CL 2024-02 accept novelty 3.0

    A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.

Reference graph

Works this paper leans on

299 extracted references · 299 canonical work pages · cited by 19 Pith papers · 32 internal anchors

  1. [1]

    2021 , booktitle=

    Extracting Training Data from Large Language Models , author=. 2021 , booktitle=

  2. [2]

    NIPS , year=

    Attention is all you need , author=. NIPS , year=

  3. [3]

    NIPS Deep Learning Workshop , year=

    Distilling the knowledge in a neural network , author=. NIPS Deep Learning Workshop , year=

  4. [4]

    How much do language models copy from their training data?

    McCoy, R Thomas and Smolensky, Paul and Linzen, Tal and Gao, Jianfeng and Celikyilmaz, Asli , journal=. How much do language models copy from their training data?

  5. [5]

    EMNLP , year=

    Imitation attacks and defenses for black-box machine translation systems , author=. EMNLP , year=

  6. [6]

    Diffusion Art or Digital Forgery?

    Somepalli, Gowthami and Singla, Vasu and Goldblum, Micah and Geiping, Jonas and Goldstein, Tom , booktitle=. Diffusion Art or Digital Forgery?

  7. [7]

    Deduplicating Training Data Makes Language Models Better

    Deduplicating training data makes language models better , author=. arXiv preprint arXiv:2107.06499 , year=

  8. [8]

    arXiv preprint arXiv:2112.12938 , year=

    Counterfactual Memorization in Neural Language Models , author=. arXiv preprint arXiv:2112.12938 , year=

  9. [9]

    ICLR , year=

    Dataset inference: Ownership resolution in machine learning , author=. ICLR , year=

  10. [10]

    Parth Thakkar , year=

  11. [11]

    Hashimoto , title =

    Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =

  12. [12]

    2023 , journal=

    Xinyang Geng and Arnav Gudibande and Hao Liu and Eric Wallace and Pieter Abbeel and Sergey Levine and Dawn Song , title =. 2023 , journal=

  13. [13]

    Aaron Gokaslan and Vanya Cohen and Ellie Pavlick and Stefanie Tellex , year=

  14. [14]

    What neural networks memorize and why:

    Feldman, Vitaly and Zhang, Chiyuan , booktitle=. What neural networks memorize and why:

  15. [15]

    2020 , booktitle=

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , author=. 2020 , booktitle=

  16. [16]

    OpenAI Technical Report , year=

    Language Models are Unsupervised Multitask Learners , author=. OpenAI Technical Report , year=

  17. [17]

    2021 , booktitle=

    On Memorization in Probabilistic Deep Generative Models , author=. 2021 , booktitle=

  18. [18]

    Understanding Membership Inferences on Well-Generalized Learning Models

    Understanding Membership Inferences on Well-Generalized Learning Models , author=. arXiv preprint arXiv:1802.04889 , year=

  19. [19]

    Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and others , journal=. The

  20. [20]

    IEEE CSF , year=

    Privacy risk in machine learning: Analyzing the connection to overfitting , author=. IEEE CSF , year=

  21. [21]

    ACM CCS , year=

    Model inversion attacks that exploit confidence information and basic countermeasures , author=. ACM CCS , year=

  22. [22]

    2019 , booktitle=

    White-box vs Black-box: Bayes Optimal Strategies for Membership Inference , author=. 2019 , booktitle=

  23. [23]

    arXiv preprint arXiv:2111.08440 , year=

    On the Importance of Difficulty Calibration in Membership Inference Attacks , author=. arXiv preprint arXiv:2111.08440 , year=

  24. [24]

    TCC , year=

    Calibrating noise to sensitivity in private data analysis , author=. TCC , year=

  25. [25]

    arXiv preprint arXiv:2110.06500 , year=

    Differentially private fine-tuning of language models , author=. arXiv preprint arXiv:2110.06500 , year=

  26. [26]

    Microsoft Bets Big on the Creator of

    Cade Metz and Karen Weise , journal=. Microsoft Bets Big on the Creator of

  27. [27]

    2022 , booktitle=

    Large Language Models Can Be Strong Differentially Private Learners , author=. 2022 , booktitle=

  28. [28]

    Privacy Preserving Machine Learning Workshop , year=

    Training data leakage analysis in language models , author=. Privacy Preserving Machine Learning Workshop , year=

  29. [29]

    USENIX Security Symposium , year=

    The secret sharer: Evaluating and testing unintended memorization in neural networks , author=. USENIX Security Symposium , year=

  30. [30]

    Do we train on test data?

    Barz, Bj. Do we train on test data?. Journal of Imaging , year=

  31. [31]

    A first look at rote learning in

    Albert Ziegler , month=. A first look at rote learning in

  32. [32]

    IEEE S&P , year=

    Membership Inference Attacks From First Principles , author=. IEEE S&P , year=

  33. [33]

    Nakano, Reiichiro and Hilton, Jacob and Balaji, Suchir and Wu, Jeff and Ouyang, Long and Kim, Christina and Hesse, Christopher and Jain, Shantanu and Kosaraju, Vineet and Saunders, William and others , journal=

  34. [34]

    NeurIPS , year=

    Learning to summarize with human feedback , author=. NeurIPS , year=

  35. [35]

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Training a helpful and harmless assistant with reinforcement learning from human feedback , author=. arXiv preprint arXiv:2204.05862 , year=

  36. [36]

    How Close is

    Guo, Biyang and Zhang, Xin and Wang, Ziyuan and Jiang, Minqi and Nie, Jinran and Ding, Yuxuan and Yue, Jianwei and Wu, Yupeng , journal=. How Close is

  37. [37]

    and Stoica, Ion and Xing, Eric P

    Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P. , year =. Vicuna: An Open-Source Chatbot Impressing

  38. [38]

    Natural questions:

    Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and others , journal=. Natural questions:

  39. [39]

    ICLR , year=

    Measuring Massive Multitask Language Understanding , author=. ICLR , year=

  40. [40]

    ICLR , year=

    Quantifying memorization across neural language models , author=. ICLR , year=

  41. [41]

    2018 , booktitle=

    Hierarchical Neural Story Generation , author=. 2018 , booktitle=

  42. [42]

    OpenAI Blog https://openai

    Better language models and their implications , author=. OpenAI Blog https://openai. com/blog/better-language-models , volume=

  43. [43]

    2020 , booktitle=

    Language Models are Few-Shot Learners , author=. 2020 , booktitle=

  44. [44]

    2021 , booktitle=

    Reflective Decoding: Beyond Unidirectional Generation with Off-the-Shelf Language Models , author=. 2021 , booktitle=

  45. [45]

    Recht, Benjamin and Roelofs, Rebecca and Schmidt, Ludwig and Shankar, Vaishaal , journal=. Do

  46. [46]

    SemEval , author=

    SemEval-2017 Task 1:. SemEval , author=

  47. [47]

    Proceedings of the 53rd Annual

    Gavin Brown and Mark Bun and Vitaly Feldman and Adam Smith and Kunal Talwar , title =. Proceedings of the 53rd Annual

  48. [48]

    2019 , publisher =

    Song, Congzheng and Shmatikov, Vitaly , title =. 2019 , publisher =

  49. [49]

    Model Inversion Attacks for Prediction Systems: Without Knowledge of Non-Sensitive Attributes , year=

    Hidano, Seira and Murakami, Takao and Katsumata, Shuichi and Kiyomoto, Shinsaku and Hanaoka, Goichiro , booktitle=. Model Inversion Attacks for Prediction Systems: Without Knowledge of Non-Sensitive Attributes , year=

  50. [50]

    Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security , pages =

    Song, Congzheng and Raghunathan, Ananth , title =. Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security , pages =. 2020 , publisher =

  51. [51]

    2019 , booktitle =

    Yang, Ziqi and Zhang, Jiyi and Chang, Ee-Chien and Liang, Zhenkai , title =. 2019 , booktitle =

  52. [52]

    Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?

    Lehman, Eric and Jain, Sarthak and Pichotta, Karl and Goldberg, Yoav and Wallace, Byron. Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021

  53. [53]

    arXiv preprint arXiv:2205.01863 , year=

    Provably Confidential Language Modelling , author=. arXiv preprint arXiv:2205.01863 , year=

  54. [54]

    Privacy Regularization: Joint Privacy-Utility Optimization in L anguage M odels

    Mireshghallah, Fatemehsadat and Inan, Huseyin and Hasegawa, Marcello and R. Privacy Regularization: Joint Privacy-Utility Optimization in L anguage M odels. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021

  55. [55]

    Towards Robust and Privacy-preserving Text Representations

    Li, Yitong and Baldwin, Timothy and Cohn, Trevor. Towards Robust and Privacy-preserving Text Representations. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018

  56. [56]

    Privacy-preserving Neural Representations of Text

    Coavoux, Maximin and Narayan, Shashi and Cohen, Shay B. Privacy-preserving Neural Representations of Text. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018

  57. [57]

    Scaling Laws and Interpretability of Learning from Repeated Data

    Scaling Laws and Interpretability of Learning from Repeated Data , author=. arXiv preprint arXiv:2205.10487 , year=

  58. [58]

    Language Models as Knowledge Bases?

    Petroni, Fabio and Rockt. Language Models as Knowledge Bases?. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019

  59. [59]

    How Much Knowledge Can You Pack Into the Parameters of a Language Model?

    Roberts, Adam and Raffel, Colin and Shazeer, Noam. How Much Knowledge Can You Pack Into the Parameters of a Language Model?. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020

  60. [60]

    ICLR , year=

    Multitask prompted training enables zero-shot task generalization , author=. ICLR , year=

  61. [61]

    Evaluating Large Language Models Trained on Code

    Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

  62. [62]

    arXiv preprint arXiv:2302.10724 , year=

    Koco. arXiv preprint arXiv:2302.10724 , year=

  63. [63]

    ICLR , year=

    Finetuned language models are zero-shot learners , author=. ICLR , year=

  64. [64]

    EMNLP , year=

    Adapting language models for zero-shot learning by meta-tuning on dataset and prompt collections , author=. EMNLP , year=

  65. [65]

    Training language models to follow instructions with human feedback

    Training language models to follow instructions with human feedback , author=. arXiv preprint arXiv:2203.02155 , year=

  66. [66]

    Min, Sewon and Lewis, Mike and Zettlemoyer, Luke and Hajishirzi, Hannaneh , booktitle=

  67. [67]

    Black, Sid and Biderman, Stella and Hallahan, Eric and Anthony, Quentin and Gao, Leo and Golding, Laurence and He, Horace and Leahy, Connor and McDonell, Kyle and Phang, Jason and others , booktitle=

  68. [68]

    Zhang, Susan and Roller, Stephen and Goyal, Naman and Artetxe, Mikel and Chen, Moya and Chen, Shuohui and Dewan, Christopher and Diab, Mona and Li, Xian and Lin, Xi Victoria and others , journal=

  69. [69]

    EMNLP , year=

    Benchmarking generalization via in-context instructions on 1,600+ language tasks , author=. EMNLP , year=

  70. [70]

    Zelle and Raymond J

    John M. Zelle and Raymond J. Mooney , booktitle = aaai, year =

  71. [71]

    Analyzing uncertainty in neural machine translation , author=

  72. [72]

    NeurIPS , year=

    Denoising diffusion probabilistic models , author=. NeurIPS , year=

  73. [73]

    Prediction Poisoning: Towards Defenses Against

    Orekondy, Tribhuvanesh and Schiele, Bernt and Fritz, Mario , booktitle=. Prediction Poisoning: Towards Defenses Against

  74. [74]

    ICLR , year=

    Increasing the cost of model extraction with calibrated proof of work , author=. ICLR , year=

  75. [75]

    ICLR , year=

    On the difficulty of defending self-supervised learning against model extraction , author=. ICLR , year=

  76. [76]

    Watermarking the outputs of structured prediction with an application in statistical machine translation , author=

  77. [77]

    IEEE S&P , year=

    Membership inference attacks against machine learning models , author=. IEEE S&P , year=

  78. [78]

    Black-box adversarial attacks with limited queries and information , author=

  79. [79]

    Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , author=

  80. [80]

    Gil, Yotam and Chai, Yoav and Gorodissky, Or and Berant, Jonathan , booktitle=naacl, year=

Showing first 80 references.