Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Liliana Hotsko; Pengyu Nie; Stuart Shieber; Wentao Zhang; Woojeong Kim; Yuntian Deng

arxiv: 2607.02512 · v1 · pith:EMOUFTYGnew · submitted 2026-07-02 · 💻 cs.LG · cs.AI· cs.CL

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Wentao Zhang , Liliana Hotsko , Woojeong Kim , Pengyu Nie , Stuart Shieber , Yuntian Deng This is my paper

Pith reviewed 2026-07-03 16:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CL

keywords fuzzy functionsprogram as weightsparameter-efficient adaptersnatural language to program compilationlightweight model executionFuzzyBench datasetlocal offline inference

0 comments

The pith

Natural-language specs for fuzzy tasks compile into small neural adapters that let a 0.6B model match a 32B model's accuracy at 1/50th the memory cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces fuzzy-function programming as a way to turn imprecise everyday tasks, such as log filtering or intent-based ranking, into locally executable programs instead of repeated calls to large language model APIs. It realizes this idea through Program-as-Weights, where a 4B compiler trained on the released FuzzyBench dataset produces parameter-efficient adapters for a frozen 0.6B interpreter. The resulting adapters allow the small interpreter to reach the accuracy of direct prompting on a 32B model while running at 30 tokens per second on consumer hardware and using roughly one-fiftieth the inference memory. The central shift is that the large model is called only once, when the function is first defined, after which the compiled artifact handles all subsequent inputs offline and cheaply.

Core claim

PAW reframes the foundation model from a per-input problem solver into a tool builder: invoked once per function definition, it produces a small reusable artifact whose subsequent calls per function application are cheap and offline. A 4B compiler trained on FuzzyBench emits parameter-efficient adapters for a frozen 0.6B Qwen3 interpreter; programs executed this way match the performance of direct prompting of Qwen3-32B while using roughly one fiftieth of the inference memory and running at 30 tokens per second on a MacBook M3.

What carries the argument

Program-as-Weights (PAW), the mechanism in which a compiler emits parameter-efficient adapters that are loaded into a fixed lightweight interpreter to execute the compiled fuzzy function.

If this is right

Common fuzzy tasks can be defined once in natural language and then executed repeatedly with low memory and no API dependency.
The large model is invoked only at function-definition time rather than at every input.
Fuzzy functions become reproducible local artifacts rather than black-box API responses.
A single 10M-example dataset suffices to train a compiler that produces usable adapters across multiple task types.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same compiler-plus-interpreter split could be applied to other base models or adapter families if the training distribution is broadened.
Multiple PAW artifacts could be composed at runtime to build higher-level fuzzy pipelines without retraining the compiler.
Production systems that currently route every fuzzy decision through an API might replace those routes with one-time compilation plus local execution.

Load-bearing premise

A compiler trained on the FuzzyBench dataset can reliably produce adapters that generalize to arbitrary natural-language specifications of fuzzy functions.

What would settle it

Run the PAW compiler on a new fuzzy-function specification absent from FuzzyBench, execute the resulting adapter on the 0.6B interpreter, and check whether accuracy drops substantially below that of direct prompting on the 32B model for the same inputs.

Figures

Figures reproduced from arXiv: 2607.02512 by Liliana Hotsko, Pengyu Nie, Stuart Shieber, Wentao Zhang, Woojeong Kim, Yuntian Deng.

**Figure 1.** Figure 1: Overview of the Program-as-Weights paradigm. Top: compile once in the cloud. A natural-language description of a fuzzy function (here, “classify if this is urgent”) is fed to a neural compiler, which produces a neural program. Bottom: run locally. A small frozen neural interpreter loads the compiled program and runs the user’s input (“Need your signature by EOD!”) to produce the output (“urgent”). The comp… view at source ↗

**Figure 2.** Figure 2: Text-to-LoRA instantiation of PAW (Section 3.2). Left. The trained LoRA compiler reads the function specification, the pseudo-program produced by an off-the-shelf prompted pseudo compiler Cp (not depicted), and a fixed sequence of learned prefix tokens; it emits prefix-position hidden states H. Middle. The LoRA mapper mean-pools H, passes it through an MLP, and projects into mixing coefficients that compos… view at source ↗

**Figure 3.** Figure 3: FuzzyBench-10M task-family distribution. 29 incremental thematic versions are mapped to 7 high-level families. “Core text processing & NLP” is the largest family because the v1 base layer (2.5M examples; 277 base categories) covers parsing, classification, NER, coreference, and sentiment; the remaining 7.5M examples spread across the other six families. safety/verification. The full per-version timeline (2… view at source ↗

**Figure 4.** Figure 4: Developer interface. Left: the compiler translates a natural-language specification into a neural program. Right: the interpreter loads this program and exposes it as a local function [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Step 1: Compile a program from natural language. The user specifies a fuzzy function in natural language. Image inputs are also supported [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Step 2: Interactively test the compiled program. Users can provide test inputs and inspect the corresponding outputs, enabling rapid validation and refinement before download. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Step 3: Execute the program locally via Python. Once compiled, the program can be loaded and invoked through a simple Python API; subsequent execution requires no internet access. B FuzzyBench Construction Prompts Figures 8 to 10 show the prompts used to generate the natural-language specifications. Half of the specifications are generated without exemplar examples ( [PITH_FULL_IMAGE:figures/full_fig_p017… view at source ↗

**Figure 8.** Figure 8: System prompt for generating specifications. [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: User prompt for generating specifications (no exemplar examples). [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: User prompt for generating specifications, with exemplar input/output pairs. [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 12.** Figure 12: User prompt for generating input/output examples given a specification. [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 14.** Figure 14: Compiler prompt, examples style. Used by the off-the-shelf reference compiler (Qwen3- 4B-Instruct-2507) to generate the rollouts used during training. {pseudo_program} [INPUT] {task_input} [END_INPUT] [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗

**Figure 15.** Figure 15: Interpreter prompt, minimal style. Role: PAW-Compiler. You will see an image. Produce a self-contained text representation of the image that a *blind* interpreter can later use to answer arbitrary questions about it. The interpreter sees only your output and never the image. Coverage requirements (apply all that exist in the image): - Transcribe every piece of legible text verbatim, in quotes, with its lo… view at source ↗

**Figure 16.** Figure 16: Compiler prompt for image-conditioned specifications. [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗

**Figure 17.** Figure 17: Interpreter prompt for image-conditioned specifications. [PITH_FULL_IMAGE:figures/full_fig_p020_17.png] view at source ↗

**Figure 18.** Figure 18: Prefix-tuning precursor architecture (Section 3.3). (a) Compile. The user describes a fuzzy function (e.g., “extract the final answer”); the trained prefix compiler reads the description plus a handful of representative I/O examples and produces a per-example KV prefix — the “neural binary” that constitutes the compiled program. (b) Interpret. A small frozen interpreter loads the compiled KV prefix into i… view at source ↗

**Figure 19.** Figure 19: A library of compiled PAW programs. Three example natural-language function specifications (“Classify message urgency”, “Fix malformed JSON”, “Remove personal information”; left) are each compiled into a separate neural program (middle): a discrete pseudo-program in a fixed format plus a continuous per-example LoRA (depicted as red, blue, green adapters). At deployment time (right), all three programs are… view at source ↗

**Figure 20.** Figure 20: The Alien-Taboo case-study UI. The player describes the secret word (here, “moon”) in free text without using any of the listed taboo words (night, orbit, lunar, full); the alien “Zog” — a one-PAW-function compiled program — must guess the word from the description. Each player turn is served by a 0.6B Qwen3 PAW interpreter on a small server, with one PAW program (and per-program LoRA adapter) per languag… view at source ↗

read the original abstract

Many everyday programming tasks resist clean rule-based implementation, such as alerting on important log lines, repairing malformed JSON, or ranking search results by intent, and are increasingly outsourced to large language model APIs at the cost of locality, reproducibility, and price. We propose fuzzy-function programming: compiling such a function from a natural-language specification into a compact, locally-executable neural artifact. We instantiate this paradigm with Program-as-Weights (PAW), in which a 4B compiler trained on FuzzyBench, a 10M-example dataset we release, emits parameter-efficient adapters for a frozen, lightweight interpreter. A 0.6B Qwen3 interpreter executing PAW programs matches the performance of direct prompting of Qwen3-32B, while using roughly one fiftieth of the inference memory and running at 30 tokens/s on a MacBook M3. PAW reframes the foundation model from a per-input problem solver into a tool builder: invoked once per function definition, it produces a small reusable artifact whose subsequent calls per function application are cheap and offline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PAW introduces a compiler that turns NL fuzzy specs into small reusable adapters, but the headline performance match lacks any evaluation details in the abstract.

read the letter

The main point is that this paper describes a compiler trained on FuzzyBench that emits adapters for a frozen 0.6B interpreter, claiming it matches direct use of a 32B model on fuzzy tasks like log alerting or JSON repair while using far less memory and running locally at decent speed.

What is new is the specific PAW setup: a dedicated 4B compiler that produces parameter-efficient adapters from natural-language function definitions, plus the release of the 10M-example FuzzyBench dataset. The framing of the foundation model as a one-time tool builder rather than a repeated per-input solver is a clean way to present the idea.

The paper does a reasonable job motivating the problem of fuzzy everyday tasks that resist pure rules and showing the potential for cheaper, private, offline execution.

The soft spots are clear from the abstract alone. It states performance equivalence but supplies no information on dataset construction, baselines, metrics, or how success was measured. The stress-test concern lands directly: the whole approach depends on the compiler generalizing to arbitrary new natural-language specs outside the training distribution, and nothing in the text shows that this holds. If the adapters only work on things close to FuzzyBench, the memory and speed advantages do not deliver on the promise.

This is for researchers working on parameter-efficient methods, local deployment, and hybrid rule-plus-learning programming. A reader focused on practical efficiency gains would get value from the paradigm even before the numbers are verified.

It deserves a serious referee to check whether the full experiments actually support the generalization claim and the equivalence result.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes Program-as-Weights (PAW), a paradigm for 'fuzzy functions' (e.g., log alerting, JSON repair, intent ranking) that resist clean rule-based implementation. A 4B compiler trained on the released FuzzyBench dataset (10M examples) emits parameter-efficient adapters for a frozen 0.6B Qwen3 interpreter. The central claim is that this 0.6B setup matches the performance of direct prompting on Qwen3-32B while using ~1/50th the inference memory and running at 30 tokens/s on a MacBook M3, reframing foundation models as one-time tool builders that produce small reusable offline artifacts.

Significance. If the performance equivalence and generalization hold, the work could enable efficient local execution of tasks currently routed to large APIs, improving locality, reproducibility, and cost. The release of FuzzyBench is a concrete positive contribution. The paradigm shift from per-input solving to artifact generation is conceptually interesting for the ML systems community.

major comments (2)

[Abstract] Abstract: the headline claim of performance equivalence between the 0.6B Qwen3+PAW interpreter and direct Qwen3-32B prompting is stated without any evaluation details, dataset construction rules, baseline comparisons, or metrics, so the central claim cannot be assessed from the provided text.
[Evaluation (implied)] The manuscript supplies no experiments or ablations testing whether the 4B compiler produces effective adapters for natural-language fuzzy-function specifications that lie outside the FuzzyBench training distribution; this generalization is load-bearing for the claim that PAW yields reusable artifacts for arbitrary specifications.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting the potential impact of the work and the value of the FuzzyBench release. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim of performance equivalence between the 0.6B Qwen3+PAW interpreter and direct Qwen3-32B prompting is stated without any evaluation details, dataset construction rules, baseline comparisons, or metrics, so the central claim cannot be assessed from the provided text.

Authors: We agree that the abstract, constrained by length, does not include evaluation specifics. The manuscript body details the 10M-example FuzzyBench dataset, the metrics used, and the direct comparison to Qwen3-32B prompting. We will revise the abstract to incorporate a concise reference to the key metrics and dataset scale so the central claim is more readily assessable. revision: yes
Referee: [Evaluation (implied)] The manuscript supplies no experiments or ablations testing whether the 4B compiler produces effective adapters for natural-language fuzzy-function specifications that lie outside the FuzzyBench training distribution; this generalization is load-bearing for the claim that PAW yields reusable artifacts for arbitrary specifications.

Authors: The reported results use the held-out test split of FuzzyBench. We acknowledge that explicit evaluation on specifications outside this distribution would strengthen the generalization claim. We will add targeted experiments and ablations on novel out-of-distribution natural-language specifications in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical result on new dataset

full rationale

The paper introduces FuzzyBench (10M examples) and trains a 4B compiler to emit adapters for a frozen 0.6B interpreter. The headline performance equivalence (matching Qwen3-32B direct prompting) is presented as an empirical outcome of this training and evaluation procedure, not as a quantity derived by definition, by renaming a fitted parameter, or by a self-citation chain. No equations or steps reduce the claimed result to its own inputs; the derivation chain is self-contained against the external benchmark of direct prompting.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 2 invented entities

Review performed on abstract only; ledger reflects elements explicitly named in the abstract. Full paper may contain additional fitted values or assumptions.

free parameters (2)

Compiler parameter count (4B)
Model size chosen for the compiler component.
Interpreter parameter count (0.6B)
Model size chosen for the frozen interpreter.

axioms (1)

domain assumption Adapter modules can capture the behavior of fuzzy functions when emitted by a trained compiler
Central to the claim that the emitted adapters enable the interpreter to match large-model performance.

invented entities (2)

Fuzzy function no independent evidence
purpose: Category of tasks that resist clean rule-based code
Conceptual framing used to motivate the paradigm.
PAW compiler no independent evidence
purpose: Component that translates natural-language specs into adapters
New architectural element introduced by the work.

pith-pipeline@v0.9.1-grok · 5734 in / 1318 out tokens · 41615 ms · 2026-07-03T16:15:47.685359+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

94 extracted references · 16 canonical work pages · 3 internal anchors

[1]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

2000
[2]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

1980
[3]

M. J. Kearns , title =
[4]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

1983
[5]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

2000
[6]

Suppressed for Anonymity , author=
[7]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

1981
[8]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

1959
[9]

FANT o M : A Benchmark for Stress-testing Machine Theory of Mind in Interactions

Kim, Hyunwoo and Sclar, Melanie and Zhou, Xuhui and Bras, Ronan and Kim, Gunhee and Choi, Yejin and Sap, Maarten. FANT o M : A Benchmark for Stress-testing Machine Theory of Mind in Interactions. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.890

work page doi:10.18653/v1/2023.emnlp-main.890 2023
[10]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Li, Xiang Lisa and Liang, Percy. Prefix-Tuning: Optimizing Continuous Prompts for Generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021. doi:10.18653/v1/2021.acl-long.353

work page doi:10.18653/v1/2021.acl-long.353 2021
[11]

Proceedings of the 38th International Conference on Machine Learning , pages =

Learning Transferable Visual Models From Natural Language Supervision , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

2021
[12]

2021 , eprint=

Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=

2021
[13]

Mankowitz and Esme Sutherland Robson and Pushmeet Kohli and Nando de Freitas and Koray Kavukcuoglu and Oriol Vinyals , title =

Yujia Li and David Choi and Junyoung Chung and Nate Kushman and Julian Schrittwieser and Rémi Leblond and Tom Eccles and James Keeling and Felix Gimeno and Agustin Dal Lago and Thomas Hubert and Peter Choy and Cyprien de Masson d’Autume and Igor Babuschkin and Xinyun Chen and Po-Sen Huang and Johannes Welbl and Sven Gowal and Alexey Cherepanov and James M...

work page doi:10.1126/science.abq1158 2022
[14]

Locating and Editing Factual Associations in

Kevin Meng and David Bau and Alex J Andonian and Yonatan Belinkov , booktitle=. Locating and Editing Factual Associations in. 2022 , url=

2022
[15]

Edward J Hu and yelong shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo. 2022 , url=

2022
[16]

Text-to-Lo

Rujikorn Charakorn and Edoardo Cetin and Yujin Tang and Robert Tjarko Lange , booktitle=. Text-to-Lo. 2025 , url=

2025
[17]

The Thirteenth International Conference on Learning Representations , year=

Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass , author=. The Thirteenth International Conference on Learning Representations , year=
[18]

International Conference on Learning Representations , year=

Continual learning with hypernetworks , author=. International Conference on Learning Representations , year=
[19]

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

Karimi Mahabadi, Rabeeh and Ruder, Sebastian and Dehghani, Mostafa and Henderson, James. Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Pap...

work page doi:10.18653/v1/2021.acl-long.47 2021
[20]

2020 , eprint=

Language Models are Few-Shot Learners , author=. 2020 , eprint=

2020
[21]

2014 , eprint=

Neural Turing Machines , author=. 2014 , eprint=

2014
[22]

2016 , eprint=

Neural Programmer-Interpreters , author=. 2016 , eprint=

2016
[23]

2021 , eprint=

Thinking Like Transformers , author=. 2021 , eprint=

2021
[24]

A neural compiler , journal =

Frédéric Gruau and Jean-Yves Ratajszczak and Gilles Wiber , abstract =. A neural compiler , journal =. 1995 , issn =. doi:https://doi.org/10.1016/0304-3975(94)00200-3 , url =

work page doi:10.1016/0304-3975(94)00200-3 1995
[25]

2025 , eprint=

Small Language Models are the Future of Agentic AI , author=. 2025 , eprint=

2025
[26]

AI Commun

Rubio Manzano, Clemente , title =. AI Commun. , month = oct, pages =. 2012 , issue_date =

2012
[27]

The ALCHEmist: Automated Labeling 500x CHEaper than LLM Data Annotators , url =

Huang, Tzu-Heng and Cao, Catherine and Bhargava, Vaishnavi and Sala, Frederic , booktitle =. The ALCHEmist: Automated Labeling 500x CHEaper than LLM Data Annotators , url =. doi:10.52202/079017-2003 , editor =

work page doi:10.52202/079017-2003 2003
[28]

, title =

Deng, Yuntian and Kanervisto, Anssi and Ling, Jeffrey and Rush, Alexander M. , title =. Proceedings of the 34th International Conference on Machine Learning - Volume 70 , pages =. 2017 , publisher =

2017
[29]

and Hajishirzi, Hannaneh and Girshick, Ross and Farhadi, Ali and Kembhavi, Aniruddha , title =

Deitke, Matt and Clark, Christopher and Lee, Sangho and Tripathi, Rohun and Yang, Yue and Park, Jae Sung and Salehi, Mohammadreza and Muennighoff, Niklas and Lo, Kyle and Soldaini, Luca and Lu, Jiasen and Anderson, Taira and Bransom, Erin and Ehsani, Kiana and Ngo, Huong and Chen, YenSung and Patel, Ajay and Yatskar, Mark and Callison-Burch, Chris and Hea...

2025
[30]

2025 , eprint=

MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks , author=. 2025 , eprint=

2025
[31]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025
[32]

2025 , eprint=

AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model , author=. 2025 , eprint=

2025
[33]

2025 , eprint=

Qwen2.5 Technical Report , author=. 2025 , eprint=

2025
[34]

2025 , eprint=

Teach2Eval: An Indirect Evaluation Method for LLM by Judging How It Teaches , author=. 2025 , eprint=

2025
[35]

The Eleventh International Conference on Learning Representations , year=

Large Language Models are Human-Level Prompt Engineers , author=. The Eleventh International Conference on Learning Representations , year=
[36]

Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Zhang, Ruoyu and Ma, Shirong and Bi, Xiao and Zhang, Xiaokang and Yu, Xingkai and Wu, Yu and Wu, Z. F. and Gou, Zhibin and Shao, Zhihong and Li, Zhuoshu and Gao, Ziyi and Liu, Aixin and Xue, Bing and Wang, Bingxuan and Wu, Bochao and Feng, Bei ...

work page doi:10.1038/s41586-025-09422-z
[37]

2024 , eprint=

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author=. 2024 , eprint=

2024
[38]

Williams

Williams, Ronald J. , title =. Mach. Learn. , month = may, pages =. 1992 , issue_date =. doi:10.1007/BF00992696 , abstract =

work page doi:10.1007/bf00992696 1992
[39]

Policy Gradient Methods for Reinforcement Learning with Function Approximation , url =

Sutton, Richard S and McAllester, David and Singh, Satinder and Mansour, Yishay , booktitle =. Policy Gradient Methods for Reinforcement Learning with Function Approximation , url =
[40]

2025 , eprint=

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild , author=. 2025 , eprint=

2025
[41]

2021 , url=

Jieyu Zhang and Yue Yu and Yinghao Li and Yujing Wang and Yaming Yang and Mao Yang and Alexander Ratner , booktitle=. 2021 , url=

2021
[42]

Scaling text-rich image understanding via code-guided synthetic multimodal data generation.arXiv preprint arXiv:2502.14846, 2025

Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation , author=. arXiv preprint arXiv:2502.14846 , year=

work page arXiv
[43]

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

Molmo and PixMo: Open weights and open data for state-of-the-art vision-language models , author=. arXiv preprint arXiv:2409.17146 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[44]

2025 , eprint=

Olmo 3 , author=. 2025 , eprint=

2025
[45]

Python 3.14.2 , howpublished =
[46]

Proceedings of the 34th International Conference on Machine Learning , pages =

Image-to-Markup Generation with Coarse-to-Fine Attention , author =. Proceedings of the 34th International Conference on Machine Learning , pages =. 2017 , editor =

2017
[47]

The Eleventh International Conference on Learning Representations , year=

Markup-to-Image Diffusion Models with Scheduled Sampling , author=. The Eleventh International Conference on Learning Representations , year=
[48]

2024 , journal =

HybridFlow: A Flexible and Efficient RLHF Framework , author =. 2024 , journal =

2024
[49]

2025 , eprint=

HyperSteer: Activation Steering at Scale with Hypernetworks , author=. 2025 , eprint=

2025
[50]

The Eleventh International Conference on Learning Representations (ICLR) , year =

Binding Language Models in Symbolic Languages , author =. The Eleventh International Conference on Learning Representations (ICLR) , year =
[51]

Learning to Generate Task-Specific Adapters from Task Description

Ye, Qinyuan and Ren, Xiang. Learning to Generate Task-Specific Adapters from Task Description. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021. doi:10.18653/v1/2021.acl-short.82

work page doi:10.18653/v1/2021.acl-short.82 2021
[52]

HINT : Hypernetwork Instruction Tuning for Efficient Zero- and Few-Shot Generalisation

Ivison, Hamish and Bhagia, Akshita and Wang, Yizhong and Hajishirzi, Hannaneh and Peters, Matthew. HINT : Hypernetwork Instruction Tuning for Efficient Zero- and Few-Shot Generalisation. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.631

work page doi:10.18653/v1/2023.acl-long.631 2023
[53]

2023 , volume =

Phang, Jason and Mao, Yi and He, Pengcheng and Chen, Weizhu , booktitle =. 2023 , volume =

2023
[54]

Advances in Neural Information Processing Systems , year =

Learning to Compress Prompts with Gist Tokens , author =. Advances in Neural Information Processing Systems , year =
[55]

2024 , url =

Li, Yichuan and Ma, Xiyao and Lu, Sixing and Lee, Kyumin and Liu, Xiaohu and Guo, Chenlei , booktitle =. 2024 , url =

2024
[56]

2019 , publisher =

Language Models are Unsupervised Multitask Learners , author =. 2019 , publisher =

2019
[57]

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI , year =. 2508.10925 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[58]

2023 , url =

Gerganov, Georgi and. 2023 , url =

2023
[59]

2025 , eprint=

Qwen3-VL Technical Report , author=. 2025 , eprint=

2025
[60]

Singh, Amanpreet and Natarajan, Vivek and Shah, Meet and Jiang, Yu and Chen, Xinlei and Batra, Dhruv and Parikh, Devi and Rohrbach, Marcus , booktitle =. Towards
[61]

and Le, Quoc V

Ha, David and Dai, Andrew M. and Le, Quoc V. , booktitle =. 2017 , url =

2017
[62]

Parameter-Efficient Transfer Learning for

Houlsby, Neil and Giurgiu, Andrei and Jastrzebski, Stanislaw and Morrone, Bruna and de Laroussilhe, Quentin and Gesmundo, Andrea and Attariyan, Mona and Gelly, Sylvain , booktitle =. Parameter-Efficient Transfer Learning for. 2019 , url =

2019
[63]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

The Power of Scale for Parameter-Efficient Prompt Tuning , author =. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

2021
[64]

2022 , url =

Liu, Xiao and Ji, Kaixuan and Fu, Yicheng and Tam, Weng Lam and Du, Zhengxiao and Yang, Zhilin and Tang, Jie , booktitle =. 2022 , url =

2022
[65]

Advances in Neural Information Processing Systems , year =

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning , author =. Advances in Neural Information Processing Systems , year =
[66]

Advances in Neural Information Processing Systems , year =

Compacter: Efficient Low-Rank Hypercomplex Adapter Layers , author =. Advances in Neural Information Processing Systems , year =
[67]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL) , year =

Pfeiffer, Jonas and Kamath, Aishwarya and R. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL) , year =
[68]

2024 , url =

Liu, Shih-Yang and Wang, Chien-Yi and Yin, Hongxu and Molchanov, Pavlo and Wang, Yu-Chiang Frank and Cheng, Kwang-Ting and Chen, Min-Hung , booktitle =. 2024 , url =

2024
[69]

2024 , url =

Huang, Chengsong and Liu, Qian and Lin, Bill Yuchen and Pang, Tianyu and Du, Chao and Lin, Min , booktitle =. 2024 , url =

2024
[70]

2023 , url =

Zhang, Qingru and Chen, Minshuo and Bukharin, Alexander and Karampatziakis, Nikos and He, Pengcheng and Cheng, Yu and Chen, Weizhu and Zhao, Tuo , booktitle =. 2023 , url =

2023
[71]

2023 , url =

Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke , booktitle =. 2023 , url =

2023
[72]

2023 , url =

Frantar, Elias and Ashkboos, Saleh and Hoefler, Torsten and Alistarh, Dan , booktitle =. 2023 , url =

2023
[73]

2024 , url =

Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Chen, Wei-Ming and Wang, Wei-Chen and Xiao, Guangxuan and Dang, Xingyu and Gan, Chuang and Han, Song , booktitle =. 2024 , url =

2024
[74]

2602.15902 , archivePrefix =

Charakorn, Rujikorn and Cetin, Edoardo and Uesaka, Shinnosuke and Lange, Robert Tjarko , year =. 2602.15902 , archivePrefix =

work page arXiv
[75]

2026 , eprint =

Latent Context Compilation: Distilling Long Context into Compact Portable Memory , author =. 2026 , eprint =

2026
[76]

2026 , eprint =

Trojan, Bartosz and G. 2026 , eprint =

2026
[77]

SHINE: A Scalable In-Context Hypernetwork for Mapping Context to LoRA in a Single Pass

Liu, Yewei and Wang, Xiyuan and Mao, Yansheng and Gelberg, Yoav and Maron, Haggai and Zhang, Muhan , year =. 2602.06358 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[78]

The Tenth International Conference on Learning Representations (ICLR) , year =

Finetuned Language Models are Zero-Shot Learners , author =. The Tenth International Conference on Learning Representations (ICLR) , year =
[79]

The Tenth International Conference on Learning Representations (ICLR) , year =

Multitask Prompted Training Enables Zero-Shot Task Generalization , author =. The Tenth International Conference on Learning Representations (ICLR) , year =
[80]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

Cross-Task Generalization via Natural Language Crowdsourcing Instructions , author =. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =. 2022 , url =

2022

Showing first 80 references.

[1] [1]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

2000

[2] [2]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

1980

[3] [3]

M. J. Kearns , title =

[4] [4]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

1983

[5] [5]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

2000

[6] [6]

Suppressed for Anonymity , author=

[7] [7]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

1981

[8] [8]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

1959

[9] [9]

FANT o M : A Benchmark for Stress-testing Machine Theory of Mind in Interactions

Kim, Hyunwoo and Sclar, Melanie and Zhou, Xuhui and Bras, Ronan and Kim, Gunhee and Choi, Yejin and Sap, Maarten. FANT o M : A Benchmark for Stress-testing Machine Theory of Mind in Interactions. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.890

work page doi:10.18653/v1/2023.emnlp-main.890 2023

[10] [10]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Li, Xiang Lisa and Liang, Percy. Prefix-Tuning: Optimizing Continuous Prompts for Generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021. doi:10.18653/v1/2021.acl-long.353

work page doi:10.18653/v1/2021.acl-long.353 2021

[11] [11]

Proceedings of the 38th International Conference on Machine Learning , pages =

Learning Transferable Visual Models From Natural Language Supervision , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

2021

[12] [12]

2021 , eprint=

Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=

2021

[13] [13]

Mankowitz and Esme Sutherland Robson and Pushmeet Kohli and Nando de Freitas and Koray Kavukcuoglu and Oriol Vinyals , title =

Yujia Li and David Choi and Junyoung Chung and Nate Kushman and Julian Schrittwieser and Rémi Leblond and Tom Eccles and James Keeling and Felix Gimeno and Agustin Dal Lago and Thomas Hubert and Peter Choy and Cyprien de Masson d’Autume and Igor Babuschkin and Xinyun Chen and Po-Sen Huang and Johannes Welbl and Sven Gowal and Alexey Cherepanov and James M...

work page doi:10.1126/science.abq1158 2022

[14] [14]

Locating and Editing Factual Associations in

Kevin Meng and David Bau and Alex J Andonian and Yonatan Belinkov , booktitle=. Locating and Editing Factual Associations in. 2022 , url=

2022

[15] [15]

Edward J Hu and yelong shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo. 2022 , url=

2022

[16] [16]

Text-to-Lo

Rujikorn Charakorn and Edoardo Cetin and Yujin Tang and Robert Tjarko Lange , booktitle=. Text-to-Lo. 2025 , url=

2025

[17] [17]

The Thirteenth International Conference on Learning Representations , year=

Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass , author=. The Thirteenth International Conference on Learning Representations , year=

[18] [18]

International Conference on Learning Representations , year=

Continual learning with hypernetworks , author=. International Conference on Learning Representations , year=

[19] [19]

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

Karimi Mahabadi, Rabeeh and Ruder, Sebastian and Dehghani, Mostafa and Henderson, James. Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Pap...

work page doi:10.18653/v1/2021.acl-long.47 2021

[20] [20]

2020 , eprint=

Language Models are Few-Shot Learners , author=. 2020 , eprint=

2020

[21] [21]

2014 , eprint=

Neural Turing Machines , author=. 2014 , eprint=

2014

[22] [22]

2016 , eprint=

Neural Programmer-Interpreters , author=. 2016 , eprint=

2016

[23] [23]

2021 , eprint=

Thinking Like Transformers , author=. 2021 , eprint=

2021

[24] [24]

A neural compiler , journal =

Frédéric Gruau and Jean-Yves Ratajszczak and Gilles Wiber , abstract =. A neural compiler , journal =. 1995 , issn =. doi:https://doi.org/10.1016/0304-3975(94)00200-3 , url =

work page doi:10.1016/0304-3975(94)00200-3 1995

[25] [25]

2025 , eprint=

Small Language Models are the Future of Agentic AI , author=. 2025 , eprint=

2025

[26] [26]

AI Commun

Rubio Manzano, Clemente , title =. AI Commun. , month = oct, pages =. 2012 , issue_date =

2012

[27] [27]

The ALCHEmist: Automated Labeling 500x CHEaper than LLM Data Annotators , url =

Huang, Tzu-Heng and Cao, Catherine and Bhargava, Vaishnavi and Sala, Frederic , booktitle =. The ALCHEmist: Automated Labeling 500x CHEaper than LLM Data Annotators , url =. doi:10.52202/079017-2003 , editor =

work page doi:10.52202/079017-2003 2003

[28] [28]

, title =

Deng, Yuntian and Kanervisto, Anssi and Ling, Jeffrey and Rush, Alexander M. , title =. Proceedings of the 34th International Conference on Machine Learning - Volume 70 , pages =. 2017 , publisher =

2017

[29] [29]

and Hajishirzi, Hannaneh and Girshick, Ross and Farhadi, Ali and Kembhavi, Aniruddha , title =

Deitke, Matt and Clark, Christopher and Lee, Sangho and Tripathi, Rohun and Yang, Yue and Park, Jae Sung and Salehi, Mohammadreza and Muennighoff, Niklas and Lo, Kyle and Soldaini, Luca and Lu, Jiasen and Anderson, Taira and Bransom, Erin and Ehsani, Kiana and Ngo, Huong and Chen, YenSung and Patel, Ajay and Yatskar, Mark and Callison-Burch, Chris and Hea...

2025

[30] [30]

2025 , eprint=

MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks , author=. 2025 , eprint=

2025

[31] [31]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025

[32] [32]

2025 , eprint=

AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model , author=. 2025 , eprint=

2025

[33] [33]

2025 , eprint=

Qwen2.5 Technical Report , author=. 2025 , eprint=

2025

[34] [34]

2025 , eprint=

Teach2Eval: An Indirect Evaluation Method for LLM by Judging How It Teaches , author=. 2025 , eprint=

2025

[35] [35]

The Eleventh International Conference on Learning Representations , year=

Large Language Models are Human-Level Prompt Engineers , author=. The Eleventh International Conference on Learning Representations , year=

[36] [36]

Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Zhang, Ruoyu and Ma, Shirong and Bi, Xiao and Zhang, Xiaokang and Yu, Xingkai and Wu, Yu and Wu, Z. F. and Gou, Zhibin and Shao, Zhihong and Li, Zhuoshu and Gao, Ziyi and Liu, Aixin and Xue, Bing and Wang, Bingxuan and Wu, Bochao and Feng, Bei ...

work page doi:10.1038/s41586-025-09422-z

[37] [37]

2024 , eprint=

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , author=. 2024 , eprint=

2024

[38] [38]

Williams

Williams, Ronald J. , title =. Mach. Learn. , month = may, pages =. 1992 , issue_date =. doi:10.1007/BF00992696 , abstract =

work page doi:10.1007/bf00992696 1992

[39] [39]

Policy Gradient Methods for Reinforcement Learning with Function Approximation , url =

Sutton, Richard S and McAllester, David and Singh, Satinder and Mansour, Yishay , booktitle =. Policy Gradient Methods for Reinforcement Learning with Function Approximation , url =

[40] [40]

2025 , eprint=

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild , author=. 2025 , eprint=

2025

[41] [41]

2021 , url=

Jieyu Zhang and Yue Yu and Yinghao Li and Yujing Wang and Yaming Yang and Mao Yang and Alexander Ratner , booktitle=. 2021 , url=

2021

[42] [42]

Scaling text-rich image understanding via code-guided synthetic multimodal data generation.arXiv preprint arXiv:2502.14846, 2025

Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation , author=. arXiv preprint arXiv:2502.14846 , year=

work page arXiv

[43] [43]

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

Molmo and PixMo: Open weights and open data for state-of-the-art vision-language models , author=. arXiv preprint arXiv:2409.17146 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[44] [44]

2025 , eprint=

Olmo 3 , author=. 2025 , eprint=

2025

[45] [45]

Python 3.14.2 , howpublished =

[46] [46]

Proceedings of the 34th International Conference on Machine Learning , pages =

Image-to-Markup Generation with Coarse-to-Fine Attention , author =. Proceedings of the 34th International Conference on Machine Learning , pages =. 2017 , editor =

2017

[47] [47]

The Eleventh International Conference on Learning Representations , year=

Markup-to-Image Diffusion Models with Scheduled Sampling , author=. The Eleventh International Conference on Learning Representations , year=

[48] [48]

2024 , journal =

HybridFlow: A Flexible and Efficient RLHF Framework , author =. 2024 , journal =

2024

[49] [49]

2025 , eprint=

HyperSteer: Activation Steering at Scale with Hypernetworks , author=. 2025 , eprint=

2025

[50] [50]

The Eleventh International Conference on Learning Representations (ICLR) , year =

Binding Language Models in Symbolic Languages , author =. The Eleventh International Conference on Learning Representations (ICLR) , year =

[51] [51]

Learning to Generate Task-Specific Adapters from Task Description

Ye, Qinyuan and Ren, Xiang. Learning to Generate Task-Specific Adapters from Task Description. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021. doi:10.18653/v1/2021.acl-short.82

work page doi:10.18653/v1/2021.acl-short.82 2021

[52] [52]

HINT : Hypernetwork Instruction Tuning for Efficient Zero- and Few-Shot Generalisation

Ivison, Hamish and Bhagia, Akshita and Wang, Yizhong and Hajishirzi, Hannaneh and Peters, Matthew. HINT : Hypernetwork Instruction Tuning for Efficient Zero- and Few-Shot Generalisation. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.631

work page doi:10.18653/v1/2023.acl-long.631 2023

[53] [53]

2023 , volume =

Phang, Jason and Mao, Yi and He, Pengcheng and Chen, Weizhu , booktitle =. 2023 , volume =

2023

[54] [54]

Advances in Neural Information Processing Systems , year =

Learning to Compress Prompts with Gist Tokens , author =. Advances in Neural Information Processing Systems , year =

[55] [55]

2024 , url =

Li, Yichuan and Ma, Xiyao and Lu, Sixing and Lee, Kyumin and Liu, Xiaohu and Guo, Chenlei , booktitle =. 2024 , url =

2024

[56] [56]

2019 , publisher =

Language Models are Unsupervised Multitask Learners , author =. 2019 , publisher =

2019

[57] [57]

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI , year =. 2508.10925 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[58] [58]

2023 , url =

Gerganov, Georgi and. 2023 , url =

2023

[59] [59]

2025 , eprint=

Qwen3-VL Technical Report , author=. 2025 , eprint=

2025

[60] [60]

Singh, Amanpreet and Natarajan, Vivek and Shah, Meet and Jiang, Yu and Chen, Xinlei and Batra, Dhruv and Parikh, Devi and Rohrbach, Marcus , booktitle =. Towards

[61] [61]

and Le, Quoc V

Ha, David and Dai, Andrew M. and Le, Quoc V. , booktitle =. 2017 , url =

2017

[62] [62]

Parameter-Efficient Transfer Learning for

Houlsby, Neil and Giurgiu, Andrei and Jastrzebski, Stanislaw and Morrone, Bruna and de Laroussilhe, Quentin and Gesmundo, Andrea and Attariyan, Mona and Gelly, Sylvain , booktitle =. Parameter-Efficient Transfer Learning for. 2019 , url =

2019

[63] [63]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

The Power of Scale for Parameter-Efficient Prompt Tuning , author =. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

2021

[64] [64]

2022 , url =

Liu, Xiao and Ji, Kaixuan and Fu, Yicheng and Tam, Weng Lam and Du, Zhengxiao and Yang, Zhilin and Tang, Jie , booktitle =. 2022 , url =

2022

[65] [65]

Advances in Neural Information Processing Systems , year =

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning , author =. Advances in Neural Information Processing Systems , year =

[66] [66]

Advances in Neural Information Processing Systems , year =

Compacter: Efficient Low-Rank Hypercomplex Adapter Layers , author =. Advances in Neural Information Processing Systems , year =

[67] [67]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL) , year =

Pfeiffer, Jonas and Kamath, Aishwarya and R. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL) , year =

[68] [68]

2024 , url =

Liu, Shih-Yang and Wang, Chien-Yi and Yin, Hongxu and Molchanov, Pavlo and Wang, Yu-Chiang Frank and Cheng, Kwang-Ting and Chen, Min-Hung , booktitle =. 2024 , url =

2024

[69] [69]

2024 , url =

Huang, Chengsong and Liu, Qian and Lin, Bill Yuchen and Pang, Tianyu and Du, Chao and Lin, Min , booktitle =. 2024 , url =

2024

[70] [70]

2023 , url =

Zhang, Qingru and Chen, Minshuo and Bukharin, Alexander and Karampatziakis, Nikos and He, Pengcheng and Cheng, Yu and Chen, Weizhu and Zhao, Tuo , booktitle =. 2023 , url =

2023

[71] [71]

2023 , url =

Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke , booktitle =. 2023 , url =

2023

[72] [72]

2023 , url =

Frantar, Elias and Ashkboos, Saleh and Hoefler, Torsten and Alistarh, Dan , booktitle =. 2023 , url =

2023

[73] [73]

2024 , url =

Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Chen, Wei-Ming and Wang, Wei-Chen and Xiao, Guangxuan and Dang, Xingyu and Gan, Chuang and Han, Song , booktitle =. 2024 , url =

2024

[74] [74]

2602.15902 , archivePrefix =

Charakorn, Rujikorn and Cetin, Edoardo and Uesaka, Shinnosuke and Lange, Robert Tjarko , year =. 2602.15902 , archivePrefix =

work page arXiv

[75] [75]

2026 , eprint =

Latent Context Compilation: Distilling Long Context into Compact Portable Memory , author =. 2026 , eprint =

2026

[76] [76]

2026 , eprint =

Trojan, Bartosz and G. 2026 , eprint =

2026

[77] [77]

SHINE: A Scalable In-Context Hypernetwork for Mapping Context to LoRA in a Single Pass

Liu, Yewei and Wang, Xiyuan and Mao, Yansheng and Gelberg, Yoav and Maron, Haggai and Zhang, Muhan , year =. 2602.06358 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[78] [78]

The Tenth International Conference on Learning Representations (ICLR) , year =

Finetuned Language Models are Zero-Shot Learners , author =. The Tenth International Conference on Learning Representations (ICLR) , year =

[79] [79]

The Tenth International Conference on Learning Representations (ICLR) , year =

Multitask Prompted Training Enables Zero-Shot Task Generalization , author =. The Tenth International Conference on Learning Representations (ICLR) , year =

[80] [80]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

Cross-Task Generalization via Natural Language Crowdsourcing Instructions , author =. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =. 2022 , url =

2022