Towards the Readability of LLM-Generated Codes through Multitask Representation Engineering

Huifan Gao; Liuhua He; Shenbao Yu; Shengchao Qin; Weidi Sun; Yifeng Zeng; Yinghui Pan

arxiv: 2606.06214 · v1 · pith:PZCG2XITnew · submitted 2026-06-04 · 💻 cs.SE · cs.AI

Towards the Readability of LLM-Generated Codes through Multitask Representation Engineering

Huifan Gao , Liuhua He , Yinghui Pan , Shenbao Yu , Yifeng Zeng , Shengchao Qin , Weidi Sun This is my paper

Pith reviewed 2026-06-28 00:16 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords LLM code generationrepresentation engineeringcode readabilitymultitask steeringcode correctnesssteering vectors

0 comments

The pith

Multitask RepE framework steers readability in LLM-generated code and analyzes its tradeoff with correctness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes extending representation engineering to a multitask setting so that large language models can be steered to produce more readable code with minimal data and compute. Readability has been hard to target directly because it is subjective, while most prior efforts on LLM code have prioritized functional correctness instead. The authors introduce the multitask RepE framework and supply a theoretical account of how steering across tasks shifts the balance between readability and correctness. Experiments are presented to support the approach. If the method works as described, developers could obtain more comprehensible code from LLMs without separate fine-tuning runs for each quality dimension.

Core claim

The central claim is that the multitask RepE framework enables control of code readability across multiple tasks, and that the multitask steering method impacts the tradeoff between readability and correctness in LLM-generated codes.

What carries the argument

The multitask RepE framework, which applies representation engineering across multiple tasks to steer code readability.

If this is right

Readability becomes steerable at low data and computational cost compared with task-specific fine-tuning.
A quantifiable tradeoff arises between readability gains and any loss in functional correctness under multitask steering.
The same steering vectors can be reused across related code-generation tasks without retraining.
Open-source implementations allow direct replication and extension to new code properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same multitask steering approach could be tested on other subjective code attributes such as maintainability or security posture.
Integration with existing LLM inference pipelines might reduce reliance on prompt engineering for style control.
The tradeoff analysis could inform decisions about when to apply readability steering versus accepting default model output.

Load-bearing premise

Readability is a controllable property via representation engineering in a multitask setting and the theoretical tradeoff analysis holds without post-hoc adjustments.

What would settle it

An experiment in which the multitask steering vectors are applied and neither readability metrics improve nor the predicted tradeoff between readability and correctness appears.

Figures

Figures reproduced from arXiv: 2606.06214 by Huifan Gao, Liuhua He, Shenbao Yu, Shengchao Qin, Weidi Sun, Yifeng Zeng, Yinghui Pan.

**Figure 1.** Figure 1: For different users with varying experience levels, the format of comments, naming conventions, and cyclomatic complexity can lead to differences in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: The MRepE framework includes three stages. (Top) Datasets with diverse code readability metrics are provided for LLM to compute differences in hidden state representations, followed by a centering procedure. (Bottom left) Orthogonal steering vectors are extracted using MOC-JPCA. (Bottom right) The orthogonalized vectors are injected into selected layers of LLM to modulate and control its outputs. Within th… view at source ↗

**Figure 3.** Figure 3: Injecting steering vectors for different readability metrics into [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The readability performance for the models with the single-metric [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: The surface plots illustrate how the code readability performance of [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: The surface plots illustrate how the code correctness performance of [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 8.** Figure 8: The 3D stacked heatmaps illustrate the probability (blue for [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

read the original abstract

Correctness and readability are key measures of code quality, respectively ensuring functional fidelity and ease of comprehension. While most existing research focuses on improving the correctness of large language models~(LLMs) generated codes, readability remains under-addressed. Enhancing readability through targeted control is challenging due to its subjective nature. In this article, we employ representation engineering~(RepE) as the targeted control method given its characteristics of low data dependency and low computational cost. Prior work on RepE has primarily focused on the targeted control for a single task, but improving the code readability requires the control across multiple tasks. Accordingly we proposes the multitask RepE framework and theoretically discuss the impact of the multitask steering method on the tradeoff between the code readability and correctness. We further provide comprehensive experiments in support. All the relevant implementations are open-source and available upon request.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends RepE to multiple tasks for steering LLM code readability and runs experiments, but the theoretical tradeoff discussion stays qualitative without a clear derivation.

read the letter

The core idea is to take representation engineering, which works with little data and compute, and run it across several tasks at once so that LLM-generated code becomes more readable. The authors note that readability has been neglected compared to correctness, and single-task steering falls short when multiple properties need control at the same time. They outline a multitask framework, talk through how steering might shift the readability-correctness balance, and report experiments. The code is available on request.

The practical focus is the strongest part. Readability is genuinely harder to pin down than functional correctness, and RepE's low overhead makes the approach usable in real workflows. Providing experiments plus open implementations gives others a concrete starting point to test or adapt the method.

The weaker element is the theoretical discussion of the tradeoff. The abstract says they examine the impact of multitask steering, yet nothing indicates an explicit construction such as a combined vector formula, an orthogonality condition, or a bound on interference between directions. Without that, the analysis remains descriptive rather than predictive, and any claim of controlled tradeoff rests on the assumption that the representations stay separable. The experiments may demonstrate the outcome, but they do not substitute for the missing model.

This is for researchers already working on steering or control techniques in LLM code generation. Someone looking for a low-cost way to adjust readability would find the experiments and released code useful.

It should go to peer review. The experiments and open implementations give referees something concrete to evaluate, and the multitask extension is a straightforward enough idea that feedback will clarify whether the theoretical gap needs tightening or whether the empirical results carry the contribution.

Referee Report

1 major / 2 minor

Summary. The paper proposes a multitask Representation Engineering (RepE) framework to control readability of LLM-generated code across multiple tasks, theoretically discusses the impact of multitask steering on the readability-correctness tradeoff, and reports supporting experiments. Implementations are stated to be open-source.

Significance. If the multitask RepE enables controllable readability steering with a formally derived and empirically validated tradeoff against correctness, the approach could supply a low-data, low-compute method for jointly optimizing multiple code-quality dimensions in LLMs.

major comments (1)

[Abstract / theoretical discussion] Abstract and theoretical discussion section: the manuscript asserts a 'theoretical discussion' of the multitask steering method's impact on the readability-correctness tradeoff, yet no explicit construction (combined steering vector formula, orthogonality condition, or bounded interference term between task-specific directions) is supplied. Without such a derivation the tradeoff claim remains qualitative rather than predictive, which is load-bearing for the central contribution.

minor comments (2)

[Abstract] Abstract: 'we proposes the multitask RepE framework' is a subject-verb agreement error and should read 'we propose'.
[Abstract] Abstract: the claim that 'all the relevant implementations are open-source and available upon request' should be accompanied by a concrete repository URL or DOI to enable immediate verification and reuse.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the single major comment below and will incorporate the requested formalization in the revision.

read point-by-point responses

Referee: [Abstract / theoretical discussion] Abstract and theoretical discussion section: the manuscript asserts a 'theoretical discussion' of the multitask steering method's impact on the readability-correctness tradeoff, yet no explicit construction (combined steering vector formula, orthogonality condition, or bounded interference term between task-specific directions) is supplied. Without such a derivation the tradeoff claim remains qualitative rather than predictive, which is load-bearing for the central contribution.

Authors: We agree that the current theoretical discussion is qualitative and does not supply an explicit construction. In the revised manuscript we will add a formal derivation in the theoretical section: the multitask steering vector is defined as a convex combination of task-specific RepE directions, we state the orthogonality condition required for minimal interference, and we derive a simple bound on the interference term that quantifies the expected degradation in correctness. The abstract will be updated to reflect this explicit treatment. These additions will make the readability-correctness tradeoff predictive rather than descriptive. revision: yes

Circularity Check

0 steps flagged

No circularity; no equations, fits, or self-citation chains exhibited

full rationale

The provided abstract and description contain no equations, derivations, fitted parameters, or self-citations. The claim of a 'theoretical discussion' of the readability-correctness tradeoff is stated but not formalized with any model, vector formula, or reduction that could be inspected for equivalence to inputs. Per rules, circularity requires explicit quotes showing a step that reduces by construction; none exist here. The derivation chain is therefore not shown to be self-referential.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract; the central claim rests on the unstated premise that readability is steerable via RepE and that multitask extension preserves this property.

pith-pipeline@v0.9.1-grok · 5689 in / 1044 out tokens · 18866 ms · 2026-06-28T00:16:16.151685+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 9 canonical work pages · 6 internal anchors

[1]

Spinellis,Code quality: the open source perspective

D. Spinellis,Code quality: the open source perspective. Adobe Press, 2006

2006
[2]

Improving source code readability: Theory and practice,

S. Fakhoury, D. Roy, A. Hassan, and V . Arnaoudova, “Improving source code readability: Theory and practice,” in2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). IEEE, 2019, pp. 2–12

2019
[3]

Learning a metric for code readability,

R. P. Buse and W. R. Weimer, “Learning a metric for code readability,” IEEE Transactions on software engineering, vol. 36, no. 4, pp. 546–558, 2009

2009
[4]

A simpler model of software readability,

D. Posnett, A. Hindle, and P. Devanbu, “A simpler model of software readability,” inProceedings of the 8th working conference on mining software repositories, 2011, pp. 73–82

2011
[5]

Automatically assessing code understandability: How far are we?

S. Scalabrino, G. Bavota, C. Vendome, M. Linares-V ´asquez, D. Poshy- vanyk, and R. Oliveto, “Automatically assessing code understandability: How far are we?” in2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2017, pp. 417–427

2017
[6]

Krzysztof and U

C. Krzysztof and U. W. Eisenecker,Generative Programming: Methods, Tools and Applications. Addison-Wesley, 2000

2000
[7]

A neural proba- bilistic language model,

Y . Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural proba- bilistic language model,”Journal of machine learning research, vol. 3, no. Feb, pp. 1137–1155, 2003

2003
[8]

code2vec: Learning distributed representations of code,

U. Alon, M. Zilberstein, O. Levy, and E. Yahav, “code2vec: Learning distributed representations of code,”Proceedings of the ACM on Pro- gramming Languages, vol. 3, no. POPL, pp. 1–29, 2019

2019
[9]

A review on code generation with llms: Application and evaluation,

J. Wang and Y . Chen, “A review on code generation with llms: Application and evaluation,” in2023 IEEE International Conference on Medical Artificial Intelligence (MedAI). IEEE, 2023, pp. 284–289

2023
[10]

A Survey on Large Language Models for Code Generation

J. Jiang, F. Wang, J. Shen, S. Kim, and S. Kim, “A survey on large language models for code generation,”arXiv preprint arXiv:2406.00515, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

Code readability in the age of large language models: An industrial case study from atlassian,

W. Takerngsaksiri, C. Tantithamthavorn, M. Fu, J. Pasuksmit, K. Chen, and M. Wu, “Code readability in the age of large language models: An industrial case study from atlassian,” in2025 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 2025, pp. 732–742

2025
[12]

Style2code: A style-controllable code generation framework with dual-modal con- trastive representation learning,

D. Zhang, N. R. A. Arias, Y . He, and S. Kovalchuk, “Style2code: A style-controllable code generation framework with dual-modal con- trastive representation learning,”arXiv preprint arXiv:2505.19442, 2025

work page arXiv 2025
[13]

Representation Engineering: A Top-Down Approach to AI Transparency

A. Zou, L. Phan, S. Chen, J. Campbell, P. Guo, R. Ren, A. Pan, X. Yin, M. Mazeika, A.-K. Dombrowskiet al., “Representation en- gineering: a top-down approach to ai transparency,”ArXiv Preprint arXiv:2310.01405, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

Tradeoffs between alignment and helpfulness in language models with steering methods,

Y . Wolf, N. Wies, D. Shteyman, B. Rothberg, Y . Levine, and A. Shashua, “Tradeoffs between alignment and helpfulness in language models with steering methods,” inICLR 2025 Workshop on Foundation Models in the Wild, 2025

2025
[15]

Badam: A memory efficient full parameter optimization method for large language models,

Q. Luo, H. Yu, and X. Li, “Badam: A memory efficient full parameter optimization method for large language models,”Advances in Neural Information Processing Systems, vol. 37, pp. 24 926–24 958, 2024

2024
[16]

Federated full- parameter tuning of billion-sized language models with communication cost under 18 kilobytes,

Z. Qin, D. Chen, B. Qian, B. Ding, Y . Li, and S. Deng, “Federated full- parameter tuning of billion-sized language models with communication cost under 18 kilobytes,” inInternational Conference on Machine Learning. PMLR, 2024, pp. 41 473–41 497

2024
[17]

Full parameter fine-tuning for large language models with limited resources,

K. Lv, Y . Yang, T. Liu, Q. Guo, and X. Qiu, “Full parameter fine-tuning for large language models with limited resources,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 8187–8198

2024
[18]

Qlora: Efficient finetuning of quantized llms,

T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “Qlora: Efficient finetuning of quantized llms,”Advances in neural information processing systems, vol. 36, pp. 10 088–10 115, 2023

2023
[19]

Adaptive budget allocation for parameter-efficient fine-tuning,

Q. Zhang, M. Chen, A. Bukharin, P. He, Y . Cheng, W. Chen, and T. Zhao, “Adaptive budget allocation for parameter-efficient fine-tuning,” inInternational Conference on Learning Representations. Openreview, 2023

2023
[20]

Llama-adapter: Efficient fine-tuning of large language models with zero-initialized attention,

R. Zhang, J. Han, C. Liu, A. Zhou, P. Lu, Y . Qiao, H. Li, and P. Gao, “Llama-adapter: Efficient fine-tuning of large language models with zero-initialized attention,” inThe Twelfth International Conference on Learning Representations, 2024. IEEE JOURNAL SUBMISSION, VOL. XX, NO. XX, MONTH 2026 12

2024
[21]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Rayet al., “Training language models to follow instructions with human feedback,”Advances in neural information processing systems, vol. 35, pp. 27 730–27 744, 2022

2022
[22]

Rrhf: Rank responses to align language models with human feedback,

H. Yuan, Z. Yuan, C. Tan, W. Wang, S. Huang, and F. Huang, “Rrhf: Rank responses to align language models with human feedback,”Ad- vances in Neural Information Processing Systems, vol. 36, pp. 10 935– 10 950, 2023

2023
[23]

Orpo: Monolithic preference optimiza- tion without reference model,

J. Hong, N. Lee, and J. Thorne, “Orpo: Monolithic preference optimiza- tion without reference model,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 11 170– 11 189

2024
[24]

Richtungsfelder und fernparallelismus in n-dimensionalen mannigfaltigkeiten,

E. Stiefel, “Richtungsfelder und fernparallelismus in n-dimensionalen mannigfaltigkeiten,” Ph.D. dissertation, ETH Zurich, 1935

1935
[25]

Correctness assessment of code generated by large language models using internal representations,

T.-D. Bui, T. T. Vu, T.-T. Nguyen, S. Nguyen, and H. D. V o, “Correctness assessment of code generated by large language models using internal representations,”arXiv preprint arXiv:2501.12934, 2025

work page arXiv 2025
[26]

Towards understanding code readability and its impact on design quality,

U. A. Mannan, I. Ahmed, and A. Sarma, “Towards understanding code readability and its impact on design quality,” inProceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, 2018, pp. 18–21

2018
[27]

Extending activa- tion steering to broad skills and multiple behaviours,

T. van der Weij, M. Poesio, and N. Schoots, “Extending activa- tion steering to broad skills and multiple behaviours,”arXiv preprint arXiv:2403.05767, 2024

work page arXiv 2024
[28]

Program Synthesis with Large Language Models

J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang, C. Cai, M. Terry, Q. Leet al., “Program synthesis with large language models,”arXiv preprint arXiv:2108.07732, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[29]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Qwen2.5-Coder Technical Report

B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Luet al., “Qwen2. 5-coder technical report,”arXiv preprint arXiv:2409.12186, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[31]

Code Llama: Open Foundation Models for Code

B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y . Adi, J. Liu, R. Sauvestre, T. Remezet al., “Code llama: Open foundation models for code,”arXiv preprint arXiv:2308.12950, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

Spinellis,Code quality: the open source perspective

D. Spinellis,Code quality: the open source perspective. Adobe Press, 2006

2006

[2] [2]

Improving source code readability: Theory and practice,

S. Fakhoury, D. Roy, A. Hassan, and V . Arnaoudova, “Improving source code readability: Theory and practice,” in2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). IEEE, 2019, pp. 2–12

2019

[3] [3]

Learning a metric for code readability,

R. P. Buse and W. R. Weimer, “Learning a metric for code readability,” IEEE Transactions on software engineering, vol. 36, no. 4, pp. 546–558, 2009

2009

[4] [4]

A simpler model of software readability,

D. Posnett, A. Hindle, and P. Devanbu, “A simpler model of software readability,” inProceedings of the 8th working conference on mining software repositories, 2011, pp. 73–82

2011

[5] [5]

Automatically assessing code understandability: How far are we?

S. Scalabrino, G. Bavota, C. Vendome, M. Linares-V ´asquez, D. Poshy- vanyk, and R. Oliveto, “Automatically assessing code understandability: How far are we?” in2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2017, pp. 417–427

2017

[6] [6]

Krzysztof and U

C. Krzysztof and U. W. Eisenecker,Generative Programming: Methods, Tools and Applications. Addison-Wesley, 2000

2000

[7] [7]

A neural proba- bilistic language model,

Y . Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural proba- bilistic language model,”Journal of machine learning research, vol. 3, no. Feb, pp. 1137–1155, 2003

2003

[8] [8]

code2vec: Learning distributed representations of code,

U. Alon, M. Zilberstein, O. Levy, and E. Yahav, “code2vec: Learning distributed representations of code,”Proceedings of the ACM on Pro- gramming Languages, vol. 3, no. POPL, pp. 1–29, 2019

2019

[9] [9]

A review on code generation with llms: Application and evaluation,

J. Wang and Y . Chen, “A review on code generation with llms: Application and evaluation,” in2023 IEEE International Conference on Medical Artificial Intelligence (MedAI). IEEE, 2023, pp. 284–289

2023

[10] [10]

A Survey on Large Language Models for Code Generation

J. Jiang, F. Wang, J. Shen, S. Kim, and S. Kim, “A survey on large language models for code generation,”arXiv preprint arXiv:2406.00515, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[11] [11]

Code readability in the age of large language models: An industrial case study from atlassian,

W. Takerngsaksiri, C. Tantithamthavorn, M. Fu, J. Pasuksmit, K. Chen, and M. Wu, “Code readability in the age of large language models: An industrial case study from atlassian,” in2025 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 2025, pp. 732–742

2025

[12] [12]

Style2code: A style-controllable code generation framework with dual-modal con- trastive representation learning,

D. Zhang, N. R. A. Arias, Y . He, and S. Kovalchuk, “Style2code: A style-controllable code generation framework with dual-modal con- trastive representation learning,”arXiv preprint arXiv:2505.19442, 2025

work page arXiv 2025

[13] [13]

Representation Engineering: A Top-Down Approach to AI Transparency

A. Zou, L. Phan, S. Chen, J. Campbell, P. Guo, R. Ren, A. Pan, X. Yin, M. Mazeika, A.-K. Dombrowskiet al., “Representation en- gineering: a top-down approach to ai transparency,”ArXiv Preprint arXiv:2310.01405, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[14] [14]

Tradeoffs between alignment and helpfulness in language models with steering methods,

Y . Wolf, N. Wies, D. Shteyman, B. Rothberg, Y . Levine, and A. Shashua, “Tradeoffs between alignment and helpfulness in language models with steering methods,” inICLR 2025 Workshop on Foundation Models in the Wild, 2025

2025

[15] [15]

Badam: A memory efficient full parameter optimization method for large language models,

Q. Luo, H. Yu, and X. Li, “Badam: A memory efficient full parameter optimization method for large language models,”Advances in Neural Information Processing Systems, vol. 37, pp. 24 926–24 958, 2024

2024

[16] [16]

Federated full- parameter tuning of billion-sized language models with communication cost under 18 kilobytes,

Z. Qin, D. Chen, B. Qian, B. Ding, Y . Li, and S. Deng, “Federated full- parameter tuning of billion-sized language models with communication cost under 18 kilobytes,” inInternational Conference on Machine Learning. PMLR, 2024, pp. 41 473–41 497

2024

[17] [17]

Full parameter fine-tuning for large language models with limited resources,

K. Lv, Y . Yang, T. Liu, Q. Guo, and X. Qiu, “Full parameter fine-tuning for large language models with limited resources,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 8187–8198

2024

[18] [18]

Qlora: Efficient finetuning of quantized llms,

T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “Qlora: Efficient finetuning of quantized llms,”Advances in neural information processing systems, vol. 36, pp. 10 088–10 115, 2023

2023

[19] [19]

Adaptive budget allocation for parameter-efficient fine-tuning,

Q. Zhang, M. Chen, A. Bukharin, P. He, Y . Cheng, W. Chen, and T. Zhao, “Adaptive budget allocation for parameter-efficient fine-tuning,” inInternational Conference on Learning Representations. Openreview, 2023

2023

[20] [20]

Llama-adapter: Efficient fine-tuning of large language models with zero-initialized attention,

R. Zhang, J. Han, C. Liu, A. Zhou, P. Lu, Y . Qiao, H. Li, and P. Gao, “Llama-adapter: Efficient fine-tuning of large language models with zero-initialized attention,” inThe Twelfth International Conference on Learning Representations, 2024. IEEE JOURNAL SUBMISSION, VOL. XX, NO. XX, MONTH 2026 12

2024

[21] [21]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Rayet al., “Training language models to follow instructions with human feedback,”Advances in neural information processing systems, vol. 35, pp. 27 730–27 744, 2022

2022

[22] [22]

Rrhf: Rank responses to align language models with human feedback,

H. Yuan, Z. Yuan, C. Tan, W. Wang, S. Huang, and F. Huang, “Rrhf: Rank responses to align language models with human feedback,”Ad- vances in Neural Information Processing Systems, vol. 36, pp. 10 935– 10 950, 2023

2023

[23] [23]

Orpo: Monolithic preference optimiza- tion without reference model,

J. Hong, N. Lee, and J. Thorne, “Orpo: Monolithic preference optimiza- tion without reference model,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 11 170– 11 189

2024

[24] [24]

Richtungsfelder und fernparallelismus in n-dimensionalen mannigfaltigkeiten,

E. Stiefel, “Richtungsfelder und fernparallelismus in n-dimensionalen mannigfaltigkeiten,” Ph.D. dissertation, ETH Zurich, 1935

1935

[25] [25]

Correctness assessment of code generated by large language models using internal representations,

T.-D. Bui, T. T. Vu, T.-T. Nguyen, S. Nguyen, and H. D. V o, “Correctness assessment of code generated by large language models using internal representations,”arXiv preprint arXiv:2501.12934, 2025

work page arXiv 2025

[26] [26]

Towards understanding code readability and its impact on design quality,

U. A. Mannan, I. Ahmed, and A. Sarma, “Towards understanding code readability and its impact on design quality,” inProceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, 2018, pp. 18–21

2018

[27] [27]

Extending activa- tion steering to broad skills and multiple behaviours,

T. van der Weij, M. Poesio, and N. Schoots, “Extending activa- tion steering to broad skills and multiple behaviours,”arXiv preprint arXiv:2403.05767, 2024

work page arXiv 2024

[28] [28]

Program Synthesis with Large Language Models

J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang, C. Cai, M. Terry, Q. Leet al., “Program synthesis with large language models,”arXiv preprint arXiv:2108.07732, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[29] [29]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Qwen2.5-Coder Technical Report

B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Luet al., “Qwen2. 5-coder technical report,”arXiv preprint arXiv:2409.12186, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[31] [31]

Code Llama: Open Foundation Models for Code

B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y . Adi, J. Liu, R. Sauvestre, T. Remezet al., “Code llama: Open foundation models for code,”arXiv preprint arXiv:2308.12950, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023