Forecasting With LLMs: Improved Generalization Through Feature Steering

Bradford Levy; Humzah Merchant

arxiv: 2606.27199 · v1 · pith:G6LNC2HTnew · submitted 2026-06-25 · 💻 cs.CL · cs.LG

Forecasting With LLMs: Improved Generalization Through Feature Steering

Humzah Merchant , Bradford Levy This is my paper

Pith reviewed 2026-06-26 04:39 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords LLM interpretabilitysparse autoencodersforecastingfeature steeringlook-ahead biastime-aware reasoninggeneralization

0 comments

The pith

Amplifying time-awareness features in LLMs reduces look-ahead bias while preserving general reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies sparse autoencoders to examine LLM internal states on forecasting tasks and identifies features tied to time-aware reasoning and look-ahead-biased reasoning. When these time-awareness features are amplified during application to a different domain, look-ahead bias on forecasting prompts decreases substantially and general reasoning stays intact. Steering the look-ahead-bias features shows no such effect. A reader would care because this offers an interpretable way to improve how models generalize from historical data to future predictions.

Core claim

The authors use sparse autoencoders to find features associated with time-aware reasoning and look-ahead-biased reasoning in LLMs on forecasting tasks. Intervening by amplifying the time-awareness features in an entirely different domain reduces look-ahead bias on forecasting prompts while preserving general reasoning performance. In contrast, steering the candidate look-ahead-bias features does not produce an effect. This suggests that interpretable temporal features can be used to causally shift LLMs toward more historically grounded reasoning.

What carries the argument

Sparse autoencoders that identify and allow intervention on features for time-aware reasoning versus look-ahead-biased reasoning within LLM activations.

If this is right

Amplifying time-awareness features reduces look-ahead bias on forecasting prompts.
General reasoning performance is preserved after the intervention.
Steering look-ahead-bias features does not reduce the bias.
The effect transfers when applied to an entirely different domain.
Interpretable features enable causal shifts in LLM reasoning toward historical grounding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If feature steering works for temporal bias, it may apply to other biases like factual or logical ones by targeting their corresponding features.
Models could be made more reliable for real-world forecasting applications by routinely amplifying such time-awareness features.
The asymmetry in effects suggests look-ahead bias may arise from more distributed mechanisms than time-awareness.
Further tests could check if the same features work across different model architectures.

Load-bearing premise

The features extracted by sparse autoencoders causally correspond to time-aware versus look-ahead-biased reasoning and interventions on them transfer across domains.

What would settle it

If amplifying the time-awareness features fails to reduce look-ahead bias or harms general reasoning on the new domain, the central claim would not hold.

Figures

Figures reproduced from arXiv: 2606.27199 by Bradford Levy, Humzah Merchant.

**Figure 1.** Figure 1: Amplifying time-awareness in models reduces reliance on knowledge from after a inference-time specified knowledge cutoff while maintaining utility. Error bars denote ±1 SE. have been made, i.e., the knowledge cutoff relative to the forecast period, to arrive at the best forecast a decision maker could have arrived in the moment. In this paper, we explore the extent to which reliance on memorization versus … view at source ↗

**Figure 3.** Figure 3: Knowledge of M&A activity and look-ahead bias vary independently across model families and do not have positive relationships even within all families. Dot size ∝ log10 number of parameters; error bars denote ±1 SE. original activation: z = ReLU(Wencx + benc), xˆ = Wdecz + bdec. The model is trained to preserve the original activation while only a small number of features to activate at once. Empirically,… view at source ↗

**Figure 2.** Figure 2: Overview of Method Sparse autoencoders (SAEs) decompose dense transformer activations into a much larger set of sparse, learned features. Given a hidden activation x from a language model layer, an SAE produces feature activations z and reconstructs the 6 5 4 3 2 1 Memorization Mean Target-Token Logprob 0 10 20 30 40 LAB Rate (%) GPT-OSS Gemma 3 Qwen 3.5 Llama 3.1/2 [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 4.** Figure 4: Additional time-aware features as [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Successful forecasting involves identifying patterns between historical and future states of the world which generalize to future observations. We apply LLMs to a variety of forecasting tasks and inspect their internal states using sparse autoencoders to understand whether they appear to rely on time-specific pieces of knowledge versus generalizable patterns. Our analyses identify features associated with both time-aware reasoning and look-ahead-biased reasoning. We then apply the LLMs to an entirely different domain and intervene on these features. We find that amplifying time-awareness features substantially reduces look-ahead bias on forecasting prompts while preserving general reasoning performance. In contrast, steering the candidate look-ahead-bias features does not produce an effect. These results suggest that interpretable temporal features can be used to causally shift LLMs toward more historically grounded reasoning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds that amplifying SAE-identified time-awareness features cuts look-ahead bias in LLM forecasting while bias features do not, with the effect holding in a new domain.

read the letter

The main point is that this work identifies features in LLMs linked to time-aware reasoning versus look-ahead bias using sparse autoencoders, then shows that boosting the time features reduces bias on forecasting tasks in a held-out domain while the bias features produce no change and general performance stays stable.

It does a solid job applying existing SAE techniques to a concrete forecasting problem and getting a clean dissociation in the intervention results. That selective effect gives some reason to think the features track something real rather than random directions in activation space.

The soft spots are all in the execution details. The abstract supplies no numbers on dataset size, number of features extracted, how the interventions were implemented, what statistical tests were used, or how they confirmed the features actually correspond to the labeled concepts. Without those, it is difficult to judge whether the dissociation is robust or whether the transfer claim holds up under closer inspection. The paper also does not address whether the same features appear across different model scales or architectures.

This is aimed at people working on mechanistic interpretability and bias reduction in applied LLM settings. A reader already following SAE work on reasoning patterns would get the most out of it and could use the intervention approach as a starting point.

I would send it to peer review once the methods and results sections are filled in, because the core intervention result is the sort of thing that can be replicated and extended even if the current version is light on evidence.

Referee Report

0 major / 2 minor

Summary. The paper applies LLMs to forecasting tasks and uses sparse autoencoders to identify internal features associated with time-aware reasoning versus look-ahead-biased reasoning. It then performs interventions on these features in a held-out domain, reporting that amplification of time-awareness features reduces look-ahead bias on forecasting prompts while preserving general reasoning performance, whereas steering the candidate look-ahead-bias features produces no effect.

Significance. If the empirical dissociation holds under rigorous controls, the work supplies a concrete demonstration that SAE-derived features can be causally manipulated to shift LLM behavior toward more historically grounded forecasting without collateral damage to general capabilities. This is a useful data point for mechanistic interpretability research and for practical feature-steering techniques.

minor comments (2)

[Abstract] Abstract: the summary of results would be strengthened by a brief mention of the number of forecasting tasks, model scale, or quantitative effect sizes (e.g., accuracy deltas or bias metrics) so readers can immediately gauge the magnitude of the reported dissociation.
[Methods] The manuscript should clarify in the methods section how the held-out domain was chosen and whether any domain-specific adaptation of the SAE or steering vectors was performed.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the work and for recommending minor revision. The referee's description of the paper is accurate. No specific major comments were listed in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central result derives from empirical SAE feature extraction on internal activations followed by targeted steering interventions on a held-out domain, producing a selective behavioral dissociation (time-awareness features reduce look-ahead bias; candidate bias features do not). No equations, parameter fits presented as predictions, self-definitional loops, or load-bearing self-citation chains appear in the described protocol. The claims rest on observable intervention outcomes rather than reducing to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Abstract provides no explicit free parameters, mathematical axioms, or derivations; the work rests on the empirical identification and causal efficacy of two invented feature types via sparse autoencoders.

invented entities (2)

time-awareness features no independent evidence
purpose: Represent generalizable, historically grounded temporal reasoning patterns in LLM activations
Identified via sparse autoencoders; intervention on them produced the reported effect.
look-ahead-bias features no independent evidence
purpose: Represent biased reasoning that inappropriately incorporates future information
Identified via sparse autoencoders; intervention on them produced no reported effect.

pith-pipeline@v0.9.1-grok · 5650 in / 1216 out tokens · 50743 ms · 2026-06-26T04:39:02.777698+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 5 canonical work pages

[1]

A Fast and Effective Solution to the Problem of Look-ahead Bias in

Humzah Merchant and Bradford Levy , booktitle=. A Fast and Effective Solution to the Problem of Look-ahead Bias in. 2026 , url=

2026
[2]

Forty-third International Conference on Machine Learning , year=

Divergence Decoding: Inference-Time Unlearning via Auxiliary Models , author=. Forty-third International Conference on Machine Learning , year=. 2605.31293 , archivePrefix=

Pith/arXiv arXiv
[3]

Position: Evaluating

Yaxuan Kong and Hoyoung Lee and Yoontae Hwang and Alejandro Lopez-Lira and Bradford Levy and Dhagash Mehta and Qingsong Wen and CHANYEOL CHOI and Yongjae Lee and Stefan Zohren , booktitle=. Position: Evaluating. 2026 , url=

2026
[6]

2023 , eprint=

Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. 2023 , eprint=

2023
[7]

2024 , eprint=

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 , author=. 2024 , eprint=

2024
[9]

2024 , month=

Lookahead Bias in Pretrained Language Models , author=. 2024 , month=

2024
[10]

2025 , eprint=

Chronologically Consistent Large Language Models , author=. 2025 , eprint=

2025
[11]

2025 , month=

Entity Neutering , author=. 2025 , month=

2025
[12]

2023 , eprint=

Assessing Look-Ahead Bias in Stock Return Predictions Generated By GPT Sentiment Analysis , author=. 2023 , eprint=

2023
[13]

2025 , eprint=

LLM-as-a-Prophet: Understanding Predictive Intelligence with Prophet Arena , author=. 2025 , eprint=

2025
[14]

2025 , eprint=

Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time , author=. 2025 , eprint=

2025
[15]

2023 , journal=

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=. 2023 , journal=

2023
[16]

The Twelfth International Conference on Learning Representations , year=

Sparse Autoencoders Find Highly Interpretable Features in Language Models , author=. The Twelfth International Conference on Learning Representations , year=
[17]

2024 , journal=

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet , author=. 2024 , journal=

2024
[18]

2025 , eprint=

Steering Language Model Refusal with Sparse Autoencoders , author=. 2025 , eprint=

2025
[20]

E., Hume, T., Carter, S., Henighan, T., and Olah, C

Bricken, T., Templeton, A., Batson, J., Chen, B., Jermyn, A., Conerly, T., Turner, N., Anil, C., Denison, C., Askell, A., Lasenby, R., Wu, Y., Kravec, S., Schiefer, N., Maxwell, T., Joseph, N., Hatfield-Dodds, Z., Tamkin, A., Nguyen, K., McLean, B., Burke, J. E., Hume, T., Carter, S., Henighan, T., and Olah, C. Towards monosemanticity: Decomposing languag...

2023
[21]

Qwen-Scope : Turning sparse features into development tools for large language models, 2026

Deng, B., Wang, X., Wang, Y., Wan, Y., Ma, Y., Yang, B., Wei, H., Tang, J., Lin, H., Gao, R., Li, T., Cao, Q., Ren, X., Deng, X., Yang, A., Huang, F., Liu, D., and Zhou, J. Qwen-Scope : Turning sparse features into development tools for large language models, 2026. URL https://arxiv.org/abs/2605.11887

Pith/arXiv arXiv 2026
[22]

and Eisenschlos, Julian Martin and Gillick, Daniel and Eisenstein, Jacob and Cohen, William W

Dhingra, B., Cole, J. R., Eisenschlos, J. M., Gillick, D., Eisenstein, J., and Cohen, W. W. Time-aware language models as temporal knowledge bases. Transactions of the Association for Computational Linguistics, 10: 0 257–273, 2022. ISSN 2307-387X. doi:10.1162/tacl_a_00459. URL http://dx.doi.org/10.1162/tacl_a_00459

work page doi:10.1162/tacl_a_00459 2022
[23]

Time machine GPT

Drinkall, F., Rahimikia, E., Pierrehumbert, J., and Zohren, S. Time machine GPT . In Duh, K., Gomez, H., and Bethard, S. (eds.), Findings of the Association for Computational Linguistics: NAACL 2024, pp.\ 3281--3292, Mexico City, Mexico, June 2024. Association for Computational Linguistics. doi:10.18653/v1/2024.findings-naacl.208. URL https://aclanthology...

work page doi:10.18653/v1/2024.findings-naacl.208 2024
[24]

Entity neutering

Engelberg, J., Manela, A., Mullins, W., and Vulicevic, L. Entity neutering. March 2025. doi:10.2139/ssrn.5182756. Available at SSRN: 5182756

work page doi:10.2139/ssrn.5182756 2025
[25]

and Lin, C

Glasserman, P. and Lin, C. Assessing look-ahead bias in stock return predictions generated by gpt sentiment analysis, 2023. URL https://arxiv.org/abs/2309.17322

arXiv 2023
[26]

Chronologically consistent large language models, 2025

He, S., Lv, L., Manela, A., and Wu, J. Chronologically consistent large language models, 2025. URL https://arxiv.org/abs/2502.21206

arXiv 2025
[27]

Time awareness in large language models: Benchmarking fact recall across time, 2025

Herel, D., Bartek, V., Jirak, J., and Mikolov, T. Time awareness in large language models: Benchmarking fact recall across time, 2025. URL https://arxiv.org/abs/2409.13338

arXiv 2025
[28]

R., Ewart, A., and Sharkey, L

Huben, R., Cunningham, H., Smith, L. R., Ewart, A., and Sharkey, L. Sparse autoencoders find highly interpretable features in language models. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=F76bwRSLeK

2024
[29]

Position: Evaluating LLM s in finance requires explicit bias consideration

Kong, Y., Lee, H., Hwang, Y., Lopez-Lira, A., Levy, B., Mehta, D., Wen, Q., CHOI, C., Lee, Y., and Zohren, S. Position: Evaluating LLM s in finance requires explicit bias consideration. In Forty-third International Conference on Machine Learning Position Paper Track, 2026. URL https://openreview.net/forum?id=EDsAEXBFBk

2026
[30]

Caution ahead: Numerical reasoning and look-ahead bias in ai models

Levy, B. Caution ahead: Numerical reasoning and look-ahead bias in ai models. Journal of Accounting Research, 64 0 (3): 0 1139--1188, 2026. doi:https://doi.org/10.1111/1475-679x.70058. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/1475-679x.70058

work page doi:10.1111/1475-679x.70058 2026
[31]

Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2, 2024

Lieberum, T., Rajamanoharan, S., Conmy, A., Smith, L., Sonnerat, N., Varma, V., Kramár, J., Dragan, A., Shah, R., and Nanda, N. Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2, 2024. URL https://arxiv.org/abs/2408.05147

Pith/arXiv arXiv 2024
[32]

and Levy, B

Merchant, H. and Levy, B. A fast and effective solution to the problem of look-ahead bias in LLM s. In NeurIPS 2025 Workshop: Generative AI in Finance, 2026 a . URL https://openreview.net/forum?id=zYsLIPgM28

2025
[33]

and Levy, B

Merchant, H. and Levy, B. Divergence decoding: Inference-time unlearning via auxiliary models. In Forty-third International Conference on Machine Learning, 2026 b . URL https://openreview.net/forum?id=JPbp2S9yTO

2026
[34]

Steering language model refusal with sparse autoencoders, 2025

O'Brien, K., Majercak, D., Fernandes, X., Edgar, R., Bullwinkel, B., Chen, J., Nori, H., Carignan, D., Horvitz, E., and Poursabzi-Sangdeh, F. Steering language model refusal with sparse autoencoders, 2025. URL https://arxiv.org/abs/2411.11296

arXiv 2025
[35]

and Vafa, K

Sarkar, S. and Vafa, K. Lookahead bias in pretrained language models. June 2024. doi:10.2139/ssrn.4754678. Available at SSRN: 4754678

work page doi:10.2139/ssrn.4754678 2024
[36]

L., McDougall, C., MacDiarmid, M., Freeman, C

Templeton, A., Conerly, T., Marcus, J., Lindsey, J., Bricken, T., Chen, B., Pearce, A., Citro, C., Ameisen, E., Jones, A., Cunningham, H., Turner, N. L., McDougall, C., MacDiarmid, M., Freeman, C. D., Sumers, T. R., Rees, E., Batson, J., Jermyn, A., Carter, S., Olah, C., and Henighan, T. Scaling monosemanticity: Extracting interpretable features from clau...

2024
[37]

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V., Khabsa...

Pith/arXiv arXiv 2023
[38]

Llm-as-a-prophet: Understanding predictive intelligence with prophet arena, 2025

Yang, Q., Mahns, S., Li, S., Gu, A., Wu, J., and Xu, H. Llm-as-a-prophet: Understanding predictive intelligence with prophet arena, 2025. URL https://arxiv.org/abs/2510.17638

arXiv 2025

[1] [1]

A Fast and Effective Solution to the Problem of Look-ahead Bias in

Humzah Merchant and Bradford Levy , booktitle=. A Fast and Effective Solution to the Problem of Look-ahead Bias in. 2026 , url=

2026

[2] [2]

Forty-third International Conference on Machine Learning , year=

Divergence Decoding: Inference-Time Unlearning via Auxiliary Models , author=. Forty-third International Conference on Machine Learning , year=. 2605.31293 , archivePrefix=

Pith/arXiv arXiv

[3] [3]

Position: Evaluating

Yaxuan Kong and Hoyoung Lee and Yoontae Hwang and Alejandro Lopez-Lira and Bradford Levy and Dhagash Mehta and Qingsong Wen and CHANYEOL CHOI and Yongjae Lee and Stefan Zohren , booktitle=. Position: Evaluating. 2026 , url=

2026

[4] [6]

2023 , eprint=

Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. 2023 , eprint=

2023

[5] [7]

2024 , eprint=

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 , author=. 2024 , eprint=

2024

[6] [9]

2024 , month=

Lookahead Bias in Pretrained Language Models , author=. 2024 , month=

2024

[7] [10]

2025 , eprint=

Chronologically Consistent Large Language Models , author=. 2025 , eprint=

2025

[8] [11]

2025 , month=

Entity Neutering , author=. 2025 , month=

2025

[9] [12]

2023 , eprint=

Assessing Look-Ahead Bias in Stock Return Predictions Generated By GPT Sentiment Analysis , author=. 2023 , eprint=

2023

[10] [13]

2025 , eprint=

LLM-as-a-Prophet: Understanding Predictive Intelligence with Prophet Arena , author=. 2025 , eprint=

2025

[11] [14]

2025 , eprint=

Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time , author=. 2025 , eprint=

2025

[12] [15]

2023 , journal=

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=. 2023 , journal=

2023

[13] [16]

The Twelfth International Conference on Learning Representations , year=

Sparse Autoencoders Find Highly Interpretable Features in Language Models , author=. The Twelfth International Conference on Learning Representations , year=

[14] [17]

2024 , journal=

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet , author=. 2024 , journal=

2024

[15] [18]

2025 , eprint=

Steering Language Model Refusal with Sparse Autoencoders , author=. 2025 , eprint=

2025

[16] [20]

E., Hume, T., Carter, S., Henighan, T., and Olah, C

Bricken, T., Templeton, A., Batson, J., Chen, B., Jermyn, A., Conerly, T., Turner, N., Anil, C., Denison, C., Askell, A., Lasenby, R., Wu, Y., Kravec, S., Schiefer, N., Maxwell, T., Joseph, N., Hatfield-Dodds, Z., Tamkin, A., Nguyen, K., McLean, B., Burke, J. E., Hume, T., Carter, S., Henighan, T., and Olah, C. Towards monosemanticity: Decomposing languag...

2023

[17] [21]

Qwen-Scope : Turning sparse features into development tools for large language models, 2026

Deng, B., Wang, X., Wang, Y., Wan, Y., Ma, Y., Yang, B., Wei, H., Tang, J., Lin, H., Gao, R., Li, T., Cao, Q., Ren, X., Deng, X., Yang, A., Huang, F., Liu, D., and Zhou, J. Qwen-Scope : Turning sparse features into development tools for large language models, 2026. URL https://arxiv.org/abs/2605.11887

Pith/arXiv arXiv 2026

[18] [22]

and Eisenschlos, Julian Martin and Gillick, Daniel and Eisenstein, Jacob and Cohen, William W

Dhingra, B., Cole, J. R., Eisenschlos, J. M., Gillick, D., Eisenstein, J., and Cohen, W. W. Time-aware language models as temporal knowledge bases. Transactions of the Association for Computational Linguistics, 10: 0 257–273, 2022. ISSN 2307-387X. doi:10.1162/tacl_a_00459. URL http://dx.doi.org/10.1162/tacl_a_00459

work page doi:10.1162/tacl_a_00459 2022

[19] [23]

Time machine GPT

Drinkall, F., Rahimikia, E., Pierrehumbert, J., and Zohren, S. Time machine GPT . In Duh, K., Gomez, H., and Bethard, S. (eds.), Findings of the Association for Computational Linguistics: NAACL 2024, pp.\ 3281--3292, Mexico City, Mexico, June 2024. Association for Computational Linguistics. doi:10.18653/v1/2024.findings-naacl.208. URL https://aclanthology...

work page doi:10.18653/v1/2024.findings-naacl.208 2024

[20] [24]

Entity neutering

Engelberg, J., Manela, A., Mullins, W., and Vulicevic, L. Entity neutering. March 2025. doi:10.2139/ssrn.5182756. Available at SSRN: 5182756

work page doi:10.2139/ssrn.5182756 2025

[21] [25]

and Lin, C

Glasserman, P. and Lin, C. Assessing look-ahead bias in stock return predictions generated by gpt sentiment analysis, 2023. URL https://arxiv.org/abs/2309.17322

arXiv 2023

[22] [26]

Chronologically consistent large language models, 2025

He, S., Lv, L., Manela, A., and Wu, J. Chronologically consistent large language models, 2025. URL https://arxiv.org/abs/2502.21206

arXiv 2025

[23] [27]

Time awareness in large language models: Benchmarking fact recall across time, 2025

Herel, D., Bartek, V., Jirak, J., and Mikolov, T. Time awareness in large language models: Benchmarking fact recall across time, 2025. URL https://arxiv.org/abs/2409.13338

arXiv 2025

[24] [28]

R., Ewart, A., and Sharkey, L

Huben, R., Cunningham, H., Smith, L. R., Ewart, A., and Sharkey, L. Sparse autoencoders find highly interpretable features in language models. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=F76bwRSLeK

2024

[25] [29]

Position: Evaluating LLM s in finance requires explicit bias consideration

Kong, Y., Lee, H., Hwang, Y., Lopez-Lira, A., Levy, B., Mehta, D., Wen, Q., CHOI, C., Lee, Y., and Zohren, S. Position: Evaluating LLM s in finance requires explicit bias consideration. In Forty-third International Conference on Machine Learning Position Paper Track, 2026. URL https://openreview.net/forum?id=EDsAEXBFBk

2026

[26] [30]

Caution ahead: Numerical reasoning and look-ahead bias in ai models

Levy, B. Caution ahead: Numerical reasoning and look-ahead bias in ai models. Journal of Accounting Research, 64 0 (3): 0 1139--1188, 2026. doi:https://doi.org/10.1111/1475-679x.70058. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/1475-679x.70058

work page doi:10.1111/1475-679x.70058 2026

[27] [31]

Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2, 2024

Lieberum, T., Rajamanoharan, S., Conmy, A., Smith, L., Sonnerat, N., Varma, V., Kramár, J., Dragan, A., Shah, R., and Nanda, N. Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2, 2024. URL https://arxiv.org/abs/2408.05147

Pith/arXiv arXiv 2024

[28] [32]

and Levy, B

Merchant, H. and Levy, B. A fast and effective solution to the problem of look-ahead bias in LLM s. In NeurIPS 2025 Workshop: Generative AI in Finance, 2026 a . URL https://openreview.net/forum?id=zYsLIPgM28

2025

[29] [33]

and Levy, B

Merchant, H. and Levy, B. Divergence decoding: Inference-time unlearning via auxiliary models. In Forty-third International Conference on Machine Learning, 2026 b . URL https://openreview.net/forum?id=JPbp2S9yTO

2026

[30] [34]

Steering language model refusal with sparse autoencoders, 2025

O'Brien, K., Majercak, D., Fernandes, X., Edgar, R., Bullwinkel, B., Chen, J., Nori, H., Carignan, D., Horvitz, E., and Poursabzi-Sangdeh, F. Steering language model refusal with sparse autoencoders, 2025. URL https://arxiv.org/abs/2411.11296

arXiv 2025

[31] [35]

and Vafa, K

Sarkar, S. and Vafa, K. Lookahead bias in pretrained language models. June 2024. doi:10.2139/ssrn.4754678. Available at SSRN: 4754678

work page doi:10.2139/ssrn.4754678 2024

[32] [36]

L., McDougall, C., MacDiarmid, M., Freeman, C

Templeton, A., Conerly, T., Marcus, J., Lindsey, J., Bricken, T., Chen, B., Pearce, A., Citro, C., Ameisen, E., Jones, A., Cunningham, H., Turner, N. L., McDougall, C., MacDiarmid, M., Freeman, C. D., Sumers, T. R., Rees, E., Batson, J., Jermyn, A., Carter, S., Olah, C., and Henighan, T. Scaling monosemanticity: Extracting interpretable features from clau...

2024

[33] [37]

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V., Khabsa...

Pith/arXiv arXiv 2023

[34] [38]

Llm-as-a-prophet: Understanding predictive intelligence with prophet arena, 2025

Yang, Q., Mahns, S., Li, S., Gu, A., Wu, J., and Xu, H. Llm-as-a-prophet: Understanding predictive intelligence with prophet arena, 2025. URL https://arxiv.org/abs/2510.17638

arXiv 2025