ToxiREX: A Dataset on Toxic REasoning in ConteXt

Ilia Markov; Piek Vossen; Stefan F. Schouten

arxiv: 2606.27981 · v1 · pith:GWMSS7S2new · submitted 2026-06-26 · 💻 cs.CL

ToxiREX: A Dataset on Toxic REasoning in ConteXt

Stefan F. Schouten , Ilia Markov , Piek Vossen This is my paper

Pith reviewed 2026-06-29 04:30 UTC · model grok-4.3

classification 💻 cs.CL

keywords toxicity detectioncontextual reasoningimplicit toxicitymultilingual datasetreddit threadsstructured annotationsconversational context

0 comments

The pith

ToxiREX supplies a multilingual dataset of Reddit threads annotated for implicit toxicity through a structured reasoning schema.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ToxiREX as a collection of comment threads drawn from event-related posts in six languages, each annotated with structured characterizations of implied toxicity. The annotations follow a schema that explains context-dependent and implicit toxicity while allowing mappings to existing taxonomies. A large portion of the data receives LLM labels for training and a smaller set receives native-speaker labels for testing, with baselines showing performance above random but with clear remaining challenges. This construction matters because prior toxicity resources typically omit conversational context and implicit meanings that affect real-world interpretation.

Core claim

ToxiREX is the first dataset to simultaneously incorporate multiple languages, conversational context, and implicit toxicity, while using the toxic reasoning schema for rich, structured annotations. The resource contains 125 thousand LLM-annotated comments for training and just under three thousand native-speaker-annotated comments for testing, drawn from threads linked to major events and preprocessed to preserve context. Models prompted or fine-tuned on the data exceed random baselines on hierarchical schema predictions, yet leave substantial room for improvement.

What carries the argument

The toxic reasoning schema that produces structured characterizations of what comments imply in their conversational context.

If this is right

Structured schema predictions allow models to output explanations of toxicity rather than binary flags alone.
Context-preserving thread preprocessing supports training that respects conversational dependencies.
Annotation disagreements in the test set frequently represent valid alternative readings rather than label noise.
Event-linked collection enables targeted study of toxicity patterns tied to specific real-world situations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Models trained on this data could support moderation systems that surface the implied reasoning behind flagged content.
The multilingual construction may help test whether schema-based labels transfer across languages with different cultural norms around toxicity.
Native-speaker validation of LLM labels points to a general method for auditing automated annotations on subjective tasks.

Load-bearing premise

The toxic reasoning schema accurately and without systematic bias captures what comments imply in context so that both LLM and native-speaker labels remain faithful.

What would settle it

A controlled comparison in which native speakers produce characterizations that diverge from the schema on the majority of test items in ways that cannot be explained as defensible alternative interpretations.

Figures

Figures reproduced from arXiv: 2606.27981 by Ilia Markov, Piek Vossen, Stefan F. Schouten.

read the original abstract

We introduce a new, contextual, multilingual dataset called ToxiREX: Toxic REasoning in ConteXt. The dataset consists of threads of Reddit comments and structured characterizations of what the comments imply, following a systematic toxic reasoning schema developed in a previous paper. Using the schema allows us to capture and explain implicit and context-dependent toxicity, while supporting mappings to existing toxicity taxonomies. The dataset includes comments in six languages (English, Arabic, Turkish, Spanish, German, and Dutch), collected from posts connected to specific major events (e.g. the 2023 Turkey earthquakes; the Russian invasion of Ukraine). We describe the context-preserving preprocessing of the threads. We create a training set of 125 thousand comments which is annotated by a commercially available LLM, and a test set of just under three thousand comments that is annotated by native speakers. We show that apparent disagreements in the test set annotations often reflect defensible alternative interpretations rather than noise. Finally, we provide baseline results by prompting and fine-tuning language models. To produce these results, we develop evaluation strategies for our hierarchical, schema-based predictions. While models perform better than random, there remains a lot of room for improvement, showing the task to be challenging. ToxiREX is the first dataset to simultaneously incorporate multiple languages, conversational context, and implicit toxicity, while using the toxic reasoning schema for rich, structured annotations. Dataset available at: https://github.com/cltl/toxirex

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ToxiREX is a clean dataset release that combines multilingual Reddit threads with a prior toxic-reasoning schema, but the small human test set and LLM-heavy training labels are the main practical limits.

read the letter

The main thing to know is that this paper releases ToxiREX, a collection of Reddit comment threads in six languages annotated for what the comments imply using an existing toxic reasoning schema. The threads come from event-linked posts, context is kept during preprocessing, the 125k training comments are labeled by a commercial LLM, and the roughly 3k test comments are labeled by native speakers. They also run prompting and fine-tuning baselines and note that test-set disagreements often look like defensible alternative readings.

What stands out is the deliberate focus on conversational context and implicit toxicity in a multilingual setting, plus the schema's ability to map onto other taxonomies. The event-based sampling and the decision to surface interpretive differences rather than treat all disagreement as noise are practical choices that make the resource more usable than many toxicity datasets.

The soft spots are straightforward. The human-labeled test set is modest in size, which limits how firmly one can treat it as a stable benchmark. Most of the scale comes from LLM labels, so any systematic quirks in how that model applies the schema will carry through to the training data. The baseline results show the task is hard, but the paper would be stronger with more concrete numbers on the hierarchical evaluation metrics and on how well the schema actually reduces noise compared with simpler toxicity labels.

This is for people working on context-aware toxicity detection or content moderation tools that need to handle multiple languages and implicit cases. A reader who needs a ready-made resource with structured annotations will get direct value from the released data.

I would send it to peer review. The construction is described clearly enough to assess, the data is public, and the contribution is a standard but useful step in dataset work.

Referee Report

2 major / 2 minor

Summary. The paper introduces ToxiREX, a multilingual dataset of Reddit comment threads from event-related posts, annotated with a prior toxic reasoning schema to capture implicit and context-dependent toxicity across six languages. It includes a 125k-comment training split labeled by a commercial LLM and a ~3k-comment test split labeled by native speakers, along with preprocessing to preserve conversational context, an analysis of annotation disagreements as often defensible alternatives, and baseline results from prompting and fine-tuning models using custom evaluation strategies for the hierarchical schema-based predictions.

Significance. If the annotations hold, the dataset would provide a useful resource for research on nuanced, context-aware toxicity detection in multilingual conversational settings by supplying structured reasoning annotations that map to existing taxonomies, going beyond binary labels; the reported baselines establish that the task remains challenging and the disagreement analysis supports annotation quality.

major comments (2)

[Dataset construction] Dataset construction section: the context-preserving preprocessing of threads is described at a high level without specific quantitative criteria (e.g., maximum thread length, selection rules for comments tied to events like the 2023 Turkey earthquakes), which is load-bearing for reproducibility of the 125k training set and for verifying that context is consistently retained.
[Annotation process and evaluation] Annotation and evaluation sections: while the abstract notes that test-set disagreements often reflect defensible alternatives, the manuscript lacks reported inter-annotator agreement statistics or quantitative breakdown of disagreement types for the native-speaker test set, undermining assessment of whether the ~3k test annotations reliably support the baseline comparisons.

minor comments (2)

[Introduction] The claim that ToxiREX is 'the first' to combine multiple languages, context, implicit toxicity, and the schema should include explicit comparison to prior datasets in a related-work section to strengthen the positioning.
[Experiments] Baseline results would benefit from a table reporting exact metrics (e.g., precision/recall per schema category) rather than the high-level statement that models perform 'better than random' with 'room for improvement'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and positive overall assessment. We address each major comment point by point below.

read point-by-point responses

Referee: [Dataset construction] Dataset construction section: the context-preserving preprocessing of threads is described at a high level without specific quantitative criteria (e.g., maximum thread length, selection rules for comments tied to events like the 2023 Turkey earthquakes), which is load-bearing for reproducibility of the 125k training set and for verifying that context is consistently retained.

Authors: We agree that the current description is at a high level and that explicit quantitative criteria would strengthen reproducibility. In the revised manuscript we will add the specific rules used, including maximum thread length, minimum comments per thread, and the precise selection and filtering criteria applied to event-related posts. revision: yes
Referee: [Annotation process and evaluation] Annotation and evaluation sections: while the abstract notes that test-set disagreements often reflect defensible alternatives, the manuscript lacks reported inter-annotator agreement statistics or quantitative breakdown of disagreement types for the native-speaker test set, undermining assessment of whether the ~3k test annotations reliably support the baseline comparisons.

Authors: The manuscript already contains a qualitative analysis demonstrating that many disagreements represent defensible alternative interpretations. We acknowledge, however, that quantitative inter-annotator agreement statistics and a systematic breakdown of disagreement categories are not reported. We will incorporate these metrics and a quantitative summary in the revised version to the extent the annotation data permit. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical dataset paper with no derivations

full rationale

The paper constructs and releases a new multilingual dataset of Reddit threads annotated according to a toxic-reasoning schema introduced in prior work. No equations, fitted parameters, predictions, or derivations appear anywhere in the described pipeline; the central contribution is the collection, annotation (LLM for training split, native speakers for test split), and baseline evaluation of the dataset itself. Self-citation of the schema exists but is not load-bearing for any claim that reduces to the authors' own outputs by construction. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper introduces no free parameters or invented entities. It applies an existing schema to new data and relies on one domain assumption about the schema's validity.

axioms (1)

domain assumption The toxic reasoning schema developed in a previous paper accurately captures implicit and context-dependent toxicity across languages and events.
All annotations and evaluations rest on this schema; the abstract invokes it as the basis for structured characterizations.

pith-pipeline@v0.9.1-grok · 5798 in / 1333 out tokens · 50475 ms · 2026-06-29T04:30:03.787597+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

299 extracted references · 223 canonical work pages · 31 internal anchors

[1]

Mishra, Aayush and Khashabi, Daniel and Liu, Anqi , month = apr, year =. Steered
[2]

Asian Journal of Philosophy , author =

Sapience without sentience: an inferentialist approach to. Asian Journal of Philosophy , author =. 2026 , keywords =. doi:10.1007/s44204-026-00400-4 , abstract =

work page doi:10.1007/s44204-026-00400-4 2026
[3]

Søgaard, Anders , editor =. Do. Proceedings of the 63rd. 2025 , pages =. doi:10.18653/v1/2025.acl-long.1258 , abstract =

work page doi:10.18653/v1/2025.acl-long.1258 2025
[4]

Anthropic Alignment Science Blog , author =

The. Anthropic Alignment Science Blog , author =
[5]

Journal of Machine Learning Research , author =

Causal. Journal of Machine Learning Research , author =. 2025 , pages =

2025
[6]

Geiger, Atticus and Wu, Zhengxuan and Potts, Christopher and Icard, Thomas and Goodman, Noah , month = mar, year =. Finding. Proceedings of the
[7]

and Markov, Ilia and Vossen, Piek , editor =

Schouten, Stefan F. and Markov, Ilia and Vossen, Piek , editor =. A. The. 2026 , pages =. doi:10.18653/v1/2026.wassa-1.12 , abstract =

work page doi:10.18653/v1/2026.wassa-1.12 2026
[8]

Improving

Nejadgholi, Isar and Fraser, Kathleen and Kiritchenko, Svetlana , editor =. Improving. Proceedings of the 60th. 2022 , pages =. doi:10.18653/v1/2022.acl-long.378 , abstract =

work page doi:10.18653/v1/2022.acl-long.378 2022
[9]

Probing for

Conia, Simone and Navigli, Roberto , editor =. Probing for. Proceedings of the 60th. 2022 , pages =. doi:10.18653/v1/2022.acl-long.316 , abstract =

work page doi:10.18653/v1/2022.acl-long.316 2022
[10]

Factuality challenges in the era of large language models and opportunities for fact-checking , volume =

Augenstein, Isabelle and Baldwin, Timothy and Cha, Meeyoung and Chakraborty, Tanmoy and Ciampaglia, Giovanni Luca and Corney, David and DiResta, Renee and Ferrara, Emilio and Hale, Scott and Halevy, Alon and Hovy, Eduard and Ji, Heng and Menczer, Filippo and Miguez, Ruben and Nakov, Preslav and Scheufele, Dietram and Sharma, Shivam and Zagni, Giovanni , m...

work page doi:10.1038/s42256-024-00881-z
[11]

and Baker, Collin , editor =

Heine, Bernd and Narrog, Heiko and Fillmore, Charles J. and Baker, Collin , editor =. A frames. The. doi:10.1093/oxfordhb/9780199677078.013.0013 , urldate =

work page doi:10.1093/oxfordhb/9780199677078.013.0013
[12]

Ruppenhofer, Josef and Ellsworth, Michael and Petruck, Miriam R. L. and Johnson, Christopher R. and Scheffczyk, Jan , year =
[13]

Representation Engineering: A Top-Down Approach to AI Transparency

Zou, Andy and Phan, Long and Chen, Sarah and Campbell, James and Guo, Phillip and Ren, Richard and Pan, Alexander and Yin, Xuwang and Mazeika, Mantas and Dombrowski, Ann-Kathrin and Goel, Shashwat and Li, Nathaniel and Byun, Michael J. and Wang, Zifan and Mallen, Alex and Basart, Steven and Koyejo, Sanmi and Song, Dawn and Fredrikson, Matt and Kolter, J. ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.01405
[14]

Zhao, Hongjue and Sun, Haosen and Kong, Jiangtao and Li, Xiaochang and Wang, Qineng and Jiang, Liwei and Zhu, Qi and Abdelzaher, Tarek and Choi, Yejin and Li, Manling and Shao, Huajie , month = feb, year =
[15]

Language

Zhu, Wentao and Zhang, Zhining and Wang, Yizhou , month = jul, year =. Language. Proceedings of the 41st
[16]

Representational and

Dies, Samantha and Maynard, Courtney and Savcisens, Germans and Eliassi-Rad, Tina , month = jan, year =. Representational and. doi:10.48550/arXiv.2511.19166 , abstract =

work page doi:10.48550/arxiv.2511.19166
[17]

Philosophy and the Mind Sciences , author =

Mental representation without neural representation:. Philosophy and the Mind Sciences , author =. 2026 , keywords =. doi:10.33735/phimisci.2026.12204 , abstract =

work page doi:10.33735/phimisci.2026.12204 2026
[18]

Mind & Language , author =

Do. Mind & Language , author =. 1997 , note =. doi:10.1111/j.1468-0017.1997.tb00061.x , abstract =

work page doi:10.1111/j.1468-0017.1997.tb00061.x 1997
[19]

Acta Analytica , author =

Rethinking. Acta Analytica , author =. 1995 , pages =

1995
[20]

Representation reconsidered , language =
[21]

, year =

Dretske, Fred I. , year =. Explaining behavior: reasons in a world of causes , isbn =
[22]

Makelov, Aleksandar and Lange, Georg and Geiger, Atticus and Nanda, Neel , year =. Is. 12th
[23]

International Conference on Learning Representations , author =

Everything,. International Conference on Learning Representations , author =. 2025 , pages =

2025
[24]

2023 , note =

Is. 2023 , note =. doi:10.48550/arXiv.2311.17030 , abstract =

work page doi:10.48550/arxiv.2311.17030 2023
[25]

, month = jan, year =

Wu, Zhengxuan and Geiger, Atticus and Huang, Jing and Arora, Aryaman and Icard, Thomas and Potts, Christopher and Goodman, Noah D. , month = jan, year =. A. doi:10.48550/arXiv.2401.12631 , abstract =

work page doi:10.48550/arxiv.2401.12631
[26]

Sakata, Masaki and Heinzerling, Benjamin and Ito, Takumi and Yokoi, Sho and Inui, Kentaro , month = apr, year =. Linear. doi:10.48550/arXiv.2604.07886 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.07886
[27]

Orgad, Hadas and Wei, Boyi and Zheng, Kaden and Wattenberg, Martin and Henderson, Peter and Goldfarb-Tarrant, Seraphina and Belinkov, Yonatan , month = apr, year =. Large. doi:10.48550/arXiv.2604.09544 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.09544
[28]

Arithmetic

Nikankin, Yaniv and Reusch, Anja and Mueller, Aaron and Belinkov, Yonatan , month = may, year =. Arithmetic. doi:10.48550/arXiv.2410.21272 , abstract =

work page doi:10.48550/arxiv.2410.21272
[29]

Understanding

Nickel, Christian and Schrewe, Laura and Mai, Florian and Flek, Lucie , month = feb, year =. Understanding. doi:10.48550/arXiv.2602.22072 , abstract =

work page doi:10.48550/arxiv.2602.22072
[30]

, month = feb, year =

Wu, Zhengxuan and Geiger, Atticus and Icard, Thomas and Potts, Christopher and Goodman, Noah D. , month = feb, year =. Interpretability at. doi:10.48550/arXiv.2305.08809 , abstract =

work page doi:10.48550/arxiv.2305.08809
[31]

doi:10.48550/arXiv.2503.10894 , abstract =

Sun, Jiuding and Huang, Jing and Baskaran, Sidharth and D'Oosterlinck, Karel and Potts, Christopher and Sklar, Michael and Geiger, Atticus , month = apr, year =. doi:10.48550/arXiv.2503.10894 , abstract =

work page doi:10.48550/arxiv.2503.10894
[32]

Lange, Georg and Makelov, Alex and Nanda, Neel , month = aug, year =. An
[33]

Sutter, Denis and Minder, Julian and Hofmann, Thomas and Pimentel, Tiago , year =. The. doi:10.48550/ARXIV.2507.08802 , abstract =

work page doi:10.48550/arxiv.2507.08802
[34]

Behavioral and Brain Sciences , author =

How. Behavioral and Brain Sciences , author =. 2025 , note =. doi:10.1017/S0140525X2510112X , abstract =

work page doi:10.1017/s0140525x2510112x 2025
[35]

Stacey, Joe and Orgad, Hadas and Inui, Kentaro and Heinzerling, Benjamin and Moosavi, Nafise Sadat , month = apr, year =. Hidden. doi:10.48550/arXiv.2604.11662 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.11662
[36]

Futrell, Richard and Mahowald, Kyle , month = apr, year =. You. doi:10.1017/S0140525X26105123 , abstract =

work page doi:10.1017/s0140525x26105123
[37]

Cheng, Stephen and Wiegreffe, Sarah and Manocha, Dinesh , month = apr, year =. What. doi:10.48550/arXiv.2604.08524 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.08524
[38]

Causality is

Joshi, Shruti and Mueller, Aaron and Klindt, David and Brendel, Wieland and Reizinger, Patrik and Sridhar, Dhanya , month = mar, year =. Causality is. doi:10.48550/arXiv.2602.16698 , abstract =

work page doi:10.48550/arxiv.2602.16698
[39]

Transactions on Machine Learning Research , author =

Large. Transactions on Machine Learning Research , author =
[40]

Assessing

Gubelmann, Reto and Karray, Ghassen , editor =. Assessing. Proceedings of the 63rd. 2025 , pages =. doi:10.18653/v1/2025.acl-long.1450 , abstract =

work page doi:10.18653/v1/2025.acl-long.1450 2025
[41]

Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!

Kambhampati, Subbarao and Valmeekam, Karthik and Bhambri, Siddhant and Palod, Vardhan and Saldyt, Lucas and Stechly, Kaya and Samineni, Soumya Rani and Kalwar, Durgesh and Biswas, Upasana , month = mar, year =. Position:. doi:10.48550/arXiv.2504.09762 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.09762
[42]

Adarsh, Shivam and Maistro, Maria and Lioma, Christina , month = jan, year =. How. doi:10.48550/arXiv.2601.06599 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.06599
[43]

Asian Journal of Philosophy , author =

Sapience. Asian Journal of Philosophy , author =
[45]

Lyngbaek, Laurits and Feldkamp, Pascale and Bizzoni, Yuri and Nielbo, Kristoffer and Enevoldsen, Kenneth , editor =. Is. The. 2026 , pages =. doi:10.18653/v1/2026.wassa-1.13 , abstract =

work page doi:10.18653/v1/2026.wassa-1.13 2026
[46]

Transactions of the Association for Computational Linguistics , author =

Theoretical. Transactions of the Association for Computational Linguistics , author =. 2020 , pages =. doi:10.1162/tacl_a_00306 , abstract =

work page doi:10.1162/tacl_a_00306 2020
[47]

Transactions on Machine Learning Research , author =

A. Transactions on Machine Learning Research , author =
[48]

ACM Trans

Reasoning. ACM Trans. Intell. Syst. Technol. , author =. 2025 , pages =. doi:10.1145/3712701 , abstract =

work page doi:10.1145/3712701 2025
[49]

Kath, Suraj and Badhe, Sanket and Shah, Preet and Sampathkumar, Ashwin and Gupta, Shivani , month = mar, year =. Large. doi:10.48550/arXiv.2604.00323 , abstract =

work page doi:10.48550/arxiv.2604.00323
[50]

Ying, Zhuofan Josh and Ravfogel, Shauli and Kriegeskorte, Nikolaus and Hase, Peter , month = feb, year =. The. doi:10.48550/arXiv.2602.20273 , abstract =

work page doi:10.48550/arxiv.2602.20273
[51]

Representation in

Shea, Nicholas , month = oct, year =. Representation in. doi:10.1093/oso/9780198812883.001.0001 , abstract =

work page doi:10.1093/oso/9780198812883.001.0001
[52]

Meaning without reference in large language models , url =

Piantadosi, Steven and Hill, Felix , month = oct, year =. Meaning without reference in large language models , url =
[53]

Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , author =

Symbols and grounding in large language models , volume =. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , author =. 2023 , pages =. doi:10.1098/rsta.2022.0041 , abstract =

work page doi:10.1098/rsta.2022.0041 2023
[54]

Mollo, Dimitri Coelho and Millière, Raphaël , month = dec, year =. The. doi:10.48550/arXiv.2304.01481 , abstract =

work page doi:10.48550/arxiv.2304.01481
[55]

Synthese , author =

Semantics,. Synthese , author =. 1984 , pages =. doi:10.1007/BF00869335 , language =

work page doi:10.1007/bf00869335 1984
[56]

Interpretability

Orgad, Hadas and Barez, Fazl and Haklay, Tal and Lee, Isabelle and Mosbach, Marius and Reusch, Anja and Saphra, Naomi and Wallace, Byron C and Wiegreffe, Sarah and Wong, Eric and Tenney, Ian and Geva, Mor , month = feb, year =. Interpretability
[57]

Testing the Limits of Truth Directions in LLMs

Poulis, Angelos and Crovella, Mark and Terzi, Evimaria , month = apr, year =. Testing the. doi:10.48550/arXiv.2604.03754 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.03754
[58]

Probing the

Berger, Tom-Felix , month = feb, year =. Probing the. doi:10.48550/arXiv.2603.10003 , abstract =

work page doi:10.48550/arxiv.2603.10003
[59]

De Giorgis, Stefano and Chen, Ting-Chih and Ilievski, Filip , month = feb, year =. M-
[60]

Sparks in the Wind , author =

How. Sparks in the Wind , author =
[61]

Proceedings of the 62nd

Huang, Jing and Wu, Zhengxuan and Potts, Christopher and Geva, Mor and Geiger, Atticus , editor =. Proceedings of the 62nd. 2024 , pages =. doi:10.18653/v1/2024.acl-long.470 , abstract =

work page doi:10.18653/v1/2024.acl-long.470 2024
[62]

Mueller, Aaron and Brinkmann, Jannik and Li, Millicent and Marks, Samuel and Pal, Koyena and Prakash, Nikhil and Rager, Can and Sankaranarayanan, Aruna and Sharma, Arnab Sen and Sun, Jiuding and Todd, Eric and Bau, David and Belinkov, Yonatan , month = sep, year =. The. doi:10.48550/arXiv.2408.01416 , abstract =

work page doi:10.48550/arxiv.2408.01416
[63]

Campbell Systematic Reviews , author =

Counter-narratives for the prevention of violent radicalisation:. Campbell Systematic Reviews , author =. 2020 , note =. doi:10.1002/cl2.1106 , abstract =

work page doi:10.1002/cl2.1106 2020
[64]

Recognizing

Gao, Lei and Kuppersmith, Alexis and Huang, Ruihong , editor =. Recognizing. Proceedings of the. 2017 , pages =

2017
[65]

2022 , pages =

Creating a. 2022 , pages =. doi:10.1017/9781108641104.019 , urldate =

work page doi:10.1017/9781108641104.019 2022
[66]

Kim, Junsol and Evans, James and Schein, Aaron , year =. Linear. 13th
[67]

and Macar, Uzay and Nanda, Neel and Conmy, Arthur , month = oct, year =

Bogdan, Paul C. and Macar, Uzay and Nanda, Neel and Conmy, Arthur , month = oct, year =. Thought. doi:10.48550/arXiv.2506.19143 , abstract =

work page doi:10.48550/arxiv.2506.19143
[68]

Hlobil, Ulf , month = feb, year =. First-. doi:10.48550/arXiv.2602.13908 , abstract =

work page doi:10.48550/arxiv.2602.13908
[69]

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

🧜. Computational Linguistics , author =. 2025 , pages =. doi:10.1162/COLI.a.16 , abstract =

work page doi:10.1162/coli.a.16 2025
[70]

Cognitive Systems Research , author =

Does. Cognitive Systems Research , author =. 2024 , keywords =. doi:10.1016/j.cogsys.2023.101174 , abstract =

work page doi:10.1016/j.cogsys.2023.101174 2024
[71]

2018 , keywords =

Non-propositional intentionality , isbn =. 2018 , keywords =

2018
[72]

McGrath, Thomas and Rahtz, Matthew and Kramar, Janos and Mikulik, Vladimir and Legg, Shane , month = jul, year =. The. doi:10.48550/arXiv.2307.15771 , abstract =

work page doi:10.48550/arxiv.2307.15771
[73]

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Wang, Kevin and Variengien, Alexandre and Conmy, Arthur and Shlegeris, Buck and Steinhardt, Jacob , month = nov, year =. Interpretability in the. doi:10.48550/arXiv.2211.00593 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2211.00593
[74]

Synthese , author =

Connectionism and. Synthese , author =. 1990 , keywords =. doi:10.1007/BF00413661 , abstract =

work page doi:10.1007/bf00413661 1990
[75]

OpenAI and Agarwal, Sandhini and Ahmad, Lama and Ai, Jason and Altman, Sam and Applebaum, Andy and Arbus, Edwin and Arora, Rahul K. and Bai, Yu and Baker, Bowen and Bao, Haiming and Barak, Boaz and Bennett, Ally and Bertao, Tyler and Brett, Nivedita and Brevdo, Eugene and Brockman, Greg and Bubeck, Sebastien and Chang, Che and Chen, Kai and Chen, Mark and...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10925
[76]

Gemma 3 Technical Report

Team, Gemma and Kamath, Aishwarya and Ferret, Johan and Pathak, Shreya and Vieillard, Nino and Merhej, Ramona and Perrin, Sarah and Matejovicova, Tatiana and Ramé, Alexandre and Rivière, Morgane and Rouillard, Louis and Mesnard, Thomas and Cideron, Geoffrey and Grill, Jean-bastien and Ramos, Sabela and Yvinec, Edouard and Casbon, Michelle and Pot, Etienne...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.19786
[77]

doi:10.1017/9781108641104 , abstract =

Creating a. doi:10.1017/9781108641104 , abstract =

work page doi:10.1017/9781108641104
[78]

Linear representations in language models can change dramatically over a conversation , url =

Lampinen, Andrew Kyle and Li, Yuxuan and Hosseini, Eghbal and Bhardwaj, Sangnie and Shanahan, Murray , month = feb, year =. Linear representations in language models can change dramatically over a conversation , url =. doi:10.48550/arXiv.2601.20834 , abstract =

work page doi:10.48550/arxiv.2601.20834
[79]

Deep sequence models tend to memorize geometrically; it is unclear why

Noroozizadeh, Shahriar and Nagarajan, Vaishnavh and Rosenfeld, Elan and Kumar, Sanjiv , month = dec, year =. Deep sequence models tend to memorize geometrically; it is unclear why , url =. doi:10.48550/arXiv.2510.26745 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.26745
[80]

and Potts, Christopher , month = jun, year =

Wu, Zhengxuan and Arora, Aryaman and Geiger, Atticus and Wang, Zheng and Huang, Jing and Jurafsky, Dan and Manning, Christopher D. and Potts, Christopher , month = jun, year =
[81]

and Nava, Andres and Wyart, Matthieu and Bahri, Yasaman , month = feb, year =

Karkada, Dhruva and Korchinski, Daniel J. and Nava, Andres and Wyart, Matthieu and Bahri, Yasaman , month = feb, year =. Symmetry in language statistics shapes the geometry of model representations , url =. doi:10.48550/arXiv.2602.15029 , abstract =

work page doi:10.48550/arxiv.2602.15029

Showing first 80 references.

[1] [1]

Mishra, Aayush and Khashabi, Daniel and Liu, Anqi , month = apr, year =. Steered

[2] [2]

Asian Journal of Philosophy , author =

Sapience without sentience: an inferentialist approach to. Asian Journal of Philosophy , author =. 2026 , keywords =. doi:10.1007/s44204-026-00400-4 , abstract =

work page doi:10.1007/s44204-026-00400-4 2026

[3] [3]

Søgaard, Anders , editor =. Do. Proceedings of the 63rd. 2025 , pages =. doi:10.18653/v1/2025.acl-long.1258 , abstract =

work page doi:10.18653/v1/2025.acl-long.1258 2025

[4] [4]

Anthropic Alignment Science Blog , author =

The. Anthropic Alignment Science Blog , author =

[5] [5]

Journal of Machine Learning Research , author =

Causal. Journal of Machine Learning Research , author =. 2025 , pages =

2025

[6] [6]

Geiger, Atticus and Wu, Zhengxuan and Potts, Christopher and Icard, Thomas and Goodman, Noah , month = mar, year =. Finding. Proceedings of the

[7] [7]

and Markov, Ilia and Vossen, Piek , editor =

Schouten, Stefan F. and Markov, Ilia and Vossen, Piek , editor =. A. The. 2026 , pages =. doi:10.18653/v1/2026.wassa-1.12 , abstract =

work page doi:10.18653/v1/2026.wassa-1.12 2026

[8] [8]

Improving

Nejadgholi, Isar and Fraser, Kathleen and Kiritchenko, Svetlana , editor =. Improving. Proceedings of the 60th. 2022 , pages =. doi:10.18653/v1/2022.acl-long.378 , abstract =

work page doi:10.18653/v1/2022.acl-long.378 2022

[9] [9]

Probing for

Conia, Simone and Navigli, Roberto , editor =. Probing for. Proceedings of the 60th. 2022 , pages =. doi:10.18653/v1/2022.acl-long.316 , abstract =

work page doi:10.18653/v1/2022.acl-long.316 2022

[10] [10]

Factuality challenges in the era of large language models and opportunities for fact-checking , volume =

Augenstein, Isabelle and Baldwin, Timothy and Cha, Meeyoung and Chakraborty, Tanmoy and Ciampaglia, Giovanni Luca and Corney, David and DiResta, Renee and Ferrara, Emilio and Hale, Scott and Halevy, Alon and Hovy, Eduard and Ji, Heng and Menczer, Filippo and Miguez, Ruben and Nakov, Preslav and Scheufele, Dietram and Sharma, Shivam and Zagni, Giovanni , m...

work page doi:10.1038/s42256-024-00881-z

[11] [11]

and Baker, Collin , editor =

Heine, Bernd and Narrog, Heiko and Fillmore, Charles J. and Baker, Collin , editor =. A frames. The. doi:10.1093/oxfordhb/9780199677078.013.0013 , urldate =

work page doi:10.1093/oxfordhb/9780199677078.013.0013

[12] [12]

Ruppenhofer, Josef and Ellsworth, Michael and Petruck, Miriam R. L. and Johnson, Christopher R. and Scheffczyk, Jan , year =

[13] [13]

Representation Engineering: A Top-Down Approach to AI Transparency

Zou, Andy and Phan, Long and Chen, Sarah and Campbell, James and Guo, Phillip and Ren, Richard and Pan, Alexander and Yin, Xuwang and Mazeika, Mantas and Dombrowski, Ann-Kathrin and Goel, Shashwat and Li, Nathaniel and Byun, Michael J. and Wang, Zifan and Mallen, Alex and Basart, Steven and Koyejo, Sanmi and Song, Dawn and Fredrikson, Matt and Kolter, J. ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.01405

[14] [14]

Zhao, Hongjue and Sun, Haosen and Kong, Jiangtao and Li, Xiaochang and Wang, Qineng and Jiang, Liwei and Zhu, Qi and Abdelzaher, Tarek and Choi, Yejin and Li, Manling and Shao, Huajie , month = feb, year =

[15] [15]

Language

Zhu, Wentao and Zhang, Zhining and Wang, Yizhou , month = jul, year =. Language. Proceedings of the 41st

[16] [16]

Representational and

Dies, Samantha and Maynard, Courtney and Savcisens, Germans and Eliassi-Rad, Tina , month = jan, year =. Representational and. doi:10.48550/arXiv.2511.19166 , abstract =

work page doi:10.48550/arxiv.2511.19166

[17] [17]

Philosophy and the Mind Sciences , author =

Mental representation without neural representation:. Philosophy and the Mind Sciences , author =. 2026 , keywords =. doi:10.33735/phimisci.2026.12204 , abstract =

work page doi:10.33735/phimisci.2026.12204 2026

[18] [18]

Mind & Language , author =

Do. Mind & Language , author =. 1997 , note =. doi:10.1111/j.1468-0017.1997.tb00061.x , abstract =

work page doi:10.1111/j.1468-0017.1997.tb00061.x 1997

[19] [19]

Acta Analytica , author =

Rethinking. Acta Analytica , author =. 1995 , pages =

1995

[20] [20]

Representation reconsidered , language =

[21] [21]

, year =

Dretske, Fred I. , year =. Explaining behavior: reasons in a world of causes , isbn =

[22] [22]

Makelov, Aleksandar and Lange, Georg and Geiger, Atticus and Nanda, Neel , year =. Is. 12th

[23] [23]

International Conference on Learning Representations , author =

Everything,. International Conference on Learning Representations , author =. 2025 , pages =

2025

[24] [24]

2023 , note =

Is. 2023 , note =. doi:10.48550/arXiv.2311.17030 , abstract =

work page doi:10.48550/arxiv.2311.17030 2023

[25] [25]

, month = jan, year =

Wu, Zhengxuan and Geiger, Atticus and Huang, Jing and Arora, Aryaman and Icard, Thomas and Potts, Christopher and Goodman, Noah D. , month = jan, year =. A. doi:10.48550/arXiv.2401.12631 , abstract =

work page doi:10.48550/arxiv.2401.12631

[26] [26]

Sakata, Masaki and Heinzerling, Benjamin and Ito, Takumi and Yokoi, Sho and Inui, Kentaro , month = apr, year =. Linear. doi:10.48550/arXiv.2604.07886 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.07886

[27] [27]

Orgad, Hadas and Wei, Boyi and Zheng, Kaden and Wattenberg, Martin and Henderson, Peter and Goldfarb-Tarrant, Seraphina and Belinkov, Yonatan , month = apr, year =. Large. doi:10.48550/arXiv.2604.09544 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.09544

[28] [28]

Arithmetic

Nikankin, Yaniv and Reusch, Anja and Mueller, Aaron and Belinkov, Yonatan , month = may, year =. Arithmetic. doi:10.48550/arXiv.2410.21272 , abstract =

work page doi:10.48550/arxiv.2410.21272

[29] [29]

Understanding

Nickel, Christian and Schrewe, Laura and Mai, Florian and Flek, Lucie , month = feb, year =. Understanding. doi:10.48550/arXiv.2602.22072 , abstract =

work page doi:10.48550/arxiv.2602.22072

[30] [30]

, month = feb, year =

Wu, Zhengxuan and Geiger, Atticus and Icard, Thomas and Potts, Christopher and Goodman, Noah D. , month = feb, year =. Interpretability at. doi:10.48550/arXiv.2305.08809 , abstract =

work page doi:10.48550/arxiv.2305.08809

[31] [31]

doi:10.48550/arXiv.2503.10894 , abstract =

Sun, Jiuding and Huang, Jing and Baskaran, Sidharth and D'Oosterlinck, Karel and Potts, Christopher and Sklar, Michael and Geiger, Atticus , month = apr, year =. doi:10.48550/arXiv.2503.10894 , abstract =

work page doi:10.48550/arxiv.2503.10894

[32] [32]

Lange, Georg and Makelov, Alex and Nanda, Neel , month = aug, year =. An

[33] [33]

Sutter, Denis and Minder, Julian and Hofmann, Thomas and Pimentel, Tiago , year =. The. doi:10.48550/ARXIV.2507.08802 , abstract =

work page doi:10.48550/arxiv.2507.08802

[34] [34]

Behavioral and Brain Sciences , author =

How. Behavioral and Brain Sciences , author =. 2025 , note =. doi:10.1017/S0140525X2510112X , abstract =

work page doi:10.1017/s0140525x2510112x 2025

[35] [35]

Stacey, Joe and Orgad, Hadas and Inui, Kentaro and Heinzerling, Benjamin and Moosavi, Nafise Sadat , month = apr, year =. Hidden. doi:10.48550/arXiv.2604.11662 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.11662

[36] [36]

Futrell, Richard and Mahowald, Kyle , month = apr, year =. You. doi:10.1017/S0140525X26105123 , abstract =

work page doi:10.1017/s0140525x26105123

[37] [37]

Cheng, Stephen and Wiegreffe, Sarah and Manocha, Dinesh , month = apr, year =. What. doi:10.48550/arXiv.2604.08524 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.08524

[38] [38]

Causality is

Joshi, Shruti and Mueller, Aaron and Klindt, David and Brendel, Wieland and Reizinger, Patrik and Sridhar, Dhanya , month = mar, year =. Causality is. doi:10.48550/arXiv.2602.16698 , abstract =

work page doi:10.48550/arxiv.2602.16698

[39] [39]

Transactions on Machine Learning Research , author =

Large. Transactions on Machine Learning Research , author =

[40] [40]

Assessing

Gubelmann, Reto and Karray, Ghassen , editor =. Assessing. Proceedings of the 63rd. 2025 , pages =. doi:10.18653/v1/2025.acl-long.1450 , abstract =

work page doi:10.18653/v1/2025.acl-long.1450 2025

[41] [41]

Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!

Kambhampati, Subbarao and Valmeekam, Karthik and Bhambri, Siddhant and Palod, Vardhan and Saldyt, Lucas and Stechly, Kaya and Samineni, Soumya Rani and Kalwar, Durgesh and Biswas, Upasana , month = mar, year =. Position:. doi:10.48550/arXiv.2504.09762 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.09762

[42] [42]

Adarsh, Shivam and Maistro, Maria and Lioma, Christina , month = jan, year =. How. doi:10.48550/arXiv.2601.06599 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.06599

[43] [43]

Asian Journal of Philosophy , author =

Sapience. Asian Journal of Philosophy , author =

[44] [45]

Lyngbaek, Laurits and Feldkamp, Pascale and Bizzoni, Yuri and Nielbo, Kristoffer and Enevoldsen, Kenneth , editor =. Is. The. 2026 , pages =. doi:10.18653/v1/2026.wassa-1.13 , abstract =

work page doi:10.18653/v1/2026.wassa-1.13 2026

[45] [46]

Transactions of the Association for Computational Linguistics , author =

Theoretical. Transactions of the Association for Computational Linguistics , author =. 2020 , pages =. doi:10.1162/tacl_a_00306 , abstract =

work page doi:10.1162/tacl_a_00306 2020

[46] [47]

Transactions on Machine Learning Research , author =

A. Transactions on Machine Learning Research , author =

[47] [48]

ACM Trans

Reasoning. ACM Trans. Intell. Syst. Technol. , author =. 2025 , pages =. doi:10.1145/3712701 , abstract =

work page doi:10.1145/3712701 2025

[48] [49]

Kath, Suraj and Badhe, Sanket and Shah, Preet and Sampathkumar, Ashwin and Gupta, Shivani , month = mar, year =. Large. doi:10.48550/arXiv.2604.00323 , abstract =

work page doi:10.48550/arxiv.2604.00323

[49] [50]

Ying, Zhuofan Josh and Ravfogel, Shauli and Kriegeskorte, Nikolaus and Hase, Peter , month = feb, year =. The. doi:10.48550/arXiv.2602.20273 , abstract =

work page doi:10.48550/arxiv.2602.20273

[50] [51]

Representation in

Shea, Nicholas , month = oct, year =. Representation in. doi:10.1093/oso/9780198812883.001.0001 , abstract =

work page doi:10.1093/oso/9780198812883.001.0001

[51] [52]

Meaning without reference in large language models , url =

Piantadosi, Steven and Hill, Felix , month = oct, year =. Meaning without reference in large language models , url =

[52] [53]

Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , author =

Symbols and grounding in large language models , volume =. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , author =. 2023 , pages =. doi:10.1098/rsta.2022.0041 , abstract =

work page doi:10.1098/rsta.2022.0041 2023

[53] [54]

Mollo, Dimitri Coelho and Millière, Raphaël , month = dec, year =. The. doi:10.48550/arXiv.2304.01481 , abstract =

work page doi:10.48550/arxiv.2304.01481

[54] [55]

Synthese , author =

Semantics,. Synthese , author =. 1984 , pages =. doi:10.1007/BF00869335 , language =

work page doi:10.1007/bf00869335 1984

[55] [56]

Interpretability

Orgad, Hadas and Barez, Fazl and Haklay, Tal and Lee, Isabelle and Mosbach, Marius and Reusch, Anja and Saphra, Naomi and Wallace, Byron C and Wiegreffe, Sarah and Wong, Eric and Tenney, Ian and Geva, Mor , month = feb, year =. Interpretability

[56] [57]

Testing the Limits of Truth Directions in LLMs

Poulis, Angelos and Crovella, Mark and Terzi, Evimaria , month = apr, year =. Testing the. doi:10.48550/arXiv.2604.03754 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.03754

[57] [58]

Probing the

Berger, Tom-Felix , month = feb, year =. Probing the. doi:10.48550/arXiv.2603.10003 , abstract =

work page doi:10.48550/arxiv.2603.10003

[58] [59]

De Giorgis, Stefano and Chen, Ting-Chih and Ilievski, Filip , month = feb, year =. M-

[59] [60]

Sparks in the Wind , author =

How. Sparks in the Wind , author =

[60] [61]

Proceedings of the 62nd

Huang, Jing and Wu, Zhengxuan and Potts, Christopher and Geva, Mor and Geiger, Atticus , editor =. Proceedings of the 62nd. 2024 , pages =. doi:10.18653/v1/2024.acl-long.470 , abstract =

work page doi:10.18653/v1/2024.acl-long.470 2024

[61] [62]

Mueller, Aaron and Brinkmann, Jannik and Li, Millicent and Marks, Samuel and Pal, Koyena and Prakash, Nikhil and Rager, Can and Sankaranarayanan, Aruna and Sharma, Arnab Sen and Sun, Jiuding and Todd, Eric and Bau, David and Belinkov, Yonatan , month = sep, year =. The. doi:10.48550/arXiv.2408.01416 , abstract =

work page doi:10.48550/arxiv.2408.01416

[62] [63]

Campbell Systematic Reviews , author =

Counter-narratives for the prevention of violent radicalisation:. Campbell Systematic Reviews , author =. 2020 , note =. doi:10.1002/cl2.1106 , abstract =

work page doi:10.1002/cl2.1106 2020

[63] [64]

Recognizing

Gao, Lei and Kuppersmith, Alexis and Huang, Ruihong , editor =. Recognizing. Proceedings of the. 2017 , pages =

2017

[64] [65]

2022 , pages =

Creating a. 2022 , pages =. doi:10.1017/9781108641104.019 , urldate =

work page doi:10.1017/9781108641104.019 2022

[65] [66]

Kim, Junsol and Evans, James and Schein, Aaron , year =. Linear. 13th

[66] [67]

and Macar, Uzay and Nanda, Neel and Conmy, Arthur , month = oct, year =

Bogdan, Paul C. and Macar, Uzay and Nanda, Neel and Conmy, Arthur , month = oct, year =. Thought. doi:10.48550/arXiv.2506.19143 , abstract =

work page doi:10.48550/arxiv.2506.19143

[67] [68]

Hlobil, Ulf , month = feb, year =. First-. doi:10.48550/arXiv.2602.13908 , abstract =

work page doi:10.48550/arxiv.2602.13908

[68] [69]

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

🧜. Computational Linguistics , author =. 2025 , pages =. doi:10.1162/COLI.a.16 , abstract =

work page doi:10.1162/coli.a.16 2025

[69] [70]

Cognitive Systems Research , author =

Does. Cognitive Systems Research , author =. 2024 , keywords =. doi:10.1016/j.cogsys.2023.101174 , abstract =

work page doi:10.1016/j.cogsys.2023.101174 2024

[70] [71]

2018 , keywords =

Non-propositional intentionality , isbn =. 2018 , keywords =

2018

[71] [72]

McGrath, Thomas and Rahtz, Matthew and Kramar, Janos and Mikulik, Vladimir and Legg, Shane , month = jul, year =. The. doi:10.48550/arXiv.2307.15771 , abstract =

work page doi:10.48550/arxiv.2307.15771

[72] [73]

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Wang, Kevin and Variengien, Alexandre and Conmy, Arthur and Shlegeris, Buck and Steinhardt, Jacob , month = nov, year =. Interpretability in the. doi:10.48550/arXiv.2211.00593 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2211.00593

[73] [74]

Synthese , author =

Connectionism and. Synthese , author =. 1990 , keywords =. doi:10.1007/BF00413661 , abstract =

work page doi:10.1007/bf00413661 1990

[74] [75]

OpenAI and Agarwal, Sandhini and Ahmad, Lama and Ai, Jason and Altman, Sam and Applebaum, Andy and Arbus, Edwin and Arora, Rahul K. and Bai, Yu and Baker, Bowen and Bao, Haiming and Barak, Boaz and Bennett, Ally and Bertao, Tyler and Brett, Nivedita and Brevdo, Eugene and Brockman, Greg and Bubeck, Sebastien and Chang, Che and Chen, Kai and Chen, Mark and...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10925

[75] [76]

Gemma 3 Technical Report

Team, Gemma and Kamath, Aishwarya and Ferret, Johan and Pathak, Shreya and Vieillard, Nino and Merhej, Ramona and Perrin, Sarah and Matejovicova, Tatiana and Ramé, Alexandre and Rivière, Morgane and Rouillard, Louis and Mesnard, Thomas and Cideron, Geoffrey and Grill, Jean-bastien and Ramos, Sabela and Yvinec, Edouard and Casbon, Michelle and Pot, Etienne...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.19786

[76] [77]

doi:10.1017/9781108641104 , abstract =

Creating a. doi:10.1017/9781108641104 , abstract =

work page doi:10.1017/9781108641104

[77] [78]

Linear representations in language models can change dramatically over a conversation , url =

Lampinen, Andrew Kyle and Li, Yuxuan and Hosseini, Eghbal and Bhardwaj, Sangnie and Shanahan, Murray , month = feb, year =. Linear representations in language models can change dramatically over a conversation , url =. doi:10.48550/arXiv.2601.20834 , abstract =

work page doi:10.48550/arxiv.2601.20834

[78] [79]

Deep sequence models tend to memorize geometrically; it is unclear why

Noroozizadeh, Shahriar and Nagarajan, Vaishnavh and Rosenfeld, Elan and Kumar, Sanjiv , month = dec, year =. Deep sequence models tend to memorize geometrically; it is unclear why , url =. doi:10.48550/arXiv.2510.26745 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.26745

[79] [80]

and Potts, Christopher , month = jun, year =

Wu, Zhengxuan and Arora, Aryaman and Geiger, Atticus and Wang, Zheng and Huang, Jing and Jurafsky, Dan and Manning, Christopher D. and Potts, Christopher , month = jun, year =

[80] [81]

and Nava, Andres and Wyart, Matthieu and Bahri, Yasaman , month = feb, year =

Karkada, Dhruva and Korchinski, Daniel J. and Nava, Andres and Wyart, Matthieu and Bahri, Yasaman , month = feb, year =. Symmetry in language statistics shapes the geometry of model representations , url =. doi:10.48550/arXiv.2602.15029 , abstract =

work page doi:10.48550/arxiv.2602.15029