Graph-Native Reinforcement Learning Enables Traceable Scientific Hypothesis Generation through Conceptual Recombination

Markus J. Buehler; Shashwat Sourav; Subhadeep Pal; Tirthankar Ghosal

arxiv: 2607.00924 · v1 · pith:EJ4DZLSBnew · submitted 2026-07-01 · 💻 cs.AI · cond-mat.mtrl-sci· cs.CL· cs.LG

Graph-Native Reinforcement Learning Enables Traceable Scientific Hypothesis Generation through Conceptual Recombination

Subhadeep Pal , Shashwat Sourav , Tirthankar Ghosal , Markus J. Buehler This is my paper

Pith reviewed 2026-07-02 12:25 UTC · model grok-4.3

classification 💻 cs.AI cond-mat.mtrl-scics.CLcs.LG

keywords graph-native reinforcement learningscientific hypothesis generationmaterials sciencereasoning traceabilityconceptual recombinationGroup Relative Policy OptimizationGraph-PRefLexORsemantic diversity

0 comments

The pith

Graph-PRefLexOR organizes reasoning into explicit phases via graph-native reinforcement learning, producing more traceable hypotheses than base models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Graph-PRefLexOR, a family of models fine-tuned with Group Relative Policy Optimization to structure scientific reasoning into four explicit phases: mechanism exploration, graph construction, pattern extraction, and hypothesis synthesis. This links neural language generation directly to symbolic relational graphs so that causal connections can be built, inspected, and reused during open-ended materials design tasks. Tested on 100 questions drawn from materials science and mechanics literature, the approach delivers 40-65 percent gains over base models, with the biggest lifts in reasoning traceability and roughly two to three times greater semantic diversity. Embedding and hidden-state analyses confirm tighter alignment between the structured steps and the final answers, while test-time graph expansion shows that extra compute mainly drives long-range conceptual recombination inside a bounded semantic space.

Core claim

Graph-PRefLexOR links neural language generation with symbolic relational structure by organizing reasoning into explicit phases for mechanism exploration, graph construction, pattern extraction, and hypothesis synthesis. This design enables causal connections to be constructed, inspected, and reused, resulting in 40-65% improvements over base models on 100 open-ended materials questions, with the largest gains in reasoning traceability, broader semantic exploration, and stronger alignment between intermediate reasoning and final answers.

What carries the argument

Graph-PRefLexOR, the graph-native reasoning model fine-tuned with Group Relative Policy Optimization (GRPO) to enforce phased reasoning that connects language outputs to symbolic graphs for inspection and recombination.

If this is right

Reasoning steps become inspectable, so users can trace how intermediate graphs support or contradict the final hypothesis.
Semantic diversity roughly doubles, allowing the model to explore a wider set of conceptual combinations within the same domain.
Additional test-time compute increases long-range recombination rather than simply widening the covered semantic space.
Hidden-state analyses show tighter coupling between the phased reasoning layers and the generated answer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same phased graph structure could be adapted to hypothesis generation in chemistry or biology where causal mechanisms are also central.
If the graph-construction phase can be made fully automatic from raw text, the method might scale to larger corpora without extra human annotation.
The bounded semantic space implies the model excels at recombining known concepts but may still require external novelty injection to propose truly paradigm-shifting ideas.

Load-bearing premise

The 100 questions drawn from existing literature are enough to measure scientific validity and that the reported gains in traceability stem specifically from the graph-native phased structure.

What would settle it

An expert panel rates traceability and scientific validity on the same 100 questions for both Graph-PRefLexOR and base models trained to the same compute budget but without the explicit graph-phased structure; if the gap disappears, the central claim is false.

Figures

Figures reproduced from arXiv: 2607.00924 by Markus J. Buehler, Shashwat Sourav, Subhadeep Pal, Tirthankar Ghosal.

**Figure 1.** Figure 1: (a) Scientific discovery often proceeds through iterative hypothesis generation, validation, re-ideation and refinement. (b) Standard LLM responses to scientific queries can be difficult to trace, leading to untraceability, hallucination, or contradiction. (c) Graph-PRefLexOR addresses this limitation by organizing the <think> section into explicit reasoning phases: <brainstorm> for mechanism exploration, … view at source ↗

**Figure 2.** Figure 2: Evaluation of structured reasoning across model scales on open-ended scientific questions (N = 100), assessed using Claude Opus-4.7. Metrics include Reasoning Quality, Intellectual Depth, Reasoning Traceability, and Overall score (0–10). (a) Graph-PRefLexOR-8B vs. Qwen3-8B (with no-thinking variant), (b) Graph-PRefLexOR-3B vs. Llama-3.2-3B-Instruct, and (c) Graph-PRefLexOR-1.7B vs. Qwen3-1.7B (with no-thin… view at source ↗

**Figure 3.** Figure 3: Representative cross-disciplinary hypothesis-generation question used to evaluate Graph-PRefLexOR. The question is derived from Ref. [10] and probes analogical mapping, mechanistic breakdown, and long-horizon adaptive reasoning. models with reasoning disabled (no-thinking setting). The resulting performance degradation closely mirrors that observed for the Llama baseline, with overall reductions on the ord… view at source ↗

**Figure 4.** Figure 4: Representative Graph-PRefLexOR-8B reasoning to the benchmark question in [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Graph and pattern representation extracted from the Graph-PRefLexOR-8B response. (a) Directed graph linking biological immune-system concepts, multi-agent AI components, and the proposed bridging mechanism. (b) Higher-order reasoning patterns extracted from the graph, summarizing the main causal motifs used for hypothesis synthesis. This qualitative comparison motivates the embedding-based analyses that fo… view at source ↗

**Figure 6.** Figure 6: PCA projection of reasoning traces and final answers comparing Graph-PRefLexOR and base models across scales. (a) Graph-PRefLexOR-8B vs. Qwen3-8B reasoning traces, (b) Graph-PRefLexOR-1.7B vs. Qwen3-1.7B reasoning traces, and (c–e) corresponding comparisons of final answers for 8B, 3B, and 1.7B models, respectively. Reasoning traces are decomposed into structured components (<brainstorm>, <graph>, <pattern… view at source ↗

**Figure 7.** Figure 7: PCA projection of directed (a) reasoning, and (b) answer trajectories between Graph-PRefLexOR-8B and Qwen3-8B. For Graph-PRefLexOR, trajectories explicitly follow structured stages (<brainstorm>, <graph>, <patterns>, and <synthesis>), forming coherent, directional transitions in latent space. In contrast, base model trajectories (shown as sequential chunks) remain more localized and less structured. For an… view at source ↗

**Figure 8.** Figure 8: Semantic diversity measured via inter-phase centroid distance for (a) reasoning traces and (b) final answers across model scales. Violin plots show the distribution of sample-level semantic diversity scores for Graph-PRefLexOR and the corresponding base models. Individual points denote responses, horizontal black lines indicate medians, black diamonds denote means, and vertical error bars represent one sta… view at source ↗

**Figure 9.** Figure 9: Semantic backtracking analysis of final answer alignment for Qwen3-8B and Graph-PRefLexOR 8B across 100 open-ended scientific questions. (a) Binary split showing whether each final answer is closest to its own reasoning trace or to the other model’s outputs. (b) Source distribution for Qwen3-8B final answers, which align with its own <think> trace in only 16/100 cases and more often align with Graph-PRefLe… view at source ↗

**Figure 10.** Figure 10: Internal semantic backtracking of Graph-PRefLexOR-8B final answers. (a) Closest structured reasoning stage for each final answer across 100 benchmark questions. (b) Mean cosine similarity between the final answer and each reasoning phase. Final answers align most frequently and most strongly with the <synthesis> stage, indicating that response generation is primarily grounded in the final integrative reas… view at source ↗

**Figure 11.** Figure 11: Layer-wise hidden-state divergence between reasoning and final-answer representations for Qwen3-8B and Graph-PRefLexOR-8B. Qwen3-8B exhibits a larger reasoning-answer separation, with a pronounced increase around layers 7-10 and a final-layer spike. In contrast, Graph-PRefLexOR-8B maintains lower divergence across most layers, indicating a more continuous transition from structured reasoning to final-answ… view at source ↗

**Figure 12.** Figure 12: Backtracking-conditioned layer-wise hidden-state divergence for Qwen3-8B and Graph-PRefLexOR-8B. (a) Qwen3-8B divergence between thinking and final-answer, separated by whether the final answer backtracks to the model’s own thinking trace or to another source. Non-backtracking cases show larger divergence, particularly around layers 7-10 and at the final layer. (b) Graph-PRefLexOR-8B divergence between st… view at source ↗

**Figure 13.** Figure 13: Graph-native ideation loop for test-time graph expansion. At each iteration, the reasoner answers a question, emits a small ontological graph, and merges it into a growing memory graph Gt using embedding-based de-duplication. An expansion strategy then selects concepts or concept pairs from Gt to generate the next question. The four strategies are frontier, which expands low-degree leaves and central hubs… view at source ↗

**Figure 14.** Figure 14: Test-time compute expands a bounded idea space through recombination. Four size-robust metrics are shown as a function of reasoning iteration up to 2,000 iterations for the four expansion strategies. The number of distinct concepts continues to increase (a), whereas the explored embedding volume (b) and maximum distance from the seed (c) saturate within a few hundred iterations, indicating that the semant… view at source ↗

**Figure 15.** Figure 15: Semantic organization and broker concepts in the leap run. (a) Principal-component projection of concept embeddings, with colors indicating greedy-modularity communities and marker size indicating PageRank. The fifteen highest-PageRank concepts are numbered and listed below the map. (b) Broker concepts plotted by degree and betweenness. A small set of high-betweenness concepts mediates most cross-communit… view at source ↗

**Figure 16.** Figure 16: Growth dynamics of the leap run. The final graph is replayed in birth-iteration order, and each panel is evaluated using embedding geometry or mesoscale community structure rather than raw graph distance. (a) New concepts per iteration bin, separated into novel concepts and consolidating in-fill concepts; the black line shows the fraction of novel concepts. (b) Number of greedy-modularity communities and … view at source ↗

**Figure 17.** Figure 17: Statistical novelty of mined connections in the leap run. (a) Relational-motif significance relative to a label-shuffled null model. The ten most over-represented relation-typed two-step motifs reach z ≈ 100–160, far exceeding the ordinary significance threshold (z = 1.96), indicating that the graph follows consistent relational templates rather than random associations. The graph is also more modular tha… view at source ↗

**Figure 18.** Figure 18: ORPO cold start for all three models. Top row: total ORPO loss and its NLL component; bottom row: preference accuracy and reward margin (both in [0, 1]); columns are (a) 1.7B, (b) 3B, (c) 8B, with y shared per row (note the differing ORPO durations, ∼480/480/240 steps). Loss falls and preference accuracy saturates near 1.0 for all backbones; the 1.7B’s much larger reward margin reflects its higher learnin… view at source ↗

**Figure 19.** Figure 19: Graph-GRPO reward for all three models. Top row: total composite reward; bottom row: the six reward components; columns are (a) 1.7B, (b) 3B, (c) 8B (differing GRPO durations, ∼970/1970/1260 steps), with y shared per row for direct comparison. The 8B starts highest and the 3B climbs most, while graph utility (green) is the lowest component at every scale. Rationale for This Reasoning Structure Each of the… view at source ↗

**Figure 20.** Figure 20: Graph-GRPO dynamics across the three models, with each run rescaled to [0, 1] training progress so the durations align. (a) Reasoning-trace length: mean terminated completion length (top) and fraction of completions truncated at the token budget (bottom). (b) Optimization diagnostics: within-group reward standard deviation, i.e. the scale of the group-normalized advantage of Eq. (2) (top), and policy entr… view at source ↗

**Figure 21.** Figure 21: Workflow for constructing the open-ended scientific reasoning benchmark from research papers. For each paper, OpenAI gpt-5.4 with high reasoning effort generates one self-contained, research-level evaluation question. The resulting benchmark contains 100 open-ended questions. Each question is assigned to one of five predefined reasoning categories: causal_multiscale_reasoning, tradeoff_and_non_monotonici… view at source ↗

read the original abstract

Accelerating materials discovery requires AI systems that can generate scientifically valid hypotheses through multi-step, domain-grounded reasoning. Standard large language models often produce fluent but weakly traceable responses to open-ended materials design problems, making it difficult to determine whether final answers are supported by coherent intermediate reasoning. We develop Graph-PRefLexOR, a family of graph-native reasoning models fine-tuned with Group Relative Policy Optimization (GRPO) to organize reasoning into explicit phases for mechanism exploration, graph construction, pattern extraction, and hypothesis synthesis. This design links neural language generation with symbolic relational structure, enabling causal connections to be constructed, inspected, and reused. On 100 open-ended questions from materials science and mechanics literature, Graph-PRefLexOR achieves 40-65% improvements over corresponding base models, with the largest gains in reasoning traceability. Embedding analyses show broader semantic exploration and approximately 2-3 times greater semantic diversity than baselines. Semantic backtracking and layer-wise hidden-state analyses further show stronger alignment between structured reasoning and final answers. Finally, test-time graph expansion reveals that additional compute primarily increases long-range conceptual recombination within a bounded semantic space, rather than simply expanding semantic coverage. These results establish graph-native reinforcement learning as a pathway toward interpretable AI systems for scientific hypothesis generation in materials design and other scientific applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Graph-PRefLexOR tries to make hypothesis generation traceable by adding explicit graph phases to GRPO training, but the 40-65% gains are not isolated from the graph structure itself.

read the letter

The paper's core move is to structure reasoning into four phases—mechanism exploration, graph construction, pattern extraction, hypothesis synthesis—then train with GRPO so the model produces inspectable relational graphs alongside text. On 100 open-ended materials questions it reports 40-65% better traceability and 2-3x semantic diversity versus base models, plus some backtracking and hidden-state checks.

What stands out is the attempt to tie language output to reusable symbolic links rather than just prompting or standard RL. That direction is worth tracking for anyone building AI tools that need to show their work in science.

The main weakness is exactly the one the stress-test flags: no ablations that keep the base model, data, and GRPO fixed while dropping only the graph-construction step. Without those controls it is impossible to know whether the gains come from the graph-native design or from longer context, dataset effects, or GRPO itself. The evaluation set is drawn from existing literature, which also leaves open whether the outputs are genuinely new or just recombining seen patterns. No statistical tests or baseline definitions appear in the abstract, and the full text does not appear to close that gap either.

This is for groups already working on traceable or graph-augmented reasoning in materials or similar domains. A reader who wants concrete evidence that the graph component drives the result will not get it here.

I would send it to peer review only if the authors add the missing ablations and clearer baseline reporting; otherwise the central claim stays under-supported.

Referee Report

3 major / 2 minor

Summary. The paper introduces Graph-PRefLexOR, a family of graph-native reasoning models fine-tuned with Group Relative Policy Optimization (GRPO) that structures reasoning into explicit phases (mechanism exploration, graph construction, pattern extraction, hypothesis synthesis) to link neural generation with symbolic relational structure for traceable hypothesis generation in materials science. On 100 open-ended questions from the literature, it claims 40-65% improvements over base models (largest in traceability), ~2-3x greater semantic diversity, stronger reasoning-answer alignment via semantic backtracking and hidden-state analyses, and that test-time graph expansion increases long-range recombination within a bounded space.

Significance. If the performance gains can be shown to arise specifically from the graph-native phased structure (rather than GRPO, data curation, or context length), the approach would offer a concrete mechanism for improving interpretability and traceability in LLM-based scientific reasoning, with potential applicability beyond materials design.

major comments (3)

[Evaluation / Results] Evaluation section (100-question benchmark): the headline 40-65% gains and traceability improvements are reported without ablations that hold training data, compute budget, and base model fixed while removing only the graph-construction / symbolic-recombination components; comparisons appear limited to 'corresponding base models' without non-graph GRPO or standard SFT controls, so it is impossible to attribute gains to the graph-native structure as claimed.
[Methods] Methods / Experimental setup: no definition is provided for how 'reasoning traceability' was quantified (e.g., the precise metric, inter-annotator protocol, or automated proxy used for the largest reported gains), nor are statistical tests or question-selection criteria described, leaving the central performance claims without visible supporting evidence.
[Semantic analyses] § on semantic analyses: the claims of 'broader semantic exploration' and 'approximately 2-3 times greater semantic diversity' rest on embedding analyses whose construction (distance metric, embedding model, normalization) is not specified, preventing verification that these quantities are independent of the training process itself.

minor comments (2)

[Abstract] Abstract and introduction use 'traceable' and 'interpretable' interchangeably without a crisp operational distinction.
[Figures] Figure captions for the layer-wise hidden-state and test-time expansion plots should explicitly state the number of runs and error bars.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve clarity and empirical rigor.

read point-by-point responses

Referee: [Evaluation / Results] Evaluation section (100-question benchmark): the headline 40-65% gains and traceability improvements are reported without ablations that hold training data, compute budget, and base model fixed while removing only the graph-construction / symbolic-recombination components; comparisons appear limited to 'corresponding base models' without non-graph GRPO or standard SFT controls, so it is impossible to attribute gains to the graph-native structure as claimed.

Authors: We agree that the current comparisons to base models do not fully isolate the graph-construction and symbolic-recombination components from GRPO or data effects. To strengthen attribution, we will add the requested ablations in the revision, holding training data, compute budget, and base model fixed while including non-graph GRPO and standard SFT controls. revision: yes
Referee: [Methods] Methods / Experimental setup: no definition is provided for how 'reasoning traceability' was quantified (e.g., the precise metric, inter-annotator protocol, or automated proxy used for the largest reported gains), nor are statistical tests or question-selection criteria described, leaving the central performance claims without visible supporting evidence.

Authors: We will expand the Methods section in the revision to define the reasoning traceability metric (including the automated proxy and human validation protocol with inter-annotator agreement), report the statistical tests used, and detail the question-selection criteria from the literature. revision: yes
Referee: [Semantic analyses] § on semantic analyses: the claims of 'broader semantic exploration' and 'approximately 2-3 times greater semantic diversity' rest on embedding analyses whose construction (distance metric, embedding model, normalization) is not specified, preventing verification that these quantities are independent of the training process itself.

Authors: We will specify the full construction of the embedding analyses in the revision, including the embedding model, distance metric, normalization steps, and controls to confirm independence from the training process. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The abstract and provided text describe an empirical method (Graph-PRefLexOR with GRPO) and report performance gains on an external benchmark of 100 literature questions. No equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citations are present that would reduce the claimed improvements to quantities defined by the training process itself. The evaluation is presented as an independent test set, making the result self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the phases and GRPO are introduced but not formalized enough to audit.

pith-pipeline@v0.9.1-grok · 5785 in / 1136 out tokens · 26159 ms · 2026-07-02T12:25:03.909489+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 51 canonical work pages · 21 internal anchors

[1]

Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=

Real-time segmentation of on-line handwritten arabic script , author=. Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=. 2014 , organization=

2014
[2]

Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=

Fast classification of handwritten on-line Arabic characters , author=. Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=. 2014 , organization=

2014
[3]

Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications

Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications , author=. arXiv preprint arXiv:1804.09028 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Advanced Intelligent Discovery , author =

In. Advanced Intelligent Discovery , author =. 2025 , pages =. doi:10.1002/aidi.202500006 , abstract =

work page doi:10.1002/aidi.202500006 2025
[5]

, month = may, year =

Buehler, Markus J. , month = may, year =. npj Artificial Intelligence , publisher =. doi:10.1038/s44387-025-00003-z , abstract =

work page doi:10.1038/s44387-025-00003-z
[6]

Yang, An and Li, Anfeng and Yang, Baosong and Zhang, Beichen and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Gao, Chang and Huang, Chengen and Lv, Chenxu and Zheng, Chujie and Liu, Dayiheng and Zhou, Fan and Huang, Fei and Hu, Feng and Ge, Hao and Wei, Haoran and Lin, Huan and Tang, Jialong and Yang, Jian and Tu, Jianhong and Zhang, Jianwei and Yang, Jia...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.09388
[7]

Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, Y. K. and Wu, Y. and Guo, Daya , month = apr, year =. doi:10.48550/arXiv.2402.03300 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300
[8]

ACM Computing Surveys , author =

Knowledge. ACM Computing Surveys , author =. 2022 , note =. doi:10.1145/3447772 , abstract =

work page doi:10.1145/3447772 2022
[9]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , month = dec, year =. Judging. doi:10.48550/arXiv.2306.05685 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.05685
[10]

Vera, Henrique Schechter and Dua, Sahil and Zhang, Biao and Salz, Daniel and Mullins, Ryan and Panyam, Sindhu Raghuram and Smoot, Sara and Naim, Iftekhar and Zou, Joe and Chen, Feiyang and Cer, Daniel and Lisak, Alice and Choi, Min and Gonzalez, Lucas and Sanseviero, Omar and Cameron, Glenn and Ballantyne, Ian and Black, Kat and Chen, Kaifeng and Wang, We...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509.20354
[11]

WIREs Computational Statistics , author =

Principal component analysis , volume =. WIREs Computational Statistics , author =. 2010 , note =. doi:10.1002/wics.101 , abstract =

work page doi:10.1002/wics.101 2010
[12]

, editor =

Scott, David W. , editor =. Multivariate. Handbook of. 2012 , keywords =. doi:10.1007/978-3-642-21551-3_19 , abstract =

work page doi:10.1007/978-3-642-21551-3_19 2012
[13]

2025 , howpublished =

Marker: Convert PDF to Markdown, JSON, and HTML , author =. 2025 , howpublished =

2025
[14]

2024 , howpublished =

GPT-4o mini Model , author =. 2024 , howpublished =

2024
[15]

2026 , howpublished =

GPT-5.5 Model , author =. 2026 , howpublished =

2026
[16]

Wang, Hanchen and Fu, Tianfan and Du, Yuanqi and Gao, Wenhao and Huang, Kexin and Liu, Ziming and Chandak, Payal and Liu, Shengchao and Van Katwyk, Peter and Deac, Andreea and Anandkumar, Anima and Bergen, Karianne and Gomes, Carla P. and Ho, Shirley and Kohli, Pushmeet and Lasenby, Joan and Leskovec, Jure and Liu, Tie-Yan and Manrai, Arjun and Marks, Deb...

work page doi:10.1038/s41586-023-06221-2
[17]

Accelerating scientific discovery with generative knowledge extraction, graph-based representation, and multimodal intelligent graph reasoning , volume =

Buehler, Markus J , month = sep, year =. Accelerating scientific discovery with generative knowledge extraction, graph-based representation, and multimodal intelligent graph reasoning , volume =. Machine Learning: Science and Technology , publisher =. doi:10.1088/2632-2153/ad7228 , abstract =

work page doi:10.1088/2632-2153/ad7228
[18]

and Kanhaiya, Krishan and Bockstaller, Michael R

Nepal, Dhriti and Kang, Saewon and Adstedt, Katarina M. and Kanhaiya, Krishan and Bockstaller, Michael R. and Brinson, L. Catherine and Buehler, Markus J. and Coveney, Peter V. and Dayal, Kaushik and El-Awady, Jaafar A. and Henderson, Luke C. and Kaplan, David L. and Keten, Sinan and Kotov, Nicholas A. and Schatz, George C. and Vignolini, Silvia and Vollr...

work page doi:10.1038/s41563-022-01384-1
[19]

Wegst, Ulrike G. K. and Bai, Hao and Saiz, Eduardo and Tomsia, Antoni P. and Ritchie, Robert O. , month = jan, year =. Bioinspired structural materials , volume =. Nature Materials , publisher =. doi:10.1038/nmat4089 , abstract =

work page doi:10.1038/nmat4089
[20]

, year =

Swanson, Don R. , year =. Undiscovered. The Library Quarterly: Information, Community, Policy , publisher =
[21]

Attention is

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, Ł ukasz and Polosukhin, Illia , year =. Attention is. Advances in
[22]

Advances in Neural Information Processing Systems , author =

Language. Advances in Neural Information Processing Systems , author =. 2020 , pages =

2020
[23]

Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and Du, Yifan and Yang, Chen and Chen, Yushuo and Chen, Zhipeng and Jiang, Jinhao and Ren, Ruiyang and Li, Yifan and Tang, Xinyu and Liu, Zikang and Liu, Peiyu and Nie, Jian-Yun and Wen, Ji-R...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.18223
[24]

Zhang, Yanbo and Khan, Sumeer A. and Mahmud, Adnan and Yang, Huck and Lavin, Alexander and Levin, Michael and Frey, Jeremy and Dunnmon, Jared and Evans, James and Bundy, Alan and Dzeroski, Saso and Tegner, Jesper and Zenil, Hector , month = aug, year =. Exploring the role of large language models in the scientific method: from hypothesis to discovery , vo...

work page doi:10.1038/s44387-025-00019-5
[25]

2024 , pages =

Advanced Science , author =. 2024 , pages =. doi:10.1002/advs.202306724 , abstract =

work page doi:10.1002/advs.202306724 2024
[26]

and Buehler, M

Ghafarollahi, A. and Buehler, M. J. , month = jan, year =. doi:10.48550/arXiv.2402.04268 , abstract =

work page doi:10.48550/arxiv.2402.04268
[27]

Lu, Chris and Lu, Cong and Lange, Robert Tjarko and Foerster, Jakob and Clune, Jeff and Ha, David , month = sep, year =. The. doi:10.48550/arXiv.2408.06292 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.06292
[28]

, month = mar, year =

Hage, Tarjei Paule and Buehler, Markus J. , month = mar, year =. doi:10.48550/arXiv.2603.04124 , abstract =

work page doi:10.48550/arxiv.2603.04124
[29]

doi:10.1115/1.4063843 , abstract =

Applied Mechanics Reviews , author =. doi:10.1115/1.4063843 , abstract =

work page doi:10.1115/1.4063843
[30]

2023 , keywords =

Journal of the Mechanics and Physics of Solids , author =. 2023 , keywords =. doi:10.1016/j.jmps.2023.105454 , abstract =

work page doi:10.1016/j.jmps.2023.105454 2023
[31]

2025 , pages =

Advanced Materials , author =. 2025 , pages =. doi:10.1002/adma.202413523 , abstract =

work page doi:10.1002/adma.202413523 2025
[32]

, month = apr, year =

Ghafarollahi, Alireza and Buehler, Markus J. , month = apr, year =. Sparks:. doi:10.48550/arXiv.2504.19017 , abstract =

work page doi:10.48550/arxiv.2504.19017
[33]

Advances in Neural Information Processing Systems , author =

Chain-of-. Advances in Neural Information Processing Systems , author =. 2022 , pages =

2022
[34]

doi:10.1088/3050-287X/ae61d1 , abstract =

AI for Science , author =. doi:10.1088/3050-287X/ae61d1 , abstract =

work page doi:10.1088/3050-287x/ae61d1
[36]

and Hage, Tarjei Paule and Hsu, Yu-Chuan and Buehler, Markus J

Stewart, Isabella A. and Hage, Tarjei Paule and Hsu, Yu-Chuan and Buehler, Markus J. , month = feb, year =. doi:10.48550/arXiv.2602.07491 , abstract =

work page doi:10.48550/arxiv.2602.07491
[37]

Retrieval-

Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and Küttler, Heinrich and Lewis, Mike and Yih, Wen-tau and Rocktäschel, Tim and Riedel, Sebastian and Kiela, Douwe , year =. Retrieval-. Advances in
[38]

and Marom, Lee and Pal, Subhadeep and Luu, Rachel K

Wang, Fiona Y. and Marom, Lee and Pal, Subhadeep and Luu, Rachel K. and Lu, Wei and Berkovich, Jaime A. and Buehler, Markus J. , month = mar, year =. Autonomous. doi:10.48550/arXiv.2603.14312 , abstract =

work page doi:10.48550/arxiv.2603.14312
[39]

Scientific Data , publisher =

Venugopal, Vineeth and Olivetti, Elsa , month = feb, year =. Scientific Data , publisher =. doi:10.1038/s41597-024-03039-z , abstract =

work page doi:10.1038/s41597-024-03039-z
[40]

, month = jan, year =

Ghafarollahi, Alireza and Buehler, Markus J. , month = jan, year =. Automating alloy design and discovery with physics-aware multimodal multiagent. Proceedings of the National Academy of Sciences , publisher =. doi:10.1073/pnas.2414074122 , abstract =

work page doi:10.1073/pnas.2414074122
[41]

MRS Bulletin , author =

Rapid and automated alloy design with graph neural network-powered large language model-driven multi-agent. MRS Bulletin , author =. 2025 , keywords =. doi:10.1557/s43577-025-00953-4 , abstract =

work page doi:10.1557/s43577-025-00953-4 2025
[42]

Retrieval-Augmented Generation for Large Language Models: A Survey

Gao, Yunfan and Xiong, Yun and Gao, Xinyu and Jia, Kangxiang and Pan, Jinliu and Bi, Yuxi and Dai, Yi and Sun, Jiawei and Wang, Meng and Wang, Haofen , month = mar, year =. Retrieval-. doi:10.48550/arXiv.2312.10997 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.10997
[43]

IEEE Transactions on Knowledge and Data Engineering , author =

Unifying. IEEE Transactions on Knowledge and Data Engineering , author =. 2024 , keywords =. doi:10.1109/TKDE.2024.3352100 , abstract =

work page doi:10.1109/tkde.2024.3352100 2024
[44]

Eliciting Latent Predictions from Transformers with the Tuned Lens

Eliciting Latent Predictions from Transformers with the Tuned Lens. arXiv e-prints , keywords =. doi:10.48550/arXiv.2303.08112 , archivePrefix =. 2303.08112 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08112
[45]

How to use and interpret activation patching

How to use and interpret activation patching. arXiv e-prints , keywords =. doi:10.48550/arXiv.2404.15255 , archivePrefix =. 2404.15255 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.15255
[46]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv e-prints , keywords =. doi:10.48550/arXiv.2201.11903 , archivePrefix =. 2201.11903 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2201.11903
[47]

Measuring Faithfulness in Chain-of-Thought Reasoning

Measuring Faithfulness in Chain-of-Thought Reasoning. arXiv e-prints , keywords =. doi:10.48550/arXiv.2307.13702 , archivePrefix =. 2307.13702 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.13702
[48]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv e-prints , keywords =. doi:10.48550/arXiv.1908.10084 , archivePrefix =. 1908.10084 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1908.10084 1908
[49]

MTEB: Massive Text Embedding Benchmark

MTEB: Massive Text Embedding Benchmark. arXiv e-prints , keywords =. doi:10.48550/arXiv.2210.07316 , archivePrefix =. 2210.07316 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.07316
[50]

Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations

Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. arXiv e-prints , keywords =. doi:10.48550/arXiv.1703.03717 , archivePrefix =. 1703.03717 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1703.03717
[51]

arXiv e-prints , keywords =

Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning. arXiv e-prints , keywords =. doi:10.48550/arXiv.2402.13950 , archivePrefix =. 2402.13950 , primaryClass =

work page doi:10.48550/arxiv.2402.13950
[52]

arXiv e-prints , keywords =

A Primer in BERTology: What we know about how BERT works. arXiv e-prints , keywords =. doi:10.48550/arXiv.2002.12327 , archivePrefix =. 2002.12327 , primaryClass =

work page doi:10.48550/arxiv.2002.12327 2002
[53]

What Does BERT Look At? An Analysis of BERT's Attention

What Does BERT Look At? An Analysis of BERT's Attention. arXiv e-prints , keywords =. doi:10.48550/arXiv.1906.04341 , archivePrefix =. 1906.04341 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1906.04341 1906
[54]

arXiv e-prints , keywords =

BERT Rediscovers the Classical NLP Pipeline. arXiv e-prints , keywords =. doi:10.48550/arXiv.1905.05950 , archivePrefix =. 1905.05950 , primaryClass =

work page doi:10.48550/arxiv.1905.05950 1905
[55]

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. arXiv e-prints , keywords =. doi:10.48550/arXiv.1706.05806 , archivePrefix =. 1706.05806 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.05806
[56]

Similarity of Neural Network Representations Revisited

Similarity of Neural Network Representations Revisited. arXiv e-prints , keywords =. doi:10.48550/arXiv.1905.00414 , archivePrefix =. 1905.00414 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1905.00414 1905
[57]

arXiv e-prints , keywords =

LogitLens4LLMs: Extending Logit Lens Analysis to Modern Large Language Models. arXiv e-prints , keywords =. doi:10.48550/arXiv.2503.11667 , archivePrefix =. 2503.11667 , primaryClass =

work page doi:10.48550/arxiv.2503.11667
[58]

arXiv e-prints , keywords =

Towards Automated Circuit Discovery for Mechanistic Interpretability. arXiv e-prints , keywords =. doi:10.48550/arXiv.2304.14997 , archivePrefix =. 2304.14997 , primaryClass =

work page doi:10.48550/arxiv.2304.14997
[59]

arXiv e-prints , keywords =

On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models. arXiv e-prints , keywords =. doi:10.48550/arXiv.2406.10625 , archivePrefix =. 2406.10625 , primaryClass =

work page doi:10.48550/arxiv.2406.10625
[60]

Understanding intermediate layers using linear classifier probes

Understanding intermediate layers using linear classifier probes. arXiv e-prints , keywords =. doi:10.48550/arXiv.1610.01644 , archivePrefix =. 1610.01644 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1610.01644
[61]

Analysis Methods in Neural Language Processing: A Survey

Analysis Methods in Neural Language Processing: A Survey. arXiv e-prints , keywords =. doi:10.48550/arXiv.1812.08951 , archivePrefix =. 1812.08951 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1812.08951
[62]

2023 , eprint=

C-Pack: Packaged Resources To Advance General Chinese Embedding , author=. 2023 , eprint=

2023
[63]

C-Pack: Packed Resources For General Chinese Embeddings

C-Pack: Packed Resources For General Chinese Embeddings. arXiv e-prints , keywords =. doi:10.48550/arXiv.2309.07597 , archivePrefix =. 2309.07597 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.07597
[64]

2024 , eprint=

ORPO: Monolithic Preference Optimization without Reference Model , author=. 2024 , eprint=

2024
[65]

Proceedings of the 29th Symposium on Operating Systems Principles (SOSP) , year =

Efficient Memory Management for Large Language Model Serving with PagedAttention , author =. Proceedings of the 29th Symposium on Operating Systems Principles (SOSP) , year =
[66]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =
[67]

2019 , eprint=

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , author=. 2019 , eprint=

2019

[1] [1]

Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=

Real-time segmentation of on-line handwritten arabic script , author=. Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=. 2014 , organization=

2014

[2] [2]

Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=

Fast classification of handwritten on-line Arabic characters , author=. Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=. 2014 , organization=

2014

[3] [3]

Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications

Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications , author=. arXiv preprint arXiv:1804.09028 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Advanced Intelligent Discovery , author =

In. Advanced Intelligent Discovery , author =. 2025 , pages =. doi:10.1002/aidi.202500006 , abstract =

work page doi:10.1002/aidi.202500006 2025

[5] [5]

, month = may, year =

Buehler, Markus J. , month = may, year =. npj Artificial Intelligence , publisher =. doi:10.1038/s44387-025-00003-z , abstract =

work page doi:10.1038/s44387-025-00003-z

[6] [6]

Yang, An and Li, Anfeng and Yang, Baosong and Zhang, Beichen and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Gao, Chang and Huang, Chengen and Lv, Chenxu and Zheng, Chujie and Liu, Dayiheng and Zhou, Fan and Huang, Fei and Hu, Feng and Ge, Hao and Wei, Haoran and Lin, Huan and Tang, Jialong and Yang, Jian and Tu, Jianhong and Zhang, Jianwei and Yang, Jia...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.09388

[7] [7]

Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, Y. K. and Wu, Y. and Guo, Daya , month = apr, year =. doi:10.48550/arXiv.2402.03300 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300

[8] [8]

ACM Computing Surveys , author =

Knowledge. ACM Computing Surveys , author =. 2022 , note =. doi:10.1145/3447772 , abstract =

work page doi:10.1145/3447772 2022

[9] [9]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , month = dec, year =. Judging. doi:10.48550/arXiv.2306.05685 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.05685

[10] [10]

Vera, Henrique Schechter and Dua, Sahil and Zhang, Biao and Salz, Daniel and Mullins, Ryan and Panyam, Sindhu Raghuram and Smoot, Sara and Naim, Iftekhar and Zou, Joe and Chen, Feiyang and Cer, Daniel and Lisak, Alice and Choi, Min and Gonzalez, Lucas and Sanseviero, Omar and Cameron, Glenn and Ballantyne, Ian and Black, Kat and Chen, Kaifeng and Wang, We...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509.20354

[11] [11]

WIREs Computational Statistics , author =

Principal component analysis , volume =. WIREs Computational Statistics , author =. 2010 , note =. doi:10.1002/wics.101 , abstract =

work page doi:10.1002/wics.101 2010

[12] [12]

, editor =

Scott, David W. , editor =. Multivariate. Handbook of. 2012 , keywords =. doi:10.1007/978-3-642-21551-3_19 , abstract =

work page doi:10.1007/978-3-642-21551-3_19 2012

[13] [13]

2025 , howpublished =

Marker: Convert PDF to Markdown, JSON, and HTML , author =. 2025 , howpublished =

2025

[14] [14]

2024 , howpublished =

GPT-4o mini Model , author =. 2024 , howpublished =

2024

[15] [15]

2026 , howpublished =

GPT-5.5 Model , author =. 2026 , howpublished =

2026

[16] [16]

Wang, Hanchen and Fu, Tianfan and Du, Yuanqi and Gao, Wenhao and Huang, Kexin and Liu, Ziming and Chandak, Payal and Liu, Shengchao and Van Katwyk, Peter and Deac, Andreea and Anandkumar, Anima and Bergen, Karianne and Gomes, Carla P. and Ho, Shirley and Kohli, Pushmeet and Lasenby, Joan and Leskovec, Jure and Liu, Tie-Yan and Manrai, Arjun and Marks, Deb...

work page doi:10.1038/s41586-023-06221-2

[17] [17]

Accelerating scientific discovery with generative knowledge extraction, graph-based representation, and multimodal intelligent graph reasoning , volume =

Buehler, Markus J , month = sep, year =. Accelerating scientific discovery with generative knowledge extraction, graph-based representation, and multimodal intelligent graph reasoning , volume =. Machine Learning: Science and Technology , publisher =. doi:10.1088/2632-2153/ad7228 , abstract =

work page doi:10.1088/2632-2153/ad7228

[18] [18]

and Kanhaiya, Krishan and Bockstaller, Michael R

Nepal, Dhriti and Kang, Saewon and Adstedt, Katarina M. and Kanhaiya, Krishan and Bockstaller, Michael R. and Brinson, L. Catherine and Buehler, Markus J. and Coveney, Peter V. and Dayal, Kaushik and El-Awady, Jaafar A. and Henderson, Luke C. and Kaplan, David L. and Keten, Sinan and Kotov, Nicholas A. and Schatz, George C. and Vignolini, Silvia and Vollr...

work page doi:10.1038/s41563-022-01384-1

[19] [19]

Wegst, Ulrike G. K. and Bai, Hao and Saiz, Eduardo and Tomsia, Antoni P. and Ritchie, Robert O. , month = jan, year =. Bioinspired structural materials , volume =. Nature Materials , publisher =. doi:10.1038/nmat4089 , abstract =

work page doi:10.1038/nmat4089

[20] [20]

, year =

Swanson, Don R. , year =. Undiscovered. The Library Quarterly: Information, Community, Policy , publisher =

[21] [21]

Attention is

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, Ł ukasz and Polosukhin, Illia , year =. Attention is. Advances in

[22] [22]

Advances in Neural Information Processing Systems , author =

Language. Advances in Neural Information Processing Systems , author =. 2020 , pages =

2020

[23] [23]

Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and Du, Yifan and Yang, Chen and Chen, Yushuo and Chen, Zhipeng and Jiang, Jinhao and Ren, Ruiyang and Li, Yifan and Tang, Xinyu and Liu, Zikang and Liu, Peiyu and Nie, Jian-Yun and Wen, Ji-R...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.18223

[24] [24]

Zhang, Yanbo and Khan, Sumeer A. and Mahmud, Adnan and Yang, Huck and Lavin, Alexander and Levin, Michael and Frey, Jeremy and Dunnmon, Jared and Evans, James and Bundy, Alan and Dzeroski, Saso and Tegner, Jesper and Zenil, Hector , month = aug, year =. Exploring the role of large language models in the scientific method: from hypothesis to discovery , vo...

work page doi:10.1038/s44387-025-00019-5

[25] [25]

2024 , pages =

Advanced Science , author =. 2024 , pages =. doi:10.1002/advs.202306724 , abstract =

work page doi:10.1002/advs.202306724 2024

[26] [26]

and Buehler, M

Ghafarollahi, A. and Buehler, M. J. , month = jan, year =. doi:10.48550/arXiv.2402.04268 , abstract =

work page doi:10.48550/arxiv.2402.04268

[27] [27]

Lu, Chris and Lu, Cong and Lange, Robert Tjarko and Foerster, Jakob and Clune, Jeff and Ha, David , month = sep, year =. The. doi:10.48550/arXiv.2408.06292 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.06292

[28] [28]

, month = mar, year =

Hage, Tarjei Paule and Buehler, Markus J. , month = mar, year =. doi:10.48550/arXiv.2603.04124 , abstract =

work page doi:10.48550/arxiv.2603.04124

[29] [29]

doi:10.1115/1.4063843 , abstract =

Applied Mechanics Reviews , author =. doi:10.1115/1.4063843 , abstract =

work page doi:10.1115/1.4063843

[30] [30]

2023 , keywords =

Journal of the Mechanics and Physics of Solids , author =. 2023 , keywords =. doi:10.1016/j.jmps.2023.105454 , abstract =

work page doi:10.1016/j.jmps.2023.105454 2023

[31] [31]

2025 , pages =

Advanced Materials , author =. 2025 , pages =. doi:10.1002/adma.202413523 , abstract =

work page doi:10.1002/adma.202413523 2025

[32] [32]

, month = apr, year =

Ghafarollahi, Alireza and Buehler, Markus J. , month = apr, year =. Sparks:. doi:10.48550/arXiv.2504.19017 , abstract =

work page doi:10.48550/arxiv.2504.19017

[33] [33]

Advances in Neural Information Processing Systems , author =

Chain-of-. Advances in Neural Information Processing Systems , author =. 2022 , pages =

2022

[34] [34]

doi:10.1088/3050-287X/ae61d1 , abstract =

AI for Science , author =. doi:10.1088/3050-287X/ae61d1 , abstract =

work page doi:10.1088/3050-287x/ae61d1

[35] [36]

and Hage, Tarjei Paule and Hsu, Yu-Chuan and Buehler, Markus J

Stewart, Isabella A. and Hage, Tarjei Paule and Hsu, Yu-Chuan and Buehler, Markus J. , month = feb, year =. doi:10.48550/arXiv.2602.07491 , abstract =

work page doi:10.48550/arxiv.2602.07491

[36] [37]

Retrieval-

Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and Küttler, Heinrich and Lewis, Mike and Yih, Wen-tau and Rocktäschel, Tim and Riedel, Sebastian and Kiela, Douwe , year =. Retrieval-. Advances in

[37] [38]

and Marom, Lee and Pal, Subhadeep and Luu, Rachel K

Wang, Fiona Y. and Marom, Lee and Pal, Subhadeep and Luu, Rachel K. and Lu, Wei and Berkovich, Jaime A. and Buehler, Markus J. , month = mar, year =. Autonomous. doi:10.48550/arXiv.2603.14312 , abstract =

work page doi:10.48550/arxiv.2603.14312

[38] [39]

Scientific Data , publisher =

Venugopal, Vineeth and Olivetti, Elsa , month = feb, year =. Scientific Data , publisher =. doi:10.1038/s41597-024-03039-z , abstract =

work page doi:10.1038/s41597-024-03039-z

[39] [40]

, month = jan, year =

Ghafarollahi, Alireza and Buehler, Markus J. , month = jan, year =. Automating alloy design and discovery with physics-aware multimodal multiagent. Proceedings of the National Academy of Sciences , publisher =. doi:10.1073/pnas.2414074122 , abstract =

work page doi:10.1073/pnas.2414074122

[40] [41]

MRS Bulletin , author =

Rapid and automated alloy design with graph neural network-powered large language model-driven multi-agent. MRS Bulletin , author =. 2025 , keywords =. doi:10.1557/s43577-025-00953-4 , abstract =

work page doi:10.1557/s43577-025-00953-4 2025

[41] [42]

Retrieval-Augmented Generation for Large Language Models: A Survey

Gao, Yunfan and Xiong, Yun and Gao, Xinyu and Jia, Kangxiang and Pan, Jinliu and Bi, Yuxi and Dai, Yi and Sun, Jiawei and Wang, Meng and Wang, Haofen , month = mar, year =. Retrieval-. doi:10.48550/arXiv.2312.10997 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.10997

[42] [43]

IEEE Transactions on Knowledge and Data Engineering , author =

Unifying. IEEE Transactions on Knowledge and Data Engineering , author =. 2024 , keywords =. doi:10.1109/TKDE.2024.3352100 , abstract =

work page doi:10.1109/tkde.2024.3352100 2024

[43] [44]

Eliciting Latent Predictions from Transformers with the Tuned Lens

Eliciting Latent Predictions from Transformers with the Tuned Lens. arXiv e-prints , keywords =. doi:10.48550/arXiv.2303.08112 , archivePrefix =. 2303.08112 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08112

[44] [45]

How to use and interpret activation patching

How to use and interpret activation patching. arXiv e-prints , keywords =. doi:10.48550/arXiv.2404.15255 , archivePrefix =. 2404.15255 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.15255

[45] [46]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv e-prints , keywords =. doi:10.48550/arXiv.2201.11903 , archivePrefix =. 2201.11903 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2201.11903

[46] [47]

Measuring Faithfulness in Chain-of-Thought Reasoning

Measuring Faithfulness in Chain-of-Thought Reasoning. arXiv e-prints , keywords =. doi:10.48550/arXiv.2307.13702 , archivePrefix =. 2307.13702 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.13702

[47] [48]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv e-prints , keywords =. doi:10.48550/arXiv.1908.10084 , archivePrefix =. 1908.10084 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1908.10084 1908

[48] [49]

MTEB: Massive Text Embedding Benchmark

MTEB: Massive Text Embedding Benchmark. arXiv e-prints , keywords =. doi:10.48550/arXiv.2210.07316 , archivePrefix =. 2210.07316 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.07316

[49] [50]

Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations

Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. arXiv e-prints , keywords =. doi:10.48550/arXiv.1703.03717 , archivePrefix =. 1703.03717 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1703.03717

[50] [51]

arXiv e-prints , keywords =

Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning. arXiv e-prints , keywords =. doi:10.48550/arXiv.2402.13950 , archivePrefix =. 2402.13950 , primaryClass =

work page doi:10.48550/arxiv.2402.13950

[51] [52]

arXiv e-prints , keywords =

A Primer in BERTology: What we know about how BERT works. arXiv e-prints , keywords =. doi:10.48550/arXiv.2002.12327 , archivePrefix =. 2002.12327 , primaryClass =

work page doi:10.48550/arxiv.2002.12327 2002

[52] [53]

What Does BERT Look At? An Analysis of BERT's Attention

What Does BERT Look At? An Analysis of BERT's Attention. arXiv e-prints , keywords =. doi:10.48550/arXiv.1906.04341 , archivePrefix =. 1906.04341 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1906.04341 1906

[53] [54]

arXiv e-prints , keywords =

BERT Rediscovers the Classical NLP Pipeline. arXiv e-prints , keywords =. doi:10.48550/arXiv.1905.05950 , archivePrefix =. 1905.05950 , primaryClass =

work page doi:10.48550/arxiv.1905.05950 1905

[54] [55]

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. arXiv e-prints , keywords =. doi:10.48550/arXiv.1706.05806 , archivePrefix =. 1706.05806 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.05806

[55] [56]

Similarity of Neural Network Representations Revisited

Similarity of Neural Network Representations Revisited. arXiv e-prints , keywords =. doi:10.48550/arXiv.1905.00414 , archivePrefix =. 1905.00414 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1905.00414 1905

[56] [57]

arXiv e-prints , keywords =

LogitLens4LLMs: Extending Logit Lens Analysis to Modern Large Language Models. arXiv e-prints , keywords =. doi:10.48550/arXiv.2503.11667 , archivePrefix =. 2503.11667 , primaryClass =

work page doi:10.48550/arxiv.2503.11667

[57] [58]

arXiv e-prints , keywords =

Towards Automated Circuit Discovery for Mechanistic Interpretability. arXiv e-prints , keywords =. doi:10.48550/arXiv.2304.14997 , archivePrefix =. 2304.14997 , primaryClass =

work page doi:10.48550/arxiv.2304.14997

[58] [59]

arXiv e-prints , keywords =

On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models. arXiv e-prints , keywords =. doi:10.48550/arXiv.2406.10625 , archivePrefix =. 2406.10625 , primaryClass =

work page doi:10.48550/arxiv.2406.10625

[59] [60]

Understanding intermediate layers using linear classifier probes

Understanding intermediate layers using linear classifier probes. arXiv e-prints , keywords =. doi:10.48550/arXiv.1610.01644 , archivePrefix =. 1610.01644 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1610.01644

[60] [61]

Analysis Methods in Neural Language Processing: A Survey

Analysis Methods in Neural Language Processing: A Survey. arXiv e-prints , keywords =. doi:10.48550/arXiv.1812.08951 , archivePrefix =. 1812.08951 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1812.08951

[61] [62]

2023 , eprint=

C-Pack: Packaged Resources To Advance General Chinese Embedding , author=. 2023 , eprint=

2023

[62] [63]

C-Pack: Packed Resources For General Chinese Embeddings

C-Pack: Packed Resources For General Chinese Embeddings. arXiv e-prints , keywords =. doi:10.48550/arXiv.2309.07597 , archivePrefix =. 2309.07597 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.07597

[63] [64]

2024 , eprint=

ORPO: Monolithic Preference Optimization without Reference Model , author=. 2024 , eprint=

2024

[64] [65]

Proceedings of the 29th Symposium on Operating Systems Principles (SOSP) , year =

Efficient Memory Management for Large Language Model Serving with PagedAttention , author =. Proceedings of the 29th Symposium on Operating Systems Principles (SOSP) , year =

[65] [66]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

[66] [67]

2019 , eprint=

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , author=. 2019 , eprint=

2019