{"total":16,"items":[{"citing_arxiv_id":"2606.23394","ref_index":4,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Do LLM Embedding Spaces Recover Expert Structure?","primary_cat":"cs.CL","submitted_at":"2026-06-22T14:19:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Pretrained and fine-tuned Qwen3 embeddings exhibit measurable alignment with an expert symptom matrix via RSA on Reddit mental-health data, strengthened by fine-tuning at fine-grained levels and larger scale, with residual alignment after VAD/LIWC/topic controls.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.07103","ref_index":10,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Style or Content? Evaluating Style Classifiers with Controlled Content Overlap","primary_cat":"cs.CL","submitted_at":"2026-06-05T09:53:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Defines overlap parameter alpha as normalized residual mutual information between content and style, then shows RoBERTa classifiers degrade differently under content removal depending on training overlap level.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01045","ref_index":197,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Child-directed speech facilitates production, not comprehension, in BabyLMs","primary_cat":"cs.CL","submitted_at":"2026-05-31T06:27:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CDS-trained BabyLMs show earlier and more appropriate production in a new frame-completion task while FineWeb-edu models lead on comprehension benchmarks, indicating current tests underestimate CDS benefits.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27006","ref_index":50,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Sampling Data with Chains of Forward-Backward Diffusion Steps","primary_cat":"cs.LG","submitted_at":"2026-05-26T13:26:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"U-turn chains are Markov chains formed by short forward-backward diffusion steps that remain on the learned manifold and, with Metropolis-Hastings, sample from energy-modified targets, exhibiting an ergodicity-breaking transition on fragmented manifolds.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14125","ref_index":45,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Polar probe linearly decodes semantic structures from LLMs","primary_cat":"cs.CL","submitted_at":"2026-05-13T21:21:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs represent semantic relations geometrically via embedding distance and direction; a linear Polar Probe decodes these structures from middle-layer activations and generalizes to new entities.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12411","ref_index":20,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling","primary_cat":"cs.LG","submitted_at":"2026-05-12T17:09:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A tabular foundation model with LLM-as-Observer features predicts AI agent decisions in controlled games, outperforming baselines by 4 AUC points and 14% lower error at K=16 interactions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"your money where your mouth is: Evaluating strategic planning and execution of LLM agents in an auction arena.arXiv preprint arXiv:2310.05746, 2023. [19] Robert M. Coehoorn and Nicholas R. Jennings. Learning on opponent's preferences to make effective multi-issue negotiation trade-offs. InProceedings of the 6th International Conference on Electronic Commerce, pages 59-68, 2004. [20] Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, and Marco Baroni. What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties. InProceedings of the 56th Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers), pages 2126-2136, Melbourne, Australia, 2018. Association"},{"citing_arxiv_id":"2605.11448","ref_index":2,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Deep Minds and Shallow Probes","primary_cat":"cs.LG","submitted_at":"2026-05-12T02:59:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Symmetry under affine reparameterizations of hidden coordinates selects a unique hierarchy of shallow coordinate-stable probes and a probe-visible quotient for cross-model transfer.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"are governed by the same geometric question: which properties of a neural representation survive natural group actions, and which survive only in a particular basis? References [1] Yonatan Belinkov. Probing classifiers: Promises, shortcomings, and advances.Computational Linguis- tics, 48(1):207-219, March 2022. doi: 10.1162/coli_a_00422. URL https://aclanthology. org/2022.cl-1.7/. [2] Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, and Marco Baroni. What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2126-2136, Melbourne, Australia, July 2018."},{"citing_arxiv_id":"2605.11410","ref_index":31,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"What Do EEG Foundation Models Capture from Human Brain Signals?","primary_cat":"cs.AI","submitted_at":"2026-05-12T01:57:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"EEG foundation models encode 68.6% of a 63-feature clinical lexicon in a representation-causal way, with frequency-domain features dominant; these recover 79.3% of the models' advantage over random baselines on average.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"C004(Higuchi-style lag-difference slope proxy)= clip [0,3] OLS slope oflogL ∗(k)vs.log(1/k) \u0001 , (30) where L∗(k) = mean t |xt+k −x t| for k= 1, . . . , k max with kmax = min(8,⌊T /4⌋) . This is a lag-difference slope proxy, not the full Higuchi curve-length estimator with multiple starting offsets. C005(DFA-style exponent proxy)= clip [−1,2] OLS slope oflogF(s)vs.logs \u0001 ,(31) 15 where Yt =Pt u=1(xu −¯x), F(s) is the root-mean-square residual after per-window linear detrending of Y on the fixed dyadic window set s∈ {16,32,64,128,256,512} ∩ {w: 2w≤T} , and the slope is fit by ordinary least squares. C006(normalized1/e-decay ACF lag)=τ 1/e / τmax,(32) C007(normalized first-zero ACF lag)=τ 0 / τmax,(33) where ρ(τ) is the FFT-based autocorrelation, τ1/e = min{τ >0 :ρ(τ)≤1/e} , τ0 = min{τ >0 :"},{"citing_arxiv_id":"2605.11206","ref_index":11,"ref_count":4,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Instructions Shape Production of Language, not Processing","primary_cat":"cs.CL","submitted_at":"2026-05-11T20:21:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.","context_count":2,"top_context_role":"background","top_context_polarity":"unclear","context_text":"),Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2126-2136, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1198. URLhttps://aclanthology.org/P18-1198/. Artur d'Avila Garcez and Luís C. Lamb. Neurosymbolic AI: the 3rd wave.Artif. Intell. Rev., 56(11):12387- 12406, 2023. doi: 10.1007/S10462-023-10448-W. URLhttps://doi.org/10.1007/s10462-023-10448-w. Ferdinand de Saussure.Cours de linguistique générale. Payot, Paris, 1916. URLhttps://books.google. ch/books?id=B38KAQAAMAAJ. 14 Gary Dell. A spreading-activation theory of retrieval in sentence production.Psychological Review, 93: 283-321, 07 1986."},{"citing_arxiv_id":"2605.04980","ref_index":4,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Conceptors for Semantic Steering","primary_cat":"cs.LG","submitted_at":"2026-05-06T14:32:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Conceptors as soft projection matrices from bipolar activations offer a multidimensional, compositional, and geometrically principled method for semantic steering in LLMs that outperforms single-vector baselines in multi-dimensional subspaces.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01381","ref_index":57,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A framework for analyzing concept representations in neural models","primary_cat":"cs.CL","submitted_at":"2026-05-02T11:08:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00607","ref_index":11,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe","primary_cat":"cs.CL","submitted_at":"2026-05-01T12:19:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.14004","ref_index":52,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-01-20T14:23:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"#* vector: Probing sentence embeddings for linguistic properties. In Iryna Gurevych and Yusuke Miyao, editors,Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pages 2126-2136. Association for Computational Linguistics, 2018. doi: 10.18653/V1/P18-1198. URLhttps://aclanthology.org/P18-1198/. [52] Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models.arXiv preprint arXiv:2309.08600, 2023. [53] Bartosz Cywiński, Bart Bussmann, Arthur Conmy, Josh Engels, Neel Nanda, and Senthooran Rajamanoharan. Can we interpret latent reasoning using current mechanistic interpretabil-"},{"citing_arxiv_id":"2211.05100","ref_index":221,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model","primary_cat":"cs.CL","submitted_at":"2022-11-09T18:48:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2202.05262","ref_index":8,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Locating and Editing Factual Associations in GPT","primary_cat":"cs.CL","submitted_at":"2022-02-10T18:59:54+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2102.12452","ref_index":19,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Probing Classifiers: Promises, Shortcomings, and Advances","primary_cat":"cs.CL","submitted_at":"2021-02-24T18:36:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Probing classifiers are a common but limited method for analyzing linguistic knowledge in neural NLP models, and this review outlines their promises, methodological shortcomings, and recent advances.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}