{"total":14,"items":[{"citing_arxiv_id":"2606.22873","ref_index":70,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning","primary_cat":"cs.CV","submitted_at":"2026-06-22T05:37:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SingGuard introduces a policy-adaptive multimodal LLM guardrail with dynamic reasoning regimes and SingGuard-Bench, reporting SOTA F1 scores across 35 datasets and improved policy-following accuracy under runtime shifts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.05863","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction","primary_cat":"cs.LG","submitted_at":"2026-06-04T08:39:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Deep linear network theory derives logarithmic decay for cross-entropy loss under gap-growth conditions versus polynomial closure for Schatten-regularized structural energy under late-time KL tails, separating fitting from simplification; conditional reductions extend this to ReLU MLPs with fixed ac","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00230","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization","primary_cat":"cs.LG","submitted_at":"2026-05-29T18:04:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An exposure-based split on BLiMP data reveals delayed generalization in five grammatical phenomena during LLM pre-training, with post-generalization shifts in concept vector predictiveness and attention patterns.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18022","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Unveiling Memorization-Generalization Coexistence: A Case Study on Arithmetic Tasks with Label Noise","primary_cat":"cs.LG","submitted_at":"2026-05-18T08:12:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Experiments on modular arithmetic with heavy label noise show that over-parameterized networks form a distributed internal generalization structure that can be extracted via frequency methods to achieve high accuracy despite 80% noise.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17767","ref_index":194,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent","primary_cat":"stat.ML","submitted_at":"2026-05-18T02:37:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Two steps of gradient descent on first-layer weights in linear-width two-layer networks produce a spiked random matrix with floor(alpha2/(1/2-alpha1)) outliers, each a learned direction, and batch reuse allows capturing directions with information exponent exceeding one.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07648","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning Large-Scale Modular Addition with an Auxiliary Modulus","primary_cat":"cs.LG","submitted_at":"2026-05-08T12:16:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An auxiliary modulus during training reduces wrap-around issues and preserves train-test input distributions, enabling better accuracy and sample efficiency for large N and q in modular addition learning.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"Specifically, it has been shown that models treat numbers as points in Fourier space and learn modular addition through algorithms such as addition of angles [21], the \"Pizza\" algorithm [30], and the approximate Chinese Remainder Theorem [19]. Additionally, the weights and embeddings of models trained on modular addition also exhibit periodic structures [17, 22], and periodic analytical solutions for these weights have been discovered [10]. Furthermore, these periodic structures are observed in other modular arithmetic tasks, such as modular multiplication [8, 9]. Therefore, helping models learn periodic structures is important. Previous methods, such as angle embeddings [26], directly add periodicity by treating numbers as points on a circle. Instead of changing the inputs, our method uses an auxiliary task with a different period"},{"citing_arxiv_id":"2605.06152","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes","primary_cat":"cs.LG","submitted_at":"2026-05-07T12:45:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Slingshot loss spikes are produced by low-precision arithmetic that breaks the zero-sum gradient constraint and drives exponential growth via Numerical Feature Inflation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00045","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Universal Quantum Transformer","primary_cat":"cs.AI","submitted_at":"2026-04-29T20:49:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"UQT on 5 qubits achieves exact deterministic learning of Z_11 modular arithmetic and S_4 non-Abelian algebra via quantum-native mechanisms, claiming to bypass classical attention limits and run on NISQ hardware.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21691","ref_index":248,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"There Will Be a Scientific Theory of Deep Learning","primary_cat":"stat.ML","submitted_at":"2026-04-23T13:58:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20817","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Convergent Evolution: How Different Language Models Learn Similar Number Representations","primary_cat":"cs.CL","submitted_at":"2026-04-22T17:45:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Diverse language models converge on similar periodic number features with a two-tier hierarchy of Fourier sparsity and geometric separability, acquired via language co-occurrences or multi-token arithmetic.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.26745","ref_index":60,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Deep sequence models tend to memorize geometrically; it is unclear why","primary_cat":"cs.LG","submitted_at":"2025-10-30T17:40:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"We sharpen these results by crafting a scenario where this ability is unexpected, plays out vividly, and can be cleanly isolated and analyzed. Specifically, we study path-finding on path-star graphs, a (symbolic) implicit reasoning task. The task was adversarially designed [ 13] to cause failure of next-token trained deep sequence models-Transformer [ 172] and Mamba [60] models alike. Whereas in the original task, the model is given the graph in-context, here we make the model memorize the graph's edges in its weights. Where before the model spectacularly failed to learn path-finding even on small graphs, in our in-weights task the model succeeds even on massive graphs. The success in the in-weights path-star task, we argue, is hard to reconcile within theassociative"},{"citing_arxiv_id":"2510.04930","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Egalitarian Gradient Descent: A Simple Approach to Accelerated Grokking","primary_cat":"cs.LG","submitted_at":"2025-10-06T15:40:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"EGD equalizes gradient speeds across singular directions, eliminating or shortening grokking plateaus on modular addition and sparse parity problems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.00468","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Feature Identification via the Empirical NTK","primary_cat":"cs.LG","submitted_at":"2025-10-01T03:39:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Eigenanalysis of the empirical NTK surfaces feature directions that align with Fourier features in modular addition networks and grammatical features in Gemma-3-270M, outperforming PCA baselines on activations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2408.12935","ref_index":268,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions","primary_cat":"cs.AI","submitted_at":"2024-08-23T09:33:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"relies on specialized external metrics [423, 556], modules [198, 503], or another LLMs [118] to assist in checking the level of consistency between the given input the and generated output. The second category centers on assessing the LLM's own confidence in its outputs. Outputs characterized by lower confidence levels are assumed to have a higher risk of hallucination [268, 498]. The confidence is often reflected through various indicators, such as the token probability distribution [268], the LLMs' evaluation [226, 466, 498], or the consistency observed across multiple outputs [475]. Recently, a new form of hallucination called sycophancy has drawn significant research attention. Sycophancy refers to an undesired behavior where models prioritize agreeing with the user's subjective preference over providing truthful"}],"limit":50,"offset":0}