{"total":11,"items":[{"citing_arxiv_id":"2605.23033","ref_index":14,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Uncovering the Latent Potential of Deep Intermediate Representations","primary_cat":"cs.LG","submitted_at":"2026-05-21T20:58:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22223","ref_index":95,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"How Many Different Outputs Can a Transformer Generate?","primary_cat":"cs.LG","submitted_at":"2026-05-21T09:26:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19093","ref_index":5,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts","primary_cat":"cs.AI","submitted_at":"2026-05-18T20:28:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ReElicit uses LLMs to elicit adaptive feature embeddings for Gaussian process Bayesian optimization of system prompts under aggregate-only feedback, outperforming baselines across ten tasks with a 30-evaluation budget.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08870","ref_index":9,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"TopoGeoScore: A Self-Supervised Source-Only Geometric Framework for OOD Checkpoint Selection","primary_cat":"cs.LG","submitted_at":"2026-05-09T10:46:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"TopoGeoScore learns a non-negative linear combination of geometric and topological features from source embeddings via self-supervised invariance to select robust checkpoints for OOD scenarios.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07271","ref_index":39,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Understanding Performance Collapse in Layer-Pruned Large Language Models via Decision Representation Transitions","primary_cat":"cs.CL","submitted_at":"2026-05-08T05:35:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Performance collapse in layer-pruned LLMs stems from disrupting the Silent Phase of decision-making, which blocks the transition to correct predictions, while the later Decisive Phase is robust to pruning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16351","ref_index":16,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"PIMSM: Physics-Informed Multi-Scale Mamba for Stable Neural Representations under Distribution Shift","primary_cat":"cs.LG","submitted_at":"2026-05-08T04:58:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PIMSM is a Mamba-based architecture that maps knee frequencies from spectra to multi-scale discretization parameters to reduce representation drift under distribution shifts in fMRI and weather forecasting.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01967","ref_index":37,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization","primary_cat":"cs.LG","submitted_at":"2026-05-03T16:53:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MER-DG applies modality-entropy regularization to reduce fusion overfitting in multimodal domain generalization, reporting average gains of 5% over standard fusion and 2% over prior methods on EPIC-Kitchens and HAC benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.22076","ref_index":27,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"PrivUn: Unveiling Latent Ripple Effects and Shallow Forgetting in Privacy Unlearning","primary_cat":"cs.LG","submitted_at":"2026-04-23T21:01:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PrivUn shows privacy unlearning in LLMs produces gradient-driven ripple effects and only shallow forgetting across layers, with new strategies proposed for deeper removal.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21691","ref_index":99,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"There Will Be a Scientific Theory of Deep Learning","primary_cat":"stat.ML","submitted_at":"2026-04-23T13:58:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19949","ref_index":159,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages","primary_cat":"eess.AS","submitted_at":"2026-04-21T19:54:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2405.07987","ref_index":10,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"The Platonic Representation Hypothesis","primary_cat":"cs.LG","submitted_at":"2024-05-13T17:58:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Representations learned by large AI models are converging toward a shared statistical model of reality.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"Then given two model representations f and g the corresponding features are: ϕi = f(xi) and ψi = g(yi), where the collection of these features are denoted as Φ = {ϕ1, . . . , ϕb} and Ψ = {ψ1, . . . , ψb}. Then for each feature pair (ϕi, ψi), we compute the respective nearest neighbor sets S(ϕi) and S(ψi). dknn(ϕi, Φ \\ ϕi) = S(ϕi) (9) dknn(ψi, Ψ \\ ψi) = S(ψi) (10) where dknn returns the set of indices of its k-nearest neighbors. Then we measure its average intersection via mNN(ϕi, ψi) = 1 k |S(ϕi) ∩ S(ψi)| (11) where | · | is the size of the intersection. The choice to use mutual nearest-neighbors Our initial efforts to measure alignment with CKA revealed a very weak trend of alignment between models, even when comparing models within their own modality."}],"limit":50,"offset":0}