{"total":35,"items":[{"citing_arxiv_id":"2605.23440","ref_index":20,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction","primary_cat":"cs.CL","submitted_at":"2026-05-22T09:52:43+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21318","ref_index":1,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization","primary_cat":"cs.CL","submitted_at":"2026-05-20T15:47:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TextReg mitigates prompt distributional overfitting via regularized text-space optimization, reporting up to +16.5% OOD accuracy gains over prior methods on reasoning benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20730","ref_index":6,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning","primary_cat":"cs.CL","submitted_at":"2026-05-20T05:26:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A distributional alignment metric d_NTP and a linear regression method LTV for task vectors that improves accuracy by 9.2% over baselines on classification and regression tasks across multiple LLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20477","ref_index":3,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Training Language Agents to Learn from Experience","primary_cat":"cs.LG","submitted_at":"2026-05-19T20:41:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces the ICT framework and an RL pipeline to train language agent reflectors that distill experience into reusable prompts, outperforming baselines on held-out tasks in ALFWorld and MiniHack.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19035","ref_index":10,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On","primary_cat":"cs.AI","submitted_at":"2026-05-18T18:57:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Argues that trustworthiness in Agent-to-Agent networks requires a new conceptual framework with four design pillars baked in from the beginning, as retrofitting existing single-agent methods is insufficient.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18979","ref_index":35,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"TabQL: In-Context Q-Learning with Tabular Foundation Models","primary_cat":"cs.LG","submitted_at":"2026-05-18T18:03:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18331","ref_index":3,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Prune, Update and Trim: Robust Structured Pruning for Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-18T12:48:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Putri is a structured pruning technique for LLMs that compensates for pruning errors via weight updates and sequential processing while pruning at the attention-head level to reach state-of-the-art results at extreme sparsity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18226","ref_index":6,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Context Memorization for Efficient Long Context Generation","primary_cat":"cs.CL","submitted_at":"2026-05-18T11:12:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Attention-state memory externalizes long prefixes into a lightweight lookup table of precomputed attention states, yielding higher accuracy than standard in-context learning at fixed memory budgets and lower latency than full attention.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18022","ref_index":1,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Unveiling Memorization-Generalization Coexistence: A Case Study on Arithmetic Tasks with Label Noise","primary_cat":"cs.LG","submitted_at":"2026-05-18T08:12:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Experiments on modular arithmetic with heavy label noise show that over-parameterized networks form a distributed internal generalization structure that can be extracted via frequency methods to achieve high accuracy despite 80% noise.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17169","ref_index":9,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Responsible Agentic AI Requires Explicit Provenance","primary_cat":"cs.AI","submitted_at":"2026-05-16T21:56:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Explicit provenance across the full agentic AI lifecycle is the necessary condition for making responsibility computable and actionable.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16423","ref_index":3,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Nonlinear Bipolar Compensation: Handling Outliers in Post-Training Quantization","primary_cat":"cs.CV","submitted_at":"2026-05-14T14:55:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Nonlinear Bipolar Compensation with Bipolar Logarithmic Transformation reduces outlier effects in post-training quantization by performing compensation in a compressed transformed space.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14004","ref_index":1,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Conditional Attribute Estimation with Autoregressive Sequence Models","primary_cat":"cs.AI","submitted_at":"2026-05-13T18:11:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Conditional Attribute Transformers jointly estimate next-token probabilities and conditional attribute values for autoregressive sequence models, enabling credit assignment, counterfactuals, and steerable generation in one pass.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12746","ref_index":44,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"CoT-Guard: Small Models for Strong Monitoring","primary_cat":"cs.CR","submitted_at":"2026-05-12T20:49:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CoT-Guard is a 4B model using SFT and RL that achieves 75% G-mean^2 on hidden objective detection under prompt and code manipulation attacks, outperforming several larger models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"correctly identifies it as a separate objective (selecting answer C) (training a custom model is expensive) or when the defender can verify the model identity (e.g., via cryptographic attestation of weights [43]). • Faithful execution. M faithfully follows the instructions in whatever input it receives, including any manipulated task descriptions. Modern language models are tuned for instruction-following [44, 45], and several prior works [46, 47] show that this extends to malicious task descriptions. Attack Paths.The adversary can pursue two independent attack paths,A1andA2, as outlined in Figure 1. We use ˜· to denotetaintedobjects that carry information regarding the hidden objective τhid. This taint propagates through M - if M receives a tainted input, its outputs are also tainted,"},{"citing_arxiv_id":"2605.12678","ref_index":10,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"No One Knows the State of the Art in Geospatial Foundation Models","primary_cat":"cs.CV","submitted_at":"2026-05-12T19:29:51+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12343","ref_index":1,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Neural-Schwarz Tiling for Geometry-Universal PDE Solving at Scale","primary_cat":"cs.LG","submitted_at":"2026-05-12T16:20:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Local neural operators on 3x3x3 patches, composed via Schwarz iteration, solve large-scale nonlinear elasticity on arbitrary geometries without domain-specific retraining.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Schwarz iteration, offer a reusable local-training path toward scalable learned PDE solvers that generalize across domain size, shape, and boundary-condition configurations. 1 Introduction Machine learning has transformed domains in which data admit a universal representation. In language, large-scale models trained on token sequences generalize across a broad range of tasks, domains, and contexts [ 1]. This success rests on a common compositional substrate: sentences, documents, code, and instructions can all be represented as sequences of tokens, enabling a shared modeling paradigm across heterogeneous problems. No comparable foundation has emerged for physical simulation. Many important problems in science and engineering-fluid flow, heat transfer, elasticity, fracture, electromagnetism, and coupled"},{"citing_arxiv_id":"2605.12110","ref_index":1,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"AB-Sparse: Sparse Attention with Adaptive Block Size for Accurate and Efficient Long-Context Inference","primary_cat":"cs.DC","submitted_at":"2026-05-12T13:23:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AB-Sparse adaptively allocates per-head block sizes for sparse attention, adds lossless centroid quantization and custom variable-block GPU kernels, and reports up to 5.43% accuracy gain over fixed-block baselines with no throughput loss.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"compensate for the additional memory overhead, it further employs lossless block centroid quantization. In addition, custom GPU kernels are developed to support efficient execution with variable block sizes. Evaluation results demonstrate that AB-Sparseachieves an accuracy improvement of up to 5.43 % over existing block sparse attention baselines without throughput overhead. 1 Introduction Large language models (LLMs) [ 1, 2, 3] are increasingly deployed in applications that demand long-context understanding, ranging from multi-document summarization [4] to repository-level code analysis [5] and long-form reasoning [6]. While larger context windows enable these capabilities, they introduce significant challenges for efficient model serving. At every decoding step, loading the"},{"citing_arxiv_id":"2605.11750","ref_index":23,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies","primary_cat":"cs.RO","submitted_at":"2026-05-12T08:27:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"The action input is the downsampled future action chunk ut. We compare two test-time failure prediction paradigms: (1) Direct Semantic Evaluation: We input the visual observation sequence containing the historical and current frames, along with the textified action sequence and its semantic description, into Gemini 3.1 Pro [22]. We use few-shot prompting [23] for it to predict end-to-end whether executing ut will result in a task failure (such as a collision or misalignment). (2) Explicit Spatiotemporal Dreaming: We use DreamDojo [8], an action-conditioned, large-scale pre-trained world model, to forward-render ut into future video frames, which are then evaluated by Gemini 3.1 Pro acting as a visual evaluator."},{"citing_arxiv_id":"2605.11558","ref_index":5,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"A Composite Activation Function for Learning Stable Binary Representations","primary_cat":"cs.LG","submitted_at":"2026-05-12T05:41:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"HTAF is a sigmoid-tanh composite that approximates the Heaviside function to allow stable gradient training of binary activation networks, yielding ICBMs with stable discretization and competitive performance on image tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"However, CBMs often suffer from reduced prediction performance and limited practicality due to the requirement of concept annotations, which are costly to obtain. To address this, subsequent works have proposed label-free variants using pretrained models such as Contrastive Language-Image Pre-training (CLIP, [54]) and Large Language Models (LLMs, [5]) to automatically generate concept labels for each image ([50, 81, 37]). Despite these advances, such approaches still achieve lower prediction performance than standard image models. Moreover, concept annotations produced by multimodal models such as CLIP can be unreliable and may fail to accurately capture underlying semantic concepts of individual images"},{"citing_arxiv_id":"2605.11317","ref_index":9,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"SOMA: Efficient Multi-turn LLM Serving via Small Language Model","primary_cat":"cs.CL","submitted_at":"2026-05-11T23:07:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SOMA estimates a local response manifold from early turns and adapts a small surrogate model via divergence-maximizing prompts and localized LoRA fine-tuning for efficient multi-turn serving.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[7] Adarsh Prasad Behera, Jaya Prakash Champati, Roberto Morabito, Sasu Tarkoma, and James Gross. Towards efficient multi-llm inference: Characterization and analysis of llm routing and hierarchical techniques.arXiv preprint arXiv:2506.06579, 2025. [8] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.Neural computation, 15(6):1373-1396, 2003. [9] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877-1901, 2020. [10] Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, and"},{"citing_arxiv_id":"2605.07111","ref_index":1,"ref_count":2,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation","primary_cat":"cs.CL","submitted_at":"2026-05-08T01:38:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MoLF routes updates between full fine-tuning and LoRA at the optimizer level to match or exceed the better of the two static methods on SQL, medical QA, and counterfactual tasks while an efficient variant outperforms prior adaptive LoRA by up to 20%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06123","ref_index":8,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs","primary_cat":"cs.AI","submitted_at":"2026-05-07T12:30:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A knowledge-first approach to LLM-driven automatic heuristic design in combinatorial optimization yields better discovery efficiency, transfer, and generalization than code-centric baselines by formalizing a distortion-compression trade-off.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05790","ref_index":8,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"GazeMind: A Gaze-Guided LLM Agent for Personalized Cognitive Load Assessment","primary_cat":"cs.HC","submitted_at":"2026-05-07T07:26:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GazeMind encodes gaze data for LLM reasoning to deliver interpretable, personalized cognitive load predictions that generalize across tasks without fine-tuning and outperform baselines by over 20% on a new 152-person dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03780","ref_index":4,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers","primary_cat":"cs.LG","submitted_at":"2026-05-05T14:07:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"In a controlled synthetic setting, transformers implement in-distribution task inference via convex combinations of task vectors and out-of-distribution inference via nearly orthogonal extrapolative representations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02638","ref_index":14,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"ViewSAM: Learning View-aware Cross-modal Semantics for Weakly Supervised Cross-view Referring Multi-Object Tracking","primary_cat":"cs.CV","submitted_at":"2026-05-04T14:23:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ViewSAM achieves state-of-the-art weakly supervised performance on cross-view referring multi-object tracking by refining SAM tracklets via affinity-guided re-prompting and modeling view-induced variations as learnable conditions on SAM2.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01711","ref_index":2,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Linear-Time Global Visual Modeling without Explicit Attention","primary_cat":"cs.CV","submitted_at":"2026-05-03T04:51:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12890","ref_index":26,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Towards Long-horizon Agentic Multimodal Search","primary_cat":"cs.CV","submitted_at":"2026-04-14T15:40:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LMM-Searcher uses file-based visual UIDs and a fetch tool plus 12K synthesized trajectories to fine-tune a multimodal agent that scales to 100-turn horizons and reaches SOTA among open-source models on MM-BrowseComp and MMSearch-Plus.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10291","ref_index":5,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale","primary_cat":"cs.AI","submitted_at":"2026-04-11T17:15:26+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TimeSeriesExamAgent combines templates and LLM agents to generate scalable time series reasoning benchmarks, demonstrating that current LLMs have limited performance on both abstract and domain-specific tasks.","context_count":1,"top_context_role":"other","top_context_polarity":"unclear","context_text":"Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report. arXiv preprint arXiv:2502.13923, 2025. [4] Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer: The long-document transformer, 2020. [5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877-1901, 2020. [6] Natasha Butt, Varun Chandrasekaran, Neel Joshi, Besmira Nushi, and Vidhisha Balachan-"},{"citing_arxiv_id":"2604.09611","ref_index":13,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows","primary_cat":"cs.DC","submitted_at":"2026-03-12T10:10:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"This work delivers the first measurements of performance-energy trade-offs across four multi-request LLM workflow patterns on A100 GPUs using vLLM and Parrot.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.18679","ref_index":42,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Transformers for dynamical systems learn transfer operators in-context","primary_cat":"cs.LG","submitted_at":"2026-02-21T01:03:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Small transformers learn to forecast unseen dynamical systems in-context by using delay embeddings to recover the manifold and forecasting its invariant sets via a transfer-operator strategy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.22699","ref_index":6,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer","primary_cat":"cs.CV","submitted_at":"2025-11-27T18:52:07+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"granularly profiling data attributes and orchestrating the training distribution, we ensure that the \"right data\" is aligned with the \"right stage\" of model development. This infrastructure maximizes the utility of real-world data streams, effectively eliminating computational waste arising from redundant or low-quality samples. • Efficient Architecture:Inspired by the remarkable scalability of decoder-only architectures in large language models [6], we propose aScalable Single-Stream Multi-Modal Diffusion Transformer (S3-DiT). Unlike dual-stream architectures that process text and image modalities in isolation, our design facilitates dense cross-modal interaction at every layer. This high parameter efficiency enables Z-Image to achieve superior performance within a compact 6B parameter size, significantly"},{"citing_arxiv_id":"2509.20328","ref_index":7,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Video models are zero-shot learners and reasoners","primary_cat":"cs.LG","submitted_at":"2025-09-24T17:17:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"general-purpose language understanding, which enables a single model to tackle a wide variety of tasks including coding [1], math [2], creative writing [3], summarization, translation [4], and deep research [5, 6]. These abilities started to emerge from simple primitives: training large, generative models on web-scale datasets [e.g7, 8]. As a result, LLMs are increasingly able to solve novel tasks through few-shot in-context learning [7, 9] and zero-shot learning [10]. Zero-shot learning here means that prompting a model with a task instruction replaces the need for fine-tuning or adding task-specific inference heads. Machine vision today in many ways resembles the state of NLP a few years ago: There are excellent task-specific models like \"Segment Anything\" [11, 12] for segmentation or YOLO variants"},{"citing_arxiv_id":"2507.00029","ref_index":1,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing","primary_cat":"cs.LG","submitted_at":"2025-06-17T14:58:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LoRA-Mixer routes modular LoRA experts into attention projection matrices with an adaptive Routing Specialization Loss to improve multi-task performance while using fewer trainable parameters than prior LoRA-MoE methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.10465","ref_index":1,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Superposition Yields Robust Neural Scaling","primary_cat":"cs.LG","submitted_at":"2025-05-15T16:18:13+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Strong superposition causes neural loss to scale as the inverse of model dimension due to geometric feature overlaps, explaining scaling laws for broad frequency distributions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"providing insights into questions like when neural scaling laws can be improved and when they will break down.1 1 Introduction The remarkable success of large language models (LLMs) has been driven by the empirical observa- tion that increasing model size, training data, and compute consistently leads to better performance [1-4]. Across a wide range of tasks - including language understanding [ 1, 5, 6], math [7-10], and code generation [11, 12] - larger models achieve lower loss, higher accuracy, and greater generaliza- tion abilities [2, 13]. This consistent trend, known as neural scaling laws, has been observed across multiple model families and architectures, fueling the development of increasingly large models [2-4]. These scaling laws have not only shaped the current strategies for building better models"},{"citing_arxiv_id":"2504.01990","ref_index":2,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems","primary_cat":"cs.AI","submitted_at":"2025-03-31T18:00:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Over the decades, AI has evolved from symbolic systems reliant on predefined logic to machine learning models capable of learning from data and experience and adapting to new situations. This progression reached a new frontier with the advent of large language models (LLMs), which demonstrate remarkable abilities in understanding, reasoning, and generating human-like text [2]. Central to these advancements is the concept ofagent, a system that not only processes information but also perceives its environment, makes decisions, and acts autonomously. Initially a theoretical construct, the agent paradigm has become a cornerstone of modern AI, driving advancements in fields ranging from conversational assistants to embodied robotics as AI systems increasingly tackle dynamic, real-world"},{"citing_arxiv_id":"2503.10471","ref_index":9,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Siamese Foundation Models for Crystal Structure Prediction","primary_cat":"cond-mat.mtrl-sci","submitted_at":"2025-03-13T15:44:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DAO pretrains Siamese diffusion-based models on stable/unstable crystal data to achieve 100% experimental match on Cr6Os2 and 2000x speedup over DFT on real superconductors.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}