{"total":12,"items":[{"citing_arxiv_id":"2605.20299","ref_index":41,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Mechanisms of Misgeneralization in Physical Sequence Modeling","primary_cat":"cs.LG","submitted_at":"2026-05-19T12:34:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance or energy; a data deviation kernel explains and predicts the shifts and supports a内核","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11654","ref_index":41,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Weather-Robust Cross-View Geo-Localization via Prototype-Based Semantic Part Discovery","primary_cat":"cs.CV","submitted_at":"2026-05-12T07:15:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SkyPart achieves state-of-the-art single-pass cross-view geo-localization on SUES-200, University-1652, and DenseUAV by using prototype-based part discovery, altitude-conditioned modulation, and Kendall-weighted loss, with widening gains under weather corruptions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09226","ref_index":39,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Quantum Injection Pathways for Implicit Graph Neural Networks","primary_cat":"quant-ph","submitted_at":"2026-05-09T23:51:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Independent quantum signal injection into graph DEQs yields higher test accuracy and fewer solver iterations than state-dependent or backbone-dependent injection and classical equilibrium models on NCI1, PROTEINS, and MUTAG benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"through ˜A, and the equilibrium operator consists of a classical backbone injected with a quantum signal. D. Auxiliary Signal Coupling Strategies How an auxiliary signal is coupled to a host architecture has a long classical history, including feature-wise linear mod- ulation (FiLM) [38] and residual-learning variants in hybrid quantum-classical settings [39], [40]. Related work on integra- tion strategy effects in hybrid QML has examined embedding choice [13] and hyperparameter sensitivity [41]. These studies establish that coupling choice matters for robustness, expres- sivity, and trainability, but all operate within explicit, finite- depth forward passes. The operator-level question we study- wherean auxiliary signal enters a fixed-point equilibrium"},{"citing_arxiv_id":"2605.21499","ref_index":36,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Conditional Neural Field based Reduced Order Model for Dynamic Ditching Load Prediction","primary_cat":"physics.flu-dyn","submitted_at":"2026-05-05T16:03:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Conditional neural fields combined with LSTM networks predict aircraft ditching loads accurately across heterogeneous spatial discretizations using fewer parameters than convolutional autoencoders.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05246","ref_index":23,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Memory-Efficient EDA Denoising via Knowledge Distillation for Wearable IoT Under Severe Motion Artifacts and Underwater Conditions","primary_cat":"eess.SP","submitted_at":"2026-05-04T18:37:49+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Knowledge distillation from a hybrid CNN-Transformer teacher to a depth-wise separable CNN student, combined with realistic motion and environmental augmentation, produces a 15x smaller EDA denoiser that cuts underwater reconstruction error from 2.809 to 0.215 MAE and raises downstream CNS-OT AUROC.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"deployable EDA denoising framework designed for multi -site and harsh environments. The proposed approach integrates a hybrid CNN -Transformer teacher model with a lightweight depth wise separable CNN student model through a knowledge distillation (KD) strategy [22]. To further enhance robustness, we incorporate feature -wise linear modulation (FiLM) [23] to adapt to varying amplitude scales, along with a realistic data augmentation scheme that simulates diverse MA s and environmental distortions. Unlike prior work, we explicitly evaluate the physiological relevance of denoised signals by assessing improvements in both SCR reconstruction and downstream CNS-OT prediction performance. The main contributions of this work are summarized as"},{"citing_arxiv_id":"2604.14800","ref_index":22,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Generative Modeling of Complex-Valued Brain MRI Data","primary_cat":"eess.IV","submitted_at":"2026-04-16T09:24:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A cVAE plus flow-matching model generates realistic complex-valued brain MRI that preserves phase coherence above 0.997 and yields synthetic data that trains abnormality classifiers to 0.880 AUROC, beating the 0.842 real-data baseline on fastMRI.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"normal distribution to ensure smooth representations suitable for downstream synthesis. Since the five MRI acquisition sequences produce distinct global intensity and phase distribu- tions, the autoencoder is conditioned on the acquisition sequence to account for these differences during reconstruction. This is achieved via Feature-wise Linear Modulation (FiLM) conditioning [22], which applies learned scale and shift parameters to intermediate activations, allowing the en- coder and decoder to adapt to the characteristics of each sequence. In contrast, conditioning on the 5 class label is not needed at this stage. Since the objective of the autoencoder is purely reconstructive, abnormalities are treated as local anatomical patterns and can be accurately reconstructed without"},{"citing_arxiv_id":"2604.12456","ref_index":30,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"X-VC: Zero-shot Streaming Voice Conversion in Codec Space","primary_cat":"eess.AS","submitted_at":"2026-04-14T08:42:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"X-VC achieves zero-shot streaming voice conversion via one-step codec-space conversion with dual-conditioning acoustic converter and role-assignment training on generated paired data.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"the English-source evaluations as its content encoder module only supports Chinese. 4.4 Evaluation Metrics We evaluate performance across both generation quality and in- ference efficiency. To assess speech quality, we use Word Error 1https://github.com/Plachtaa/seed-vc 2https://github.com/ASLP-lab/MeanVC Rate (WER) to measure linguistic preservation, employing Whisper- large-v3 [30] for English and Paraformer-zh [11] for Chinese. Speaker identity consistency is evaluated via Speaker Similarity (SIM), com- puted as the cosine similarity between speaker embeddings ex- tracted by a WavLM-based model [ 4], while UTMOS [ 32] is uti- lized to objectively estimate speech naturalness without human annotations. For inference efficiency, all metrics are measured end-"},{"citing_arxiv_id":"2604.11172","ref_index":19,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"NeuVolEx: Implicit Neural Features for Volume Exploration","primary_cat":"cs.GR","submitted_at":"2026-04-13T08:30:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"NeuVolEx extracts robust spatial features from INR training via a structural encoder and multi-task scheme to enable accurate ROI classification with limited supervision and unsupervised viewpoint clustering in volume exploration.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"den layers of 32 channels to generate a structural representation. By encoding local neighborhood around each voxel, this pathway provides local structural context that improves spatial coherence. Rather than simply concatenating the two feature representations, we propose a feature-wise linear modulation (FiLM) to effectively combine them, motivated by Perez et al. [19]. FiLM enables one feature represen- tation to condition another so that the latter reflects additional context. In our case, the structural representation conditions the positional rep- resentation based on neighboring voxels around each coordinate. Thus, the positional representation is enriched with local structural context, which improves the separation of neighboring regions with similar in-"},{"citing_arxiv_id":"2601.13904","ref_index":54,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"PREFAB: PREFerence-based Affective Modeling for Low-Budget Self-Annotation","primary_cat":"cs.AI","submitted_at":"2026-01-20T12:30:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PREFAB applies preference learning grounded in the peak-end rule to let users annotate only key affective change segments while interpolating the rest, reducing workload and improving confidence in a 25-participant study.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.20616","ref_index":11,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"On Optimal Hyperparameters for Differentially Private Deep Transfer Learning","primary_cat":"cs.LG","submitted_at":"2025-10-23T14:48:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Empirical study of DP transfer learning reveals that larger clipping bounds outperform under tight privacy and cumulative DP noise explains batch-size effects better than existing heuristics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.11717","ref_index":28,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"CodecSep: Prompt-Driven Universal Sound Separation on Neural Audio Codec Latents","primary_cat":"cs.SD","submitted_at":"2025-09-15T09:12:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CodecSep performs prompt-driven universal sound separation directly in neural audio codec latents by combining a frozen DAC backbone with a lightweight FiLM-conditioned Transformer masker driven by CLAP embeddings, yielding efficiency gains over AudioSep.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.01925","ref_index":268,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A Survey on Vision-Language-Action Models: An Action Tokenization Perspective","primary_cat":"cs.RO","submitted_at":"2025-07-02T17:34:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"First, discrete autoregressive tokenization can struggle to represent continuous or multi-modal actions, which are especially crucial for dexterous tasks. Additionally, the standard autoregressive generation process produces one action at a time, limiting action inference frequency. To address these issues, a new class of VLA models has emerged as an alternative to pure GPT-style architectures: using diffusion-based action heads with action chunking [268]. Diffusion policies have demonstrated superior ability to model multi-modal action distributions [269], while action chunking allows the model to output sequential actions simultaneously. This approach improves temporal consistency, reduces compounding error, and significantly boosts control frequency. Octo [248] is an early work that introduces a transformer-based policy with a diffusion head, trained on"}],"limit":50,"offset":0}