{"total":12,"items":[{"citing_arxiv_id":"2606.31856","ref_index":36,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Low-dimensional topology of deep neural networks","primary_cat":"cs.LG","submitted_at":"2026-06-30T15:53:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Restricting layers to width 3 and using linking numbers shows ResNets and transformers match in topological power, exceed monotonic feedforward nets which exceed flows, but nonmonotonic activations match the top class.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02788","ref_index":15,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Neutrino Fingerprints: Image-Based Encodings of IceCube Events for CNN Direction Reconstruction","primary_cat":"astro-ph.IM","submitted_at":"2026-06-01T18:54:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"IceCube events are encoded as 72x72x3 images and processed by ResNet18 to reach 1.10 rad mean angular error in neutrino direction reconstruction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31302","ref_index":46,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MoE-dqINR: A Unified Mixture-of-Experts Implicit Neural Representation Framework for Scan-Specific Dynamic and Quantitative MRI Reconstruction","primary_cat":"eess.IV","submitted_at":"2026-05-29T13:36:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MoE-dqINR factorizes INR-based MRI reconstruction into shared spatial experts plus state-conditioned routing to unify dynamic and quantitative reconstruction at roughly 30 seconds per scan.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29039","ref_index":15,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Kolmogorov--Arnold Networks as Implicit Regularizers: Noise Robustness and Interpretability for Stellar Classification","primary_cat":"astro-ph.IM","submitted_at":"2026-05-27T19:38:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"KAN noise robustness in star/galaxy/quasar classification arises from implicit C2-spline regularization rather than architecture, as weight-decay-tuned MLPs match performance on SDSS and DESI data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14131","ref_index":13,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Double Metric Learning for Building Directed Graphs with Chain Connections for the ATLAS ITk Detector","primary_cat":"physics.data-an","submitted_at":"2026-05-13T21:31:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Double metric learning learns two embeddings per node to build directed graphs with chain connections, yielding better performance than single metric learning for high-pT particles and accurate edge direction prediction in ATLAS ITk simulations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10775","ref_index":77,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"On the global convergence of gradient descent for wide shallow models with bounded nonlinearities","primary_cat":"math.OC","submitted_at":"2026-05-11T16:08:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Gradient descent on wide shallow models with bounded nonlinearities converges globally in the mean-field limit as non-global critical points are unstable under the dynamics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10994","ref_index":7,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Internally triggered retrospective learning in neural networks","primary_cat":"q-bio.NC","submitted_at":"2026-05-09T14:30:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Neural networks learn via sparse retrospective updates triggered internally when prediction error exceeds a threshold derived from recent error statistics, leading to stepwise parameter changes in simulations.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"The normalized trace was 𝐸̃𝑖𝑗(𝑡𝑘)= 𝐸𝑖𝑗(𝑡𝑘) 𝐸max(𝑡𝑘)+ 𝛿 . 4 Weights were updated as 𝑊𝑖𝑗(𝑡𝑘 +)= 𝑊𝑖𝑗(𝑡𝑘 −)+ 𝜂𝐸̃𝑖𝑗(𝑡𝑘)− 𝜂𝜆𝑊𝑖𝑗(𝑡𝑘 −), with clipping 0.001 ≤ 𝑊𝑖𝑗 ≤ 0.18. Between events, weights remained constant. Update sequence. At each time step 𝑡, the following deterministic order was enforced: (1) 𝐡(𝑡)→ (2) 𝐡̂ (𝑡)→ (3) 𝜀(𝑡)→ (4) 𝜇,𝑣 → (5) 𝜃(𝑡)→ (6) 𝐸𝑖𝑗(𝑡)→ (7) event check → (8) 𝑊(𝑡)→ (9) storage. This ordering ensures that prediction error is computed before weight modification and that eligibility traces accumulate prior to normalization. Control procedures. Three control conditions were defined. A fixed-weight condition imposed 𝑊(𝑡)= 𝑊(0), for all 𝑡. A continuous-update condition applied 𝑊𝑖𝑗(𝑡 + Δ𝑡)= 𝑊𝑖𝑗(𝑡)+ Δ𝑡 𝜂𝑐[𝐸̃𝑖𝑗(𝑡)− 𝜆𝑊𝑖𝑗(𝑡)]."},{"citing_arxiv_id":"2605.02745","ref_index":64,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Bolek: A Multimodal Language Model for Molecular Reasoning","primary_cat":"cs.LG","submitted_at":"2026-05-04T15:46:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"(B) When BOLEKmentions a feature it is most accurate on size, polarity, and lipophilicity descriptors and weaker on stereocenter and surface-area features; the other LLMs, when they mention a feature at all, are roughly as accurate but mention features far less often. rollouts for the same compound, without being penalized for wrong prediction. BBB Martins illustrates this ambiguity cleanly. The famous BOILED-egg framing of the BBB permeability [64] makes BBB essentially a logP-and-TPSA decision, so a chemist reading a BBB rationale expects numerical values for those two descriptors. Disturbingly, GPT- 5.4 and TxGemma never mention a numerical value for either descriptor on BBB CoTs (Figure 1A). Their reasoning aggregates to a respectable AUC, but it is not anchored in the variables that decide the task."},{"citing_arxiv_id":"2605.01283","ref_index":145,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Developing a Strong Pre-Trained Base Model for Plant Leaf Disease Classification","primary_cat":"cs.CV","submitted_at":"2026-05-02T06:33:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A DenseNet201 base model trained on a constructed plant leaf disease dataset outperforms baselines and enables faster, more robust transfer learning with less data than general models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16955","ref_index":23,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Training-inference input alignment outweighs framework choice in longitudinal retinal image prediction","primary_cat":"cs.CV","submitted_at":"2026-04-18T10:28:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"independent delta-weighted attention modules operate at scales 1, 2, and 3, each with its own projection 𝐰(𝑠). History Injection and Residual Prediction The aggregated history features 𝐆(𝑠) are injected into the U-Net encoder at the corresponding scales via channel-wise concatenation followed by a 1 × 1 convolution acting as a pixel-wise temporal mixer, followed by GroupNorm with SiLU activation [23], providing normalization and non-linear refinement of the fused representation. The model predicts the residual change from the most recent history frame rather than the absolute target [24]: 𝐼̂∗ = 𝐼𝑁 + 𝑓𝜃(𝐼𝑁, ℋ, 𝛥𝑡∗) where 𝑓𝜃 denotes the U-Net output (prior to the residual addition). This residual formulation concentrates capacity on the disease-relevant change signal."},{"citing_arxiv_id":"2604.09543","ref_index":21,"ref_count":3,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ANTIC: Adaptive Neural Temporal In-situ Compressor","primary_cat":"cs.LG","submitted_at":"2026-04-10T17:58:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ANTIC reduces storage for large-scale PDE simulations by orders of magnitude through adaptive temporal snapshot selection combined with continual neural-field residual compression while preserving physics accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2401.10774","ref_index":14,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads","primary_cat":"cs.LG","submitted_at":"2024-01-19T15:48:40+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}