{"total":13,"items":[{"citing_arxiv_id":"2604.21184","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Predicting the thermodynamics in the chromosphere from the translation of SDO data into the IRIS$^{2}$ inversion results using a visual transformer model","primary_cat":"astro-ph.SR","submitted_at":"2026-04-23T01:00:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A visual transformer model trained on IRIS inversions predicts chromospheric temperature and density from SDO data with correlations around 0.8 on 80% of test cases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09922","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"K-STEMIT: Knowledge-Informed Spatio-Temporal Efficient Multi-Branch Graph Neural Network for Subsurface Stratigraphy Thickness Estimation from Radar Data","primary_cat":"cs.LG","submitted_at":"2026-04-10T21:41:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"K-STEMIT reduces RMSE by 21% for subsurface stratigraphy thickness estimation from radar data via a knowledge-informed spatio-temporal GNN with adaptive feature fusion and physical priors from the MAR weather model.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"Preserving these features throughout the network helps maintain node-specific signals and mitigates early-stage over-smoothing before deeper layers like attention- based encoders. As a result, GraphSAGE better supports generalization and robustness in learning from spatially and temporally varying radargram data. 4.4. Temporal Convolution Inthispaper,weuseagatedtemporalconvolutionblock[11]thatextractstemporalpatternsfromnodefeaturesvia gated 2D convolution and skip connection. Zesheng Liu, Maryam Rahnemoonfar:Preprint submitted to ElsevierPage 9 of 20 Knowledge-Informed Spatio-Temporal Efficient Multi-Branch Graph Neural Network 𝑿\"!\"#$%&'( GLU ReLU Conv_1𝑷𝑸 𝑹 𝒉!\"#$%&'( Conv_2Conv_3 Figure 4:Diagram of Temporal Convolution Block. As shown in Figure 4, the input tensor̃𝐗temporal ∈ℝ 256×𝑚×6 is passed into three two-dimensional convolutions"},{"citing_arxiv_id":"2512.06938","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Progress Ratio Embeddings: An Impatience Signal for Robust Length Control in Neural Text Generation","primary_cat":"cs.CL","submitted_at":"2025-12-07T17:43:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Progress Ratio Embeddings use a trigonometric progress-ratio signal to deliver stable length control in transformers that generalizes to unseen target lengths.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2309.00071","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"YaRN: Efficient Context Window Extension of Large Language Models","primary_cat":"cs.CL","submitted_at":"2023-08-31T18:18:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"YaRN extends the context window of RoPE-based LLMs like LLaMA more efficiently than prior methods, using 10x fewer tokens and 2.5x fewer steps while surpassing state-of-the-art performance and enabling extrapolation beyond fine-tuning lengths.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2003.00295","ref_index":189,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Adaptive Federated Optimization","primary_cat":"cs.LG","submitted_at":"2020-02-29T16:37:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.09207","ref_index":86,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Deep Learning for Time Series Forecasting: The Electric Load Case","primary_cat":"cs.LG","submitted_at":"2019-07-22T10:03:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Compares feedforward, recurrent, sequence-to-sequence and temporal convolutional neural networks for short-term electric load forecasting through experiments on two real datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.05321","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Time2Vec: Learning a Vector Representation of Time","primary_cat":"cs.LG","submitted_at":"2019-07-11T15:47:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Time2Vec learns a vector representation of time that improves model performance when used in place of raw time inputs across various models and problems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.12158","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks","primary_cat":"cs.CV","submitted_at":"2019-06-28T12:23:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces HCSA, a hierarchical convolutional self-attention network for efficient long-form video QA with question-aware dependency modeling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.09084","ref_index":21,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Joint Detection of Malicious Domains and Infected Clients","primary_cat":"cs.LG","submitted_at":"2019-06-21T11:50:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Sluice network transfer learning jointly detects infected clients and malicious domains from HTTPS traffic, outperforming separate models and identifying previously unknown threats.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.08996","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Incremental Adaptation of NMT for Professional Post-editors: A User Study","primary_cat":"cs.CL","submitted_at":"2019-06-21T08:10:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"User study with professional translators shows that incremental online adaptation of NMT reduces post-editing effort and improves translation quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1904.10509","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Generating Long Sequences with Sparse Transformers","primary_cat":"cs.LG","submitted_at":"2019-04-23T19:29:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Sparse Transformers factorize attention to handle sequences tens of thousands long, achieving new SOTA density modeling on Enwik8, CIFAR-10, and ImageNet-64.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1807.03819","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Universal Transformers","primary_cat":"cs.CL","submitted_at":"2018-07-10T18:39:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1706.03762","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Attention Is All You Need","primary_cat":"cs.CL","submitted_at":"2017-06-12T17:57:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Pith review generated a malformed one-line summary.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for signiﬁcantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs. 2 Background The goal of reducing sequential computation also forms the foundation of the Extended Neural GPU [16], ByteNet [18] and ConvS2S [9], all of which use convolutional neural networks as basic building block, computing hidden representations in parallel for all input and output positions. In these models, the number of operations required to relate signals from two arbitrary input or output positions grows in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet."}],"limit":50,"offset":0}