{"total":31,"items":[{"citing_arxiv_id":"2606.23235","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A First-Order Mean Field Control Analysis of Transformer Layers under Cross-Entropy Training","primary_cat":"math.OC","submitted_at":"2026-06-22T12:21:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Transformer residual layers are approximated as an explicit Euler scheme for a controlled hidden-state flow whose mean-field limit is a first-order transport control problem with Pontryagin terminal condition given by the softmax residual.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31535","ref_index":19,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video","primary_cat":"cs.CV","submitted_at":"2026-05-29T16:50:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RayDer is a unified transformer backbone for self-supervised static-scene novel view synthesis that absorbs dynamic content as a nuisance factor and shows power-law scaling with data and compute while matching supervised methods in zero-shot settings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23409","ref_index":26,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Online Hand Gesture Recognition Using 3D Convolutional Neural Networks","primary_cat":"cs.CV","submitted_at":"2026-05-22T09:18:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"Proposes an online hand gesture recognition system using 3D CNNs achieving 98%+ detector accuracy and 90%+ classifier accuracy on Jester, with 37.5% Levenshtein accuracy on a homemade dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20276","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"OmniISR: A Unified Framework for Centralized and Federated Learning via Intermediate Supervision and Regularization","primary_cat":"cs.LG","submitted_at":"2026-05-19T04:13:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OmniISR unifies centralized, federated, and hybrid learning by injecting mutual-information supervision and negative-entropy regularization at multiple hidden layers, with supporting convergence and drift bounds.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15416","ref_index":175,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Margin-Adaptive Confidence Ranking for Reliable LLM Judgement","primary_cat":"cs.LG","submitted_at":"2026-05-14T21:01:05+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25889","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Robust Deepfake Detection: Mitigating Spatial Attention Drift via Calibrated Complementary Ensembles","primary_cat":"cs.CV","submitted_at":"2026-04-28T17:32:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A multi-stream ensemble using DINOv2 and CLIP backbones trained with extreme degradations achieves stable deepfake detection and fourth place in the NTIRE 2026 challenge.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25663","ref_index":103,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Quantum memory and scrambling from the perspective of a classical neural network","primary_cat":"quant-ph","submitted_at":"2026-04-28T13:58:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Time-dependent quantum memory oscillates faster than OTOC, does not equilibrate, and is more sensitive to symmetry breaking, as shown by neural-network predictions on helical spin chains.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25538","ref_index":100,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Simultaneous measurements of $N$-subjettiness observables in jets from gluons and light-flavour quarks, and in decays of boosted W bosons and top quarks","primary_cat":"hep-ex","submitted_at":"2026-04-28T12:04:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CMS reports a simultaneous measurement of 25 N-subjettiness observables in 1-, 2-, and 3-prong jets, unfolded to stable particles with particle-level correlations for QCD modeling.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"sets are constructed by keeping a subset of the QCD jets, approximately corresponding to the size of the data set in the W boson or top quark signal class for detector- and particle-level selections. We ensure that the distributions of the AK8 jetp T andm jet are not biased by this selection. The feed-forward DNNs trained for the classification tasks use dense layers with dropout reg- ularization [100] excepting the penultimate hidden layer. A deeper network architecture was used for jet classification at the particle level than at the detector level. For the particle- and detector-level studies the DNNs start with two dense layers of 500 and 250 nodes or two 250- node layers, respectively, each followed by a dropout layer with a dropout rate of 0."},{"citing_arxiv_id":"2604.20505","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Explicit Dropout: Deterministic Regularization for Transformer Architectures","primary_cat":"cs.LG","submitted_at":"2026-04-22T12:45:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Explicit dropout reformulates stochastic dropout as deterministic loss penalties for Transformers, matching or exceeding standard performance with independent control per component.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17465","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Language models recognize dropout and Gaussian noise applied to their activations","primary_cat":"cs.AI","submitted_at":"2026-04-19T14:30:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Language models detect, localize, and distinguish dropout from Gaussian noise applied to their activations, often with high accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.06339","ref_index":55,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Enhancing Gravitational Lens Study with Deep Learning: A Study on Effects of Dropout Regularization","primary_cat":"astro-ph.CO","submitted_at":"2026-03-06T14:46:28+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Dropout in an AlexNet-based CNN raises R² to ~0.96 and reduces SIE parameter errors by 60-76% on simulated galaxy-galaxy lens images.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.11938","ref_index":58,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Improving Neutrino Oscillation Measurements through Event Classification","primary_cat":"hep-ph","submitted_at":"2025-11-14T23:26:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Supervised ML classification of neutrino events by interaction channel prior to energy reconstruction improves accuracy and sensitivity by 10-20% in simulated DUNE analyses while remaining robust to generator mismodeling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.20535","ref_index":27,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Rotary Masked Autoencoders are Versatile Learners","primary_cat":"cs.LG","submitted_at":"2025-05-26T21:45:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RoMAE applies rotary positional embeddings to masked autoencoders to enable representation learning and interpolation on continuous positional data across irregular time-series, images, and audio without modality-specific modifications.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2309.16131","ref_index":25,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Spectral Approach for Learning Spatiotemporal Neural Differential Equations","primary_cat":"cs.LG","submitted_at":"2023-09-28T03:22:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A spectral neural differential equation learning method is proposed that handles nonlocal spatial interactions on unbounded domains without discretization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.10346","ref_index":29,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Delving Deep into Liver Focal Lesion Detection: A Preliminary Study","primary_cat":"cs.CV","submitted_at":"2019-07-24T10:24:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Authors outline a preliminary CNN framework for liver lesion detection in CT images that adds image processing, region proposal, registration and classification steps to handle 3D medical data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.10406","ref_index":52,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Open DNN Box by Power Side-Channel Attack","primary_cat":"cs.CR","submitted_at":"2019-07-21T11:52:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Power side-channel analysis recovers DNN architecture and parameters at 96.5% average accuracy on real embedded devices.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.06968","ref_index":49,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera","primary_cat":"cs.CV","submitted_at":"2019-07-16T12:50:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A multitask framework lifts 2D keypoints to 3D poses via a two-stream network then applies ENAS to model spatio-temporal pose evolution for action recognition on Human3.6M, MSR Action3D and SBU datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.06800","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning","primary_cat":"cs.LG","submitted_at":"2019-07-16T00:28:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Graph Laplacian interpolating activation replaces softmax in DNNs and improves natural accuracy, robust accuracy, and data efficiency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.05905","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Voice Pathology Detection Using Deep Learning: a Preliminary Study","primary_cat":"eess.AS","submitted_at":"2019-07-12T18:06:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A CNN plus LSTM model on 64 ms raw audio segments from SVD achieves 68% test accuracy for voice pathology detection, comparable to earlier work with different features.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.05351","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Optimized Sharing of Coefficients in Parallel Filter Banks","primary_cat":"eess.SP","submitted_at":"2019-07-11T16:19:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A two-stage coefficient grouping algorithm for parallel filter banks that increases sharing and reduces registers, LUTs, and DSP48s by up to 50% on FPGAs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.03217","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Adaptive Weighting Depth-variant Deconvolution of Fluorescence Microscopy Images with Convolutional Neural Network","primary_cat":"eess.IV","submitted_at":"2019-07-07T03:58:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A CNN predicts depth-variant PSFs for patch-wise deconvolution of fluorescence microscopy images, with adaptive weighting to reduce artifacts, claiming 98.2% accuracy and up to 6.6 dB PSNR gain.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.01869","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Simple vs complex temporal recurrences for video saliency prediction","primary_cat":"cs.CV","submitted_at":"2019-07-03T12:02:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Both ConvLSTM and exponential moving average modifications to a static saliency model achieve state-of-the-art video saliency prediction on DHF1K after SALICON pre-training and yield similar maps.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.12230","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms","primary_cat":"cs.LG","submitted_at":"2019-06-28T14:11:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"FIESTA uses bandit algorithms to adaptively decide how many seeds and splits to run for each candidate model, focusing effort on promising ones while providing guarantees on selecting the optimal model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.01960","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Fashion Retail: Forecasting Demand for New Items","primary_cat":"cs.OH","submitted_at":"2019-06-27T09:31:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Generalized ML models trained on past sales data forecast demand for new fashion items from their attributes, with experiments across neural architectures and loss functions showing robust performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.10973","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Defending Adversarial Attacks by Correcting logits","primary_cat":"cs.LG","submitted_at":"2019-06-26T11:07:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A two-layer network trained on mixed clean and perturbed logits recovers original predictions for a range of adversarial attacks without needing image data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1708.04552","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Improved Regularization of Convolutional Neural Networks with Cutout","primary_cat":"cs.CV","submitted_at":"2017-08-15T15:21:53+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Randomly masking square regions of input images during CNN training yields new state-of-the-art test errors of 2.56% on CIFAR-10, 15.20% on CIFAR-100, and 1.30% on SVHN.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1512.03385","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Deep Residual Learning for Image Recognition","primary_cat":"cs.CV","submitted_at":"2015-12-10T19:51:55+00:00","verdict":"ACCEPT","verdict_confidence":"HIGH","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Residual networks reformulate layers to learn residual functions, enabling effective training of up to 152-layer models that achieve 3.57% error on ImageNet and win ILSVRC 2015.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"before activation, following [16]. We initialize the weights as in [13] and train all plain/residual nets from scratch. We use SGD with a mini-batch size of 256. The learning rate starts from 0.1 and is divided by 10 when the error plateaus, and the models are trained for up to 60× 104 iterations. We use a weight decay of 0.0001 and a momentum of 0.9. We do not use dropout [14], following the practice in [16]. In testing, for comparison studies we adopt the standard 10-crop testing [21]. For best results, we adopt the fully- convolutional form as in [41, 13], and average the scores at multiple scales (images are resized such that the shorter side is in{224, 256, 384, 480, 640}). 4. Experiments 4.1. ImageNet Classiﬁcation"},{"citing_arxiv_id":"1412.6980","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Adam: A Method for Stochastic Optimization","primary_cat":"cs.LG","submitted_at":"2014-12-22T13:54:29+00:00","verdict":"ACCEPT","verdict_confidence":"HIGH","novelty_score":7.5,"formal_verification":"none","one_line_summary":"A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1411.1784","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Conditional Generative Adversarial Nets","primary_cat":"cs.LG","submitted_at":"2014-11-06T22:33:22+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Conditional GANs generate samples matching a given condition by supplying the condition to both generator and discriminator.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1406.2661","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Generative Adversarial Networks","primary_cat":"stat.ML","submitted_at":"2014-06-10T18:58:17+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":9.0,"formal_verification":"none","one_line_summary":"A generative model is trained to match a data distribution by competing in a minimax game against a discriminator, reaching an equilibrium where the generator recovers the true distribution and the discriminator outputs 1/2 everywhere.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"In this article, we explore the special case when the generative model generates samples by passing random noise through a multilayer perceptron, and the discriminative model is also a multilayer perceptron. We refer to this special case as adversarial nets. In this case, we can train both models using only the highly successful backpropagation and dropout algorithms [17] and sample from the generative model using only forward propagation. No approximate inference or Markov chains are necessary. 2 Related work An alternative to directed graphical models with latent variables are undirected graphical models with latent variables, such as restricted Boltzmann machines (RBMs) [27, 16], deep Boltzmann machines (DBMs) [26] and their numerous variants."},{"citing_arxiv_id":"1308.3432","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation","primary_cat":"cs.LG","submitted_at":"2013-08-15T15:19:34+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"The paper introduces and compares gradient estimators for stochastic binary neurons, notably a decomposition approach and the straight-through estimator, to support sparse conditional computation in deep networks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}