FigSIM is the first annotated dataset for fine-grained suicide severity and figurative language in suicide memes, accompanied by benchmarks on 16 unimodal and multimodal models.
mega hub Mixed citations
Deep residual learning for image recognition
Mixed citation behavior. Most common role is method (46%).
hub tools
citation-role summary
citation-polarity summary
claims ledger
- method These channels are not independent signals but jointly represent a single complex-valued measurement, where the relationship between them encodes the local phase. Unlike magnitude-only approaches, where a single intensity channel is compressed, this coupling must be explicitly preserved. The architecture, loss function, and evaluation metrics described below are designed accordingly. The architecture is implemented as a ResNet-based [20] conditional variational autoencoder (CVAE) [21]. The encod
- method Together, these considerations make a scalable, high-speed, and robust reconstruction capable of operating at Monte Carlo scale essential for Hyper-Kamiokande. Machine-learning based reconstruction offers a promising path toward meeting these computational and topological chal- lenges. Convolutional neural networks [ 16], and in particular residual networks (ResNets) [17], are well suited to process the high-dimensional charge and time images recorded by the PMT array. At Super-Kamiokande, machi
- method Instead of binary classification, our model classifies into four states (LL,L,H,HH), and instead of training CNN feature extractors from scratch, we use pre-trained ResNet50 using transfer learning. The model architecture is shown in Figure 3. 3.6.1 Feature extraction.The first step is to extract features from each of the seven images. Here we apply transfer learning using ResNet50 [22], pre-trained on a large dataset. We extract information from the penultimate layer of ResNet50, compressing ea
- dataset historical video and recomputes attention upon query arrival. (2) ReKV [12] retrieves query-relevant KVCache at the token level. (3) LiveVLM [13] further combines token-level retrieval with KVCache compression to reduce memory usage. (4) StreamMem [14] also compresses KVCache, but under a TABLE II DATASET CONFIGURATIONS. Dataset Max Length Description MLVU [19] 703s multi-task long video LongVideoBench [20] 468s long-term multi-modal video VideoMME [21] 1,018s full-spectrum multi-modal video RVS
- background Training on such data could reinforce areas where AI systems are vulnerable [37, 796], enhancing their robustness in real-world applications. Adversarial examples can be constructed in various ways. One straightforward approach is to add small perturbations to inputs, which preserves their original labels while introducing adversarial characteristics [100, 260, 300, 504]. Another effective strategy is red teaming, which usually involves human teams systematically testing to find vulnerabilities
- method histopathological images [2], [4], [5], [6]. CNN have been widely adopted for cancer detection due to their ability to capture local texture patterns and hierarchical spatial features. Residual learning has been introduced to alleviate the vanishing gradient problem, leading to significant improvements in deep feature representation, as exemplified by ResNet architectures [7]. Similarly, DenseNet and kernel architectures enhance feature reuse and gradient flow, while EfficientNet achieves state-
authors
mega hub controls
Recognition alignment
counterfactual ablation
co-cited works
representative citing papers
Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.
HASTE enables training-free dynamic compression of pre-trained CNNs by patch-wise LSH-based merging of redundant channels, reporting 46.2% FLOPs reduction on ResNet34 CIFAR-10 with 1.25% accuracy drop.
An event-camera system with active gaze control and contrast-maximization spin estimation achieves real-time performance in table tennis with 8.8% magnitude error, 6.4° axis error, 3 ms latency, and 750 Hz throughput.
MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.
Spatial multiplexing in optical neural networks is repurposed as a trainable representational coordinate, demonstrated in multi-layer architectures for image classification, regression, and hybrid vision-language captioning with over one million optical phase parameters.
An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.
DELOS applies contrastive learning to phase-folded light curves to detect shallow intermediate-to-long period transits, reporting 15.5% and 11.25% gains in combined precision-recall over BLS and TLS in low-SNR tests plus 3-80x speedups.
SDM is a new staged gradient attack that reconstructs the adversarial objective around probability differences and reports stronger performance than prior methods like APGD.
Argus enables backdoor detection in decentralized ML by collaborative neighbor-based validation of triggers, backed by convergence theory and reducing attack success by up to 90% on tested datasets.
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
LLQR+SAM pairs a slow learned geometry preconditioner with fast SAM perturbations to amplify escape from locally sharp 'potholes' while stabilizing flat basins, producing consistent gains over SAM and LLQR alone.
MorphoHELM is a new benchmark for Cell Painting morphology representations that tests methods across increasing batch effect levels and finds classic computer vision strategies remain the strongest general-purpose performers.
VCR learns valid contextual representations for incomplete wearable signals via orthogonal disentanglement and missing-aware mixture-of-experts, improving robustness across full and missing-modality settings.
The paper develops a martingale-consistent SSL framework enforcing expected coherence between coarse and refined predictions via new objectives and a Monte Carlo estimator, improving robustness under partial observations.
Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.
GPROF-IR is a CNN-based retrieval that uses temporal context in geostationary IR observations to produce precipitation estimates with lower error than prior IR methods and climatological consistency with PMW retrievals for integration into IMERG V08.
The paper introduces the VODA setting for domain adaptation from scratch using vision-language models and presents TS-DRD, which achieves competitive performance on standard benchmarks without source models.
GEODE uses per-sample cosine-similarity scaling in a norm loss to preserve feature geometry for universal scorer-compatible OOD detection, matching or exceeding OE performance on CIFAR benchmarks.
Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.
Trust-SSL introduces additive-residual trust weights in SSL to selectively handle corruptions in aerial imagery, yielding higher linear-probe accuracy and larger gains under severe degradations than SimCLR or VICReg.
FRTSearch reframes fast radio transient detection as instance segmentation on dynamic spectra and uses the segmented shapes to infer dispersion measure and time of arrival, achieving 98% recall with over 99.9% fewer false positives than traditional methods.
CapBench is a new multi-PDK dataset of post-layout 3D windows with high-fidelity capacitance labels and multiple ML-ready representations, plus baseline results showing CNN accuracy versus GNN speed trade-offs.
citing papers explorer
-
Your Neighbors Know: Leveraging Local Neighborhoods for Backdoor Detection in Decentralized Learning
Argus enables backdoor detection in decentralized ML by collaborative neighbor-based validation of triggers, backed by convergence theory and reducing attack success by up to 90% on tested datasets.
-
Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
-
Navigating Potholes with Geometry-Aware Sharpness Minimization
LLQR+SAM pairs a slow learned geometry preconditioner with fast SAM perturbations to amplify escape from locally sharp 'potholes' while stabilizing flat basins, producing consistent gains over SAM and LLQR alone.
-
VCR: Learning Valid Contextual Representation for Incomplete Wearable Signals
VCR learns valid contextual representations for incomplete wearable signals via orthogonal disentanglement and missing-aware mixture-of-experts, improving robustness across full and missing-modality settings.
-
Martingale-Consistent Self-Supervised Learning
The paper develops a martingale-consistent SSL framework enforcing expected coherence between coarse and refined predictions via new objectives and a Monte Carlo estimator, improving robustness under partial observations.
-
GEODE: Angle-Adaptive OOD Detection with Universal Scorer Compatibility
GEODE uses per-sample cosine-similarity scaling in a norm loss to preserve feature geometry for universal scorer-compatible OOD detection, matching or exceeding OE performance on CIFAR benchmarks.
-
PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training
Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.
-
Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning
Multimodal contrastive learning using multilinear products is fragile to single bad modalities, and a gated version improves top-1 retrieval accuracy on synthetic and real trimodal data.
-
Dynamic Free-Rider Detection in Federated Learning via Simulated Attack Patterns
S2-WEF detects dynamic free-riders in federated learning by simulating attack WEF patterns from prior global models, combining them with mutual deviation scores, and using two-dimensional clustering without proxy data or pre-training.
-
Effective Model Pruning: Measure The Redundancy of Model Components
EMP maps importance scores to effective sample size N_eff and prunes the lowest N - N_eff components, with a derived lower bound on retained effective mass and upper bound on loss increase.
-
CascadeFormer: Depth-Tapered Transformers Motivated by Gradient Fan-in Asymmetry
CascadeFormer tapers Transformer width with depth based on gradient fan-in asymmetry to match uniform baselines in perplexity while cutting latency 8.6%.
-
Constrained hybrid modelling to predict microbial dynamics and organic matter turnover in soil systems
Hybrid neural-process model derives biokinetic parameters from genomic traits for soil organic matter turnover, with ecological constraints, and outperforms baselines on synthetic and real data.
-
Operator Boosting Produces Pareto-Efficient PDE Surrogates
Operator Boosting constructs compact neural-operator PDE surrogates by sequential residual learning with validation-selected shrinkage, yielding 72-95% parameter reduction and accuracy gains on 21 of 30 dataset-architecture pairs.
-
A Comprehensive Inference-Time Augmentation Framework in Physiological Signals: Application to PPG-Based AF Detection
A unified inference-time augmentation framework with 13 methods and Bayesian-optimized parameters improves AUROC up to 8.5% and reduces false positives in PPG-based AF detection across five datasets.
-
Mechanical Field Networks: Structured Neural Dynamics for Multivariate Systems
MF-Net learns a shared field state and mechanical transition rule from trajectories to deliver competitive forecasting and recoverable relation matrices on Lorenz-96 and real systems.
-
Frequency-Domain Latent Attention Gating for Cross-Domain Token Aggregation
FLaG is a frequency-domain module using FFT, latent queries, and gating that improves token aggregation and shows gains on ESM2 AMP prediction and CIFAR-100 image classification while staying competitive on text tasks.
-
SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training
SparseOpt is a new optimizer that counters batch normalization's gradient skew in dynamic sparse training, yielding faster convergence and better accuracy on ResNet models for CIFAR-100 and ImageNet.
-
Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation
IRNO augments neural operators with learned fixed-point iterative refinement modules and a progressive spectral loss, achieving up to 56% error reduction on turbulent flow and large drops in high-frequency normalized errors on active matter.
-
ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models
ARC-STAR reduces velocity rollout error by at least 36x over raw Poseidon across all tested regime cells via auditable global and local correction stages on five flow benchmarks.
-
Chessformer: A Unified Architecture for Chess Modeling
Chessformer is a unified encoder-only transformer for chess that uses square tokens, geometric attention bias, and an attention-based policy head to set new records in human move prediction accuracy, playing strength, and interpretability.
-
OUIDecay: Adaptive Layer-wise Weight Decay for CNNs Using Online Activation Patterns
OUIDecay adaptively rescales layer-wise weight decay in CNNs using an online activation-based Overfitting-Underfitting Indicator and outperforms fixed decay in 7 of 8 tested settings.
-
Gradient-Discrepancy Acquisition for Pool-Based Active Learning
Introduces gradient-discrepancy acquisition criterion derived from Luo et al. (2022) generalization bound for active learning.
-
Model Merging: Foundations and Algorithms
New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.
-
PrismAgent: Illuminating Harm in Memes via a Zero-Shot Interpretable Multi-Agent Framework
PrismAgent deploys four specialized LLM agents in sequence to analyze meme intent, gather context, make preliminary judgments, and deliver a final harm verdict, outperforming prior zero-shot methods on three public datasets.
-
Preventing Latent Rehearsal Decay in Online Continual SSL with SOLAR
SOLAR prevents latent rehearsal decay in online continual SSL by adaptively managing replay buffers with deviation proxies and an explicit overlap loss, delivering both fast convergence and state-of-the-art final accuracy on vision benchmarks.
-
Extraction of linearized models from pre-trained networks via knowledge distillation
Koopman theory plus knowledge distillation yields linearized models from pre-trained nets that outperform standard least-squares Koopman approximations on MNIST and Fashion-MNIST in accuracy and stability.
-
Positive-Unlabelled Active Learning to Curate a Dataset for Orca Resident Interpretation
Curates over 900 hours of SRKW acoustic data plus other marine mammal recordings via positive-unlabeled active learning, releasing transformer classifiers that report AUROC 0.58-0.77 and species top-1 accuracy of 53.2% on held-out benchmarks.
-
Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD
Shuffled DP-SGD requires σ ≥ 1/√(2 ln M) or κ ≥ (1/√8)(1 - 1/√(4π ln M)) to limit adversarial advantage, preventing strong privacy and high utility simultaneously.
-
HOLE: Homological Observation of Latent Embeddings for Neural Network Interpretability
HOLE applies persistent homology to latent embeddings in neural networks and uses visualizations such as cluster flow diagrams to reveal patterns of class separation, feature disentanglement, and robustness.
-
Closed-Form Last Layer Optimization
A method that alternates gradient steps on a neural network backbone with closed-form optimal updates to the final linear layer under squared loss, including an SGD adaptation and NTK-regime convergence analysis.
-
Bayesian E(3)-Equivariant Interatomic Potential with Iterative Restratification of Many-body Message Passing
Bayesian E(3)-equivariant MLPs with joint energy-force NLL loss achieve competitive accuracy while enabling uncertainty-guided active learning, OOD detection, and calibration.
-
Neural Mean-Field Games: Extending Mean-Field Game Theory with Neural Stochastic Differential Equations
Neural mean-field games integrate mean-field game theory with neural SDEs to learn strategic interactions from data in a model-free way, demonstrated on games and viral dynamics.
-
Deep Privacy Funnel Model: From a Discriminative to a Generative Approach with an Application to Face Recognition
Introduces Generative Privacy Funnel (GenPF) and deep variational PF (DVPF) models that extend the privacy funnel to generative settings and provide a controllable privacy-utility trade-off with reduced sensitive attribute leakage in face recognition.
-
Generalizing from a few environments in safety-critical reinforcement learning
RL agents fail dangerously on unseen environments; ensembles reduce catastrophes in gridworld but not CoinRun, with uncertainty enabling intervention prediction.
-
From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training
MTCL learns multi-scale temporal correlations in videos via contrastive learning to produce more informative representations that improve sample efficiency and performance in downstream RL tasks.
-
Discovering Collaboration from Novelty: Random Network Distillation for Clustered Federated Learning
Random Network Distillation enables pre-training discovery of client clusters in federated learning via local novelty signals, supporting autonomous grouping under non-IID data without a priori cluster count.
-
Frequency-Domain Neural ODEs for Modeling Non-Linear Dynamical Systems
FNODE projects Neural ODE dynamics into the frequency domain via FFT and reports better generalization and convergence stability than GRUs, LSTMs, and ANODE on Lotka-Volterra, forced Duffing, Van der Pol, and Lorenz systems.
-
Pretrained Approximators for Low-Thrust Trajectory Cost and Reachability
Neural surrogates trained with scaling laws and self-similar transformations accurately approximate low-thrust trajectory costs and reachability while generalizing across orbital parameters.
-
Rethinking Federated Unlearning via the Lens of Memorization
Introduces Grouped Memorization Evaluation and FedMemPrune to remove unique memorized information in federated unlearning while preserving overlapping knowledge.
-
Classification of Single and Mixed Partial Discharges under Switching Voltage Using an AWA-CNN Framework
AWA patterns from PD pulse amplitude, width, and area enable CNNs to classify single and mixed partial discharge sources under switching voltage with over 96% test accuracy.
-
Soft Learning
Soft Learning optimally combines heterogeneous ML specialists via cross-validated non-negative least squares, achieving top performance on 70% of 37 datasets with formal guarantees and 72-435x CPU speedups over deep networks.
-
Dynamical Predictive Modelling of Cardiovascular Disease Progression Post-Myocardial Infarction via ECG-Trained Artificial Intelligence Model
A contrastive-learning ECG foundation model with multitask heads predicts post-MI outcomes better than training from scratch (AUC 0.794 vs 0.608).
-
Multi-Narrow Transformation as a Single-Model Ensemble: Boundary Conditions, Mechanisms, and Failure Modes
Multi-narrow single-model ensembles outperform wide baselines in low-data image classification by learning diverse features but underperform in data-rich settings where training favors few paths.
-
CNNs for Vis-NIR Chemometrics: From Contradiction to Conditional Design
Contradictions across CNN studies for Vis-NIR chemometrics are expected outcomes of uncontrolled variables in spectral physics and validation design, motivating a conditional rather than universal design framework.
-
The Mathematics of AI Winters: The mathematical Taxonomy of Paradigm Fragility in AI Winter
Established mathematical bottlenecks in representation, optimization, complexity, and high-dimensional learning aligned with the central disappointments of early AI research periods.
-
Deep Learning in the Automotive Industry: Recent Advances and Application Examples
An overview of deep learning applications and challenges in the automotive industry, covering ADAS, automated driving, virtual sensing, and data-driven development.
- Layer-wise Geometric Approximation Rates for Deep Networks