FigSIM is the first annotated dataset for fine-grained suicide severity and figurative language in suicide memes, accompanied by benchmarks on 16 unimodal and multimodal models.
mega hub Mixed citations
Deep residual learning for image recognition
Mixed citation behavior. Most common role is method (46%).
hub tools
citation-role summary
citation-polarity summary
claims ledger
- method These channels are not independent signals but jointly represent a single complex-valued measurement, where the relationship between them encodes the local phase. Unlike magnitude-only approaches, where a single intensity channel is compressed, this coupling must be explicitly preserved. The architecture, loss function, and evaluation metrics described below are designed accordingly. The architecture is implemented as a ResNet-based [20] conditional variational autoencoder (CVAE) [21]. The encod
- method Together, these considerations make a scalable, high-speed, and robust reconstruction capable of operating at Monte Carlo scale essential for Hyper-Kamiokande. Machine-learning based reconstruction offers a promising path toward meeting these computational and topological chal- lenges. Convolutional neural networks [ 16], and in particular residual networks (ResNets) [17], are well suited to process the high-dimensional charge and time images recorded by the PMT array. At Super-Kamiokande, machi
- method Instead of binary classification, our model classifies into four states (LL,L,H,HH), and instead of training CNN feature extractors from scratch, we use pre-trained ResNet50 using transfer learning. The model architecture is shown in Figure 3. 3.6.1 Feature extraction.The first step is to extract features from each of the seven images. Here we apply transfer learning using ResNet50 [22], pre-trained on a large dataset. We extract information from the penultimate layer of ResNet50, compressing ea
- dataset historical video and recomputes attention upon query arrival. (2) ReKV [12] retrieves query-relevant KVCache at the token level. (3) LiveVLM [13] further combines token-level retrieval with KVCache compression to reduce memory usage. (4) StreamMem [14] also compresses KVCache, but under a TABLE II DATASET CONFIGURATIONS. Dataset Max Length Description MLVU [19] 703s multi-task long video LongVideoBench [20] 468s long-term multi-modal video VideoMME [21] 1,018s full-spectrum multi-modal video RVS
- background Training on such data could reinforce areas where AI systems are vulnerable [37, 796], enhancing their robustness in real-world applications. Adversarial examples can be constructed in various ways. One straightforward approach is to add small perturbations to inputs, which preserves their original labels while introducing adversarial characteristics [100, 260, 300, 504]. Another effective strategy is red teaming, which usually involves human teams systematically testing to find vulnerabilities
- method histopathological images [2], [4], [5], [6]. CNN have been widely adopted for cancer detection due to their ability to capture local texture patterns and hierarchical spatial features. Residual learning has been introduced to alleviate the vanishing gradient problem, leading to significant improvements in deep feature representation, as exemplified by ResNet architectures [7]. Similarly, DenseNet and kernel architectures enhance feature reuse and gradient flow, while EfficientNet achieves state-
authors
mega hub controls
Recognition alignment
counterfactual ablation
co-cited works
representative citing papers
Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.
HASTE enables training-free dynamic compression of pre-trained CNNs by patch-wise LSH-based merging of redundant channels, reporting 46.2% FLOPs reduction on ResNet34 CIFAR-10 with 1.25% accuracy drop.
An event-camera system with active gaze control and contrast-maximization spin estimation achieves real-time performance in table tennis with 8.8% magnitude error, 6.4° axis error, 3 ms latency, and 750 Hz throughput.
MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.
Spatial multiplexing in optical neural networks is repurposed as a trainable representational coordinate, demonstrated in multi-layer architectures for image classification, regression, and hybrid vision-language captioning with over one million optical phase parameters.
An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.
DELOS applies contrastive learning to phase-folded light curves to detect shallow intermediate-to-long period transits, reporting 15.5% and 11.25% gains in combined precision-recall over BLS and TLS in low-SNR tests plus 3-80x speedups.
SDM is a new staged gradient attack that reconstructs the adversarial objective around probability differences and reports stronger performance than prior methods like APGD.
Argus enables backdoor detection in decentralized ML by collaborative neighbor-based validation of triggers, backed by convergence theory and reducing attack success by up to 90% on tested datasets.
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
LLQR+SAM pairs a slow learned geometry preconditioner with fast SAM perturbations to amplify escape from locally sharp 'potholes' while stabilizing flat basins, producing consistent gains over SAM and LLQR alone.
MorphoHELM is a new benchmark for Cell Painting morphology representations that tests methods across increasing batch effect levels and finds classic computer vision strategies remain the strongest general-purpose performers.
VCR learns valid contextual representations for incomplete wearable signals via orthogonal disentanglement and missing-aware mixture-of-experts, improving robustness across full and missing-modality settings.
The paper develops a martingale-consistent SSL framework enforcing expected coherence between coarse and refined predictions via new objectives and a Monte Carlo estimator, improving robustness under partial observations.
Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.
GPROF-IR is a CNN-based retrieval that uses temporal context in geostationary IR observations to produce precipitation estimates with lower error than prior IR methods and climatological consistency with PMW retrievals for integration into IMERG V08.
The paper introduces the VODA setting for domain adaptation from scratch using vision-language models and presents TS-DRD, which achieves competitive performance on standard benchmarks without source models.
GEODE uses per-sample cosine-similarity scaling in a norm loss to preserve feature geometry for universal scorer-compatible OOD detection, matching or exceeding OE performance on CIFAR benchmarks.
Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.
Trust-SSL introduces additive-residual trust weights in SSL to selectively handle corruptions in aerial imagery, yielding higher linear-probe accuracy and larger gains under severe degradations than SimCLR or VICReg.
FRTSearch reframes fast radio transient detection as instance segmentation on dynamic spectra and uses the segmented shapes to infer dispersion measure and time of arrival, achieving 98% recall with over 99.9% fewer false positives than traditional methods.
CapBench is a new multi-PDK dataset of post-layout 3D windows with high-fidelity capacitance labels and multiple ML-ready representations, plus baseline results showing CNN accuracy versus GNN speed trade-offs.
citing papers explorer
-
Improving Prognostic Performance in Resectable Pancreatic Ductal Adenocarcinoma using Radiomics and Deep Learning Features Fusion in CT Images
Risk-score based fusion of radiomics and deep learning features from CT images improves AUC for overall survival prediction in resectable PDAC by 51% over radiomics alone.
-
New pointwise convolution in Deep Neural Networks through Extremely Fast and Non Parametric Transforms
Replacing pointwise convolutions with DWHT yields a model with 79.1% fewer parameters, 48.4% fewer FLOPs, and 1.49% higher accuracy than MobileNet-V1 on CIFAR-100.
-
MAPE: Defending Against Transferable Adversarial Attacks Using Multi-Source Adversarial Perturbations Elimination
MAPE combines a channel-attention U-Net (SAPE) trained on multi-model adversarial examples scheduled by PPSA to eliminate perturbations, reporting over 95.1% average defense on CIFAR-10 and 71.5% on Mini-ImageNet against black-box transferable attacks.
-
SEADA: An efficient methodology for optimizing mixed-precision DNNs on multi-precision spatial architectures
SEADA introduces an analytical framework combining cost models, mapping tools, and entropy-based precision selection to optimize mixed-precision DNNs on multi-precision spatial architectures.
-
A multi-task spatiotemporal deep neural network for predicting penetration depth and morphology in laser welding
A CNN-plus-state-space-model multi-task network predicts laser weld penetration state (99.35% accuracy), depth (1.79 mm error), and cross-section morphology (95.65% accuracy) from top-view weld-pool images and welding parameters.
-
A cross-process welding penetration status prediction algorithm based on unsupervised domain adaptation in laser and TIG welding
Unsupervised domain adaptation with GSDE achieves ~80% accuracy in cross-process TIG-laser weld penetration prediction, improving supervised baselines by over 43%.
-
NEURON-Fabric: Architecture-Runtime Co-Design for Controlled Low-Bit Gradient Communication
NEURON-Fabric provides a profile-guided runtime for controlled low-bit gradient communication that preserves accuracy near full-precision levels while reducing modeled communication traffic across vision, transformer, and language model workloads.
-
Contrastive Augmented Transformer with Domain-specific Enhancement for Robust Multi-scenario Metal Surface Defect Detection
CAT framework reports 99.54% pixel-level AUROC on KolektorSDD2 with claimed superior generalization to three unseen defect datasets.
-
Latent Geometric Chords for Query-Efficient Decision-Based Adversarial Attacks
LGC performs curvature-aware geometric search in a compressed semantic manifold for decision-based attacks, using residual adversarial generation to reach SSIM >0.99 and LPIPS <0.01 at 5000 queries while attacking robust models.
-
Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening
Empirical benchmark finds attention-based models (SwinTiny, CoAtNet0, MaxViTTiny) achieve highest AUC above 84% on RFMiD binary screening and best F1 scores on multi-label task, with VLMs competitive but not superior and external Messidor-2 AUC 66.8-84.7%.
-
Machine Learning-based Separation of the He I 10830{\AA} Chromospheric Signal: Quantitative Analysis of Chromosphere-Corona Intensity in the Quiet Sun
CNN separation of He I 10830Å chromospheric signal from photospheric contamination in quiet Sun reveals R ≈ -0.84 anti-correlation with 304Å and magnetic-field-dependent EUV coupling.
-
HeartBeatAI: An Interpretable and Robust Deep Learning Framework for Multi-Label ECG Arrhythmia Detection
HeartBeatAI reports 98% Macro F1 under intra-source testing on four ECG datasets but shows significant degradation on rare anomalies under leave-one-domain-out evaluation.
-
A Value-added Physical Properties Catalog for Low-redshift Galaxies from DESI Legacy Imaging Surveys DR10
A multimodal neural network trained on MPA-JHU references produces SFR, stellar mass, and metallicity estimates for 547 million low-redshift galaxies in DESI LS DR10.
-
7DT Insight: Variability in Young Stellar Objects
Two-epoch medium-band photometry of 769 YSO candidates in Orion A identifies 110 variables (~14%), with best-fit templates dominated by cold and hot spot models over extinction or gray changes.
-
Visibility nowcasting in South Korea: a machine learning approach to class imbalance and distribution shift
The study applies an ensemble of machine learning and deep learning models with synthetic oversampling on 2018-2020 data to nowcast visibility, finding a performance decline on 2021 test data attributed to distributional shift confirmed by Wasserstein distance on the SHAP-identified feature.
-
XiYOLO: Energy-Aware Object Detection via Iterative Architecture Search and Scaling
XiYOLO uses iterative energy-aware neural architecture search and scaling to produce object detectors with stronger accuracy-energy tradeoffs than YOLO baselines on GPUs and NPUs.
-
CNNs for Vis-NIR Chemometrics: From Contradiction to Conditional Design
Contradictions across CNN studies for Vis-NIR chemometrics are expected outcomes of uncontrolled variables in spectral physics and validation design, motivating a conditional rather than universal design framework.
-
A Heterogeneous Two-Stream Framework for Video Action Recognition with Comparative Fusion Analysis
DualStreamHybrid assigns ViT-Tiny to RGB and MobileNetV2 to 20-channel flow, projects features to common space, and finds cross-attention best on UCF11 (98.12%) while weighted fusion is most consistent on UCF50 (96.86%).
-
Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors
Augmenting zone-level MTPL claim frequency models with coordinates, environmental features at 5 km scale, and image embeddings improves predictive accuracy on unseen postcodes across GLM, regularized GLM, and tree-based models.
-
Opportunistic Bone-Loss Screening from Routine Knee Radiographs Using a Multi-Task Deep Learning Framework with Sensitivity-Constrained Threshold Optimization
STR-Net achieves AUROC of 0.933 for binary bone-loss screening and 0.801 correlation for T-score estimation from knee X-rays on a held-out test set.
-
Autonomous Unmanned Aircraft Systems for Enhanced Search and Rescue of Drowning Swimmers: Image-Based Localization and Mission Simulation
A UAS with YOLO-based swimmer detection and DES simulations reduces drowning rescue response time by a factor of five versus standard operations in tested lake areas.
-
Human Centered Non Intrusive Driver State Modeling Using Personalized Physiological Signals in Real World Automated Driving
Personalized deep learning models on multimodal physiological signals from an Empatica E4 sensor achieve 92.68% accuracy for driver state classification in real-world automated driving, compared to 54% for generalized models across four drivers.
-
A Compact and Efficient 1.251 Million Parameter Machine Learning CNN Model PD36-C for Plant Disease Detection: A Case Study
PD36-C is a 1.25 million parameter CNN achieving 99.53% average test accuracy on 38 plant disease classes from the New Plant Diseases Dataset, with a Qt-based app enabling edge deployment.
-
A Multi-modal Fusion Network for Star-Galaxy Classification from CSST Simulated Datasets
A ResNet-50 and BiLSTM multi-modal fusion network achieves 99.81% galaxy recall and 99.66% star recall on a CSST simulated dataset of 125,896 objects.
-
DSVTLA: Deep Swin Vision Transformer-Based Transfer Learning Architecture for Multi-Type Cancer Histopathological Cancer Image Classification
A hybrid Swin Transformer and ResNet50 transfer learning model achieves up to 100% test accuracy on multi-type cancer histopathological image classification.
-
FedKLPR: KL-Guided Pruning-Aware Federated Learning for Person Re-Identification
FedKLPR introduces KL-divergence-guided training, pruning-aware weighted aggregation, and cross-round recovery to achieve 40-42% communication reduction on ResNet-50 while preserving competitive accuracy in federated person re-identification across eight datasets.
-
Safeguarding AI in Medical Imaging: Post-Hoc Out-of-Distribution Detection with Normalizing Flows
Post-hoc normalizing flows for OOD detection in medical imaging achieve 84.61% AUROC on MedOOD and 93.8% on MedMNIST, outperforming ViM, MDS, and ReAct.
-
On $L^\infty$ stability for wave propagation and for linear inverse problems
Regularization of Fourier multipliers yields L^∞ stability for wave propagation and compact operator inversion in inverse problems.
-
AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions
The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.
-
Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends
Simulation study at 7 nm finds FeFET best for large arrays on ResNet-20/CIFAR-10, ReRAM competitive at higher bit-slices on ResNet-50/CIFAR-100, with partial wordline activation and custom ADC levels each raising accuracy by up to ~32%.
-
Stitching Gaps: Fusing Situated Perceptual Knowledge with Vision Transformers for High-Level Image Classification
Hybrid knowledge graph embeddings fused with vision transformer features outperform standard techniques on abstract concept classification by integrating situated perceptual knowledge from a new cultural image resource.
-
MalariAI: A Label-Resilient Decoupled Framework for Universal Cell Segmentation and Explainable Stage Classification in Dense Malaria Blood Smears
A decoupled watershed-plus-EfficientNet pipeline recovers 75.95% of cells without annotations and reaches 98.36% stage classification accuracy with instance-level explainability on the NIH BBBC041 dataset.
-
The Mathematics of AI Winters: The mathematical Taxonomy of Paradigm Fragility in AI Winter
Established mathematical bottlenecks in representation, optimization, complexity, and high-dimensional learning aligned with the central disappointments of early AI research periods.
-
Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics
A science of AI requires theories of training dynamics to predict outcomes from early signals, intervene on trajectories, and design procedures that reliably produce desired capabilities, biases, robustness, and safety properties.
-
Digital Image Forgery Detection Using Transfer Learning
A hybrid RGB plus compression-feature transfer learning pipeline with Youden-optimized thresholds improves forgery detection on the CASIA v2.0 dataset using off-the-shelf CNN backbones.
-
Multilevel neural networks with dual-stage feature fusion for human activity recognition
Multilevel CNN-LSTM architectures using both late and intermediate feature fusion achieve higher accuracy in human activity recognition than late fusion alone on two benchmark datasets.
-
Multilingual Vision-Language Models, A Survey
The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.
-
CNNs, Transformers, Hybrid, and Vision Language Models for Skin Cancer Detection
Benchmark of twelve models finds hybrid CNN-transformer architectures and a SigLIP vision-language model deliver the strongest overall performance on skin cancer detection using the PAD-UFES-20 dataset.
-
MiniGPT: Rebuilding GPT from First Principles
MiniGPT is a self-contained PyTorch implementation of standard GPT autoregressive modeling that reaches 1.478 validation loss on Tiny Shakespeare with a 10.77M-parameter model and produces recognizable Shakespeare-style text.
-
Vision Language Models versus Machine Learning Models Performance on Polyp Detection and Classification in Colonoscopy Images
Empirical benchmark of 11 models on polyp detection and classification in colonoscopy images shows ResNet50 highest, BiomedCLIP and GPT-4 moderate on detection, and general VLMs weak on classification.
-
A Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation
A survey that categorizes deep learning models for point cloud tasks by backbone architecture, evaluates benchmark performance, and outlines challenges and future research directions.
-
Deep Learning in the Automotive Industry: Recent Advances and Application Examples
An overview of deep learning applications and challenges in the automotive industry, covering ADAS, automated driving, virtual sensing, and data-driven development.
- Layer-wise Geometric Approximation Rates for Deep Networks