Adapting Vision-Language Models for Neutrino Event Classification in High-Energy Physics

Dikshant Sagar , Kaiwen Yu , Alejandro Yankelevich , Jianming Bian , Pierre Baldi

Authors on Pith no claims yet

classification 💻 cs.LG cs.AIcs.CVhep-ex

keywords modelsneutrinoclassificationphysicslanguagetransformervisionarchitecture

read the original abstract

Recent advances in Large Language Models (LLMs) have demonstrated their remarkable capacity to process and reason over structured and unstructured data modalities beyond natural language. In this work, we explore the applications of Vision Language Models (VLMs), specifically a fine-tuned variant of LLaMA 3.2 to the task of identifying neutrino interactions in pixelated detector data from high-energy physics (HEP) experiments. We benchmark this model against a state-of-the-art convolutional neural network (CNN) architecture, similar to those used in major neutrino experiments, which have achieved high efficiency and purity in classifying electron and muon neutrino events, and also a Vision Transformer (ViT-h/14), which is the same architecture inside the VLM's vision encoder. Our evaluation considers both classification performance and interpretability of the model predictions, comparing a VLM with a vision-only transformer (ViT) and a convolutional neural network (CNN) baseline. We find that transformer-based architectures outperform conventional CNNs in classification accuracy and robustness, with the VLM providing additional flexibility through the integration of auxiliary textual or semantic information and enabling more interpretable, reasoning-based predictions. These results highlight the potential of large transformer models, particularly vision-language models, as general-purpose backbones for physics event classification, combining strong performance, robustness, and interpretability, and opening new avenues for multimodal reasoning in experimental neutrino physics.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Towards foundation-style models for energy-frontier heterogeneous neutrino detectors via self-supervised pre-training
hep-ex 2026-04 conditional novelty 6.0

Self-supervised pre-training on multimodal neutrino detector simulations produces reusable representations that improve downstream classification, regression, and data efficiency over training from scratch.