InsightGen uses thematic clustering and graph neighborhood selection to generate diverse, relevant insights for open-ended document-grounded questions and releases the SCOpE-QA dataset of 3000 questions.
Title resolution pending
12 Pith papers cite this work, alongside 16,729 external citations. Polarity classification is still indexing.
authors
co-cited works
years
2026 12verdicts
UNVERDICTED 12representative citing papers
Soft-MSM is a smooth, gradient-enabled version of the context-aware MSM distance for time series alignment that outperforms Soft-DTW alternatives in clustering and nearest-centroid classification.
A single commercial LLM can cheaply generate large populations of behaviorally equivalent yet structurally diverse malware payloads.
GCD-FGL mitigates neighborhood absorption and global semantic inconsistency in federated generalized category discovery, delivering +4.86 average HRScore gain over baselines on five graph datasets.
CADI quantifies the preservation of relative cluster angles in low-dimensional projections using internal angles from point triples.
A broad empirical benchmark shows how 15 existing test selection metrics perform for fault detection, performance estimation, and retraining under corrupted, adversarial, temporal, natural, and label shifts across image, text, and Android data.
Machine learning clustering of meteor observations produces a new hardness classification H_class that refines traditional Kb models using more parameters and reveals compositional structure in meteoroid populations.
AFGNN detects API misuses in Java code more effectively than prior methods by representing usage as graphs and clustering learned embeddings from self-supervised training.
New hardware-usage-based similarity metrics can identify matching computational kernels between proxy applications and performance suites on both CPU and GPU systems.
PCA and k-means on NHANES data identified four reproductive phenotypes in U.S. women aged 20-44, with one fragile subgroup showing 77.5% early multimorbidity prevalence; XGBoost improved discrimination over logistic regression but had worse calibration.
wSSAS is a two-phase deterministic framework that uses hierarchical text organization and SNR-based feature prioritization to improve clustering integrity, categorization accuracy, and reproducibility when applying LLMs to large review datasets.
An unsupervised-to-supervised ML pipeline on UK NDNS data discovers four dietary patterns, reproduces them with macro-F1 0.963 using a surrogate classifier, and interprets them via SHAP for potential clinical use.
citing papers explorer
-
An Answer is just the Start: Related Insight Generation for Open-Ended Document-Grounded QA
InsightGen uses thematic clustering and graph neighborhood selection to generate diverse, relevant insights for open-ended document-grounded questions and releases the SCOpE-QA dataset of 3000 questions.
-
Soft-MSM: Differentiable Context-Aware Elastic Alignment for Time Series
Soft-MSM is a smooth, gradient-enabled version of the context-aware MSM distance for time series alignment that outperforms Soft-DTW alternatives in clustering and nearest-centroid classification.
-
The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code
A single commercial LLM can cheaply generate large populations of behaviorally equivalent yet structurally diverse malware payloads.
-
Generalized Category Discovery in Federated Graph Learning
GCD-FGL mitigates neighborhood absorption and global semantic inconsistency in federated generalized category discovery, delivering +4.86 average HRScore gain over baselines on five graph datasets.
-
Class Angular Distortion Index for Dimensionality Reduction
CADI quantifies the preservation of relative cluster angles in low-dimensional projections using internal angles from point triples.
-
Empirical Insights of Test Selection Metrics under Multiple Testing Objectives and Distribution Shifts
A broad empirical benchmark shows how 15 existing test selection metrics perform for fault detection, performance estimation, and retraining under corrupted, adversarial, temporal, natural, and label shifts across image, text, and Android data.
-
A Machine Learning Approach to Meteor Classification
Machine learning clustering of meteor observations produces a new hardness classification H_class that refines traditional Kb models using more parameters and reveals compositional structure in meteoroid populations.
-
AFGNN: API Misuse Detection using Graph Neural Networks and Clustering
AFGNN detects API misuses in Java code more effectively than prior methods by representing usage as graphs and clustering learned embeddings from self-supervised training.
-
On Similarity of Computational Kernels in our Codes and Proxies
New hardware-usage-based similarity metrics can identify matching computational kernels between proxy applications and performance suites on both CPU and GPU systems.
-
AI-Derived Reproductive Phenotypes and Explainable ML for Concurrent Early Multimorbidity in U.S. Women: NHANES 2017-March 2020
PCA and k-means on NHANES data identified four reproductive phenotypes in U.S. women aged 20-44, with one fragile subgroup showing 77.5% early multimorbidity prevalence; XGBoost improved discrimination over logistic regression but had worse calibration.
-
Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs
wSSAS is a two-phase deterministic framework that uses hierarchical text organization and SNR-based feature prioritization to improve clustering integrity, categorization accuracy, and reproducibility when applying LLMs to large review datasets.
-
An Explainable Unsupervised-to-Supervised Machine Learning Framework for Dietary Pattern Discovery Using UK National Dietary Survey Data
An unsupervised-to-supervised ML pipeline on UK NDNS data discovers four dietary patterns, reproduces them with macro-F1 0.963 using a surrogate classifier, and interprets them via SHAP for potential clinical use.