hub Mixed citations

emnlp-main.308/

URL https://ojs · 2021 · DOI 10.18653/v1/2021

Mixed citation behavior. Most common role is background (57%).

34 Pith papers citing it

Background 57% of classified citations

open at publisher browse 34 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 4 baseline 1 dataset 1 method 1

citation-polarity summary

background 4 baseline 1 use dataset 1 use method 1

representative citing papers

GS-QA: A Benchmark for Geospatial Question Answering

cs.DB · 2026-05-21 · unverdicted · novelty 7.0

GS-QA is a new benchmark of 2,800 QA pairs on 28 templates using OSM and Wikipedia data to evaluate LLMs on spatial predicates, multi-source reasoning, and diverse answer types including distances and counts.

How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

A fitted iso-depth scaling law measures that one recurrence in looped transformers is worth r^0.46 unique blocks in validation loss.

NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions

cs.DB · 2026-04-13 · conditional · novelty 7.0

NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.

MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models

cs.CV · 2025-09-26 · unverdicted · novelty 7.0

MultiMat shows multimodal large models plus constrained search produce higher-quality procedural material graphs than text-only baselines on a new production dataset.

ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly

cs.CL · 2025-09-03 · unverdicted · novelty 7.0

ProMQA-Assembly is a new multimodal procedural QA dataset with 646 pairs on assembly activities, built via LLM-generated candidates verified by humans plus 81 task graphs, and used to benchmark multimodal models.

Towards Measuring the Representation of Subjective Global Opinions in Language Models

cs.CL · 2023-06-28 · conditional · novelty 7.0

LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.

TAVR-VLM: Risk-Conditioned Causal Grounding for Hallucination-Resistant Report Generation

cs.AI · 2026-06-25 · unverdicted · novelty 6.0

TAVR-VLM introduces Risk-Conditioned Causal Grounding Attention to achieve SOTA AUROC 0.896, CIDEr 0.936, and 8.1% hallucination rate on a 1,482-patient TAVR cohort.

On the Position Bias of On-Policy Distillation

cs.LG · 2026-06-21 · unverdicted · novelty 6.0 · 2 refs

Position bias in on-policy distillation degrades later-token supervision; IW-OPD weights tokens by accumulated discrepancy, yielding faster convergence and up to 6.9 point gains on AIME-2025.

Temporal Preference Optimization for Unsupervised Retrieval

cs.IR · 2026-06-16 · unverdicted · novelty 6.0

TPOUR uses a novel TRPO method to improve unsupervised retrievers for temporal relevance, outperforming baselines including a much larger model on nDCG@5 for explicit and implicit time queries.

M\"OVE: A Holistic LLM Benchmark for the German Public Sector

cs.CL · 2026-06-11 · unverdicted · novelty 6.0

MÖVE presents a new German-language benchmark evaluating 39 LLMs on performance and governance criteria using ten public-administration datasets.

Sparsely gated tiny linear experts

cs.LG · 2026-06-05 · unverdicted · novelty 6.0

Sgatlin replaces transformer FF layers with sparse single linear neurons, improving perplexity across compute budgets and enabling direct interpretation of semantically clustered circuits for factual recall.

Multi-Agent Framework Leveraging Knowledge Graphs for Virtual Commissioning Models

cs.CE · 2026-06-02 · unverdicted · novelty 6.0

A knowledge-graph multi-agent framework semi-automates virtual commissioning model creation by integrating Siemens TIA Portal and NX MCD data for system understanding, component generation, and signal mapping.

An Assessment of Human vs. Model Uncertainty in Soft-Label Learning and Calibration

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

Controlled experiments on MNIST show human soft-labels act as a regularizer that improves calibration on hard samples and aligns model uncertainty with humans, beyond accuracy gains from correcting mislabels.

Aligning LLM Uncertainty with Human Disagreement in Subjectivity Analysis

cs.CL · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

DPUA is a two-phase framework that aligns LLM uncertainty expressions with human disagreement distributions in subjectivity analysis while preserving task performance.

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

Latent-GRPO stabilizes reinforcement learning in latent space, delivering 7.86 Pass@1 gains on low-difficulty tasks over latent baselines and 4.27 points over explicit GRPO on high-difficulty tasks with 3-4x shorter reasoning chains.

Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models

cs.CR · 2026-04-27 · unverdicted · novelty 6.0

LCF detects multiple LLM runtime threats by computing aggregated diagonal Mahalanobis distances on layer-wise hidden-state differences, calibrated on clean examples, achieving high detection rates with low overhead across several model architectures.

Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks

cs.CR · 2026-04-22 · unverdicted · novelty 6.0

A context-aware Sentinel-Strategist system for RAG selectively applies defenses to block membership inference and data poisoning while recovering most retrieval utility compared to always-on defense stacks.

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

cs.AI · 2026-04-20 · conditional · novelty 6.0

Token-level contrastive attribution yields informative signals for some LLM benchmark failures but is not universally applicable across datasets and models.

When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making

cs.AI · 2026-02-03 · unverdicted · novelty 6.0

Adversarial explanation attacks preserve nearly all human trust in wrong AI outputs by using persuasive framing, shown in a study varying reasoning, evidence, style, and format with over 200 participants.

HyEm: Query-Adaptive Hyperbolic Retrieval for Biomedical Ontologies via Euclidean Vector Indexing

cs.IR · 2026-01-26 · unverdicted · novelty 6.0

HyEm maps radius-controlled hyperbolic ontology embeddings to Euclidean space for ANN indexing and applies query-adaptive hyperbolic reranking to improve hierarchy-aware retrieval while preserving most Euclidean performance on flat queries.

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

cs.CL · 2023-05-13 · conditional · novelty 6.0

CodeT5+ is a flexible encoder-decoder LLM family for code pretrained with diverse objectives on multilingual corpora and initialized from existing LLMs, achieving state-of-the-art results on code generation, completion, math programming, and retrieval tasks including new SoTA on HumanEval with the 1

Improving alignment of dialogue agents via targeted human judgements

cs.LG · 2022-09-28 · unverdicted · novelty 6.0

Sparrow uses targeted rule-based human feedback and evidence provision to outperform baselines in preference while violating rules only 8% of the time under adversarial probing.

Empirical Study for Structured Output Control in LLMs for Software Engineering

cs.SE · 2026-06-08 · conditional · novelty 5.0

Empirical benchmarks on four SE tasks show grammar-constrained decoding and TTMG eliminate most syntax errors in LLM outputs while structural and semantic errors persist and cascade in downstream tools.

Training Prompt Matters: State-Adaptive Optimization for Robust Fine-Tuning

cs.CL · 2026-06-01 · unverdicted · novelty 5.0

Paraphrased training prompts induce correlated cross-task differences in forgetting and generalization during LLM fine-tuning; superior prompts can be identified via pre-learning task loss and used in a state-adaptive optimization method (SAPO) to improve robustness.

citing papers explorer

Showing 34 of 34 citing papers.

GS-QA: A Benchmark for Geospatial Question Answering cs.DB · 2026-05-21 · unverdicted · none · ref 10
GS-QA is a new benchmark of 2,800 QA pairs on 28 templates using OSM and Wikipedia data to evaluate LLMs on spatial predicates, multi-source reasoning, and diverse answer types including distances and counts.
How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models cs.LG · 2026-04-22 · unverdicted · none · ref 49
A fitted iso-depth scaling law measures that one recurrence in looped transformers is worth r^0.46 unique blocks in validation loss.
NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions cs.DB · 2026-04-13 · conditional · none · ref 13
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models cs.CV · 2025-09-26 · unverdicted · none · ref 7
MultiMat shows multimodal large models plus constrained search produce higher-quality procedural material graphs than text-only baselines on a new production dataset.
ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly cs.CL · 2025-09-03 · unverdicted · none · ref 42
ProMQA-Assembly is a new multimodal procedural QA dataset with 646 pairs on assembly activities, built via LLM-generated candidates verified by humans plus 81 task graphs, and used to benchmark multimodal models.
Towards Measuring the Representation of Subjective Global Opinions in Language Models cs.CL · 2023-06-28 · conditional · none · ref 43
LLMs default to responses more similar to opinions from the USA and some European and South American countries; prompting for a country shifts alignment but can introduce stereotypes, while translation does not reliably match language speakers.
TAVR-VLM: Risk-Conditioned Causal Grounding for Hallucination-Resistant Report Generation cs.AI · 2026-06-25 · unverdicted · none · ref 4
TAVR-VLM introduces Risk-Conditioned Causal Grounding Attention to achieve SOTA AUROC 0.896, CIDEr 0.936, and 8.1% hallucination rate on a 1,482-patient TAVR cohort.
On the Position Bias of On-Policy Distillation cs.LG · 2026-06-21 · unverdicted · none · ref 20 · 2 links
Position bias in on-policy distillation degrades later-token supervision; IW-OPD weights tokens by accumulated discrepancy, yielding faster convergence and up to 6.9 point gains on AIME-2025.
Temporal Preference Optimization for Unsupervised Retrieval cs.IR · 2026-06-16 · unverdicted · none · ref 17
TPOUR uses a novel TRPO method to improve unsupervised retrievers for temporal relevance, outperforming baselines including a much larger model on nDCG@5 for explicit and implicit time queries.
M\"OVE: A Holistic LLM Benchmark for the German Public Sector cs.CL · 2026-06-11 · unverdicted · none · ref 44
MÖVE presents a new German-language benchmark evaluating 39 LLMs on performance and governance criteria using ten public-administration datasets.
Sparsely gated tiny linear experts cs.LG · 2026-06-05 · unverdicted · none · ref 56
Sgatlin replaces transformer FF layers with sparse single linear neurons, improving perplexity across compute budgets and enabling direct interpretation of semantically clustered circuits for factual recall.
Multi-Agent Framework Leveraging Knowledge Graphs for Virtual Commissioning Models cs.CE · 2026-06-02 · unverdicted · none · ref 24
A knowledge-graph multi-agent framework semi-automates virtual commissioning model creation by integrating Siemens TIA Portal and NX MCD data for system understanding, component generation, and signal mapping.
An Assessment of Human vs. Model Uncertainty in Soft-Label Learning and Calibration cs.LG · 2026-05-18 · unverdicted · none · ref 48
Controlled experiments on MNIST show human soft-labels act as a regularizer that improves calibration on hard samples and aligns model uncertainty with humans, beyond accuracy gains from correcting mislabels.
Aligning LLM Uncertainty with Human Disagreement in Subjectivity Analysis cs.CL · 2026-05-11 · unverdicted · none · ref 23 · 2 links
DPUA is a two-phase framework that aligns LLM uncertainty expressions with human disagreement distributions in subjectivity analysis while preserving task performance.
Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning cs.LG · 2026-04-30 · unverdicted · none · ref 16
Latent-GRPO stabilizes reinforcement learning in latent space, delivering 7.86 Pass@1 gains on low-difficulty tasks over latent baselines and 4.27 points over explicit GRPO on high-difficulty tasks with 3-4x shorter reasoning chains.
Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models cs.CR · 2026-04-27 · unverdicted · none · ref 4
LCF detects multiple LLM runtime threats by computing aggregated diagonal Mahalanobis distances on layer-wise hidden-state differences, calibrated on clean examples, achieving high detection rates with low overhead across several model architectures.
Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks cs.CR · 2026-04-22 · unverdicted · none · ref 19
A context-aware Sentinel-Strategist system for RAG selectively applies defenses to block membership inference and data poisoning while recovering most retrieval utility compared to always-on defense stacks.
Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks cs.AI · 2026-04-20 · conditional · none · ref 34
Token-level contrastive attribution yields informative signals for some LLM benchmark failures but is not universally applicable across datasets and models.
When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making cs.AI · 2026-02-03 · unverdicted · none · ref 73
Adversarial explanation attacks preserve nearly all human trust in wrong AI outputs by using persuasive framing, shown in a study varying reasoning, evidence, style, and format with over 200 participants.
HyEm: Query-Adaptive Hyperbolic Retrieval for Biomedical Ontologies via Euclidean Vector Indexing cs.IR · 2026-01-26 · unverdicted · none · ref 63
HyEm maps radius-controlled hyperbolic ontology embeddings to Euclidean space for ANN indexing and applies query-adaptive hyperbolic reranking to improve hierarchy-aware retrieval while preserving most Euclidean performance on flat queries.
CodeT5+: Open Code Large Language Models for Code Understanding and Generation cs.CL · 2023-05-13 · conditional · none · ref 15
CodeT5+ is a flexible encoder-decoder LLM family for code pretrained with diverse objectives on multilingual corpora and initialized from existing LLMs, achieving state-of-the-art results on code generation, completion, math programming, and retrieval tasks including new SoTA on HumanEval with the 1
Improving alignment of dialogue agents via targeted human judgements cs.LG · 2022-09-28 · unverdicted · none · ref 8
Sparrow uses targeted rule-based human feedback and evidence provision to outperform baselines in preference while violating rules only 8% of the time under adversarial probing.
Empirical Study for Structured Output Control in LLMs for Software Engineering cs.SE · 2026-06-08 · conditional · none · ref 27
Empirical benchmarks on four SE tasks show grammar-constrained decoding and TTMG eliminate most syntax errors in LLM outputs while structural and semantic errors persist and cascade in downstream tools.
Training Prompt Matters: State-Adaptive Optimization for Robust Fine-Tuning cs.CL · 2026-06-01 · unverdicted · none · ref 4
Paraphrased training prompts induce correlated cross-task differences in forgetting and generalization during LLM fine-tuning; superior prompts can be identified via pre-learning task loss and used in a state-adaptive optimization method (SAPO) to improve robustness.
User-Aware Active Knowledge Acquisition for Emotional Support Dialogue cs.CL · 2026-05-28 · unverdicted · none · ref 8
UKA is a gradient-free active dialogue learning framework using Theory-of-Mind uncertainty estimation to acquire user-aligned conversational knowledge, outperforming baselines in dialogue quality and user alignment across benchmarks.
A Comparative Study of Controlled Text Generation Systems Using Level-Playing-Field Evaluation Principles cs.CL · 2026-05-12 · unverdicted · none · ref 22
Re-evaluating controlled text generation systems under standardized conditions reveals that many published performance claims do not hold, highlighting the need for consistent evaluation practices.
CEZSAR: A Contrastive Embedding Method for Zero-Shot Action Recognition cs.CV · 2026-05-01 · unverdicted · none · ref 36
CEZSAR uses contrastive learning to align video and sentence embeddings with automatic negative sampling, claiming state-of-the-art zero-shot action recognition on UCF-101 and Kinetics-400.
Are Latent Reasoning Models Easily Interpretable? cs.LG · 2026-04-06 · unverdicted · none · ref 1
Latent reasoning models often ignore their latent tokens for predictions and their correct outputs can be decoded into natural language reasoning traces more reliably than incorrect outputs.
AI Evaluation Should Require Standardized Item-Level Data Releases cs.AI · 2026-02-27 · conditional · none · ref 10 · 2 links
AI benchmark evaluations require standardized item-level data releases as core infrastructure to support validity assessment, demonstrated via the OpenEval archive of 10M responses across 155k items.
Bridging Semantics and Strategy: A Dual-Stream Graph Network for Equitable Negotiation Forecasting cs.GT · 2026-05-28 · unverdicted · none · ref 1
ST-GFN adaptively fuses semantic and strategic signals in negotiations using gated fusion and fairness regularization, showing 43.8% reduction in inequality discrepancy on DealOrNoDeal and CaSiNo.
Automatic Combination of Sample Selection Strategies for Few-Shot Learning cs.LG · 2024-02-05 · unverdicted · none · ref 16
ACSESS automatically combines 23 sample selection strategies to outperform individual strategies in few-shot learning on text and image datasets.
AgriIR: A Scalable Framework for Domain-Specific Knowledge Retrieval cs.IR · 2026-03-17 · unverdicted · none · ref 16
AgriIR is a configurable RAG framework using modular stages and 1B-parameter models to deliver grounded, citable answers for Indian agricultural information access.
Multimodal Sexism Identification and Characterization using Large Language Models and Gradient Boosting cs.CV · 2026-06-04 · unverdicted · none · ref 26
A late-fusion gradient-boosting pipeline with LLM semantic features is submitted to the EXIST 2026 lab for sexism identification in memes and videos, showing mixed generalization from development to test data.
Pseudo-Deliberation in Language Models: When Reasoning Fails to Align Values and Actions cs.CL · 2026-05-11 · unreviewed · ref 14

emnlp-main.308/

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer