citation dossier

arXiv preprint arXiv:2406.06608 , year=

Schulhoff S, Ilie M, Balepur N, Kahadze K, Liu A, Si C, et al · 2024 · arXiv 2406.06608

17Pith papers citing it

17reference links

cs.AItop field · 5 papers

UNVERDICTEDtop verdict bucket · 15 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 17 reviewed papers. Its strongest current cluster is cs.AI (5 papers). The largest review-status bucket among citing papers is UNVERDICTED (15 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data

cs.AI · 2026-04-30 · unverdicted · novelty 7.0

TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design mattering more than model scale.

Can Vision Language Models Judge Action Quality? An Empirical Evaluation

cs.CV · 2026-04-09 · conditional · novelty 7.0

Vision-language models perform only marginally above random on action quality assessment and retain systematic biases even after targeted prompting and contrastive reformulation.

Alignment has a Fantasia Problem

cs.AI · 2026-04-23 · unverdicted · novelty 6.0

AI alignment must move beyond assuming users have fully formed goals and instead provide active cognitive support to help form and refine intent over time.

From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers

cs.CR · 2026-04-20 · unverdicted · novelty 6.0

Arbiter-K is a new execution architecture that treats LLMs as probabilistic processors inside a neuro-symbolic kernel with a semantic ISA to enable deterministic security enforcement and unsafe trajectory interdiction in agentic AI.

LLMs for Qualitative Data Analysis Fail on Security-specificComments in Human Experiments

cs.SE · 2026-04-12 · unverdicted · novelty 6.0

LLMs improve with detailed code descriptions but remain insufficient to replace human annotators for security-specific qualitative coding.

LLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluation

cs.AI · 2026-05-11 · unverdicted · novelty 5.0

LLARS is a new integrated platform that combines collaborative prompt authoring, cost-controlled batch generation, and hybrid evaluation to help domain experts and developers jointly build and assess LLM systems.

U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning

cs.AI · 2026-05-04 · unverdicted · novelty 5.0

U-Define improves user control in LLM planning by letting people define hard rules and soft preferences in natural language with matching verification methods, raising usefulness and satisfaction scores.

Looking Into the Past: Eye Movements Characterize Elements of Autobiographical Recall in Interviews with Holocaust Survivors

cs.MM · 2026-04-23 · unverdicted · novelty 5.0

Eye movements during Holocaust survivor interviews vary by episodic, semantic, affective and temporal memory dimensions, with pre-onset gaze sufficient to predict sentence temporal context.

OOPrompt: Reifying Intents into Structured Artifacts for Modular and Iterative Prompting

cs.HC · 2026-04-21 · unverdicted · novelty 5.0

OOPrompt reifies user intents into structured manipulable artifacts to enable modular and iterative prompting in LLM-based interactive systems.

Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis

cs.AI · 2026-04-12 · unverdicted · novelty 5.0

Agent Mentor analyzes semantic trajectories in agent logs to identify undesired behaviors and derives corrective prompt instructions, yielding measurable accuracy gains on benchmark tasks across three agent setups.

Confidence Without Competence in AI-Assisted Knowledge Work

cs.HC · 2026-04-10 · unverdicted · novelty 5.0

Standard LLM chats produce high perceived understanding but low objective learning in students, while future-self explanations best align confidence with actual gains and guided hints maximize learning with moderate workload.

The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure

cs.CL · 2026-04-03 · accept · novelty 5.0

PICCO is a five-element reference architecture (Persona, Instructions, Context, Constraints, Output) for structuring LLM prompts, derived from synthesizing prior frameworks along with a taxonomy distinguishing prompt concepts.

Characterizing Students' LLM Usage Behaviors and Their Association with Learning in Critical Thinking Tasks

cs.HC · 2026-05-06 · unverdicted · novelty 4.0

Refined bottom-up categories of LLM usage in critical thinking homework, labeled by student initiative, are examined for associations with midterm performance across two course offerings.

Hint-Writing with Deferred AI Assistance: Fostering Critical Engagement in Data Science Education

cs.HC · 2026-04-21 · unverdicted · novelty 4.0

In a randomized experiment with 97 graduate students, deferred AI assistance produced the highest-quality hints and helped students spot more code mistakes than independent writing or immediate AI help.

Prompt Engineering Strategies for LLM-based Qualitative Coding of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study

cs.SE · 2026-05-08 · unverdicted · novelty 3.0

Multi-shot prompting raises agreement with humans for Claude Haiku but not DeepSeek-Chat or Gemini 2.5 Flash, with models showing different stability and a consistent bias toward over-labeling negative feedback.

CLaC at SemEval-2026 Task 6: Response Clarity Detection in Political Discourse

cs.CL · 2026-05-04 · unverdicted · novelty 3.0

An LLM ensemble reached 80 macro-F1 on 3-class clarity detection and 59 on 9-class evasion detection, with partial layer unfreezing and multilingual ensembles improving encoder results while enriched context helped only LLMs.

A Reproducibility Study of Metacognitive Retrieval-Augmented Generation

cs.IR · 2026-04-21 · unverdicted · novelty 3.0

MetaRAG is only partially reproducible with lower absolute scores than originally reported, gains substantially from reranking, and shows greater robustness than SIM-RAG under extended retrieval features.

citing papers explorer

Showing 17 of 17 citing papers.

TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data cs.AI · 2026-04-30 · unverdicted · none · ref 38
TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design mattering more than model scale.
Can Vision Language Models Judge Action Quality? An Empirical Evaluation cs.CV · 2026-04-09 · conditional · none · ref 27
Vision-language models perform only marginally above random on action quality assessment and retain systematic biases even after targeted prompting and contrastive reformulation.
Alignment has a Fantasia Problem cs.AI · 2026-04-23 · unverdicted · none · ref 51
AI alignment must move beyond assuming users have fully formed goals and instead provide active cognitive support to help form and refine intent over time.
From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers cs.CR · 2026-04-20 · unverdicted · none · ref 18
Arbiter-K is a new execution architecture that treats LLMs as probabilistic processors inside a neuro-symbolic kernel with a semantic ISA to enable deterministic security enforcement and unsafe trajectory interdiction in agentic AI.
LLMs for Qualitative Data Analysis Fail on Security-specificComments in Human Experiments cs.SE · 2026-04-12 · unverdicted · none · ref 49
LLMs improve with detailed code descriptions but remain insufficient to replace human annotators for security-specific qualitative coding.
LLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluation cs.AI · 2026-05-11 · unverdicted · none · ref 27
LLARS is a new integrated platform that combines collaborative prompt authoring, cost-controlled batch generation, and hybrid evaluation to help domain experts and developers jointly build and assess LLM systems.
U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning cs.AI · 2026-05-04 · unverdicted · none · ref 102
U-Define improves user control in LLM planning by letting people define hard rules and soft preferences in natural language with matching verification methods, raising usefulness and satisfaction scores.
Looking Into the Past: Eye Movements Characterize Elements of Autobiographical Recall in Interviews with Holocaust Survivors cs.MM · 2026-04-23 · unverdicted · none · ref 24
Eye movements during Holocaust survivor interviews vary by episodic, semantic, affective and temporal memory dimensions, with pre-onset gaze sufficient to predict sentence temporal context.
OOPrompt: Reifying Intents into Structured Artifacts for Modular and Iterative Prompting cs.HC · 2026-04-21 · unverdicted · none · ref 39
OOPrompt reifies user intents into structured manipulable artifacts to enable modular and iterative prompting in LLM-based interactive systems.
Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis cs.AI · 2026-04-12 · unverdicted · none · ref 20
Agent Mentor analyzes semantic trajectories in agent logs to identify undesired behaviors and derives corrective prompt instructions, yielding measurable accuracy gains on benchmark tasks across three agent setups.
Confidence Without Competence in AI-Assisted Knowledge Work cs.HC · 2026-04-10 · unverdicted · none · ref 61
Standard LLM chats produce high perceived understanding but low objective learning in students, while future-self explanations best align confidence with actual gains and guided hints maximize learning with moderate workload.
The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure cs.CL · 2026-04-03 · accept · none · ref 3
PICCO is a five-element reference architecture (Persona, Instructions, Context, Constraints, Output) for structuring LLM prompts, derived from synthesizing prior frameworks along with a taxonomy distinguishing prompt concepts.
Characterizing Students' LLM Usage Behaviors and Their Association with Learning in Critical Thinking Tasks cs.HC · 2026-05-06 · unverdicted · none · ref 27
Refined bottom-up categories of LLM usage in critical thinking homework, labeled by student initiative, are examined for associations with midterm performance across two course offerings.
Hint-Writing with Deferred AI Assistance: Fostering Critical Engagement in Data Science Education cs.HC · 2026-04-21 · unverdicted · none · ref 53
In a randomized experiment with 97 graduate students, deferred AI assistance produced the highest-quality hints and helped students spot more code mistakes than independent writing or immediate AI help.
Prompt Engineering Strategies for LLM-based Qualitative Coding of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study cs.SE · 2026-05-08 · unverdicted · none · ref 11
Multi-shot prompting raises agreement with humans for Claude Haiku but not DeepSeek-Chat or Gemini 2.5 Flash, with models showing different stability and a consistent bias toward over-labeling negative feedback.
CLaC at SemEval-2026 Task 6: Response Clarity Detection in Political Discourse cs.CL · 2026-05-04 · unverdicted · none · ref 25
An LLM ensemble reached 80 macro-F1 on 3-class clarity detection and 59 on 9-class evasion detection, with partial layer unfreezing and multilingual ensembles improving encoder results while enriched context helped only LLMs.
A Reproducibility Study of Metacognitive Retrieval-Augmented Generation cs.IR · 2026-04-21 · unverdicted · none · ref 40
MetaRAG is only partially reproducible with lower absolute scores than originally reported, gains substantially from reranking, and shows greater robustness than SIM-RAG under extended retrieval features.

arXiv preprint arXiv:2406.06608 , year=

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer