Sequential fine-tuning of LLaMA-3.1-8B on discourse elements in order outperforms independent and randomized curricula for AES on PERSUADE 2.0, with specific F1/accuracy gains and competitiveness vs. LLaMA-70B on conclusion scoring.
Bloemen- van Gurp, Andre Dekker, and Rianne R.R
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Early-token log-probabilities from LLM decoding are stronger predictors of reasoning quality than full-sequence statistics in multi-agent debate on essay scoring tasks.
In two-agent debate, log-probability confidence aligns with LLM-judged reasoning quality roughly twice as strongly for the Constructor (AUROC 0.804 for critical failure detection) as for the Auditor (0.634).
Enhanced Baymex with parallelization and adaptive steering yields statistically similar or better classification performance than decision trees, logistic regression, naive Bayes and random forests on clinical data while returning multiple compact, inspectable Bayesian networks.
citing papers explorer
-
The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring
Sequential fine-tuning of LLaMA-3.1-8B on discourse elements in order outperforms independent and randomized curricula for AES on PERSUADE 2.0, with specific F1/accuracy gains and competitiveness vs. LLaMA-70B on conclusion scoring.
-
Early-Token Confidence Predicts Reasoning Quality in Multi-Agent LLM Debate
Early-token log-probabilities from LLM decoding are stronger predictors of reasoning quality than full-sequence statistics in multi-agent debate on essay scoring tasks.
-
The Confident Liar: Diagnosing Multi-Agent Debate with Log-Probabilities and LLM-as-Judge
In two-agent debate, log-probability confidence aligns with LLM-judged reasoning quality roughly twice as strongly for the Constructor (AUROC 0.804 for critical failure detection) as for the Auditor (0.634).
-
Parallel Adaptive Multi-Objective Evolutionary Learning of Discretized Bayesian Network Classifiers for Clinical Data
Enhanced Baymex with parallelization and adaptive steering yields statistically similar or better classification performance than decision trees, logistic regression, naive Bayes and random forests on clinical data while returning multiple compact, inspectable Bayesian networks.