SynAE is a multi-metric framework that evaluates how well synthetic benchmarks replicate real data characteristics for multi-turn tool-calling agent testing.
Efficacy of synthetic data as a benchmark
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
NodeSynth creates evidence-based synthetic queries via a taxonomy generator to evaluate LLMs, revealing up to 5x higher failure rates than human benchmarks and gaps in guard models.
Tabular diffusion models leak membership information via attacks even with partial attacker knowledge, and common heuristic privacy metrics like distance-to-closest-record are unreliable.
Resampling methods achieve near-perfect utility (TSTR 0.997) but fail privacy (DCR ~0), while VAEs balance 83.3% utility with full privacy protection for synthetic educational data.
citing papers explorer
-
SynAE: A Framework for Measuring the Quality of Synthetic Data for Tool-Calling Agent Evaluations
SynAE is a multi-metric framework that evaluates how well synthetic benchmarks replicate real data characteristics for multi-turn tool-calling agent testing.
-
NodeSynth: Socially Aligned Synthetic Data for AI Evaluation
NodeSynth creates evidence-based synthetic queries via a taxonomy generator to evaluate LLMs, revealing up to 5x higher failure rates than human benchmarks and gaps in guard models.
-
On Privacy Leakage in Tabular Diffusion Models: Influential Factors, Attacker Knowledge, and Metrics
Tabular diffusion models leak membership information via attacks even with partial attacker knowledge, and common heuristic privacy metrics like distance-to-closest-record are unreliable.
-
Synthetic Data in Education: Empirical Insights from Traditional Resampling and Deep Generative Models
Resampling methods achieve near-perfect utility (TSTR 0.997) but fail privacy (DCR ~0), while VAEs balance 83.3% utility with full privacy protection for synthetic educational data.