The remaining 13,500 entries were used for training and will henceforth be called the training set

Separate held-out test set:a test set was split off from the original dataset (size: 10% of the entire dataset = approximately 1500 entries) to later perform statistical analysis o

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Synthetic Eggs in Many Baskets: The Impact of Synthetic Data Diversity on LLM Fine-Tuning

cs.CL · 2025-11-03 · unverdicted · novelty 4.0

Fine-tuning LLMs on multi-source synthetic data mitigates distribution collapse and self-preference bias while increasing output quality relative to single-source or human-only fine-tuning.

citing papers explorer

Showing 1 of 1 citing paper.

Synthetic Eggs in Many Baskets: The Impact of Synthetic Data Diversity on LLM Fine-Tuning cs.CL · 2025-11-03 · unverdicted · none · ref 9
Fine-tuning LLMs on multi-source synthetic data mitigates distribution collapse and self-preference bias while increasing output quality relative to single-source or human-only fine-tuning.

The remaining 13,500 entries were used for training and will henceforth be called the training set

fields

years

verdicts

representative citing papers

citing papers explorer