PolSeT: Polish Semantics of Timbre Dataset
Pith reviewed 2026-06-26 16:02 UTC · model grok-4.3
The pith
PolSeT releases raw Polish timbre descriptor data from 165 listeners on 18 instrument sounds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PolSeT consists of a free-verbalization lexicon of Polish timbre descriptors gathered from 60 listeners and subsequent semantic differential ratings on eight bipolar scales for 18 instrument sounds collected from 105 listeners, together with demographics, audio files, and extraction scripts, to support psychoacoustic and MIR research in Polish and cross-cultural settings.
What carries the argument
The sequential design that first extracts a Polish lexicon through free verbalization and then maps it onto fixed bipolar rating scales for quantitative timbre judgments.
If this is right
- Quantitative ratings become available for statistical modeling of timbre perception in Polish.
- The raw responses and code enable direct replication and extension of the semantic differential task.
- Demographic breakdowns allow subgroup analyses by musical experience or age.
- The dataset can serve as training material for multilingual embedding models that map timbre words across languages.
Where Pith is reading between the lines
- The data could be combined with equivalent datasets in other languages to test for universal versus language-specific timbre dimensions.
- Reliability metrics from the repeated trials could be used to weight individual participant contributions in future models.
- The acoustic feature code could be applied to additional non-instrument sounds to check whether the same scales generalize.
Load-bearing premise
The eight bipolar scales drawn from the free-verbalization results are sufficient and unbiased for representing Polish timbre semantics on the chosen instrument sounds.
What would settle it
A new study in which participants produce many high-frequency descriptors from the original lexicon that fall outside the eight bipolar scales or show low test-retest reliability on the scales.
Figures
read the original abstract
This data report introduces PolSeT (Polish Semantic Timbre), a dataset designed to facilitate research in psychoacoustics and Music Information Retrieval (MIR) in Polish and cross-cultural contexts. The dataset contains data from two sequential experiments. Experiment 1 (N=60) was a free-verbalization task aimed at creating a lexicon of Polish semantic descriptors. Using 11 stimuli, a total of 1901 descriptors (701 unique) were gathered. Experiment 2 (N=105) utilized this lexicon to conduct a semantic differential study, where participants rated 18 instrument sounds on 8 bipolar scales, with repeated trials for reliability analysis. The released dataset includes raw listener responses, comprehensive demographics (experience, gender, age), audio stimuli, and extracted acoustic features with Python extraction code. This dataset addresses a gap in open timbre research data, providing both the qualitative linguistic groundwork and the quantitative ratings necessary for psychoacoustic research and the training of multilingual semantic embedding models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This data report introduces the PolSeT dataset from two experiments on Polish timbre semantics. Experiment 1 (N=60) collected 1901 free-verbalization descriptors (701 unique) from 11 stimuli to build a lexicon. Experiment 2 (N=105) applied 8 bipolar scales derived from that lexicon to rate 18 instrument sounds, with repeated trials. The release includes raw responses, demographics (experience, gender, age), audio stimuli, extracted acoustic features, and Python extraction code, aimed at supporting psychoacoustics, MIR, and multilingual semantic models.
Significance. If the described collection and release hold, the dataset supplies a needed open resource for Polish-language timbre semantics, enabling both qualitative lexicon work and quantitative rating analyses. Releasing the full raw free-verbalization responses alongside the scale ratings and code allows users to re-derive scales or conduct alternative analyses, which strengthens utility for cross-cultural and embedding-model research.
minor comments (2)
- The abstract states that the 8 bipolar scales were 'utilized this lexicon' but does not specify the exact derivation procedure or selection criteria from the 701 unique descriptors; a brief methods subsection or table listing the final scales with their source descriptors would improve transparency.
- Stimulus counts differ between experiments (11 vs. 18); a short note on whether the additional 7 sounds in Experiment 2 were chosen to match acoustic properties or for other reasons would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of PolSeT and the recommendation to accept. The review accurately captures the dataset's structure, release contents, and intended utility for psychoacoustics, MIR, and cross-lingual semantic research.
Circularity Check
No significant circularity
full rationale
This is a data-report paper whose central claim is the release of raw listener responses, demographics, audio stimuli, and acoustic features from two empirical experiments. No derivations, predictions, fitted quantities, or self-citation chains appear in the described content; the eight bipolar scales are presented as an output of Experiment 1 rather than an input that is then predicted. The dataset explicitly includes the full raw free-verbalization responses, allowing any user to re-derive scales independently. The work is therefore self-contained against external benchmarks with no load-bearing step that reduces to its own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[2]
Granular data containing raw listener answers is limited and even dedicated timbre evaluation datasets typically contain only averaged results (Jiang et al., 2019)
among others, very few have made their data available, often publishing only averages and resulting conclusions. Granular data containing raw listener answers is limited and even dedicated timbre evaluation datasets typically contain only averaged results (Jiang et al., 2019). This limitation hinders the development of cross-cultural models of timbre perc...
2019
-
[3]
Priority was given to scales that align with previous research in other languages (Alluri & Toiviainen, 2010; Faure et al., 1996; Von Bismarck, 1974; Zacharakis et al.,
2010
-
[4]
(Right) Histogram of intra-rater reliability, displaying the distribution of Pearson correlation coefficients calculated between original and repeated trials for each participant
(Left) Correlation matrix of the 8 semantic scales across all ratings. (Right) Histogram of intra-rater reliability, displaying the distribution of Pearson correlation coefficients calculated between original and repeated trials for each participant. ACOUSTIC FEATURES A set of acoustic features was extracted from the audio files used in Experiment 2, incl...
2015
-
[5]
stimuli across 8 bipolar semantic scales from 105 participants polset_exp2_participants.csv CSV Participant information: age, gender, musical experience and listening habits polset_exp2_scales_glossary.csv CSV Polish-English translations for all scale anchors polset_exp2_raw_responses.zip ZIP Raw export of the full experimental data. /audio_features polse...
arXiv 2025
-
[6]
Correspondence can be addressed to: jjasinsk@agh.edu.pl REFERENCES Alluri, V., & Toiviainen, P. (2010). Exploring perceptual and acoustical correlates of polyphonic timbre. Music Perception, 27(3), 223–242. Faure, A., McAdams, S., & Nosulenko, V. (1996). Verbal correlates of perceptual dimensions of timbre. 4th International Conference on Music Perception...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.