PolSeT: Polish Semantics of Timbre Dataset

Jan Jasi\'nski

arxiv: 2606.19987 · v1 · pith:VNXHYIQ6new · submitted 2026-06-18 · 💻 cs.SD · eess.AS

PolSeT: Polish Semantics of Timbre Dataset

Jan Jasi\'nski This is my paper

Pith reviewed 2026-06-26 16:02 UTC · model grok-4.3

classification 💻 cs.SD eess.AS

keywords Polish timbre semanticssemantic differentialtimbre datasetpsychoacousticsmusical instrumentsmusic information retrievalopen data releaselistener ratings

0 comments

The pith

PolSeT releases raw Polish timbre descriptor data from 165 listeners on 18 instrument sounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PolSeT as an open dataset built from two experiments to capture Polish-language descriptions of musical timbre. Experiment 1 collected 1901 free verbalizations from 60 participants on 11 stimuli to form a lexicon of 701 unique terms. Experiment 2 applied eight bipolar scales derived from that lexicon to ratings of 18 instrument sounds by 105 participants, including repeated trials. The release supplies the full raw responses, listener demographics on experience gender and age, the audio stimuli, and Python code for acoustic feature extraction. This supplies both qualitative linguistic material and quantitative ratings for studies of timbre perception and for training semantic models.

Core claim

PolSeT consists of a free-verbalization lexicon of Polish timbre descriptors gathered from 60 listeners and subsequent semantic differential ratings on eight bipolar scales for 18 instrument sounds collected from 105 listeners, together with demographics, audio files, and extraction scripts, to support psychoacoustic and MIR research in Polish and cross-cultural settings.

What carries the argument

The sequential design that first extracts a Polish lexicon through free verbalization and then maps it onto fixed bipolar rating scales for quantitative timbre judgments.

If this is right

Quantitative ratings become available for statistical modeling of timbre perception in Polish.
The raw responses and code enable direct replication and extension of the semantic differential task.
Demographic breakdowns allow subgroup analyses by musical experience or age.
The dataset can serve as training material for multilingual embedding models that map timbre words across languages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The data could be combined with equivalent datasets in other languages to test for universal versus language-specific timbre dimensions.
Reliability metrics from the repeated trials could be used to weight individual participant contributions in future models.
The acoustic feature code could be applied to additional non-instrument sounds to check whether the same scales generalize.

Load-bearing premise

The eight bipolar scales drawn from the free-verbalization results are sufficient and unbiased for representing Polish timbre semantics on the chosen instrument sounds.

What would settle it

A new study in which participants produce many high-frequency descriptors from the original lexicon that fall outside the eight bipolar scales or show low test-retest reliability on the scales.

Figures

Figures reproduced from arXiv: 2606.19987 by Jan Jasi\'nski.

**Figure 1.** Figure 1: The graphical user interface used in Experiment 1 (Free Verbalization). The Polish instructions translate to: Listen to the sound. Describe it using as many adjectives using the text boxes. Click “Add more” or “Delete” to add and remove boxes. Remember, there are no wrong answers The stimulus set for experiment 1 consisted of 11 diverse sounds selected to elicit a wide range of descriptors. These consisted… view at source ↗

read the original abstract

This data report introduces PolSeT (Polish Semantic Timbre), a dataset designed to facilitate research in psychoacoustics and Music Information Retrieval (MIR) in Polish and cross-cultural contexts. The dataset contains data from two sequential experiments. Experiment 1 (N=60) was a free-verbalization task aimed at creating a lexicon of Polish semantic descriptors. Using 11 stimuli, a total of 1901 descriptors (701 unique) were gathered. Experiment 2 (N=105) utilized this lexicon to conduct a semantic differential study, where participants rated 18 instrument sounds on 8 bipolar scales, with repeated trials for reliability analysis. The released dataset includes raw listener responses, comprehensive demographics (experience, gender, age), audio stimuli, and extracted acoustic features with Python extraction code. This dataset addresses a gap in open timbre research data, providing both the qualitative linguistic groundwork and the quantitative ratings necessary for psychoacoustic research and the training of multilingual semantic embedding models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PolSeT is a basic but usable dataset release that gives the first open Polish timbre semantics data from two standard experiments, with raw responses and code included.

read the letter

The paper's main contribution is the PolSeT dataset itself. Experiment 1 collected 1901 free descriptors from 60 Polish listeners on 11 sounds, and Experiment 2 had 105 listeners rate 18 instrument sounds on eight bipolar scales built from that lexicon. The release bundles the raw responses, listener demographics, the audio files, extracted features, and Python code for the features.

That combination is new for Polish and fills a documented gap in language-specific timbre resources. Releasing the full free-verbalization list alongside the ratings lets users re-derive scales or check coverage themselves, which is practical for MIR work on multilingual embeddings or cross-cultural comparisons.

The limitations are straightforward and proportional. The stimulus set stays small at 18 sounds and two classic paradigms, so the data will not support broad claims about Polish timbre in general. Without the full manuscript we cannot check stimulus selection criteria or exact reliability numbers, though the abstract states repeated trials were run. These are typical constraints for an initial dataset paper rather than fatal problems.

The work is aimed at researchers who need non-English timbre data for modeling or psychoacoustics. It is honest data collection with no circular claims or invented quantities. A serious editor should send it to peer review so the community can assess the stimulus choices and scale construction directly.

Referee Report

0 major / 2 minor

Summary. This data report introduces the PolSeT dataset from two experiments on Polish timbre semantics. Experiment 1 (N=60) collected 1901 free-verbalization descriptors (701 unique) from 11 stimuli to build a lexicon. Experiment 2 (N=105) applied 8 bipolar scales derived from that lexicon to rate 18 instrument sounds, with repeated trials. The release includes raw responses, demographics (experience, gender, age), audio stimuli, extracted acoustic features, and Python extraction code, aimed at supporting psychoacoustics, MIR, and multilingual semantic models.

Significance. If the described collection and release hold, the dataset supplies a needed open resource for Polish-language timbre semantics, enabling both qualitative lexicon work and quantitative rating analyses. Releasing the full raw free-verbalization responses alongside the scale ratings and code allows users to re-derive scales or conduct alternative analyses, which strengthens utility for cross-cultural and embedding-model research.

minor comments (2)

The abstract states that the 8 bipolar scales were 'utilized this lexicon' but does not specify the exact derivation procedure or selection criteria from the 701 unique descriptors; a brief methods subsection or table listing the final scales with their source descriptors would improve transparency.
Stimulus counts differ between experiments (11 vs. 18); a short note on whether the additional 7 sounds in Experiment 2 were chosen to match acoustic properties or for other reasons would aid reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of PolSeT and the recommendation to accept. The review accurately captures the dataset's structure, release contents, and intended utility for psychoacoustics, MIR, and cross-lingual semantic research.

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a data-report paper whose central claim is the release of raw listener responses, demographics, audio stimuli, and acoustic features from two empirical experiments. No derivations, predictions, fitted quantities, or self-citation chains appear in the described content; the eight bipolar scales are presented as an output of Experiment 1 rather than an input that is then predicted. The dataset explicitly includes the full raw free-verbalization responses, allowing any user to re-derive scales independently. The work is therefore self-contained against external benchmarks with no load-bearing step that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical data-collection report; no mathematical derivations, fitted parameters, or postulated entities are involved.

pith-pipeline@v0.9.1-grok · 5691 in / 1088 out tokens · 32740 ms · 2026-06-26T16:02:33.152306+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 1 canonical work pages

[2]

Granular data containing raw listener answers is limited and even dedicated timbre evaluation datasets typically contain only averaged results (Jiang et al., 2019)

among others, very few have made their data available, often publishing only averages and resulting conclusions. Granular data containing raw listener answers is limited and even dedicated timbre evaluation datasets typically contain only averaged results (Jiang et al., 2019). This limitation hinders the development of cross-cultural models of timbre perc...

2019
[3]

Priority was given to scales that align with previous research in other languages (Alluri & Toiviainen, 2010; Faure et al., 1996; Von Bismarck, 1974; Zacharakis et al.,

2010
[4]

(Right) Histogram of intra-rater reliability, displaying the distribution of Pearson correlation coefficients calculated between original and repeated trials for each participant

(Left) Correlation matrix of the 8 semantic scales across all ratings. (Right) Histogram of intra-rater reliability, displaying the distribution of Pearson correlation coefficients calculated between original and repeated trials for each participant. ACOUSTIC FEATURES A set of acoustic features was extracted from the audio files used in Experiment 2, incl...

2015
[5]

stimuli across 8 bipolar semantic scales from 105 participants polset_exp2_participants.csv CSV Participant information: age, gender, musical experience and listening habits polset_exp2_scales_glossary.csv CSV Polish-English translations for all scale anchors polset_exp2_raw_responses.zip ZIP Raw export of the full experimental data. /audio_features polse...

arXiv 2025
[6]

Correspondence can be addressed to: jjasinsk@agh.edu.pl REFERENCES Alluri, V., & Toiviainen, P. (2010). Exploring perceptual and acoustical correlates of polyphonic timbre. Music Perception, 27(3), 223–242. Faure, A., McAdams, S., & Nosulenko, V. (1996). Verbal correlates of perceptual dimensions of timbre. 4th International Conference on Music Perception...

work page doi:10.17651/polon.45.1 2010

[1] [2]

Granular data containing raw listener answers is limited and even dedicated timbre evaluation datasets typically contain only averaged results (Jiang et al., 2019)

among others, very few have made their data available, often publishing only averages and resulting conclusions. Granular data containing raw listener answers is limited and even dedicated timbre evaluation datasets typically contain only averaged results (Jiang et al., 2019). This limitation hinders the development of cross-cultural models of timbre perc...

2019

[2] [3]

Priority was given to scales that align with previous research in other languages (Alluri & Toiviainen, 2010; Faure et al., 1996; Von Bismarck, 1974; Zacharakis et al.,

2010

[3] [4]

(Right) Histogram of intra-rater reliability, displaying the distribution of Pearson correlation coefficients calculated between original and repeated trials for each participant

(Left) Correlation matrix of the 8 semantic scales across all ratings. (Right) Histogram of intra-rater reliability, displaying the distribution of Pearson correlation coefficients calculated between original and repeated trials for each participant. ACOUSTIC FEATURES A set of acoustic features was extracted from the audio files used in Experiment 2, incl...

2015

[4] [5]

stimuli across 8 bipolar semantic scales from 105 participants polset_exp2_participants.csv CSV Participant information: age, gender, musical experience and listening habits polset_exp2_scales_glossary.csv CSV Polish-English translations for all scale anchors polset_exp2_raw_responses.zip ZIP Raw export of the full experimental data. /audio_features polse...

arXiv 2025

[5] [6]

Correspondence can be addressed to: jjasinsk@agh.edu.pl REFERENCES Alluri, V., & Toiviainen, P. (2010). Exploring perceptual and acoustical correlates of polyphonic timbre. Music Perception, 27(3), 223–242. Faure, A., McAdams, S., & Nosulenko, V. (1996). Verbal correlates of perceptual dimensions of timbre. 4th International Conference on Music Perception...

work page doi:10.17651/polon.45.1 2010