arxiv: 2605.08048 · v1 · submitted 2026-05-08 · 💻 cs.CL

Recognition: no theorem link

Accurate and Efficient Statistical Testing for Word Semantic Breadth

Yo Ehara

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:29 UTC · model grok-4.3

classification 💻 cs.CL

keywords word embeddingssemantic breadthpermutation testHouseholder reflectiondispersioncontextual diversityhypothesis testing

0 comments

The pith

Aligning mean directions via Householder reflection lets permutation tests isolate true differences in word semantic breadth.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

When comparing the contextual spread of two words through their token embeddings, differences in average direction can falsely appear as differences in dispersion and produce misleading significance results. The paper introduces a Householder-aligned permutation test that first applies one reflection to match the mean directions of the two clouds and then permutes labels on the aligned data to obtain calibrated p-values for dispersion. This correction lowers Type-I error while keeping the test sensitive to real breadth variation, and a batched GPU version makes the procedure fast enough for practical vocabulary-scale use.

Core claim

Applying a single Householder reflection to align the mean directions of two word-type token clouds, followed by a permutation test on the aligned vectors, produces non-parametric p-values that correctly reflect dispersion differences rather than directional mismatches.

What carries the argument

Householder-aligned permutation test: one Householder reflection aligns the two mean vectors so that subsequent label permutations test only for dispersion equality.

If this is right

Breadth comparisons between words become less likely to report significance when only their average contexts differ in direction.
The test remains sensitive to genuine increases in contextual diversity after alignment.
A GPU-batched implementation reduces runtime by a factor of 23 compared with a CPU baseline, enabling larger-scale applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same single-reflection alignment could be applied to other vector-cloud comparisons where direction must be factored out of a magnitude test.
The method offers a route to more reliable automatic sense distinction in dictionaries by grounding decisions on calibrated breadth statistics.
Further work could examine whether the alignment step extends without modification to multi-class or time-varying embedding clouds.

Load-bearing premise

Aligning the means with a Householder reflection preserves the dispersion geometry and does not alter the null distribution of the permutation statistic.

What would settle it

Generate two synthetic clouds with identical dispersion but offset means, run the aligned permutation test repeatedly under the null, and check whether the resulting p-values are uniformly distributed.

Figures

Figures reproduced from arXiv: 2605.08048 by Yo Ehara.

**Figure 2.** Figure 2: Comparison of (a) Type-I Error rate and (b) Precision across dispersion ranking gaps from 1 to 10. The [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 2.** Figure 2 [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Baseline, i.e., t-SNE visualization before ap [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Proposed, i.e., t-SNE visualization after ap [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

read the original abstract

Measuring the breadth of a word's meaning, or its spread across contexts, has become feasible with contextualized token embeddings. A word type can be represented as a cloud of token vectors, with dispersion-based statistics serving as proxies for contextual diversity (Nagata and Tanaka-Ishii, ACL2025). These measurements are useful for deciding appropriate sense distinctions when constructing thesauri and domain-specific dictionaries. However, when comparing the breadth of two word types, naive hypothesis testing on dispersion can be misleading: differences in semantic direction can masquerade as dispersion differences, inflating Type-I error and yielding "statistically significant" outcomes even when there is no true breadth difference. This is problematic because significance testing should distinguish genuine effects from incidental fluctuations in small-difference regimes. We propose a Householder-aligned permutation test to isolate dispersion differences from directional differences. Our method applies a single Householder reflection to align the mean directions of the two word types and then performs a permutation test on the aligned token clouds, yielding calibrated, non-parametric p-values. For practicality, we introduce a GPU-oriented implementation that batches permutations and linear algebra operations. Empirically, our alignment reduced Type-I error by 32.5% while preserving sensitivity to genuine breadth differences, and achieved a 23x speedup over the CPU baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a Householder-aligned permutation test to compare semantic breadth in embeddings without direction confounding the result, but the fixed alignment step likely breaks exact calibration of the p-values.

read the letter

The main thing to know is that this paper proposes aligning two clouds of token embeddings with a single Householder reflection so that a subsequent permutation test can focus on dispersion differences rather than mean direction. It reports a 32.5% drop in Type-I error and a 23x GPU speedup, which sounds useful for people building dictionaries from contextual embeddings. The approach is new in this specific combination and directly targets a practical problem where naive dispersion tests can flag differences that are really just about where the clouds point. The GPU batching for permutations and linear algebra is a solid engineering addition that makes the method feasible on real data sizes. Experiments apparently show it keeps power while improving error control, which is the kind of evidence that matters for applied work. The soft spot is the permutation procedure itself. The abstract describes computing the reflection once from the observed means and then permuting the already-aligned clouds. Under the null of equal dispersion after alignment, exchangeability requires that any data-dependent step like the reflection be recomputed inside the loop for each permutation; otherwise the test statistic is not symmetrically distributed and the p-values are not guaranteed to be calibrated. The paper claims calibrated non-parametric p-values, but the description does not indicate recomputation, so the reported Type-I error reduction may be partly artifactual. This is worth checking against the code or null simulations. The work is aimed at NLP researchers who need reliable statistical comparisons of word usage breadth for lexical resources. It is not a broad theoretical result but a targeted methodological fix. I would send it for peer review so referees can verify the statistical validity and see the full implementation details.

Referee Report

2 major / 2 minor

Summary. The paper proposes a Householder-aligned permutation test for comparing semantic breadth (dispersion) between two word types in contextualized embedding spaces. A single Householder reflection is computed from the observed mean vectors to align directional differences, after which a permutation test is run on the fixed aligned token clouds to yield non-parametric p-values for dispersion differences. A GPU-batched implementation is introduced for efficiency. Empirical claims include a 32.5% reduction in Type-I error relative to naive tests while retaining power, plus a 23x speedup over CPU baselines.

Significance. If the alignment procedure preserves exact calibration, the method would address a practical problem in NLP by enabling reliable statistical comparisons of contextual diversity for tasks such as sense inventory construction and domain dictionary building. The work supplies a novel combination of Householder reflections with permutation testing plus a practical GPU implementation that could scale to large embedding corpora.

major comments (2)

[Proposed method (Householder alignment and permutation procedure)] The method description states that a single Householder reflection is computed once from the two observed mean vectors and then applied to produce fixed aligned clouds on which the permutation test is performed. Under the null of equal dispersion (after directional alignment), this fixed transformation violates exchangeability: each permuted pair of clouds has new means, so the original reflection no longer aligns them, and the test statistic is evaluated under a transformation that does not match the permuted data. This directly undermines the claim of 'calibrated, non-parametric p-values'.
[Experiments and empirical validation] The reported 32.5% Type-I error reduction and preservation of sensitivity are presented without simulation details confirming that the alignment step was either (a) recomputed inside the permutation loop or (b) shown to leave the null distribution of the dispersion statistic unchanged. Without such verification, the empirical calibration claim cannot be assessed.

minor comments (2)

The citation 'Nagata and Tanaka-Ishii, ACL2025' appears in the abstract but should be expanded to a full reference entry with title and venue details for completeness.
Consider adding a short pseudocode listing that explicitly shows whether the Householder reflection is recomputed per permutation or held fixed; this would clarify the exact algorithm for readers implementing the test.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on the manuscript. We address each major comment below and describe the revisions we will implement.

read point-by-point responses

Referee: The method description states that a single Householder reflection is computed once from the two observed mean vectors and then applied to produce fixed aligned clouds on which the permutation test is performed. Under the null of equal dispersion (after directional alignment), this fixed transformation violates exchangeability: each permuted pair of clouds has new means, so the original reflection no longer aligns them, and the test statistic is evaluated under a transformation that does not match the permuted data. This directly undermines the claim of 'calibrated, non-parametric p-values'.

Authors: We appreciate the referee highlighting this critical aspect of permutation test validity. The concern about exchangeability is well-founded: a fixed Householder reflection derived solely from the observed means does not guarantee that the null distribution remains correctly calibrated when applied to permuted samples whose means differ. To resolve this, we will revise the method to recompute the Householder reflection for every permutation using the means of the current permuted clouds. This ensures the alignment procedure is identically applied to both observed and permuted data, restoring exact non-parametric calibration. We will update the method description, Algorithm 1, and the GPU batching implementation accordingly while retaining the overall efficiency gains. revision: yes
Referee: The reported 32.5% Type-I error reduction and preservation of sensitivity are presented without simulation details confirming that the alignment step was either (a) recomputed inside the permutation loop or (b) shown to leave the null distribution of the dispersion statistic unchanged. Without such verification, the empirical calibration claim cannot be assessed.

Authors: We agree that the current manuscript lacks sufficient detail on the simulation protocol and whether alignment was handled inside the permutation loop. In the revised version we will add a dedicated subsection (and supplementary code) that fully specifies the null simulation design, confirms that the Householder reflection is recomputed per permutation under the updated procedure, and reports the resulting empirical Type-I error rates together with power curves. This will allow direct assessment of calibration and sensitivity. revision: yes

Circularity Check

0 steps flagged

No circularity: method is an explicit algorithmic construction on standard primitives

full rationale

The paper defines a Householder-aligned permutation test by first computing a single reflection from the two observed mean vectors to align directions, then running a standard permutation test on the transformed clouds. This is presented as a direct procedural description without any equation or result that reduces to a fitted parameter, self-referential definition, or load-bearing self-citation. No ansatz is smuggled, no uniqueness theorem is invoked from prior author work, and no renaming of known results occurs. The empirical claims (Type-I error reduction, speedup) are separate experimental outcomes, not tautological consequences of the method definition itself. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Relies on standard assumptions in embedding spaces and non-parametric statistics; no new entities introduced.

axioms (2)

domain assumption Dispersion of contextual token embeddings serves as a proxy for semantic breadth
Basis for the measurements as stated in the abstract.
domain assumption Permutation tests on aligned clouds provide calibrated p-values for dispersion differences
Core assumption of the statistical method.

pith-pipeline@v0.9.0 · 5516 in / 1296 out tokens · 54861 ms · 2026-05-11T02:29:31.174894+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

work page 1972
[2]

Publications Manual , year = "1983", publisher =

work page 1983
[3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

work page
[5]

Dan Gusfield , title =. 1997

work page 1997
[6]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

work page 2015
[7]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

work page
[12]

A Cluster-based Approach for Improving Isotropy in Contextual Embedding Space

Rajaee, Sara and Pilehvar, Mohammad Taher. A Cluster-based Approach for Improving Isotropy in Contextual Embedding Space. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021. doi:10.18653/v1/2021.acl-short.73

work page doi:10.18653/v1/2021.acl-short.73 2021
[14]

Statistical Significance Tests for Machine Translation Evaluation

Koehn, Philipp. Statistical Significance Tests for Machine Translation Evaluation. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004

work page 2004
[16]

On Some Pitfalls in Automatic Evaluation and Significance Testing for MT

Riezler, Stefan and Maxwell, John T. On Some Pitfalls in Automatic Evaluation and Significance Testing for MT. Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 2005

work page 2005
[17]

Breaking Through the 80

Bevilacqua, Michele and Navigli, Roberto , booktitle =. Breaking Through the 80. 2020 , address =. doi:10.18653/v1/2020.acl-main.255 , pages =

work page doi:10.18653/v1/2020.acl-main.255 2020
[18]

2007 , publisher =

The. 2007 , publisher =

work page 2007
[19]

Language Resources and Evaluation , year =

Maekawa, Kikuo and Yamazaki, Makoto and Ogiso, Toshinobu and Maruyama, Takehiko and Ogura, Hideki and Kashino, Wakako and Koiso, Hanae and Yamaguchi, Masaya and Tanaka, Makiro and Den, Yasuharu , title =. Language Resources and Evaluation , year =

work page
[20]

An Empirical Investigation of Statistical Significance in NLP

Berg-Kirkpatrick, Taylor and Burkett, David and Klein, Dan. An Empirical Investigation of Statistical Significance in NLP. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012

work page 2012
[21]

, title =

Miller, George A. , title =. Communications of the ACM , volume =

work page
[22]

Educational Cone Model in Embedding Vector Spaces

Yo Ehara. Educational Cone Model in Embedding Vector Spaces. Proceedings of ICCE 2025: The 33rd International Conference on Computers in Education (short paper). 2025

work page 2025
[23]

Mining Words in the Minds of Second Language Learners: Learner-Specific Word Difficulty

Ehara, Yo and Sato, Issei and Oiwa, Hidekazu and Nakagawa, Hiroshi. Mining Words in the Minds of Second Language Learners: Learner-Specific Word Difficulty. Proceedings of COLING 2012. 2012

work page 2012
[25]

A Generalized Solution of the Orthogonal Procrustes Problem , journal =

Sch. A Generalized Solution of the Orthogonal Procrustes Problem , journal =. 1966 , doi =

work page 1966
[26]

Journal of Machine Learning Research , volume =

van der Maaten, Laurens and Hinton, Geoffrey , title =. Journal of Machine Learning Research , volume =

work page
[27]

Taylor Berg-Kirkpatrick, David Burkett, and Dan Klein. 2012. https://aclanthology.org/D12-1091/ An empirical investigation of statistical significance in NLP . In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 995--1005, Jeju Island, Korea. Association for Com...

work page 2012
[28]

BNC Consortium . 2007. http://hdl.handle.net/20.500.14106/2554 The British National Corpus , XML edition . License: http://www.natcorp.ox.ac.uk/docs/licence.html

work page 2007
[29]

Francis Bond, Arkadiusz Janz, Marek Maziarz, and Ewa Rudnicka. 2019. https://doi.org/10.18653/v1/2019.gwc-1.44 Testing Z ipf ' s meaning-frequency law with wordnets as sense inventories . In Proceedings of the 10th Global Wordnet Conference, pages 342--352, Wroclaw, Poland. Global Wordnet Association

work page doi:10.18653/v1/2019.gwc-1.44 2019
[30]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long a...

work page doi:10.18653/v1/n19-1423 2019
[31]

Rotem Dror, Gili Baumer, Segev Shlomov, and Roi Reichart. 2018. https://doi.org/10.18653/v1/P18-1128 The hitchhiker ' s guide to testing statistical significance in natural language processing . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1383--1392, Melbourne, Australia. Associ...

work page doi:10.18653/v1/p18-1128 2018
[32]

Yo Ehara. 2022. https://doi.org/10.1007/978-3-031-11644-5\_37 An intelligent interactive support system for word usage learning in second languages . In Artificial Intelligence in Education - 23rd International Conference, AIED 2022, Durham, UK, July 27-31, 2022, Proceedings, Part I , Lecture Notes in Computer Science, pages 453--464. Springer

work page doi:10.1007/978-3-031-11644-5 2022
[33]

Yo Ehara. 2025. https://library.apsce.net/index.php/ICCE/article/view/5944 Educational cone model in embedding vector spaces . In Proceedings of ICCE 2025: The 33rd International Conference on Computers in Education (short paper)

work page 2025
[34]

Yo Ehara, Issei Sato, Hidekazu Oiwa, and Hiroshi Nakagawa. 2012. https://aclanthology.org/C12-1049/ Mining words in the minds of second language learners: Learner-specific word difficulty . In Proceedings of COLING 2012 , pages 799--814, Mumbai, India. The COLING 2012 Organizing Committee

work page 2012
[35]

Kawin Ethayarajh. 2019. https://doi.org/10.18653/v1/D19-1006 How contextual are contextualized word representations? C omparing the geometry of BERT , ELM o, and GPT -2 embeddings . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCN...

work page doi:10.18653/v1/d19-1006 2019
[37]

Yvette Graham, Nitika Mathur, and Timothy Baldwin. 2014. https://doi.org/10.3115/v1/W14-3333 Randomized significance tests in machine translation . In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 266--274, Baltimore, Maryland, USA. Association for Computational Linguistics

work page doi:10.3115/v1/w14-3333 2014
[38]

Philipp Koehn. 2004. https://aclanthology.org/W04-3250/ Statistical significance tests for machine translation evaluation . In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 388--395, Barcelona, Spain. Association for Computational Linguistics

work page 2004
[40]

Kikuo Maekawa, Makoto Yamazaki, Toshinobu Ogiso, Takehiko Maruyama, Hideki Ogura, Wakako Kashino, Hanae Koiso, Masaya Yamaguchi, Makiro Tanaka, and Yasuharu Den. 2014. Balanced corpus of contemporary written Japanese . Language Resources and Evaluation, 48:345--371

work page 2014
[41]

George A. Miller. 1995. WordNet : A lexical database for English . Communications of the ACM, 38(11):39--41

work page 1995
[44]

Stefan Riezler and John T. Maxwell. 2005. https://aclanthology.org/W05-0908/ On some pitfalls in automatic evaluation and significance testing for MT . In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , pages 57--64, Ann Arbor, Michigan. Association for Computational Linguistics

work page 2005
[45]

Schönemann

Peter H. Sch \"o nemann. 1966. https://doi.org/10.1007/BF02289451 A generalized solution of the orthogonal procrustes problem . Psychometrika, 31(1):1--10

work page doi:10.1007/bf02289451 1966
[47]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of Machine Learning Research, 9:2579--2605

work page 2008
[50]

Christos Xypolopoulos, Antoine Tixier, and Michalis Vazirgiannis. 2021. https://doi.org/10.18653/v1/2021.eacl-main.297 Unsupervised word polysemy quantification with multiresolution grids of contextual embeddings . In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3391--3401,...

work page doi:10.18653/v1/2021.eacl-main.297 2021
[51]

Hiroaki Yamagiwa and Hidetoshi Shimodaira. 2025. 2025.coling-main.521/ Norm of mean contextualized embeddings determines their variance . In Proceedings of the 31st International Conference on Computational Linguistics, pages 7778--7808, Abu Dhabi, UAE

work page 2025
[53]

A New Formulation of Z ipf ' s Meaning-Frequency Law through Contextual Diversity

Nagata, Ryo and Tanaka-Ishii, Kumiko. A New Formulation of Z ipf ' s Meaning-Frequency Law through Contextual Diversity. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.744

work page doi:10.18653/v1/2025.acl-long.744 2025
[54]

A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change

Periti, Francesco and Tahmasebi, Nina. A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.naacl-long.240

work page doi:10.18653/v1/2024.naacl-long.240 2024
[55]

Analysing Lexical Semantic Change with Contextualised Word Representations

Giulianelli, Mario and Del Tredici, Marco and Fern \'a ndez, Raquel. Analysing Lexical Semantic Change with Contextualised Word Representations. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.365

work page doi:10.18653/v1/2020.acl-main.365 2020
[56]

Exact Paired-Permutation Testing for Structured Test Statistics

Zmigrod, Ran and Vieira, Tim and Cotterell, Ryan. Exact Paired-Permutation Testing for Structured Test Statistics. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022. doi:10.18653/v1/2022.naacl-main.360

work page doi:10.18653/v1/2022.naacl-main.360 2022
[57]

H yper L ex: A Large-Scale Evaluation of Graded Lexical Entailment

Vuli \'c , Ivan and Gerz, Daniela and Kiela, Douwe and Hill, Felix and Korhonen, Anna. H yper L ex: A Large-Scale Evaluation of Graded Lexical Entailment. Computational Linguistics. 2017. doi:10.1162/COLI_a_00301

work page doi:10.1162/coli_a_00301 2017
[58]

Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference

Warner, Benjamin and Chaffin, Antoine and Clavi. Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.127

work page doi:10.18653/v1/2025.acl-long.127 2025
[59]

Norm of Mean Contextualized Embeddings Determines their Variance

Yamagiwa, Hiroaki and Shimodaira, Hidetoshi. Norm of Mean Contextualized Embeddings Determines their Variance. Proceedings of the 31st International Conference on Computational Linguistics. 2025

work page 2025
[60]

Statistical Uncertainty in Word Embeddings: G lo V e- V

Vallebueno, Andrea and Handan-Nader, Cassandra and Manning, Christopher D and Ho, Daniel E. Statistical Uncertainty in Word Embeddings: G lo V e- V. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.510

work page doi:10.18653/v1/2024.emnlp-main.510 2024
[61]

U i O - U v A at S em E val-2020 Task 1: Contextualised Embeddings for Lexical Semantic Change Detection

Kutuzov, Andrey and Giulianelli, Mario. U i O - U v A at S em E val-2020 Task 1: Contextualised Embeddings for Lexical Semantic Change Detection. Proceedings of the Fourteenth Workshop on Semantic Evaluation. 2020. doi:10.18653/v1/2020.semeval-1.14

work page doi:10.18653/v1/2020.semeval-1.14 2020