pith. sign in

hub Canonical reference

How Contextual are Contextualized Word Representations? C omparing the Geometry of BERT , ELM o, and GPT -2 Embeddings

Canonical reference. 78% of citing Pith papers cite this work as background.

27 Pith papers citing it
Background 78% of classified citations

hub tools

citation-role summary

background 9

citation-polarity summary

roles

background 8

polarities

background 7 support 2

representative citing papers

SimCSE: Simple Contrastive Learning of Sentence Embeddings

cs.CL · 2021-04-18 · conditional · novelty 8.0

SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.

Closing the Calibration Gap in Semantic Caching

cs.IR · 2026-06-18 · unverdicted · novelty 7.0

Introduces P-CHR AUC and CRR metrics to demonstrate that semantic caching model selection is limited by calibration quality rather than ranking performance.

Continuous Language Diffusion as a Decoder-Interface Problem

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.

RSRank: Learning Relevance from Representational Shifts

cs.IR · 2026-06-16 · unverdicted · novelty 6.0

RSRank learns calibrated relevance scores from alignment between representational shifts induced by candidate documents and those from oracle document sets, enabling zero-threshold filtering.

Inside the LLM Word Factory

cs.CL · 2026-06-07 · unverdicted · novelty 6.0

Activation patching localizes English detokenization in Llama2-7B to a two-stage attention-then-MLP process at layer 1 that generalizes to 12 models from 8 families, with depth varying by positional encoding, plus an early-layer probe achieving 0.94-0.97 AUROC.

How Many Different Outputs Can a Transformer Generate?

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.

Inductive Entity Representations from Text via Link Prediction

cs.CL · 2020-10-07 · unverdicted · novelty 6.0

Entity representations learned from text via link prediction generalize to unseen entities and transfer to classification and retrieval with reported gains of 22% MRR, 16% accuracy, and 8.8% NDCG@10.

Analyzing the Effect of Noise in LLM Fine-tuning

cs.LG · 2026-04-14 · unverdicted · novelty 5.0

Label noise hurts fine-tuning performance most while grammatical and typographical noise sometimes act as mild regularizers, with changes concentrated in task-specific layers.

citing papers explorer

Showing 27 of 27 citing papers.