Readers' highlighting patterns on a social web platform remain stable over 24 months as a durable trait, with personal profiles from early documents predicting future selections at roughly 3x the average precision of non-personal baselines.
Personal Salience: Highlighting Is Social, but Individuality Lives in Selection
4 Pith papers cite this work. Polarity classification is still indexing.
abstract
Social highlighters let people mark passages that matter to them. We ask how much of an individual is recoverable from these naturalistic traces, using a co-readership identity control (the same document highlighted by many users) that holds document and topic fixed and asks whether a person's own history predicts their marks better than another reader's does. We separate generic salience (structure), crowd salience (what others marked), and personal salience (the individual residual). First, highlighting is social: which sentences you mark is predicted far better by the crowd than by structure or by a personal model, and even a well-estimated crowd, an information-privileged baseline that sees others' marks on the same document, beats a frontier LLM twin built from your other-document history; the within-document personal signal is at most a whisper (own-vs-other gap +0.017 by an embedding scorer, small but significant). Second, in sharp contrast, individuality lives in selection: asked which of the already-salient passages are yours, your own history is a strong, leakage-free predictor (gap +0.14). A topic decomposition shows this is largely stable thematic preference: it shrinks ~6-8x against a topically-matched peer, and a thin residual cannot be separated from finer topic. The non-obvious part is an asymmetry: under the same scorer the individual signal is ~6-8x weaker in salience than in selection. Methodologically, naive history-conditioning evaluations leak (the target's own marks enter the profile in ~42% of pairs, inflating personal scores by up to +0.15 AP) and small crowds overstate personalization; our results are leakage-free, use a dense crowd, and a model-matched control. Highlights carry a genuine individual signature, but a thin layer over a strong shared one, surfacing far more in which salient things a person selects than in what is salient.
fields
cs.IR 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
Within-document highlighting shows strong reader sub-groups beyond null expectations from salience and popularity, but cross-document reproducibility of pair agreement is near zero and unresolved due to insufficient overlap.
Personalization in social highlighting is modest and topic-driven at document selection (~+0.13) but yields no reliable gain at the sentence salience layer over impersonal baselines.
A supervised logistic ranker on embeddings and features beats the lead baseline by 0.044 average precision in retrospective cold-start prediction of crowd highlights.
citing papers explorer
-
Trait, Not State: The Durability of Reading Identity in Social Highlighting
Readers' highlighting patterns on a social web platform remain stable over 24 months as a durable trait, with personal profiles from early documents predicting future selections at roughly 3x the average precision of non-personal baselines.
-
Factions Within, Uncertain Across: Within-Document Reader Sub-Groups in Social Highlighting
Within-document highlighting shows strong reader sub-groups beyond null expectations from salience and popularity, but cross-document reproducibility of pair agreement is near zero and unresolved due to insufficient overlap.
-
Selection, Not Salience: The Shape and Limits of Personalization in Social Highlighting
Personalization in social highlighting is modest and topic-driven at document selection (~+0.13) but yields no reliable gain at the sentence salience layer over impersonal baselines.
-
The Long Tail, Not the Front Page: Cold-Start Prediction of Crowd Highlight Salience
A supervised logistic ranker on embeddings and features beats the lead baseline by 0.044 average precision in retrospective cold-start prediction of crowd highlights.