A temporary CLM phase followed by MLM decay during encoder continued pretraining outperforms standard MLM on biomedical tasks by 0.3-2.8pp across languages and model sizes.
Under review
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6representative citing papers
A probabilistic model with domain-aligned inductive bias detects acts of mechanistic reasoning in student conversations and shows improved generalization to unseen students and novel contexts.
MADE creates a contamination-resistant living benchmark for multi-label classification of medical device adverse events, with evaluations revealing model-specific trade-offs in accuracy and uncertainty quantification.
Cross-encoder reranker performance scales predictably via power laws with model size and training exposure, allowing accurate forecasts for 400M and 1B models and data-heavy compute allocation.
Modestly sized language models acquire sensitivity to the meanings of rare Paired-Focus constructions later than their syntactic forms, with semantic learning correlating to gains in selected world-knowledge domains.
MATCH augments sparsified attention with an efficient in-context retrieval system to boost performance on long-range recall tasks in transformers.
citing papers explorer
-
A Causal Language Modeling Detour Improves Encoder Continued Pretraining
A temporary CLM phase followed by MLM decay during encoder continued pretraining outperforms standard MLM on biomedical tasks by 0.3-2.8pp across languages and model sizes.
-
MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events
MADE creates a contamination-resistant living benchmark for multi-label classification of medical device adverse events, with evaluations revealing model-specific trade-offs in accuracy and uncertainty quantification.
-
Language Models Learn Constructional Semantics, Not To Mention Syntax: Investigating LM Understanding of Paired-Focus Constructions
Modestly sized language models acquire sensitivity to the meanings of rare Paired-Focus constructions later than their syntactic forms, with semantic learning correlating to gains in selected world-knowledge domains.
-
MATCH: Modulating Attention via In-Context Retrieval for Long-Context Transformers
MATCH augments sparsified attention with an efficient in-context retrieval system to boost performance on long-range recall tasks in transformers.