pith. machine review for the scientific record. sign in

citation dossier

Longrope: Extending llm context window beyond 2 million tokens

Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, and Mao Yang · 2024 · arXiv 2402.13753

17Pith papers citing it
17reference links
cs.CLtop field · 8 papers
UNVERDICTEDtop verdict bucket · 14 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 17 reviewed papers. Its strongest current cluster is cs.CL (8 papers). The largest review-status bucket among citing papers is UNVERDICTED (14 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

years

2026 14 2024 3

representative citing papers

Generating Complex Code Analyzers from Natural Language Questions

cs.SE · 2026-05-10 · unverdicted · novelty 7.0

Merlin generates CodeQL queries from natural language questions via RAG-based iteration and a self-test technique using assistive queries, achieving 3.8x higher task accuracy and 31% less completion time in user studies while finding additional software issues.

Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings

q-bio.QM · 2026-04-09 · unverdicted · novelty 7.0

Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.

FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

FocuSFT uses an inner optimization loop to adapt fast-weight parameters into a parametric memory that sharpens attention on relevant content, then conditions outer-loop supervised fine-tuning on this representation, yielding gains on long-context benchmarks.

Sensitivity-Positional Co-Localization in GQA Transformers

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

In Llama 3.1 8B, task-sensitive layers cluster late while RoPE adaptation is strongest early, yet applying both adaptations only to sensitivity-identified layers outperforms other layer choices by 4-16 points on MMLU, GPQA, HumanEval+, MATH, MGSM and ARC.

Long Context Transfer from Language to Vision

cs.CV · 2024-06-24 · unverdicted · novelty 6.0

Extending language model context length enables LMMs to process over 200K visual tokens from long videos without video training, achieving SOTA on Video-MME via dense frame sampling.

citing papers explorer

Showing 17 of 17 citing papers.