pith. sign in

Mixed citations

C ommonsense QA : A Question Answering Challenge Targeting Commonsense Knowledge

Mixed citation behavior. Most common role is background (60%).

56 Pith papers citing it
Background 60% of classified citations

citation-role summary

background 4 dataset 1

citation-polarity summary

representative citing papers

Inducing Artificial Uncertainty in Language Models

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

Inducing artificial uncertainty on trivial tasks allows training probes that achieve higher calibration on hard data than standard approaches while retaining performance on easy data.

Improving LLM Unlearning Robustness via Random Perturbations

cs.CL · 2025-01-31 · unverdicted · novelty 7.0

LLM unlearning is reframed as inadvertently installing backdoor triggers on forget-tokens; Random Noise Augmentation is introduced as a defense that improves robustness with theoretical guarantees.

GAIA: a benchmark for General AI Assistants

cs.CL · 2023-11-21 · unverdicted · novelty 7.0

GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

cs.LG · 2026-06-10 · unverdicted · novelty 6.0

Manifold Power Iteration aligns MoE router rows with principal singular directions of experts via a power-then-retract process, with theory showing convergence and experiments on 1B-11B models showing gains.

Scaling Agentic Capabilities via Grounded Interaction Synthesis

cs.CL · 2026-06-01 · unverdicted · novelty 6.0

GAIS synthesizes diverse, high-fidelity agentic tasks from real-world MCP servers and adversarial planning, outperforming LLM-only baselines on BFCL, τ²-Bench, and ACEBench with greater data efficiency.

Forecasting Downstream Performance of LLMs With Proxy Metrics

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.

SkillFactory: Self-Distillation For Learning Cognitive Behaviors

cs.CL · 2025-12-03 · unverdicted · novelty 6.0

SkillFactory creates silver SFT data from a model's self-generated traces rearranged into cognitive skill formats to prime models for better skill use during subsequent RL, improving post-RL generalization and out-of-domain robustness.

citing papers explorer

Showing 50 of 56 citing papers.