Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
Title resolution pending
11 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Adding temporal memory via LIF, precision-weighted gating, and anticipatory prediction to MoE routers recovers effective expert selection at distribution transitions, with ablation confirming a super-additive beta-ant interaction.
Language models trained on parallel streams of computation can overcome single-stream bottlenecks in autonomous agents by enabling simultaneous reading, thinking, and acting.
TextSeal provides a localized, distortion-free LLM watermark that enables provenance tracking and distillation detection while preserving performance and text quality.
BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.
KV cache reuse improves long-range draft acceptance in speculative decoding but delivers only marginal end-to-end speedups due to drafter limitations.
FusionCIM is a fusion-driven CIM accelerator for LLM inference that maps QKT to IP-CIM and PV to OP-CIM, uses QO-stationary dataflow, and applies pattern-aware online softmax, delivering up to 3.86x energy savings and 1.98x speedup on LLaMA-3 at 29.4 TOPS/W.
GLM-5V-Turbo integrates multimodal perception as a core part of reasoning and execution for agentic tasks, reporting strong results in visual tool use and multimodal coding while keeping text-only performance competitive.
GLM-5 is a foundation model that claims state-of-the-art results on coding benchmarks and superior performance on end-to-end software engineering tasks via new asynchronous RL methods and cost-saving DSA.
MiMo-V2-Flash is a 309B/15B MoE model trained on 27T tokens with hybrid attention and multi-teacher on-policy distillation that matches larger models like DeepSeek-V3.2 while enabling 2.6x faster decoding via repurposed MTP layers.
GLM-4.5, a 355B-parameter MoE model with hybrid reasoning, scores 70.1% on TAU-Bench, 91.0% on AIME 24, and 64.2% on SWE-bench Verified while ranking 3rd overall and 2nd on agentic benchmarks.
citing papers explorer
-
Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models
Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
-
Affinity Is Not Enough: Recovering the Free Energy Principle in Mixture-of-Experts
Adding temporal memory via LIF, precision-weighted gating, and anticipatory prediction to MoE routers recovers effective expert selection at distribution transitions, with ablation confirming a super-additive beta-ant interaction.
-
Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs
Language models trained on parallel streams of computation can overcome single-stream bottlenecks in autonomous agents by enabling simultaneous reading, thinking, and acting.
-
TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection
TextSeal provides a localized, distortion-free LLM watermark that enables provenance tracking and distillation detection while preserving performance and text quality.
-
BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion
BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.
-
When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?
KV cache reuse improves long-range draft acceptance in speculative decoding but delivers only marginal end-to-end speedups due to drafter limitations.
-
FusionCIM: Accelerating LLM Inference with Fusion-Driven Computing-in-Memory Architecture
FusionCIM is a fusion-driven CIM accelerator for LLM inference that maps QKT to IP-CIM and PV to OP-CIM, uses QO-stationary dataflow, and applies pattern-aware online softmax, delivering up to 3.86x energy savings and 1.98x speedup on LLaMA-3 at 29.4 TOPS/W.
-
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
GLM-5V-Turbo integrates multimodal perception as a core part of reasoning and execution for agentic tasks, reporting strong results in visual tool use and multimodal coding while keeping text-only performance competitive.
-
GLM-5: from Vibe Coding to Agentic Engineering
GLM-5 is a foundation model that claims state-of-the-art results on coding benchmarks and superior performance on end-to-end software engineering tasks via new asynchronous RL methods and cost-saving DSA.
-
MiMo-V2-Flash Technical Report
MiMo-V2-Flash is a 309B/15B MoE model trained on 27T tokens with hybrid attention and multi-teacher on-policy distillation that matches larger models like DeepSeek-V3.2 while enabling 2.6x faster decoding via repurposed MTP layers.
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
GLM-4.5, a 355B-parameter MoE model with hybrid reasoning, scores 70.1% on TAU-Bench, 91.0% on AIME 24, and 64.2% on SWE-bench Verified while ranking 3rd overall and 2nd on agentic benchmarks.