LTL ∩ PCTL is decidable because an LTL formula defines a PCTL-expressible tree language iff its word language is DBW-recognizable, via a new HWTcf automata characterization of PCTL.
super hub Mixed citations
Chandra and Dexter C
Mixed citation behavior. Most common role is unclear (48%).
hub tools
citation-role summary
citation-polarity summary
authors
co-cited works
representative citing papers
Develops polynomial-time PMC for FDFA and introduces FUFA for succinct ω-regular specs with improved LTL translation.
Star-height of Parikh images is bounded by 2 for one-register automata but the rational conjecture fails for multiple registers, showing Parikh's theorem does not hold over infinite alphabets.
k-REWB matching cannot be solved in O(n to the 2k minus epsilon) time under SETH, is W[2]-hard parameterized by expression length, and 2-use 2-REWBs require superlinear time unless triangle detection does; 1-use REWBs admit an O(n log squared n) algorithm.
Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
P2R decouples perception from reasoning in VLMs via a two-stage process and PRA-GRPO alternating RL training, reporting gains such as 93.2% on V-Star for the 4B model over its Qwen3-VL backbone.
MECoBench is a benchmark showing that multimodal agent collaboration improves embodied task performance when communication balances coordination costs, with gains also under noisy conditions.
The paper proposes an operator-level visual-token skipping framework for MLLMs that reduces TFLOPs by 33.7% on Qwen3-VL while retaining 99.5% performance across VQA benchmarks.
Cortex uses an Ontological Corpus Graph to structure web-scale corpora, creating a refined 24.14B-token corpus and a new benchmark validated on eight LLMs.
SMDA fits ridge regression on SAE features to distill symbolic policies then decomposes each SFT example's influence via feature-activation and output-probability deltas, demonstrated on refusal behavior in Llama-3.2-3B-Instruct.
OpenFinGym is a multi-task verifiable gym environment for quant-finance agents with automated task construction from publications, containerised runtime, paper trading engine, and support for SFT/RL training.
DISC is a new iterative verify-judge-correct procedure for LLMs that improves accuracy on reasoning benchmarks by modeling verification as denoising signals and using a gate to control correction precision.
Chehre introduces a new emoji-prompted video dataset with multi-annotator labels to benchmark models on dominant and distributional facial expression recognition tasks.
NEST is a new benchmark dataset for narrative event structures in long videos, with baselines reporting ETD below 8%, EL under 6%, EAE below 11%, and ERE at 35-44% F1.
Introduces thermodynamic free-energy signatures and spectral form factors from attention Laplacians for hallucination detection, with stability proofs, expressiveness results, a PAC bound, and empirical AUROC gains over baselines.
Mechanistic tracing shows text suppresses but does not erase audio representations in late layers of Audio LLMs; back-patching reduces text dominance.
LegalWorld is a life-cycle interactive environment modeling Chinese civil litigation as five causally connected stages grounded in 75,309 judgments, paired with LongJud-Bench for cross-stage agent evaluation.
Introduces applicability condition extraction for therapeutic drug-disease relations, creates first annotated dataset of 1,119 pairs, and proposes enhanced LoRA method outperforming baselines.
MolGram integrates a conditional n-gram memory module into molecular language models to address locality gaps in SMILES tokenization, improving performance on generation, forward prediction, and retrosynthesis while outperforming 3x larger baselines.
Translating LIBERO to ten languages shows VLA failures under multilingual instructions are driven by language-sensitive steps; a step-wise inference intervention improves performance.
HDSL is a tree-structured DSL for 3D indoor scenes that lets LLM agents generate subtrees recursively and perform localized edits via hierarchical retrieval and deterministic merge.
NüshuVoice releases the first sentence-level Nüshu TTS dataset and shows that an F0-conditioned VITS model using five-level pitch notation outperforms baselines on spectral fidelity, pitch accuracy, and intelligibility.
LLMs match condition-level patterns in a noodle purchase survey but fail to replicate distributional structure, with no model beating a pooled human baseline for purchase quantities.
EGPS localizes MCMC moves to high-entropy decision points using forward-pass entropy, yielding up to 12.6× wall-clock speedup and best-or-tied accuracy on MATH500, HumanEval, and GPQA for Qwen2.5-Math-7B.
citing papers explorer
-
Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation
MPD reduces hallucinations in LVLMs by 23.4% while retaining 97.4% of general capability through semantic disentanglement and selective parameter updates.
-
Think before Go: Hierarchical Reasoning for Image-goal Navigation
HRNav decomposes image-goal navigation into VLM-based short-horizon planning and RL-based execution with a wandering suppression penalty to improve performance in complex unseen settings.