CoLVR uses latent contrastive objectives with angle-based perturbation and RL trajectory rewards to increase exploratory visual reasoning in MLLMs, delivering 5-8% gains on VSP, Jigsaw, and MMStar benchmarks.
hub
Gpt-4 technical report
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
SELF-EMO lets LLMs bootstrap better emotion recognition and expression via self-play, data flywheel filtering with smoothed IoU rewards, and SELF-GRPO reinforcement learning, yielding SOTA gains on IEMOCAP, MELD, and EmoryNLP.
Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.
Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.
PRPO is a paragraph-level policy optimization technique that grounds vision-language model reasoning in image content to raise deepfake detection accuracy and reasoning quality.
An adaptive compute-optimal strategy for scaling LLM test-time compute achieves over 4x efficiency gains versus best-of-N and lets smaller models outperform 14x larger ones on some problems.
LLM agents exhibit temporal blindness, achieving no better than 65% normalized alignment with human preferences on tool-use decisions across time-sensitive scenarios in the new TicToc dataset.
Proposes token-significance and dynamic length rewards in RL to reduce LLM response length while preserving or improving reasoning correctness across benchmarks.
TableMaster improves LM table understanding by verbalizing tables with enriched semantics and using adaptive textual-symbolic reasoning, reaching 78.13% accuracy on WikiTQ with GPT-4o-mini.
citing papers explorer
-
CoLVR: Enhancing Exploratory Latent Visual Reasoning via Contrastive Optimization
CoLVR uses latent contrastive objectives with angle-based perturbation and RL trajectory rewards to increase exploratory visual reasoning in MLLMs, delivering 5-8% gains on VSP, Jigsaw, and MMStar benchmarks.
-
SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression
SELF-EMO lets LLMs bootstrap better emotion recognition and expression via self-play, data flywheel filtering with smoothed IoU rewards, and SELF-GRPO reinforcement learning, yielding SOTA gains on IEMOCAP, MELD, and EmoryNLP.
-
The Indra Representation Hypothesis for Multimodal Alignment
Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.
-
Automated Design of Agentic Systems
Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.
-
PRPO: Paragraph-level Policy Optimization for Vision-Language Deepfake Detection
PRPO is a paragraph-level policy optimization technique that grounds vision-language model reasoning in image content to raise deepfake detection accuracy and reasoning quality.
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
An adaptive compute-optimal strategy for scaling LLM test-time compute achieves over 4x efficiency gains versus best-of-N and lets smaller models outperform 14x larger ones on some problems.
-
Your LLM Agents are Temporally Blind: The Misalignment Between Tool Use Decisions and Human Time Perception
LLM agents exhibit temporal blindness, achieving no better than 65% normalized alignment with human preferences on tool-use decisions across time-sensitive scenarios in the new TicToc dataset.
-
Not All Tokens Matter: Towards Efficient LLM Reasoning via Token Significance in Reinforcement Learning
Proposes token-significance and dynamic length rewards in RL to reduce LLM response length while preserving or improving reasoning correctness across benchmarks.
-
TableMaster: A Recipe to Advance Table Understanding with Language Models
TableMaster improves LM table understanding by verbalizing tables with enriched semantics and using adaptive textual-symbolic reasoning, reaching 78.13% accuracy on WikiTQ with GPT-4o-mini.
- Language Modeling with Hyperspherical Flows