Gpt-4o system card

OpenAI · 2024

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

Omnimodal LLMs encode premise-perception mismatches in hidden states yet almost never reject false textual claims, exposing a representation-action gap that is modality-asymmetric and prompt-resistant.

CoLVR: Enhancing Exploratory Latent Visual Reasoning via Contrastive Optimization

cs.CV · 2026-05-09 · conditional · novelty 7.0 · 2 refs

CoLVR uses latent contrastive objectives with angle-based perturbation and RL trajectory rewards to increase exploratory visual reasoning in MLLMs, delivering 5-8% gains on VSP, Jigsaw, and MMStar benchmarks.

CCA Reimagined: An Exploratory Study of Large Language Models for Congestion Control

cs.NI · 2026-04-04 · unverdicted · novelty 7.0

LLM-based congestion control reduces latency by up to 50% with less than 0.3% throughput sacrifice versus traditional CCAs in static and dynamic network emulations.

RewardBench 2: Advancing Reward Model Evaluation

cs.CL · 2025-06-02 · unverdicted · novelty 6.0

RewardBench 2 is a new benchmark that supplies challenging fresh human prompts for reward model evaluation, yielding lower average scores but higher correlation with downstream best-of-N sampling and RLHF training performance.

OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

OmniSelect is a training-free, modality-adaptive token pruning framework that dynamically selects Audio-Centric, Video-Centric, or Uniform compression regimes using AudioCLIP cross-modal relevance scores and then applies adaptive fine-grained pruning within temporal groups.

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

cs.CV · 2025-01-22 · unverdicted · novelty 4.0

VideoLLaMA3 uses a vision-centric training paradigm and token-reduction design to reach competitive results on image and video benchmarks.

citing papers explorer

Showing 6 of 6 citing papers.

Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs cs.AI · 2026-05-13 · unverdicted · none · ref 48
Omnimodal LLMs encode premise-perception mismatches in hidden states yet almost never reject false textual claims, exposing a representation-action gap that is modality-asymmetric and prompt-resistant.
CoLVR: Enhancing Exploratory Latent Visual Reasoning via Contrastive Optimization cs.CV · 2026-05-09 · conditional · none · ref 17 · 2 links
CoLVR uses latent contrastive objectives with angle-based perturbation and RL trajectory rewards to increase exploratory visual reasoning in MLLMs, delivering 5-8% gains on VSP, Jigsaw, and MMStar benchmarks.
CCA Reimagined: An Exploratory Study of Large Language Models for Congestion Control cs.NI · 2026-04-04 · unverdicted · none · ref 22
LLM-based congestion control reduces latency by up to 50% with less than 0.3% throughput sacrifice versus traditional CCAs in static and dynamic network emulations.
RewardBench 2: Advancing Reward Model Evaluation cs.CL · 2025-06-02 · unverdicted · none · ref 72
RewardBench 2 is a new benchmark that supplies challenging fresh human prompts for reward model evaluation, yielding lower average scores but higher correlation with downstream best-of-N sampling and RLHF training performance.
OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models cs.CV · 2026-05-18 · unverdicted · none · ref 15
OmniSelect is a training-free, modality-adaptive token pruning framework that dynamically selects Audio-Centric, Video-Centric, or Uniform compression regimes using AudioCLIP cross-modal relevance scores and then applies adaptive fine-grained pruning within temporal groups.
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding cs.CV · 2025-01-22 · unverdicted · none · ref 1
VideoLLaMA3 uses a vision-centric training paradigm and token-reduction design to reach competitive results on image and video benchmarks.

Gpt-4o system card

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer