Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework

Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, Hongxia Yang · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

EMMA: End-to-End Multimodal Model for Autonomous Driving

cs.CV · 2024-10-30 · unverdicted · novelty 6.0

EMMA is an end-to-end multimodal LLM that converts camera data into trajectories, objects, and road graphs via text prompts and reports state-of-the-art motion planning on nuScenes plus competitive detection results on Waymo.

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

cs.CV · 2022-12-06 · unverdicted · novelty 5.0

InternVideo combines masked video modeling and video-language contrastive learning into a single foundation model that reaches state-of-the-art results on 39 video datasets including 91.1% top-1 on Kinetics-400.

citing papers explorer

Showing 2 of 2 citing papers.

EMMA: End-to-End Multimodal Model for Autonomous Driving cs.CV · 2024-10-30 · unverdicted · none · ref 82
EMMA is an end-to-end multimodal LLM that converts camera data into trajectories, objects, and road graphs via text prompts and reports state-of-the-art motion planning on nuScenes plus competitive detection results on Waymo.
InternVideo: General Video Foundation Models via Generative and Discriminative Learning cs.CV · 2022-12-06 · unverdicted · none · ref 18
InternVideo combines masked video modeling and video-language contrastive learning into a single foundation model that reaches state-of-the-art results on 39 video datasets including 91.1% top-1 on Kinetics-400.

Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework

fields

years

verdicts

representative citing papers

citing papers explorer