pith. sign in

hub Mixed citations

MediaPipe: A Framework for Building Perception Pipelines

Mixed citation behavior. Most common role is background (60%).

36 Pith papers citing it
Background 60% of classified citations
abstract

Building applications that perceive the world around them is challenging. A developer needs to (a) select and develop corresponding machine learning algorithms and models, (b) build a series of prototypes and demos, (c) balance resource consumption against the quality of the solutions, and finally (d) identify and mitigate problematic cases. The MediaPipe framework addresses all of these challenges. A developer can use MediaPipe to build prototypes by combining existing perception components, to advance them to polished cross-platform applications and measure system performance and resource consumption on target platforms. We show that these features enable a developer to focus on the algorithm or model development and use MediaPipe as an environment for iteratively improving their application with results reproducible across different devices and platforms. MediaPipe will be open-sourced at https://github.com/google/mediapipe.

hub tools

citation-role summary

background 3 method 2

citation-polarity summary

representative citing papers

D-Rex : Diffusion Rendering for Relightable Expressive Avatars

cs.GR · 2026-04-30 · conditional · novelty 7.0

D-Rex applies a LoRA-fine-tuned video diffusion model as an image-space post-process to add consistent relighting to any expressive full-body avatar pipeline while preserving motion and facial detail.

Face Anything: 4D Face Reconstruction from Any Image Sequence

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

A single transformer model jointly predicts depth and normalized canonical coordinates to deliver state-of-the-art 4D facial geometry and tracking with 3x lower correspondence error and 16% better depth accuracy.

AvatarPointillist: AutoRegressive 4D Gaussian Avatarization

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

AvatarPointillist autoregressively generates adaptive 3D point clouds via Transformer for photorealistic 4D Gaussian avatars from one image, jointly predicting animation bindings and using a conditioned Gaussian decoder.

The DeepSpeak Dataset

cs.CV · 2024-08-09 · unverdicted · novelty 7.0

DeepSpeak provides over 100 hours of consented, identity-matched real and modern deepfake audiovisual content focused on talking heads, with evaluations showing existing detectors fail to generalize without retraining.

CHOIR: Contact-aware 4D Hand-Object Interaction Reconstruction

cs.CV · 2026-05-20 · unverdicted · novelty 6.0 · 2 refs

CHOIR reconstructs articulated hand motion, object shape with 6D pose over time, and contact locations from monocular videos via contact-aware spatial rectification and joint optimization.

PaintCopilot: Modeling Painting as Autonomous Artistic Continuation

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

PaintCopilot models painting as an open-ended autoregressive process that predicts coherent brushstrokes from partial canvas observations using a ViT target predictor, flow-matching stroke generator, and VAE region sampler.

citing papers explorer

Showing 36 of 36 citing papers.