WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.
super hub Canonical reference
Emogen: Emotional image content generation with text-to-image diffusion models
Canonical reference. 91% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
co-cited works
representative citing papers
ScaLe-INR is a multi-branch INR architecture that applies directional scaling per the Fourier inverse theorem and a directional edge guidance loss to disentangle scales and improve reconstruction fidelity.
MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.
GeoFidelity-Bench shows text-to-image models gain city-level plausibility from local names but achieve near-zero improvement in exact segment identity, with GPS coordinates adding no benefit.
Arbor attaches constraint mesh tokens to a frozen text-to-3D denoiser to enable controllable generation obeying hull, avoidance, and touch constraints.
Target dynamics provide an intrinsic source of variation equivalent to controlled illumination changes, enabling scattering-compensated reconstruction of dynamic scenes with one acquisition per frame in holographic and fluorescence imaging.
The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.
FLM-Occ reformulates indoor occupancy prediction as feed-forward likelihood maximization over a mixture model with volume-normalized weights, achieving superior accuracy on Occ-ScanNet using only 32 superquadrics.
HERO maps DNA methylation and miRNA to a 16-dimensional intent vector for TF-IDF caption retrieval and cosine-gated repair in VLM-based multi-task breast cancer prediction, claiming SOTA on TCGA-BRCA.
StylisticBias benchmark shows 15 visual attributes explain nearly 80% of bias variation in six MLLMs by isolating single cues like age and fashion in generated images.
CloudLULC-Net is an end-to-end heterogeneous SAR-optical fusion network for LULC mapping under cloud contamination that achieves 86.60% OA, 83.29% F1, and 73.51% mIoU on a new global benchmark of 40,223 samples.
A two-stage generative model (Graph CVAE + flow matching) learns topology-agnostic motion codes from a new 5k-topology dataset and retargets video motion to arbitrary unseen skeletons.
FisherAdapTune uses temporal drift in Fisher geometry, measured by scale-invariant Jensen-Shannon distance, to progressively freeze stabilized parameter groups during fine-tuning, reporting gains on segmentation and zero-shot transfer.
An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.
Attributed Feature Graphs (AFGs) represent CAD features as attributed nodes and relations as directed edges to enable GNN surrogate models that predict design performance with feature-level interpretability on the CarHoods10K dataset.
Empirical study of five LVR variants finds cosine alignment negatively correlates with accuracy (r=-0.94), supervised latents are bypassed under corruption (max 4-point shift), and answers are decodable downstream but not at the latent.
OTP-FM extends conditional flow matching by incorporating dynamic optimal transport potentials to enable efficient multimarginal transport learning with intermediate observed marginals.
TIDES simulates realistic event camera streams in continuous time via dynamic Gaussian splatting with adaptive occlusion handling and sensor artifact modeling, claiming SOTA fidelity and better downstream transfer than prior methods.
MERIT enables decentralized instruction tuning via conflict-aware PCA splitting and parameter-space merging, raising average benchmark scores above joint training on multimodal and text mixtures.
SuperMemory-VQA provides 4,853 human-verified QA pairs from 52.9 hours of egocentric AI glasses recordings to benchmark AI systems on realistic long-horizon memory tasks including an unanswerable option.
Parameterized Diffusion Policy learns a behavior manifold to condition diffusion policies on low-dimensional continuous parameters, enabling interpolation between strategies and adaptation to novel constraints without policy weight updates.
DirectorBench is a profile-aware diagnostic benchmark that localizes bottlenecks in long-form video generation workflows using structured checkpoints and multi-agent evaluation.
RS2AD-LiDAR reconstructs vehicle LiDAR data from roadside observations via coordinate transformation, virtual LiDAR modeling and resampling, claimed as the first such method, with experiments showing improved object detection when mixed with real data.
AgroVG is a new multi-source benchmark for agricultural visual grounding formulated as generalized set prediction, with protocols for box and mask grounding across single-target, multi-target, and target-absent queries from six object families.
citing papers explorer
-
ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue
ESARBench is the first unified benchmark for MLLM-driven UAV agents that must explore, locate clues, and decide on victim positions in photorealistic simulated SAR environments.