WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.
hub
Predicate Path expressions
25 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 25representative citing papers
BrepForge factorizes B-rep synthesis into face-aware autoregressive wireframe composition followed by boundary-conditioned surface instantiation using learning-free geometric priors.
R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
The authors introduce an explanation-annotated dataset of manipulative betting advertisements collected from Instagram and Reddit to support explainable detection models.
DisImpact introduces a two-stage MLLM framework to classify disaster-related social media posts into ten impact categories and compute a unified physi-social impact index validated against FEMA and NASA ground-truth data.
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
DeG models 3D Gaussians via learned octree density and uses VecSeq Sobol re-indexing to turn set generation into sequence modeling, claiming SOTA quality in single-image-to-3D.
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
A training-free technique manipulates low-frequency noise in diffusion models to control image color and structure using low-frequency priors.
TTCD uses a non-stationary feature learner and reconstruction-guided distillation inside a transformer to infer contemporaneous and lagged causal graphs from non-stationary time series without strong noise assumptions.
A relaxed Picard iteration plus heteroscedastic boundary denoising lets Monte Carlo PDE solvers solve heat equations with nonlinear radiation boundary conditions more accurately than linearization.
Lightweight networks combine bracketed smartphone exposures as convex combinations of raw pixels to produce artifact-free HDR images that generalize from synthetic training to real captures.
ScaffoldAgent improves long-form report generation by modeling outline evolution as expansion, contraction, and revision guided by a utility function estimating downstream value.
PULSE stabilizes mmWave human pose estimation by screening Doppler motion prompts before injecting them into spatial magnitude reasoning.
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.
Hard maximum similarity pooling in late-interaction models induces higher patch-level gradient concentration and greater length sensitivity than top-k or softmax alternatives.
TRUST searches for minimal input changes that achieve a user-defined confidence target in PTM models, claiming perfect robustness and low cost on benchmarks versus standard boundary-crossing methods.
CS students and recent grads prioritize pay and workplace culture over ethics in job searches and justify conflicting decisions with shared explanations such as money or lack of alternatives.
Y-BotFrame is an extensible software framework that integrates multimodal perception and an LLM to map natural-language instructions to executable tasks on quadruped robots.
AI is shifting researchers from creators to curators of generated content, risking loss of intellectual ownership and genuine understanding of science.
The paper demonstrates the Unified Transform Method on the heat equation with Dirichlet conditions, obtaining a contour integral solution that matches data to machine precision in a Maple implementation.
Round-table discussions with researchers and practitioners indicate verification and validation skills will become central for software engineers in the agentic AI era.
citing papers explorer
-
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
-
Align Documents to Questions: Question-Oriented Document Rewriting for Retrieval-Augmented Generation
QREAM rewrites documents to question-focused style using iterative ICL and distilled FT models, boosting RAG performance by up to 8% relative improvement.