pith. sign in

Dahua Lin

Identifiers

  • name variant Dahua Lin 0.60 · backfill

Papers (83)

  1. ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning cs.AI · 2026 · author #8
  2. AMix-2: Establishing Protein as a Native Modality in Large Language Models q-bio.BM · 2026 · author #20
  3. SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation cs.CV · 2026 · author #7
  4. From Pixels to Words -- Towards Native One-Vision Models at Scale cs.CV · 2026 · author #20
  5. ETCHR: Editing To Clarify and Harness Reasoning cs.CV · 2026 · author #6
  6. NanoCP: Request-Level Dynamic Context Parallelism for Data-Expert Parallel Decoding cs.DC · 2026 · author #12
  7. Beyond Mode Collapse: Distribution Matching for Diverse Reasoning cs.AI · 2026 · author #10
  8. What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents cs.AI · 2026 · author #8
  9. SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture cs.CV · 2026 · author #58
  10. WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation cs.CL · 2026 · author #15
  11. ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism cs.DC · 2026 · author #7
  12. OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis cs.AI · 2026 · author #14
  13. Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs cs.AI · 2026 · author #12
  14. MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale cs.CV · 2026 · author #42
  15. Demystifying Video Reasoning cs.CV · 2026 · author #12
  16. Visual-ERM: Reward Modeling for Visual Equivalence cs.CV · 2026 · author #9
  17. Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction cs.RO · 2026 · author #6
  18. EAG-PT: Emission-Aware Gaussians and Path Tracing for Diffuse Indoor Scene Reconstruction and Editing cs.GR · 2026 · author #7
  19. MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing cs.CV · 2025 · author #59
  20. InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency cs.CV · 2025 · author #68
  21. InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling cs.CL · 2025 · author #15
  22. MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence cs.CV · 2025 · author #11
  23. InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models cs.CV · 2025 · author #48
  24. Visual-RFT: Visual Reinforcement Fine-Tuning cs.CV · 2025 · author #7
  25. Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation cs.RO · 2024 · author #5
  26. Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling cs.CV · 2024 · author #39
  27. PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction cs.CV · 2024 · author #11
  28. MinerU: An Open-Source Solution for Precise Document Content Extraction cs.CV · 2024 · author #17
  29. InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output cs.CV · 2024 · author #26
  30. How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites cs.CV · 2024 · author #32
  31. Are We on the Right Way for Evaluating Large Vision-Language Models? cs.CV · 2024 · author #10
  32. InternLM2 Technical Report cs.CL · 2024 · author #100
  33. RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition cs.CV · 2024 · author #8
  34. InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model cs.CV · 2024 · author #22
  35. ShareGPT4V: Improving Large Multi-Modal Models with Better Captions cs.CV · 2023 · author #8
  36. InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition cs.CV · 2023 · author #20
  37. MMBench: Is Your Multi-modal Model an All-around Player? cs.CV · 2023 · author #12
  38. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning cs.CV · 2023 · author #8
  39. MMDetection: Open MMLab Detection Toolbox and Benchmark cs.CV · 2019 · author #25
  40. POPQORN: Quantifying Robustness of Recurrent Neural Networks cs.LG · 2019 · author #6
  41. Learning to Cluster Faces on an Affinity Graph cs.CV · 2019 · author #6
  42. Libra R-CNN: Towards Balanced Learning for Object Detection cs.CV · 2019 · author #6
  43. Self-Supervised Learning via Conditional Motion Propagation cs.CV · 2019 · author #4
  44. WIDER Face and Pedestrian Challenge 2018: Methods and Results cs.CV · 2019 · author #2
  45. Hybrid Task Cascade for Instance Segmentation cs.CV · 2019 · author #12
  46. Region Proposal by Guided Anchoring cs.CV · 2019 · author #5
  47. Monocular 3D Pose Recovery via Nonconvex Sparsity with Theoretical Analysis cs.CV · 2018 · author #2
  48. A Neural Compositional Paradigm for Image Captioning cs.CV · 2018 · author #3
  49. Improving On-policy Learning with Statistical Reward Accumulation cs.LG · 2018 · author #3
  50. Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition cs.CV · 2018 · author #4
  51. Penalizing Top Performers: Conservative Loss for Semantic Segmentation Adaptation cs.CV · 2018 · author #5
  52. Generative Adversarial Frontal View to Bird View Synthesis cs.CV · 2018 · author #5
  53. Pose Guided Human Video Generation cs.CV · 2018 · author #6
  54. Person Search in Videos with One Portrait Through Visual and Temporal Links cs.CV · 2018 · author #3
  55. Move Forward and Tell: A Progressive Generator of Video Descriptions cs.CV · 2018 · author #3
  56. Rethinking the Form of Latent States in Image Captioning cs.CV · 2018 · author #3
  57. Probabilistic Ensemble of Collaborative Filters cs.IR · 2018 · author #2
  58. From Trailers to Storylines: An Efficient Way to Learn from Movies cs.CV · 2018 · author #5
  59. Unifying Identification and Context Learning for Person Recognition cs.CV · 2018 · author #3
  60. Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination cs.CV · 2018 · author #4
  61. Optimizing Video Object Detection via a Scale-Time Lattice cs.CV · 2018 · author #7
  62. Low-Latency Video Semantic Segmentation cs.CV · 2018 · author #3
  63. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition cs.CV · 2018 · author #3
  64. Accelerated Training for Massive Classification via Dynamic Class Selection cs.CV · 2018 · author #4
  65. Peephole: Predicting Network Performance Before Training cs.LG · 2017 · author #3
  66. Learning Sparse Visual Representations with Leaky Capped Norm Regularizers cs.LG · 2017 · author #2
  67. Be Your Own Prada: Fashion Synthesis with Structural Coherence cs.CV · 2017 · author #4
  68. Contrastive Learning for Image Captioning cs.CV · 2017 · author #2
  69. Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data stat.ML · 2017 · author #2
  70. Integrating Specialized Classifiers Based on Continuous Time Markov Chain cs.LG · 2017 · author #2
  71. Discover and Learn New Objects from Documentaries cs.CV · 2017 · author #4
  72. Temporal Segment Networks for Action Recognition in Videos cs.CV · 2017 · author #5
  73. Temporal Action Detection with Structured Segment Networks cs.CV · 2017 · author #6
  74. Detecting Visual Relationships with Deep Relational Networks cs.CV · 2017 · author #3
  75. Towards Diverse and Natural Image Descriptions via a Conditional GAN cs.CV · 2017 · author #4
  76. UntrimmedNets for Weakly Supervised Action Recognition and Detection cs.CV · 2017 · author #3
  77. A Pursuit of Temporal Accuracy in General Activity Detection cs.CV · 2017 · author #4
  78. PolyNet: A Pursuit of Structural Diversity in Very Deep Networks cs.CV · 2016 · author #4
  79. Deep Markov Random Field for Image Modeling cs.CV · 2016 · author #2
  80. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition cs.CV · 2016 · author #5
  81. CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016 cs.CV · 2016 · author #7
  82. Adjustable Bounded Rectifiers: Towards Deep Binary Representations cs.LG · 2015 · author #2
  83. Generating Multi-Sentence Lingual Descriptions of Indoor Scenes cs.CV · 2015 · author #1

Mentions

  • 1511.06201 #2 · backfill · confidence 0.70 Dahua Lin
  • 1711.02857 #2 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2606.03503 #8 · arxiv_oai · confidence 0.70 Dahua Lin
  • 1503.00064 #1 · backfill · confidence 0.70 Dahua Lin
  • 2605.30963 #20 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2605.30116 #7 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2605.28820 #20 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2603.16870 #12 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2505.23764 #11 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2605.23897 #6 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2412.15109 #5 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2605.21100 #12 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2508.08636 #15 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2605.19461 #10 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2605.19447 #8 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2403.13805 #8 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2309.15112 #20 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2509.22186 #59 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2407.03320 #26 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2401.16420 #22 · arxiv_oai · confidence 0.70 Dahua Lin
  • 2409.18839 #17 · arxiv_oai · confidence 0.70 Dahua Lin

Frequent Coauthors