pith. sign in

Jian Luan

Identifiers

  • name variant Jian Luan 0.60 · backfill

Papers (27)

  1. Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text cs.SD · 2026 · author #9
  2. Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation cs.AI · 2026 · author #7
  3. PixelWizard: Towards Efficient High-Fidelity Video Generation at Ultra-Large Spatial Resolution cs.CV · 2026 · author #6
  4. SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking cs.AI · 2026 · author #5
  5. PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media cs.CV · 2026 · author #9
  6. Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment cs.LG · 2026 · author #8
  7. How Mobile World Model Guides GUI Agents? cs.AI · 2026 · author #11
  8. StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video cs.CV · 2026 · author #9
  9. Reducing Linguistic Hallucination in LM-Based Speech Enhancement via Noise-Invariant Acoustic-Semantic Distillation eess.AS · 2026 · author #8
  10. Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding eess.AS · 2026 · author #8
  11. TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis cs.CL · 2026 · author #10
  12. ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling cs.MM · 2026 · author #13
  13. Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA cs.CL · 2026 · author #10
  14. Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models cs.CV · 2026 · author #10
  15. Borderless Long Speech Synthesis cs.SD · 2026 · author #15
  16. From Ideal to Real: Stable Video Object Removal under Imperfect Conditions cs.CV · 2026 · author #7
  17. Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension cs.CV · 2026 · author #8
  18. Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation cs.CV · 2026 · author #9
  19. Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models cs.CL · 2026 · author #9
  20. REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding cs.CV · 2025 · author #10
  21. Revisiting Entropy in Reinforcement Learning for Large Reasoning Models cs.CL · 2025 · author #8
  22. Enhancing Trustworthy GUI Grounding via Self-Critiqued Reinforcement Learning cs.CV · 2025 · author #11
  23. MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks eess.AS · 2025 · author #10
  24. Mobile GUI Agents under Real-world Threats: Are We There Yet? cs.CR · 2025 · author #7
  25. End-to-End Optimization of LLM-Driven Multi-Agent Search Systems via Heterogeneous-Group-Based Reinforcement Learning cs.LG · 2025 · author #5
  26. Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding cs.CV · 2025 · author #16
  27. Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security cs.HC · 2024 · author #14

Mentions

  • 2605.27838 #9 · arxiv_oai · confidence 0.70 Jian Luan
  • 2510.27266 #11 · arxiv_oai · confidence 0.70 Jian Luan
  • 2605.27134 #7 · arxiv_oai · confidence 0.70 Jian Luan
  • 2605.25801 #6 · arxiv_oai · confidence 0.70 Jian Luan
  • 2605.25160 #5 · arxiv_oai · confidence 0.70 Jian Luan
  • 2605.10347 #11 · arxiv_oai · confidence 0.70 Jian Luan
  • 2605.16381 #9 · arxiv_oai · confidence 0.70 Jian Luan
  • 2605.14311 #8 · arxiv_oai · confidence 0.70 Jian Luan
  • 2503.13377 #16 · arxiv_oai · confidence 0.70 Jian Luan
  • 2401.05459 #14 · arxiv_oai · confidence 0.70 Jian Luan

Frequent Coauthors