pith. sign in

Dawn Song

Identifiers

  • name variant Dawn Song 0.60 · backfill

Papers (89)

  1. VIMPO: Value-Implicit Policy Optimization for LLMs cs.LG · 2026 · author #4
  2. AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility cs.AI · 2026 · author #29
  3. Representational Similarity and Model Behavior in Multi-Agent Interaction cs.CL · 2026 · author #8
  4. CyberGym-E2E: Scalable Real-World Benchmark for AI Agents' End-to-End Cybersecurity Capabilities cs.CR · 2026 · author #16
  5. Can Generalist Agents Automate Data Curation? cs.AI · 2026 · author #7
  6. BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution cs.SE · 2026 · author #13
  7. SCDBench: A Benchmark for LLM-Based Smart Contract Decompilers cs.SE · 2026 · author #2
  8. Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening cs.CR · 2026 · author #7
  9. MemFail: Stress-Testing Failure Modes of LLM Memory Systems cs.AI · 2026 · author #3
  10. Securing LLM Agents Need Intent-to-Execution Integrity cs.CR · 2026 · author #6
  11. Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack cs.AI · 2026 · author #6
  12. ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks? cs.CR · 2026 · author #16
  13. MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI cs.LG · 2026 · author #27
  14. DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents cs.AI · 2026 · author #16
  15. The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break cs.AI · 2026 · author #9
  16. Intent-aligned Formal Specification Synthesis via Traceable Refinement cs.LG · 2026 · author #8
  17. SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization cs.CR · 2026 · author #5
  18. Peer-Preservation in Frontier Models cs.CL · 2026 · author #5
  19. The World Won't Stay Still: Programmable Evolution for Agent Benchmarks cs.AI · 2026 · author #14
  20. Self-Sovereign Agent cs.CR · 2026 · author #4
  21. Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents cs.CR · 2026 · author #6
  22. MalTool: Malicious Tool Attacks on LLM Agents cs.CR · 2026 · author #4
  23. Autonomous Continual Learning for Environment Adaptation of Computer-Use Agents cs.CL · 2026 · author #6
  24. When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents cs.CL · 2026 · author #7
  25. Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities cs.AI · 2026 · author #10
  26. Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning cs.CY · 2026 · author #6
  27. InfoSynth: Information-Guided Benchmark Synthesis for LLMs cs.CL · 2026 · author #4
  28. Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice cs.LG · 2025 · author #5
  29. Measuring Agents in Production cs.CY · 2025 · author #21
  30. Can Aha Moments Be Fake? Towards Quantifying Decorative and True Thinking in Chain-of-Thought cs.LG · 2025 · author #4
  31. Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption cs.CR · 2025 · author #3
  32. CTIConnect: A Benchmark for Retrieval-Augmented LLMs over Heterogeneous Cyber Threat Intelligence cs.CR · 2025 · author #4
  33. RepIt: Steering Language Models with Concept-Specific Refusal Vectors cs.AI · 2025 · author #5
  34. Learning to Reason without External Rewards cs.LG · 2025 · author #5
  35. In-Context Watermarks for Large Language Models cs.CL · 2025 · author #4
  36. Progent: Securing AI Agents with Privilege Control cs.CR · 2025 · author #7
  37. On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective cs.CY · 2025 · author #65
  38. Humanity's Last Exam cs.LG · 2025 · author #906
  39. GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning cs.LG · 2024 · author #11
  40. Representation Engineering: A Top-Down Approach to AI Transparency cs.LG · 2023 · author #18
  41. The False Promise of Imitating Proprietary LLMs cs.CL · 2023 · author #8
  42. Measuring Coding Challenge Competence With APPS cs.SE · 2021 · author #10
  43. Measuring Mathematical Problem Solving With the MATH Dataset cs.LG · 2021 · author #7
  44. Measuring Massive Multitask Language Understanding cs.CY · 2020 · author #6
  45. Aligning AI With Shared Human Values cs.CY · 2020 · author #6
  46. How You Act Tells a Lot: Privacy-Leakage Attack on Deep Reinforcement Learning cs.LG · 2019 · author #6
  47. Sanctorum: A lightweight security monitor for secure enclaves cs.CR · 2018 · author #7
  48. Data Poisoning Attack against Unsupervised Node Embedding Methods cs.LG · 2018 · author #7
  49. Assessing Generalization in Deep Reinforcement Learning cs.LG · 2018 · author #6
  50. Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation cs.CR · 2018 · author #6
  51. Characterizing Audio Adversarial Examples Using Temporal Dependency cs.LG · 2018 · author #4
  52. Physical Adversarial Examples for Object Detectors cs.CR · 2018 · author #9
  53. Efficient Deep Learning on Multi-Source Private Data cs.LG · 2018 · author #3
  54. GamePad: A Learning Environment for Theorem Proving cs.LG · 2018 · author #3
  55. Curriculum Adversarial Training cs.LG · 2018 · author #4
  56. A Machine Learning Approach To Prevent Malicious Calls Over Telephony Networks cs.CR · 2018 · author #9
  57. The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks cs.LG · 2018 · author #5
  58. Tree-to-tree Neural Networks for Program Translation cs.AI · 2018 · author #3
  59. Adversarial Texts with Gradient Methods cs.CL · 2018 · author #4
  60. Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality cs.LG · 2018 · author #7
  61. Spatially Transformed Adversarial Examples cs.CR · 2018 · author #6
  62. Generating Adversarial Examples with Adversarial Networks cs.CR · 2018 · author #6
  63. Exploring the Space of Black-box Attacks on Deep Neural Networks cs.LG · 2017 · author #4
  64. Note on Attacking Object Detectors with Adversarial Stickers cs.CR · 2017 · author #5
  65. A Berkeley View of Systems Challenges for AI cs.AI · 2017 · author #2
  66. The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions cs.LG · 2017 · author #2
  67. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning cs.CR · 2017 · author #5
  68. SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning cs.CL · 2017 · author #3
  69. Fooling Vision and Language Models Despite Localization and Attention Mechanism cs.AI · 2017 · author #6
  70. Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection cs.CR · 2017 · author #6
  71. Robust Physical-World Attacks on Deep Learning Models cs.CR · 2017 · author #9
  72. Towards Practical Differential Privacy for SQL Queries cs.CR · 2017 · author #3
  73. Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong cs.LG · 2017 · author #5
  74. Towards Synthesizing Complex Programs from Input-Output Examples cs.LG · 2017 · author #3
  75. Delving into adversarial attacks on deep policies stat.ML · 2017 · author #2
  76. Making Neural Programming Architectures Generalize via Recursion cs.LG · 2017 · author #3
  77. PIANO: Proximity-based User Authentication on Voice-Powered Internet-of-Things Devices cs.CR · 2017 · author #6
  78. Adversarial examples for generative models stat.ML · 2017 · author #3
  79. Delving into Transferable Adversarial Examples and Black-box Attacks cs.LG · 2016 · author #4
  80. Latent Attention For If-Then Program Synthesis cs.CL · 2016 · author #4
  81. Subliminal Probing for Private Information via EEG-Based BCI Devices cs.CR · 2013 · author #8
  82. Mining Permission Request Patterns from Android and Facebook Applications (extended author version) cs.CR · 2012 · author #4
  83. Evolution of Social-Attribute Networks: Measurements, Modeling, and Implications using Google+ cs.SI · 2012 · author #7
  84. Preserving Link Privacy in Social Network Based Systems cs.CR · 2012 · author #3
  85. Touchalytics: On the Applicability of Touchscreen Input as a Behavioral Biometric for Continuous Authentication cs.CR · 2012 · author #5
  86. Jointly Predicting Links and Inferring Attributes using a Social-Attribute Network (SAN) cs.SI · 2011 · author #8
  87. How Open Should Open Source Be? cs.CR · 2011 · author #4
  88. Towards Practical Oblivious RAM cs.CR · 2011 · author #3
  89. A Learning-Based Approach to Reactive Security cs.CR · 2009 · author #5

Mentions

  • 2606.20008 #4 · arxiv_oai · confidence 0.70 Dawn Song
  • 2606.13608 #29 · arxiv_oai · confidence 0.70 Dawn Song
  • 2602.13379 #6 · arxiv_oai · confidence 0.70 Dawn Song
  • 2602.08235 #7 · arxiv_oai · confidence 0.70 Dawn Song
  • 2606.07818 #8 · arxiv_oai · confidence 0.70 Dawn Song
  • 2512.04123 #21 · arxiv_oai · confidence 0.70 Dawn Song
  • 2510.11974 #4 · arxiv_oai · confidence 0.70 Dawn Song
  • 2606.04460 #16 · arxiv_oai · confidence 0.70 Dawn Song
  • 2606.04261 #7 · arxiv_oai · confidence 0.70 Dawn Song
  • 2606.01286 #13 · arxiv_oai · confidence 0.70 Dawn Song
  • 1312.6052 #8 · backfill · confidence 0.70 Dawn Song
  • 2605.29059 #2 · arxiv_oai · confidence 0.70 Dawn Song
  • 2605.28999 #7 · arxiv_oai · confidence 0.70 Dawn Song
  • 2605.08678 #27 · arxiv_oai · confidence 0.70 Dawn Song
  • 2510.24941 #4 · arxiv_oai · confidence 0.70 Dawn Song
  • 2605.26667 #3 · arxiv_oai · confidence 0.70 Dawn Song
  • 2601.00575 #4 · arxiv_oai · confidence 0.70 Dawn Song
  • 1210.2429 #4 · backfill · confidence 0.70 Dawn Song
  • 1209.0835 #7 · backfill · confidence 0.70 Dawn Song
  • 1208.6189 #3 · backfill · confidence 0.70 Dawn Song
  • 1207.6231 #5 · backfill · confidence 0.70 Dawn Song
  • 1112.3265 #8 · backfill · confidence 0.70 Dawn Song
  • 2008.02275 #6 · arxiv_oai · confidence 0.70 Dawn Song
  • 1109.0507 #4 · backfill · confidence 0.70 Dawn Song
  • 2406.09187 #11 · arxiv_oai · confidence 0.70 Dawn Song
  • 1106.3652 #3 · backfill · confidence 0.70 Dawn Song
  • 2603.05910 #14 · arxiv_oai · confidence 0.70 Dawn Song
  • 2605.16976 #6 · arxiv_oai · confidence 0.70 Dawn Song
  • 2502.14296 #65 · arxiv_oai · confidence 0.70 Dawn Song
  • 2305.15717 #8 · arxiv_oai · confidence 0.70 Dawn Song
  • 0912.1155 #5 · backfill · confidence 0.70 Dawn Song
  • 2505.19590 #5 · arxiv_oai · confidence 0.70 Dawn Song

Frequent Coauthors