Dawn Song

Identifiers

name variant Dawn Song 0.60 · backfill

Papers (89)

VIMPO: Value-Implicit Policy Optimization for LLMs cs.LG · 2026 · author #4
AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility cs.AI · 2026 · author #29
Representational Similarity and Model Behavior in Multi-Agent Interaction cs.CL · 2026 · author #8
CyberGym-E2E: Scalable Real-World Benchmark for AI Agents' End-to-End Cybersecurity Capabilities cs.CR · 2026 · author #16
Can Generalist Agents Automate Data Curation? cs.AI · 2026 · author #7
BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution cs.SE · 2026 · author #13
SCDBench: A Benchmark for LLM-Based Smart Contract Decompilers cs.SE · 2026 · author #2
Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening cs.CR · 2026 · author #7
MemFail: Stress-Testing Failure Modes of LLM Memory Systems cs.AI · 2026 · author #3
Securing LLM Agents Need Intent-to-Execution Integrity cs.CR · 2026 · author #6
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack cs.AI · 2026 · author #6
ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks? cs.CR · 2026 · author #16
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI cs.LG · 2026 · author #27
DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents cs.AI · 2026 · author #16
The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break cs.AI · 2026 · author #9
Intent-aligned Formal Specification Synthesis via Traceable Refinement cs.LG · 2026 · author #8
SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization cs.CR · 2026 · author #5
Peer-Preservation in Frontier Models cs.CL · 2026 · author #5
The World Won't Stay Still: Programmable Evolution for Agent Benchmarks cs.AI · 2026 · author #14
Self-Sovereign Agent cs.CR · 2026 · author #4
Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents cs.CR · 2026 · author #6
MalTool: Malicious Tool Attacks on LLM Agents cs.CR · 2026 · author #4
Autonomous Continual Learning for Environment Adaptation of Computer-Use Agents cs.CL · 2026 · author #6
When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents cs.CL · 2026 · author #7
Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities cs.AI · 2026 · author #10
Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning cs.CY · 2026 · author #6
InfoSynth: Information-Guided Benchmark Synthesis for LLMs cs.CL · 2026 · author #4
Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice cs.LG · 2025 · author #5
Measuring Agents in Production cs.CY · 2025 · author #21
Can Aha Moments Be Fake? Towards Quantifying Decorative and True Thinking in Chain-of-Thought cs.LG · 2025 · author #4
Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption cs.CR · 2025 · author #3
CTIConnect: A Benchmark for Retrieval-Augmented LLMs over Heterogeneous Cyber Threat Intelligence cs.CR · 2025 · author #4
RepIt: Steering Language Models with Concept-Specific Refusal Vectors cs.AI · 2025 · author #5
Learning to Reason without External Rewards cs.LG · 2025 · author #5
In-Context Watermarks for Large Language Models cs.CL · 2025 · author #4
Progent: Securing AI Agents with Privilege Control cs.CR · 2025 · author #7
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective cs.CY · 2025 · author #65
Humanity's Last Exam cs.LG · 2025 · author #906
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning cs.LG · 2024 · author #11
Representation Engineering: A Top-Down Approach to AI Transparency cs.LG · 2023 · author #18
The False Promise of Imitating Proprietary LLMs cs.CL · 2023 · author #8
Measuring Coding Challenge Competence With APPS cs.SE · 2021 · author #10
Measuring Mathematical Problem Solving With the MATH Dataset cs.LG · 2021 · author #7
Measuring Massive Multitask Language Understanding cs.CY · 2020 · author #6
Aligning AI With Shared Human Values cs.CY · 2020 · author #6
How You Act Tells a Lot: Privacy-Leakage Attack on Deep Reinforcement Learning cs.LG · 2019 · author #6
Sanctorum: A lightweight security monitor for secure enclaves cs.CR · 2018 · author #7
Data Poisoning Attack against Unsupervised Node Embedding Methods cs.LG · 2018 · author #7
Assessing Generalization in Deep Reinforcement Learning cs.LG · 2018 · author #6
Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation cs.CR · 2018 · author #6
Characterizing Audio Adversarial Examples Using Temporal Dependency cs.LG · 2018 · author #4
Physical Adversarial Examples for Object Detectors cs.CR · 2018 · author #9
Efficient Deep Learning on Multi-Source Private Data cs.LG · 2018 · author #3
GamePad: A Learning Environment for Theorem Proving cs.LG · 2018 · author #3
Curriculum Adversarial Training cs.LG · 2018 · author #4
A Machine Learning Approach To Prevent Malicious Calls Over Telephony Networks cs.CR · 2018 · author #9
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks cs.LG · 2018 · author #5
Tree-to-tree Neural Networks for Program Translation cs.AI · 2018 · author #3
Adversarial Texts with Gradient Methods cs.CL · 2018 · author #4
Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality cs.LG · 2018 · author #7
Spatially Transformed Adversarial Examples cs.CR · 2018 · author #6
Generating Adversarial Examples with Adversarial Networks cs.CR · 2018 · author #6
Exploring the Space of Black-box Attacks on Deep Neural Networks cs.LG · 2017 · author #4
Note on Attacking Object Detectors with Adversarial Stickers cs.CR · 2017 · author #5
A Berkeley View of Systems Challenges for AI cs.AI · 2017 · author #2
The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions cs.LG · 2017 · author #2
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning cs.CR · 2017 · author #5
SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning cs.CL · 2017 · author #3
Fooling Vision and Language Models Despite Localization and Attention Mechanism cs.AI · 2017 · author #6
Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection cs.CR · 2017 · author #6
Robust Physical-World Attacks on Deep Learning Models cs.CR · 2017 · author #9
Towards Practical Differential Privacy for SQL Queries cs.CR · 2017 · author #3
Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong cs.LG · 2017 · author #5
Towards Synthesizing Complex Programs from Input-Output Examples cs.LG · 2017 · author #3
Delving into adversarial attacks on deep policies stat.ML · 2017 · author #2
Making Neural Programming Architectures Generalize via Recursion cs.LG · 2017 · author #3
PIANO: Proximity-based User Authentication on Voice-Powered Internet-of-Things Devices cs.CR · 2017 · author #6
Adversarial examples for generative models stat.ML · 2017 · author #3
Delving into Transferable Adversarial Examples and Black-box Attacks cs.LG · 2016 · author #4
Latent Attention For If-Then Program Synthesis cs.CL · 2016 · author #4
Subliminal Probing for Private Information via EEG-Based BCI Devices cs.CR · 2013 · author #8
Mining Permission Request Patterns from Android and Facebook Applications (extended author version) cs.CR · 2012 · author #4
Evolution of Social-Attribute Networks: Measurements, Modeling, and Implications using Google+ cs.SI · 2012 · author #7
Preserving Link Privacy in Social Network Based Systems cs.CR · 2012 · author #3
Touchalytics: On the Applicability of Touchscreen Input as a Behavioral Biometric for Continuous Authentication cs.CR · 2012 · author #5
Jointly Predicting Links and Inferring Attributes using a Social-Attribute Network (SAN) cs.SI · 2011 · author #8
How Open Should Open Source Be? cs.CR · 2011 · author #4
Towards Practical Oblivious RAM cs.CR · 2011 · author #3
A Learning-Based Approach to Reactive Security cs.CR · 2009 · author #5

Mentions

2606.20008 #4 · arxiv_oai · confidence 0.70 Dawn Song
2606.13608 #29 · arxiv_oai · confidence 0.70 Dawn Song
2602.13379 #6 · arxiv_oai · confidence 0.70 Dawn Song
2602.08235 #7 · arxiv_oai · confidence 0.70 Dawn Song
2606.07818 #8 · arxiv_oai · confidence 0.70 Dawn Song
2512.04123 #21 · arxiv_oai · confidence 0.70 Dawn Song
2510.11974 #4 · arxiv_oai · confidence 0.70 Dawn Song
2606.04460 #16 · arxiv_oai · confidence 0.70 Dawn Song
2606.04261 #7 · arxiv_oai · confidence 0.70 Dawn Song
2606.01286 #13 · arxiv_oai · confidence 0.70 Dawn Song
1312.6052 #8 · backfill · confidence 0.70 Dawn Song
2605.29059 #2 · arxiv_oai · confidence 0.70 Dawn Song
2605.28999 #7 · arxiv_oai · confidence 0.70 Dawn Song
2605.08678 #27 · arxiv_oai · confidence 0.70 Dawn Song
2510.24941 #4 · arxiv_oai · confidence 0.70 Dawn Song
2605.26667 #3 · arxiv_oai · confidence 0.70 Dawn Song
2601.00575 #4 · arxiv_oai · confidence 0.70 Dawn Song
1210.2429 #4 · backfill · confidence 0.70 Dawn Song
1209.0835 #7 · backfill · confidence 0.70 Dawn Song
1208.6189 #3 · backfill · confidence 0.70 Dawn Song
1207.6231 #5 · backfill · confidence 0.70 Dawn Song
1112.3265 #8 · backfill · confidence 0.70 Dawn Song
2008.02275 #6 · arxiv_oai · confidence 0.70 Dawn Song
1109.0507 #4 · backfill · confidence 0.70 Dawn Song
2406.09187 #11 · arxiv_oai · confidence 0.70 Dawn Song
1106.3652 #3 · backfill · confidence 0.70 Dawn Song
2603.05910 #14 · arxiv_oai · confidence 0.70 Dawn Song
2605.16976 #6 · arxiv_oai · confidence 0.70 Dawn Song
2502.14296 #65 · arxiv_oai · confidence 0.70 Dawn Song
2305.15717 #8 · arxiv_oai · confidence 0.70 Dawn Song
0912.1155 #5 · backfill · confidence 0.70 Dawn Song
2505.19590 #5 · arxiv_oai · confidence 0.70 Dawn Song

Frequent Coauthors

Bo Li 17 shared papers
Chang Liu 11 shared papers
Xuandong Zhao 9 shared papers
Chaowei Xiao 7 shared papers
Tianneng Shi 7 shared papers
Xinyun Chen 7 shared papers
Dan Hendrycks 6 shared papers
Chenguang Wang 5 shared papers
Neil Zhenqiang Gong 5 shared papers
Steven Basart 5 shared papers
Warren He 5 shared papers
Wenbo Guo 5 shared papers
Andy Zou 4 shared papers
Collin Burns 4 shared papers
Jacob Steinhardt 4 shared papers
Jernej Kos 4 shared papers
Jingxuan He 4 shared papers
Mantas Mazeika 4 shared papers
Prateek Mittal 4 shared papers
Xiaojun Xu 4 shared papers