Identifiers
-
name variant
Dawn Song
0.60 · backfill
Papers (89)
-
VIMPO: Value-Implicit Policy Optimization for LLMs
cs.LG · 2026 · author #4
-
AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility
cs.AI · 2026 · author #29
-
Representational Similarity and Model Behavior in Multi-Agent Interaction
cs.CL · 2026 · author #8
-
CyberGym-E2E: Scalable Real-World Benchmark for AI Agents' End-to-End Cybersecurity Capabilities
cs.CR · 2026 · author #16
-
Can Generalist Agents Automate Data Curation?
cs.AI · 2026 · author #7
-
BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution
cs.SE · 2026 · author #13
-
SCDBench: A Benchmark for LLM-Based Smart Contract Decompilers
cs.SE · 2026 · author #2
-
Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening
cs.CR · 2026 · author #7
-
MemFail: Stress-Testing Failure Modes of LLM Memory Systems
cs.AI · 2026 · author #3
-
Securing LLM Agents Need Intent-to-Execution Integrity
cs.CR · 2026 · author #6
-
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
cs.AI · 2026 · author #6
-
ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?
cs.CR · 2026 · author #16
-
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI
cs.LG · 2026 · author #27
-
DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents
cs.AI · 2026 · author #16
-
The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break
cs.AI · 2026 · author #9
-
Intent-aligned Formal Specification Synthesis via Traceable Refinement
cs.LG · 2026 · author #8
-
SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization
cs.CR · 2026 · author #5
-
Peer-Preservation in Frontier Models
cs.CL · 2026 · author #5
-
The World Won't Stay Still: Programmable Evolution for Agent Benchmarks
cs.AI · 2026 · author #14
-
Self-Sovereign Agent
cs.CR · 2026 · author #4
-
Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents
cs.CR · 2026 · author #6
-
MalTool: Malicious Tool Attacks on LLM Agents
cs.CR · 2026 · author #4
-
Autonomous Continual Learning for Environment Adaptation of Computer-Use Agents
cs.CL · 2026 · author #6
-
When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents
cs.CL · 2026 · author #7
-
Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities
cs.AI · 2026 · author #10
-
Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning
cs.CY · 2026 · author #6
-
InfoSynth: Information-Guided Benchmark Synthesis for LLMs
cs.CL · 2026 · author #4
-
Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice
cs.LG · 2025 · author #5
-
Measuring Agents in Production
cs.CY · 2025 · author #21
-
Can Aha Moments Be Fake? Towards Quantifying Decorative and True Thinking in Chain-of-Thought
cs.LG · 2025 · author #4
-
Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption
cs.CR · 2025 · author #3
-
CTIConnect: A Benchmark for Retrieval-Augmented LLMs over Heterogeneous Cyber Threat Intelligence
cs.CR · 2025 · author #4
-
RepIt: Steering Language Models with Concept-Specific Refusal Vectors
cs.AI · 2025 · author #5
-
Learning to Reason without External Rewards
cs.LG · 2025 · author #5
-
In-Context Watermarks for Large Language Models
cs.CL · 2025 · author #4
-
Progent: Securing AI Agents with Privilege Control
cs.CR · 2025 · author #7
-
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
cs.CY · 2025 · author #65
-
Humanity's Last Exam
cs.LG · 2025 · author #906
-
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
cs.LG · 2024 · author #11
-
Representation Engineering: A Top-Down Approach to AI Transparency
cs.LG · 2023 · author #18
-
The False Promise of Imitating Proprietary LLMs
cs.CL · 2023 · author #8
-
Measuring Coding Challenge Competence With APPS
cs.SE · 2021 · author #10
-
Measuring Mathematical Problem Solving With the MATH Dataset
cs.LG · 2021 · author #7
-
Measuring Massive Multitask Language Understanding
cs.CY · 2020 · author #6
-
Aligning AI With Shared Human Values
cs.CY · 2020 · author #6
-
How You Act Tells a Lot: Privacy-Leakage Attack on Deep Reinforcement Learning
cs.LG · 2019 · author #6
-
Sanctorum: A lightweight security monitor for secure enclaves
cs.CR · 2018 · author #7
-
Data Poisoning Attack against Unsupervised Node Embedding Methods
cs.LG · 2018 · author #7
-
Assessing Generalization in Deep Reinforcement Learning
cs.LG · 2018 · author #6
-
Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation
cs.CR · 2018 · author #6
-
Characterizing Audio Adversarial Examples Using Temporal Dependency
cs.LG · 2018 · author #4
-
Physical Adversarial Examples for Object Detectors
cs.CR · 2018 · author #9
-
Efficient Deep Learning on Multi-Source Private Data
cs.LG · 2018 · author #3
-
GamePad: A Learning Environment for Theorem Proving
cs.LG · 2018 · author #3
-
Curriculum Adversarial Training
cs.LG · 2018 · author #4
-
A Machine Learning Approach To Prevent Malicious Calls Over Telephony Networks
cs.CR · 2018 · author #9
-
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
cs.LG · 2018 · author #5
-
Tree-to-tree Neural Networks for Program Translation
cs.AI · 2018 · author #3
-
Adversarial Texts with Gradient Methods
cs.CL · 2018 · author #4
-
Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality
cs.LG · 2018 · author #7
-
Spatially Transformed Adversarial Examples
cs.CR · 2018 · author #6
-
Generating Adversarial Examples with Adversarial Networks
cs.CR · 2018 · author #6
-
Exploring the Space of Black-box Attacks on Deep Neural Networks
cs.LG · 2017 · author #4
-
Note on Attacking Object Detectors with Adversarial Stickers
cs.CR · 2017 · author #5
-
A Berkeley View of Systems Challenges for AI
cs.AI · 2017 · author #2
-
The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions
cs.LG · 2017 · author #2
-
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
cs.CR · 2017 · author #5
-
SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning
cs.CL · 2017 · author #3
-
Fooling Vision and Language Models Despite Localization and Attention Mechanism
cs.AI · 2017 · author #6
-
Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection
cs.CR · 2017 · author #6
-
Robust Physical-World Attacks on Deep Learning Models
cs.CR · 2017 · author #9
-
Towards Practical Differential Privacy for SQL Queries
cs.CR · 2017 · author #3
-
Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong
cs.LG · 2017 · author #5
-
Towards Synthesizing Complex Programs from Input-Output Examples
cs.LG · 2017 · author #3
-
Delving into adversarial attacks on deep policies
stat.ML · 2017 · author #2
-
Making Neural Programming Architectures Generalize via Recursion
cs.LG · 2017 · author #3
-
PIANO: Proximity-based User Authentication on Voice-Powered Internet-of-Things Devices
cs.CR · 2017 · author #6
-
Adversarial examples for generative models
stat.ML · 2017 · author #3
-
Delving into Transferable Adversarial Examples and Black-box Attacks
cs.LG · 2016 · author #4
-
Latent Attention For If-Then Program Synthesis
cs.CL · 2016 · author #4
-
Subliminal Probing for Private Information via EEG-Based BCI Devices
cs.CR · 2013 · author #8
-
Mining Permission Request Patterns from Android and Facebook Applications (extended author version)
cs.CR · 2012 · author #4
-
Evolution of Social-Attribute Networks: Measurements, Modeling, and Implications using Google+
cs.SI · 2012 · author #7
-
Preserving Link Privacy in Social Network Based Systems
cs.CR · 2012 · author #3
-
Touchalytics: On the Applicability of Touchscreen Input as a Behavioral Biometric for Continuous Authentication
cs.CR · 2012 · author #5
-
Jointly Predicting Links and Inferring Attributes using a Social-Attribute Network (SAN)
cs.SI · 2011 · author #8
-
How Open Should Open Source Be?
cs.CR · 2011 · author #4
-
Towards Practical Oblivious RAM
cs.CR · 2011 · author #3
-
A Learning-Based Approach to Reactive Security
cs.CR · 2009 · author #5
Mentions
-
2606.20008
#4 · arxiv_oai · confidence 0.70
Dawn Song
-
2606.13608
#29 · arxiv_oai · confidence 0.70
Dawn Song
-
2602.13379
#6 · arxiv_oai · confidence 0.70
Dawn Song
-
2602.08235
#7 · arxiv_oai · confidence 0.70
Dawn Song
-
2606.07818
#8 · arxiv_oai · confidence 0.70
Dawn Song
-
2512.04123
#21 · arxiv_oai · confidence 0.70
Dawn Song
-
2510.11974
#4 · arxiv_oai · confidence 0.70
Dawn Song
-
2606.04460
#16 · arxiv_oai · confidence 0.70
Dawn Song
-
2606.04261
#7 · arxiv_oai · confidence 0.70
Dawn Song
-
2606.01286
#13 · arxiv_oai · confidence 0.70
Dawn Song
-
1312.6052
#8 · backfill · confidence 0.70
Dawn Song
-
2605.29059
#2 · arxiv_oai · confidence 0.70
Dawn Song
-
2605.28999
#7 · arxiv_oai · confidence 0.70
Dawn Song
-
2605.08678
#27 · arxiv_oai · confidence 0.70
Dawn Song
-
2510.24941
#4 · arxiv_oai · confidence 0.70
Dawn Song
-
2605.26667
#3 · arxiv_oai · confidence 0.70
Dawn Song
-
2601.00575
#4 · arxiv_oai · confidence 0.70
Dawn Song
-
1210.2429
#4 · backfill · confidence 0.70
Dawn Song
-
1209.0835
#7 · backfill · confidence 0.70
Dawn Song
-
1208.6189
#3 · backfill · confidence 0.70
Dawn Song
-
1207.6231
#5 · backfill · confidence 0.70
Dawn Song
-
1112.3265
#8 · backfill · confidence 0.70
Dawn Song
-
2008.02275
#6 · arxiv_oai · confidence 0.70
Dawn Song
-
1109.0507
#4 · backfill · confidence 0.70
Dawn Song
-
2406.09187
#11 · arxiv_oai · confidence 0.70
Dawn Song
-
1106.3652
#3 · backfill · confidence 0.70
Dawn Song
-
2603.05910
#14 · arxiv_oai · confidence 0.70
Dawn Song
-
2605.16976
#6 · arxiv_oai · confidence 0.70
Dawn Song
-
2502.14296
#65 · arxiv_oai · confidence 0.70
Dawn Song
-
2305.15717
#8 · arxiv_oai · confidence 0.70
Dawn Song
-
0912.1155
#5 · backfill · confidence 0.70
Dawn Song
-
2505.19590
#5 · arxiv_oai · confidence 0.70
Dawn Song