Junyao Yang(杨竣尧)
About Me
Hi there, this is Junyao Yang. I am a graduate student at the School of Computing, National University of Singapore (NUS), where I am pursuing a specialization in Artificial Intelligence. My research interests lie in Natural Language Processing, Explainable Artificial Intelligence and Trustworthy Machine Learning.
My research story revolves around the Underlying Principles and Understanding of Artificial Intelligence, particularly focusing on how to enhance the "Robustness" and "Safety" of LLM-generated information and understand the Interpretability of model mechanisms, which connects to related areas such as Trustworthy LLM [ACL 2025 Main, EMNLP 2025 Main] and Agent [Agentic Attribution, AgentDoG], Reasoning Model Merging [AAAI 2026, ReasonAny] and Malicious Attacks [ACL 2025 Main].
My research story revolves around the Underlying Principles and Understanding of Artificial Intelligence, particularly focusing on how to enhance the "Robustness" and "Safety" of LLM-generated information and understand the Interpretability of model mechanisms, which connects to related areas such as Trustworthy LLM [ACL 2025 Main, EMNLP 2025 Main] and Agent [Agentic Attribution, AgentDoG], Reasoning Model Merging [AAAI 2026, ReasonAny] and Malicious Attacks [ACL 2025 Main].
News
- 2026.02 Blog post: The Entropy-Gradient Inversion. R1/o1-like reasoning models exhibit significant negative correlations between gradient strength and token entropy, emerging rapidly within the first 200 steps of SFT.
- 2026.01 Tech report: AgentDoG! State-of-the-art diagnostic guardrail framework with an Agentic XAI attribution module.
- 2026.01 Paper: Agentic Attribution! A hierarchical framework to unveil internal factors driving LLM-based agent actions.
- 2026.01 Attending AAAI 2026 at Singapore during Jan 20-27, 2026.
- 2026.01 Paper: ReasonAny! Contrastive gradient identification to resolve destructive performance collapse in model merging.
- 2025.11 First-Author paper RCP-Merging accepted to AAAI 2026 Main Track.
- 2025.08 RewardDS accepted to EMNLP 2025 Main.
- 2025.08 Joined Shanghai AI Lab as a Research Intern, advised by Dongrui Liu.
- 2025.08 New work: RCP-Merging! Integrating long CoT capability into domain-specific LLMs.
- 2025.05 Passed undergraduate thesis defense.
- 2025.05 Co-First-Author paper PrivacyRestore accepted to ACL 2025 Main.
- 2025.02 New papers: RewardDS and PrivacyRestore.
- 2024.07 Joined ZeroNLP as a Research Assistant, advised by Prof. Ziqian Zeng.
- 2024.07 Completed internship at Tencent as a machine learning intern.
- 2024.07 Contextless CS reached 20,000 DAU.
- 2024.04 Joined Tencent as a machine learning intern.
- 2024.03 Completed internship at ShenZhen Stock Exchange as a machine learning intern.
Publications & Preprints
arXiv Preprint ReasonAny: Incorporating Reasoning Capability to Any Model via Simple and Effective Model Merging
Junyao Yang, Chen Qian, Dongrui Liu†, Wen Shen, Yong Liu†, Jing Shao†
TL;DR: Merging robust chain-of-thought capabilities into domain-specific models (Safety, Biomedicine) using Contrastive Gradient Identification.
arXiv Preprint The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution
Chen Qian, Peng Wang, Dongrui Liu†, Junyao Yang, Dadi Guo, Ling Tang, Jilin Mei, Qihan Ren, Shuai Shao, Yong Liu, Jie Fu, Jing Shao, Xia Hu
TL;DR: A hierarchical framework for agentic attribution, using temporal likelihood and perturbation-based analysis to unveil internal factors driving LLM-based agent actions.
AAAI 2026 Main Track RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior
Junyao Yang, Jianwei Wang, Huiping Zhuang, Cen Chen, Ziqian Zeng*†
TL;DR: Enhancing domain performance while preserving chain-of-thought reasoning abilities by treating reasoning as a prior.
ACL 2025 Main PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration
Ziqian Zeng*†, Jianwei Wang*, Junyao Yang*, Zhengdong Lu, Haoran Li, Huiping Zhuang, Cen Chen
TL;DR: Protecting privacy via activation steering using a protected meta-vector without retraining.
EMNLP 2025 Main RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis
Jianwei Wang, Chengming Shi, Junyao Yang, Haoran Li, Qianli Ma, Huiping Zhuang, Cen Chen, Ziqian Zeng†
TL;DR: Using client-side reward models to filter synthetic data, mitigating noise while protecting privacy.
Tech Reports & Projects
Tech Report AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
Shanghai Artificial Intelligence Laboratory (Contributor)
TL;DR: A state-of-the-art diagnostic guardrail framework utilizing a unified three-dimensional taxonomy to provide fine-grained monitoring and root-cause analysis of AI agent safety risks.
Blogs
The Entropy-Gradient Inversion: A New Perspective on LLM Reasoning Capabilities
TL;DR: We discover that reasoning models exhibit a unique "fingerprint": a significant negative correlation between gradient strength and token entropy, which contradicts traditional base models. This capability emerges rapidly within the first 200 steps of SFT.
Education

M.S. in AI
National University of Singapore
2025 - 2027 (Expected).svg.png)
B.S. in CS (with honor)
South China University of Technology
2021 - 2025
High School
Shenzhen Experimental School
2018 - 2021Experience

Shanghai AI Lab
Research Intern | 2025.06 - Present
.svg.png)
South China University of Technology
Research Intern | 2024.07 - 2025.06

Tencent
Machine Learning Intern | 2024.04 - 2024.07

SZSE
Machine Learning Intern | 2024.01 - 2024.04
Honor & Awards
- Excellent Graduation Thesis (2025.06)
- Outstanding Student Leader (2022-2024)
- Second-Class Scholarship of SCUT (2024.10)
- Second-Class Award in CUMCM at Guangdong Province (2022.09)
- Second-Prize of Olympic Mathematics Competition (2020.05)
- Second-Prize of Olympic Physics Competition (2020.02)





