Junyao Yang(杨竣尧)
👋
About Me
🤠 Hi there, this is Junyao Yang. I am a graduate student at the School of Computing, National University of Singapore (NUS), where I am pursuing a specialization in Artificial Intelligence. My research interests lie in Natural Language Processing, Explainable Artificial Intelligence and Trustworthy Machine Learning.
🧐 My research story revolves around the Underlying Principles and Understanding of Artificial Intelligence, particularly focusing on how to enhance the "Robustness" and "Safety" of LLM-generated information and understand the Interpretability of model mechanisms, which connects to related areas such as Trustworthy LLM [ACL 2025 Main, EMNLP 2025 Main] and Agent [Agentic Attribution, AgentDoG], Reasoning Model Merging [AAAI 2026, ReasonAny] and Malicious Attacks [ACL 2025 Main].
🧐 My research story revolves around the Underlying Principles and Understanding of Artificial Intelligence, particularly focusing on how to enhance the "Robustness" and "Safety" of LLM-generated information and understand the Interpretability of model mechanisms, which connects to related areas such as Trustworthy LLM [ACL 2025 Main, EMNLP 2025 Main] and Agent [Agentic Attribution, AgentDoG], Reasoning Model Merging [AAAI 2026, ReasonAny] and Malicious Attacks [ACL 2025 Main].
🔥
News
- 2026.02 ✍️ Blog post: The Entropy-Gradient Inversion. R1/o1-like reasoning models exhibit significant negative correlations between gradient strength and token entropy, emerging rapidly within the first 200 steps of SFT. This provides a new interpreable approach to the model's reasoning capability.
- 2026.01 🚀 Please check our latest tech report: AgentDoG! It introduces a state-of-the-art diagnostic guardrail framework utilizing a three-dimensional taxonomy, featuring an Agentic XAI attribution module I contributed to for diagnosing the internal drivers of risky actions.
- 2026.01 🚀 Please check our latest paper: Agentic Attribution! A hierarchical framework utilize temporal likelihood and perturbation-based analysis to unveil internal factors driving LLM-based agent actions.
- 2026.01 🏄♂️ I will attend AAAI 2026 at Singapore during Jan 20-27, 2026. Let’s have fun!
- 2026.01 🚀 Please check our latest paper: ReasonAny! ReasonAny employs contrastive gradient identification to resolve destructive performance collapse, effectively merging reasoning capabilities into domain-specific models!
- 2025.11 🎉 First-Author paper RCP-Merging has been accepted to AAAI 2026 Main Track! See you in Singapore!
- 2025.08 🎉 RewardDS has been accepted to EMNLP 2025 Main!
- 2025.08 🥳 I joined Shanghai AI Lab as a Research Intern, advised by Dongrui Liu.
- 2025.08 🚀 Check out my latest work: RCP-Merging! This novel framework integrates long CoT capability into domain-specific LLMs without sacrificing their performance in the original domain!
- 2025.05 🎉 Successfully passed my undergraduate thesis defense!
- 2025.05 🎉 Co-First-Author paper PrivacyRestore has been accepted to ACL 2025 Main! Deeply grateful to my mentor Ziqian and collaborator Jianwei! See you in Vienna!
- 2025.02 🚀 Please check our newest papers: RewardDS and PrivacyRestore! Thanks to the help of other collaborators.
- 2024.07 🥳 I joined ZeroNLP as a Research Assistant, advised by Prof. Ziqian Zeng.
- 2024.07 🥳 I spent a wonderful time at Tencent as a machine learning intern!
- 2024.07 🚀 Contextless CS is available now, which reaches 20,000 DAU! Check my work here!
- 2024.04 🥳 I joined Tencent as a machine learning intern.
- 2024.03 🥳 I spent a wonderful time at ShenZhen Stock Exchange as a machine learning intern!
📝
Publications & Preprints
arXiv Preprint ReasonAny: Incorporating Reasoning Capability to Any Model via Simple and Effective Model Merging
Junyao Yang, Chen Qian, Dongrui Liu†, Wen Shen, Yong Liu†, Jing Shao†
TL;DR: Merging robust chain-of-thought capabilities into domain-specific models (Safety, Biomedicine) using Contrastive Gradient Identification.
arXiv Preprint The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution
Chen Qian, Peng Wang, Dongrui Liu†, Junyao Yang, Dadi Guo, Ling Tang, Jilin Mei, Qihan Ren, Shuai Shao, Yong Liu, Jie Fu, Jing Shao, Xia Hu
TL;DR: A hierarchical framework for agentic attribution, using temporal likelihood and perturbation-based analysis to unveil internal factors driving LLM-based agent actions.
AAAI 2026 Main Track RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior
Junyao Yang, Jianwei Wang, Huiping Zhuang, Cen Chen, Ziqian Zeng*†
TL;DR: Enhancing domain performance while preserving chain-of-thought reasoning abilities by treating reasoning as a prior.
ACL 2025 Main PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration
Ziqian Zeng*†, Jianwei Wang*, Junyao Yang*, Zhengdong Lu, Haoran Li, Huiping Zhuang, Cen Chen
TL;DR: Protecting privacy via activation steering using a protected meta-vector without retraining.
EMNLP 2025 Main RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis
Jianwei Wang, Chengming Shi, Junyao Yang, Haoran Li, Qianli Ma, Huiping Zhuang, Cen Chen, Ziqian Zeng†
TL;DR: Using client-side reward models to filter synthetic data, mitigating noise while protecting privacy.
📁
Tech Reports & Projects
Tech Report AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
Shanghai Artificial Intelligence Laboratory (Contributor)
TL;DR: A state-of-the-art diagnostic guardrail framework utilizing a unified three-dimensional taxonomy to provide fine-grained monitoring and root-cause analysis of AI agent safety risks.
✍️
Blogs
The Entropy-Gradient Inversion: A New Perspective on LLM Reasoning Capabilities
TL;DR: We discover that reasoning models exhibit a unique "fingerprint": a significant negative correlation between gradient strength and token entropy, which contradicts traditional base models. This capability emerges rapidly within the first 200 steps of SFT.
🎓
Education

M.S. in AI
National University of Singapore
2025 - 2027 (Expected).svg.png)
B.S. in CS (with honor)
South China University of Technology
2021 - 2025
High School
Shenzhen Experimental School
2018 - 2021 💻
Experience

Shanghai AI Lab
Research Intern | 2025.06 - Present
.svg.png)
South China University of Technology
Research Intern | 2024.07 - 2025.06

Tencent
Machine Learning Intern | 2024.04 - 2024.07

SZSE
Machine Learning Intern | 2024.01 - 2024.04
🏆
Honor & Awards
- Excellent Graduation Thesis (2025.06)
- Outstanding Student Leader (2022-2024)
- Second-Class Scholarship of SCUT (2024.10)
- Second-Class Award in CUMCM at Guangdong Province (2022.09)
🌏





