教授、博士生导师 现任:工程博士教育中心 主任

Ying Li is currently a professor of the School of Software and Microelectronics in Peking University. Meanwhile, she is a research professor of National Research Center of Software Engineering, and managing the Education Center of Doctor of Engineering (electronics and information area). Before joining PKU in 2012, she worked as STSM and Senior Manager leading the Department of Distributed Computing and Service Management in IBM Research – China Lab.

During the period of working in IBM, Dr. Li conducted several global collaborative research projects that the technologies have been successfully delivered and transformed to IBM software products and global service solutions, which made significant technology impact and business value, wining "IBM CIO Leadership Award" (2007) and "IBM Global Research Accomplishment Award" twice (2008 and 2010). She worked as the Executive Assistant to IBM China Chairman in 2009. She worked as the member of Software Product Architecture Board to participate several industrial projects with solid contribution to commercial software systems. She also worked as the Strategist for IBM global research strategy, and contributed to IBM Global Technical Outlook (GTO) from year 2008 to 2011. She had served as the Chair of IBM Research Lab Patent Review Board for several years. She was awarded as the "IBM Master Inventor" in 2011.

Since 2012, Prof. Li’s research interests have centered on AIOps, cloud-native systems, and reliability engineering for AI systems. She and her research team have undertaken relevant research projects funded by the National Key Research and Development Program, the National Natural Science Foundation of China, and the Key-Area R&D Program of Guangdong Province. They have also established fruitful industry-academia collaborations with leading high-tech enterprises including Alibaba, Huawei, ZTE, Tencent, Delta, ByteDance, Ant Group, and other such enterprises to address challenging technical problems in industrial scenarios.

Prof. Li holds over 40 granted CN and US patents and has published more than 100 academic papers in top-tier international journals and conferences. She is a Senior Member of the IEEE, having served as a Program Committee (PC) member for leading conferences such as ICSE and AAAI, etc. and as a reviewer for prestigious journals and conferences such as TOSEM, TSE, TSC, ACL and IJCAI. 

研究方向

• LLM/Agent-driven AIOps: including anomaly detection, root cause localization, and end-to-end auto-remediation for large-scale complex systems such as microservices, databases, and cloud infrastructures, by leveraging large language models, agentic reasoning, and reinforcement fine-tuning.

• Reliability Engineering for AI Systems: including reliability evaluation, failure diagnosis, risk control, and self-healing for LLM-based and agentic AI systems, by leveraging runtime observability, execution tracing, compliance testing, hallucination detection, and harness engineering.

• Observability and Diagnosis for Multi-Agent Systems: including runtime state modeling, reasoning/action trace analysis, inter-agent interaction tracing, and root cause localization for complex multi-agent workflows, by leveraging unified runtime representations, causal analysis, and intelligent root cause localization.

• Controllable and Resilient Execution for Agentic AI Systems: including policy-constrained execution, dynamic intervention, human-in-the-loop control, rollback/retry, and auto-recovery for autonomous AI agents, by leveraging risk-aware orchestration, safety guardrails, and adaptive execution mechanisms.

学术论文

[1] L. Zhang, Y. Zhai, T. Jia, C. Duan, S. Yu, J. Gao, B. Ding, Z. Wu, and Y. Li*, “ThinkFL: Self-Refining Failure Localization for Microservice Systems via Reinforcement Fine-Tuning,” ACM Transactions on Software Engineering and Methodology, 2026. (CCF A)

[2] X. Huang, H. Liu, L. Zhang, T. Jia, Y. Li, and Z. Wu*, “UDA-RCL: Unsupervised Domain Adaptation for Microservice Root Cause Localization Utilizing Multimodal Data,” IEEE Transactions on Services Computing, 2026. (CCF A)

[3] H. Liu, X. Huang, M. Jia, T. Jia, Z. Wu, and Y. Li*, “NER-AD: Noise-Robust Reconstruction Enhanced by Representation-Learning for Metric Anomaly Detection in Online Service Systems,” IEEE Transactions on Services Computing, 2026. (CCF A)

[4] X Zhao, T. Jia, M. He, and Y. Li*, “Generality Is Not Enough: Zero-Label Cross-System Log-Based Anomaly Detection via Knowledge-Level Collaboration,” in Proceedings of the IEEE/ACM International Conference on Software Engineering (ICSE), 2026. (CCF A)

[5] L. Zhang, T. Jia, Y. Zhai, L. Pan, C. Duan, M. He, P. Xiao, and Y. Li*, “Hypothesize-Then-Verify: Speculative Root Cause Analysis for Microservices with Pathwise Parallelism,” in Proceedings of the IEEE/ACM International Conference on Software Engineering (ICSE), 2026. (CCF A)

[6] Z. Li, N. Wang, J. Liu, Y. Zhang, F. Tong, Z. Chen, C. Li, M. Liu, X. Zhang, Y. Wu, T. Jia, and Y. Li*, “MagmaScope: Identifying Root-Cause Changes for Emergency Incident in Large-Scale Cloud Infrastructure,” in Proceedings of the IEEE/ACM International Conference on Software Engineering (ICSE), 2026. (CCF A)

[7] L. Zhang, T. Jia, Y. Zhai, L. Pan, C. Duan, M. He, M. Jia, and Y. Li*, “Agentic Memory Enhanced Recursive Reasoning for Root Cause Localization in Microservices,” in Proceedings of the IEEE/ACM International Conference on Software Engineering (ICSE), 2026. (CCF A)

[8] J. Sun, T. Jia, M. He, and Y. Li*, “VarParser: Unleashing the Neglected Power of Variables for LLM-Based Log Parsing,” in Proceedings of the ACM Web Conference (WWW), 2026. (CCFA )

[9] L. Zhang, Y. Zhai, T. Jia, M. He, C. Duan, Z. Liu, B. Ding, and Y. Li*, “E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning,” in Proceedings of the ACM International Conference on the Foundations of Software Engineering (FSE), 2026. (CCF A)

[10] C. Duan, T. Jia, M. He, P. Xiao, L. Zhang, Z. Zhong, X. Zhang, and Y. Li*, “AIMS: A Content-Aware Resource Management Approach for AI Assistant Systems,” in Proceedings of the ACM International Conference on the Foundations of Software Engineering (FSE), 2026. (CCF A)

[11] L. Zhang, T. Jia, W. Hong, M. Wang, C. Duan, M. He, R. Wang, X. Peng, M. Wang, N. Zhang, R. Che, and Y. Li*, “RuntimeSlicer: Towards Generalizable Unified Runtime State Representation for Failure Management,” in Proceedings of the ACM International Conference on the Foundations of Software Engineering (FSE), 2026. (CCF A)

[12] L. Zhang, T. Jia, M. Wang, W. Hong, C. Duan, M. He, R. Wang, X. Peng, M. Wang, N. Zhang, R. Chen, and Y. Li*, “Efficient Failure Management for Multi-Agent Systems with Reasoning Trace Representation,” in Proceedings of the ACM International Conference on the Foundations of Software Engineering (FSE), 2026. (CCF A)

[13] L. Zhang, T. Jia, M. Jia, Y. Wu, A. Liu, Y. Yang, Z. Wu, X. Hu, P. S. Yu, Y. Li*, et al., “A Survey of AIOps in the Era of Large Language Models,” ACM Computing Surveys, vol. 58, no. 2, Art. no. 44, pp. 1–35, 2025. (CCF A)

[14] L. Zhang, T. Jia, X. Huang, M. Jia, H. Liu, Z. Wu, and Y. Li*, “E-Log: Fine-Grained Elastic Log-Based Anomaly Detection and Diagnosis for Databases,” IEEE Transactions on Services Computing, vol. 18, no. 5, Sep./Oct. 2025. (CCF A)

[15] L. Zhang, T. Jia, M. Jia, H. Liu, Y. Yang, Z. Wu, and Y. Li*, “Towards Close-to-Zero Runtime Collection Overhead: Raft-Based Anomaly Diagnosis on System Faults for Distributed Storage System,” IEEE Transactions on Services Computing, vol. 18, no. 2, Mar./Apr. 2025. (CCF A)