王逍

发布人：信息学院发布时间：2026-06-03浏览次数：53

王逍，硕士生导师，2015年获北京理工大学学士学位，2021年获北京航空航天大学博士学位，从事大模型、具身智能、飞行器动力学与控制、航天器集群自主任务规划等方面的研究。截至目前，已发表相关学术论文18篇，其中以第一作者在IEEETransactions on Aerospace and ElectronicSystems、Neurocomputing、Advancesin SpaceResearch等期刊发表SCI学术论文10篇，EI及核心论文5篇；申请发明专利8项，已授权5项。主持省部级纵向及“星群算力聚合与任务规划软件开发”、“基于多智能体强化学习的航天器动态博弈对抗技术研究”、“地月空间自主任务规划模块开发”、“动态不确定空间目标的时空因果推理决策方法研究”等纵横向项目。

教学课程

自动控制原理	本科生
智能控制理论及应用	研究生

学术论文

[1] Wang X, LiD, Bioinspired Actor-Critic Algorithm for Reinforcement LearningInterpretation with Levy-Brown Hybrid Exploration Strategy,Neurocomputing, 2024, 574(Mar.14):1.1-1.16.

[2] Wang, X.,Ma, Z., Cao, L. et al. A planar tracking strategy based onmultiple-interpretable improved PPO algorithm with few-shottechnique. Sci Rep, 2024, 14: 3910.https://doi.org/10.1038/s41598-024-54268-6.

[3] Xiao Wang,Zhuo Yang, Yuying Han, Hao Li, Peng Shi,Method of sequentialintention inference for a space target based on meta-fuzzy decisiontree,Advances in Space Research,2024,,ISSN0273-1177,https://doi.org/10.1016/j.asr.2024.06.049.

[4] Xiao Wang,Jiake Li, Lu Cao, Dechao Ran, Mingjiang Ji, Kewu Sun, Yuying Han, ZheMa,A data-knowledge joint-driven reinforcement learning algorithmbased on guided policy and state-prediction for satellitecontinuous-thrust tracking,Advances in Space Research,2024,ISSN0273-1177,https://doi.org/10.1016/j.asr.2024.06.070.

[5] Wang X. ,Shi P. , Wen C. and Zhao Y. Design of Parameter-Self-TuningController Based on Reinforcement Learning for TrackingNoncooperative Targets in Space[J]. IEEE Transactions on Aerospaceand Electronic Systems, 56(6): 4192-4208, Dec. 2020, doi:10.1109/TAES.2020.2988170.

[6] Wang X ,Shi P , Zhao Y , et al. A Pre-Trained Fuzzy Reinforcement LearningMethod for the Pursuing Satellite in a One-to-One Game in Space[J].Sensors, 2020, 20(8):2253.

[7] Wang X ,Shi P , Wen C , et al. An Algorithm of Reinforcement Learning forManeuvering Parameter Self-Tuning Applying in Satellite Cluster[J].Mathematical Problems in Engineering, 2020, 2020(5):1-17.

[8] Wang X ,Shi P , Schwartz H , et al. An algorithm of pretrained fuzzyactor–critic learning applying in fixed-time space differentialgame[J]. Proceedings of the Institution of Mechanical Engineers PartG Journal of Aerospace Engineering, 2021, 235(14):2095-2112.

[9] Wang X,Ma Z, Mao L, Sun K, Huang X, Fan C, Li J. Accelerating FuzzyActor–Critic Learning via Suboptimal Knowledge for aMulti-Agent Tracking Problem. Electronics, 2023,12(8):1852.

[10] WangX,Yang Z, Bai X, Ji M, Li H, Ran D. A Consistent Round-UpStrategy Based on PPO Path Optimization for the Leader–FollowerTracking Problem. Sensors. 2023, 23. 8814.

[11] Li D，ZhuF，Wang X，JinQ. Multi-objective reinforcement learning for fed-batch fermentationprocess control[J].Journal of Process Control, 2022,115(11):89-99.

[12] Song L,Li D,WangX, Xu X. AdaBoost Maximum Entropy Deep Inverse ReinforcementLearning with Truncated Gradient[J].Information Sciences,2022,602(2).

[13] Wang, X.,Han, Y., Tang, M., Zhang, F. (2025). Robust Orbital Game Policyin Multiple Disturbed Environments: An Approach Basedon Causality Diversity Maximal Marginal Relevance Algorithm. In:Liu, L., Niu, Y., Fu, W., Qu, Y. (eds) Proceedings of 4th 2024International Conference on Autonomous Unmanned Systems (4th ICAUS2024). ICAUS 2024. Lecture Notes in Electrical Engineering, vol 1374.Springer, Singapore.

[14] Xiao Wang,Lishuo Wang, Chen Zhang, Hao Zhang，DaziLi，Robust orbital game policy inmultiple disturbed environments: an approach based on causalitydiversity maximal marginal relevance algorithm，C2-CHINA2025（Accept）.

[15] 王逍,温昶煊,赵育善,师鹏.翻滚目标安全走廊内的碰撞可能性判断方法[J].哈尔滨工业大学学报,2018,50(04):94-101+187.

[16] 王逍,师鹏,温昶煊,赵育善. 非合作目标近距离停靠的自适应位姿联合控制[C]. 中国自动化学会控制理论专业委员会.第36届中国控制会议论文集（C）.中国自动化学会控制理论专业委员会:中国自动化学会控制理论专业委员会,2017:129-134.

[17] 王逍,师鹏,温昶煊,赵育善.对失效卫星特征点的自适应位姿跟踪控制[J].中国空间科学技术,2018,38(01):8-17.

教职工名录

王逍