May 20, 2025

US12305967 - Method for designing terminal guidance law based on deep reinforcement learning

The present disclosure discloses a method for designing a terminal guidance law based on deep reinforcement learning, and relates to the field of missile and rocket guidance. The method includes: establishing a relative kinematics equation between a missile and a target in a longitudinal plane of a target interception terminal guidance section of the missile; to adapt to a research paradigm of reinforcement learning, abstracting a research problem and modeling as a Markov decision process; building an algorithm network, and setting algorithm parameters, where a selected deep reinforcement learning algorithm is a deep Q-network (DQN); in a terminal guidance process of each round, obtaining a sufficient number of training samples through Q-learning, training a neural network and updating a target network at fixed frequencies respectively, and continuously repeating the above process before set learning rounds are reached.

The patent describes a method for designing a terminal guidance law for missiles using deep reinforcement learning, which involves establishing kinematic equations and modeling the problem as a Markov decision process. The process includes training a neural network with Q-learning to optimize the missile’s guidance until it successfully intercepts the target.

Claim 1

  1. A method for designing a terminal guidance law based on deep reinforcement learning, comprising the following steps: establishing a relative kinematics equation between a missile and a target in a longitudinal plane of a target interception terminal guidance section of the missile; abstracting a solving problem of the kinematics equation and modeling as a Markov decision process; building an algorithm network, setting algorithm parameters, and training the algorithm network based on a randomly initialized data set to determine weight parameters of an initial network; continuously caching, by an agent, state transition data and reward values as learning samples in an experience pool based on a Q-Learning algorithm, and continuously selecting a fixed number of samples from the experience pool to train the network until set learning rounds are reached; and generating, during a specific guidance process, an action in real time based on a current state by using a learned network to transfer to a next state, and continuously repeating the process until the target is hit to complete the guidance process. establishing a relative kinematics equation between a missile and a target in a longitudinal plane of a target interception terminal guidance section of the missile; abstracting a solving problem of the kinematics equation and modeling as a Markov decision process; building an algorithm network, setting algorithm parameters, and training the algorithm network based on a randomly initialized data set to determine weight parameters of an initial network; continuously caching, by an agent, state transition data and reward values as learning samples in an experience pool based on a Q-Learning algorithm, and continuously selecting a fixed number of samples from the experience pool to train the network until set learning rounds are reached; and generating, during a specific guidance process, an action in real time based on a current state by using a learned network to transfer to a next state, and continuously repeating the process until the target is hit to complete the guidance process.

Google Patents

https://patents.google.com/patent/US12305967

USPTO PDF

https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/12305967

Use the arrows to move through the archive in gazette order.