US12305967 - Method for designing terminal guidance law based on deep reinforcement learning

The patent describes a method for designing a terminal guidance law for missiles using deep reinforcement learning, which involves establishing kinematic equations and modeling the problem as a Markov decision process. The process includes training a neural network with Q-learning to optimize the missile’s guidance until it successfully intercepts the target.
Claim 1
- A method for designing a terminal guidance law based on deep reinforcement learning, comprising the following steps: establishing a relative kinematics equation between a missile and a target in a longitudinal plane of a target interception terminal guidance section of the missile; abstracting a solving problem of the kinematics equation and modeling as a Markov decision process; building an algorithm network, setting algorithm parameters, and training the algorithm network based on a randomly initialized data set to determine weight parameters of an initial network; continuously caching, by an agent, state transition data and reward values as learning samples in an experience pool based on a Q-Learning algorithm, and continuously selecting a fixed number of samples from the experience pool to train the network until set learning rounds are reached; and generating, during a specific guidance process, an action in real time based on a current state by using a learned network to transfer to a next state, and continuously repeating the process until the target is hit to complete the guidance process. establishing a relative kinematics equation between a missile and a target in a longitudinal plane of a target interception terminal guidance section of the missile; abstracting a solving problem of the kinematics equation and modeling as a Markov decision process; building an algorithm network, setting algorithm parameters, and training the algorithm network based on a randomly initialized data set to determine weight parameters of an initial network; continuously caching, by an agent, state transition data and reward values as learning samples in an experience pool based on a Q-Learning algorithm, and continuously selecting a fixed number of samples from the experience pool to train the network until set learning rounds are reached; and generating, during a specific guidance process, an action in real time based on a current state by using a learned network to transfer to a next state, and continuously repeating the process until the target is hit to complete the guidance process.
Google Patents
https://patents.google.com/patent/US12305967
USPTO PDF
https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/12305967