Abstract:
The article describes a biologically plausible hybrid navigation algorithm for an autonomous mobile agent in an unidentified dynamic environment. The algorithm is based on the discrete optimization method and extended by an incremental descriptor. The incremental descriptor is a statistical memory module that implements the step-by-step (incremental) accumulation and updating of information about the relationships between states and actions. It acts as a probabilistic approximation of the environment transition function, which improves the robustness of training and enables a balance between exploration and exploitation without explicitly modeling the dynamics. It is a probabilistic memory and accumulates statistics of state–action pairs. Consequently, the learning stability is improved by balancing between “exploration” and “exploitation” without explicitly modeling transitions in the environment.