A. M. Trunova, D. A. Yudin, “Scene graph forecasting using neural network-based methods”, Dokl. RAN. Math. Inf. Proc. Upr., 2025, Volume 527,Pages <nobr>245

SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES

Scene graph forecasting using neural network-based methods

A. M. Trunova^a, D. A. Yudin^ab

^a Center for Cognitive Modeling MIPT
^b AIRI, Moscow, Russia

Abstract: Forecasting the future state of a scene is a key computer vision task needed to build systems capable of proactive perception and decision-making in changing environments. This work addresses the problem of forecasting future scene graphs, where, given a video and a sequence of past graphs, one must predict objects and their relations in subsequent frames. Unlike existing approaches limited to static perception, the proposed method, GraphCast, takes into account semantic vision-language features of objects and their temporal dynamics. We introduce a model architecture based on object-centric encoding with a foundation transformer model, interaction modeling via a biaffine relation classification head, and a specialized object presence classifier. In addition, a temporal convolution module is used to extract features and improve robustness to noise. Experiments on the STAR and Action Genome datasets demonstrate that the proposed architecture outperforms existing baselines.

Keywords: scene graph forecasting, video understanding, spatio-temporal reasoning, neural network.

UDC: 004.89

Received: 20.08.2025
Accepted: 22.09.2025

DOI: 10.7868/S2686954325070203