Abstract:
The article considers the issues of efficient storage of multidimensional data models in the context of modern analytical systems. Particular attention is paid to the architecture of multidimensional cubes, which involve storing aggregated facts at the intersection of many dimensions. A review of modern data storage formats is provided – Parquet, ORC, Iceberg, Delta Lake, Hudi – from the standpoint of their applicability to multidimensional analytics tasks. It is shown that existing solutions are focused mainly on tabular structures and do not provide full support for multidimensional relationships, hierarchies and aggregations. The difficulties of integration between different storage formats and the lack of a unified approach to describing metadata are analyzed. Based on the identified limitations, design tasks facing the multidimensional cube storage format are formulated. A conceptual storage model is proposed that combines the principles of relational and multidimensional data organization. The multidimensional model is a table of facts, dimensions, as well as a metadata level and an API interface.
Keywords:multidimensional cubes, OLAP systems, data storage model, metadata, integration, data cube.