Abstract:
The evolution of query compilation in database management systems traces back to System R, which pioneered a code generation scheme where small machine code fragments were stitched together to form a specialized routine to process a given SQL statement. Subsequent approaches shifted to generating C code, compiling it with system compilers like GCC into dynamic libraries, and loading them at runtime. The current state-of-the-art standard for dynamic query compilation is the LLVM framework, which bypasses frontend compiler overhead by directly generating intermediate representation, enabling machine-independent optimizations and efficient machine code generation. LLVM's resource-intensive nature, primarily designed as an optimizing compiler, however, can lead to compilation times that are orders of magnitude longer than query execution times, particularly problematic for queries with millisecond-level interpretation costs. This paper evaluates two lightweight code generation frameworks for x86-64 architecture as alternatives to LLVM in PostgreSQL, assessing their code generation speed and the quality of emitted machine code. We present a qualitative comparison with LLVM, analyzing trade-offs between compilation latency and runtime performance across databases of varying sizes. Experimental results demonstrate that lightweight code generation can not only outperform LLVM on small-scale datasets but also maintain competitive performance on larger ones.