Abstract:
In this paper we present an approach to static analysis of Python programs based on a low-level intermediate representation and devirtualization to provide interprocedural and intermodule analysis. This approach can be used to analyze Python programs without type annotations and find complex defects inaccessible to traditional AST-based analysis tools. Using CPython bytecode as a base, the representation suitable to static analysis is constructed and call resolution is performed via an interprocedural devirtualization algorithm. We implemented the proposed approach in a static analyzer for finding errors in C, C++, Java, and Go programs and achieved good results on open-source projects with minimal modifications to existing detectors. The detectors that are relevant to Python had a true positive rate from 60% up to 96%. This demonstrates that our approach allows to apply techniques used for analysis of statically typed languages to Python.