Abstract:
Large language models are widely used in the field of natural language processing. However, despite their high eficiency, the application of large language models becomes dificult due to their high computational and memory costs.
One of the ways to solve this problem is neural network quantization, that is, converting the weights and activations of the network to a representation with lower bit-width. A special case of quantization is binarization, which is the compression of network parameters to a bit-width of $1$ bit.
In this paper, the structure of binary neural networks is considered, an overview of current methods of language model binarization is provided, and the results obtained are described.
Keywords:natural language processing, binary neural networks, binarization, quantization, large language models