Compact and Computationally Efficient Representation of Deep Neural Networks

Wiedemann, Simon; Müller, Klaus-Robert; Samek, Wojciech

Abstract:At the core of any inference procedure in deep neural networks are dot product operations, which are the component that require the highest computational resources. A common approach to reduce the cost of inference is to reduce its memory complexity by lowering the entropy of the weight matrices of the neural network, e.g., by pruning and quantizing their elements. However, the quantized weight matrices are then usually represented either by a dense or sparse matrix storage format, whose associated dot product complexity is not bounded by the entropy of the matrix. This means that the associated inference complexity ultimately depends on the implicit statistical assumptions that these matrix representations make about the weight distribution, which can be in many cases suboptimal. In this paper we address this issue and present new efficient representations for matrices with low entropy statistics. These new matrix formats have the novel property that their memory and algorithmic complexity are implicitly bounded by the entropy of the matrix, consequently implying that they are guaranteed to become more efficient as the entropy of the matrix is being reduced. In our experiments we show that performing the dot product under these new matrix formats can indeed be more energy and time efficient under practically relevant assumptions. For instance, we are able to attain up to x42 compression ratios, x5 speed ups and x90 energy savings when we convert in a lossless manner the weight matrices of state-of-the-art networks such as AlexNet, VGG-16, ResNet152 and DenseNet into the new matrix formats and benchmark their respective dot product operation.

Comments:	17 pages, 14 figures
Subjects:	Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as:	arXiv:1805.10692 [cs.LG]
	(or arXiv:1805.10692v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1805.10692

Computer Science > Machine Learning

Title:Compact and Computationally Efficient Representation of Deep Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators