Yue Guan, Changming Yu, Yangjie Zhou, Jingwen Leng, Chao Li, Minyi Guo
In Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
ABSTRACT
Model pruning, which eliminates redundant parameters and reduces computational complexity, emerges as a viable strat- egy for efficient deep neural network (DNN) deployment. Owing to the irregular memory access and computation pat- terns in the sparse DNN models after pruning, existing arts have suggested various structured sparse patterns to enhance sparse DNN performance. In this work, we propose a unique perspective of understanding existing sparse pattern design as computation-skipping after tiling the tensor computa- tion into multi-level hierarchies. This unified perspective opens up a new design space of multi-level sparse tiling to maximize the sparsity benefits of DNNs, as opposed to the single-level choice in current practices. We present Fractal, an auto-tuning system for sparse patterns that identifies the optimal multi-level sparse tiling pattern. We introduce Pat- ternIR, a novel high-level intermediate representation (IR), to express a diverse range of multi-level sparse patterns. By leveraging insights from prior dense operator optimizations, we translate PatternIR into low-level compiler IRs, facili- tating further operator optimization and code generation. Our evaluations demonstrate that Fractal yields substantial speedups of up to on average 3.16× on CUDA Core, 2.52× on TensorCore of GPUs compared to the state-of-art dense base- line under 75% sparsity while upholding minimal accuracy degradation compared to prior sparse operator libraries.