Jingwen's Homepage


  • Architecture for Deep Learning

BlockSkim: Efficient Question Answering for Transformer

VELTAIR: Towards High-Performance Multi-Tenant Deep Learning Services via Adaptive Compilation and Scheduling

Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS

Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction

Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators

Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks

  • Resiliency and Efficiency

  • Cloud Computing

Astraea: Towards QoS-Aware and Resource-Efficient Multi-stage GPU Services

AlphaR: Learning-Powered Resource Management for Irregular, Dynamic Microservice Graph

  • Misc.