Jingwen's Homepage

Publication



  • System and Architecture for Deep Learning

[HPCA’25]
VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference

[HPCA’25]
MANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type














[MICRO’22]
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
IEEE Micro Top Picks from Computer Architecture Conferences Honorable Mention
























  • Resiliency and Efficiency




















  • Cloud Computing

[ASPLOS’22]
Astraea: Towards QoS-Aware and Resource-Efficient Multi-stage GPU Services

[IPDPS’21]
AlphaR: Learning-Powered Resource Management for Irregular, Dynamic Microservice Graph







  • Misc.