<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><pre style="width: 1731.84px; word-break: break-word !important;">Hi, Adapters,

<div>Large language models(LLMs) have demonstrated remarkable abilities across a host of NLP tasks. However, the inceasingly growing size of these auto-regressive LLMs have posed significant challenge in deployment. Specifically, the auto-regressive </div><div>decoding process is heavily memory-bound and only utilizes a small amount of the parallel computation capacity of modern GPU accelerators. In this talk, I will introduce speculative decoding, a losses approach for accelerating LLMs inference. </div><div><br></div><div>Hope you find it useful and interesting.</div>

Meeting link: https://teams.microsoft.com/l/meetup-join/19%3ameeting_M2VmMTU5MzgtODUzOC00NmU4LTg0MzktNGFjNDdiMmIwYTI1%40thread.v2/0?context=%7b%22Tid%22%3a%225cdc5b43-d7be-4caa-8173-729e3b0a62d9%22%2c%22Oid%22%3a%221a8b9fa0-af57-4a1c-9390-22d1c201d622%22%7d


Best wishes, Roy

_______________________________________________

Adapt mailing list

Adapt@cs.sjtu.edu.cn

<a href="http://cs.sjtu.edu.cn/mailman/listinfo/adapt" _src="http://cs.sjtu.edu.cn/mailman/listinfo/adapt">http://cs.sjtu.edu.cn/mailman/listinfo/adapt</a> </pre></div>