[Adapt] [ADAPT][Seminar] Efficient Streaming Language Models With Attention Sinks

Tue Oct 17 14:41:47 CST 2023

Hi, Adapters,

Large language models~(LLMs) have revolutionized the NLP community by storm merely in one year. However, LLMs all comes with their pre-defined context length limits. For instance, LLaMa's context window size is 2K and LLaMa2's context window is 4K. This context size limitation has largely hindered LLMs' application on an array of scenarious where long inputs are indispensable, such as multi-turn conversation, multi-document summarization, and paper analysis. In this talk, I will introduce a recent paper titled "Efficient Streaming Language Models With Attention Sinks", which proposes a simple yet effective solution to applying LLMs to streaming applications while enjoying constant memory footprint and inference latency. I chose this paper because it is a good example of how a working idea is motivated by empirical observation(though more theoretical understanding is needed).

Hope you find this talk useful.

Time: Wed 10 am. - 11:30 am.
Meeting link: https://teams.live.com/meet/9558525993023?p=kwWYI7ENfpXHnxOR 

Best wishes,
Roy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.sjtu.edu.cn/pipermail/adapt/attachments/20231017/deb4895f/attachment.htm>