<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><pre style="width: 1731.84px; word-break: break-word !important;">Hi, Adapters,
<div>Large language models~(LLMs) have revolutionized the NLP community by storm merely in one year. However, LLMs all comes with their pre-defined context length limits. For instance, LLaMa's context window size is 2K and LLaMa2's context window is 4K. </div><div>This context size limitation has largely hindered LLMs' application on an array of scenarious where long inputs are indispensable, such as multi-turn conversation, multi-document summarization, and paper analysis. In this talk, I will introduce a recent paper titled </div><div>"Efficient Streaming Language Models With Attention Sinks", which proposes a simple yet effective solution to applying LLMs to streaming applications while enjoying constant memory footprint and inference latency. I chose this paper because it is a good </div><div>example of how a working idea is motivated by empirical observation(though more theoretical understanding is needed).</div><div><br></div><div>Hope you find this talk useful.</div>
Time: Wed 10 am. - 11:30 am.
Meeting link: <a href="https://teams.live.com/meet/9558525993023?p=kwWYI7ENfpXHnxOR" _src="https://teams.live.com/meet/9558525993023?p=kwWYI7ENfpXHnxOR">https://teams.live.com/meet/9558525993023?p=kwWYI7ENfpXHnxOR</a>
Best wishes,
Roy
</pre><div><br></div></div>