[Adapt] [Seminar]Long Inputs for Transformer-based Models

贾琪 jia_qi_0217 at 163.com
Tue May 18 23:51:39 CST 2021

Hi Adapters,

We all know that training large transformer models on long sequence is expensive and may not be possible on a standard GPU card because of the self-attention mechanism that grows quadratically with sequence length.  In this seminar, I'll first give an analysis of memory occupation for standard transformer-based models (BART). Then, I'll talk about two solutions from the view of the model and the input data respectively to tackle the long document summarization problem.

See you there~

Time: Wed 4:00pm

Venue: SEIEE 3-414


Best Regards,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.sjtu.edu.cn/pipermail/adapt/attachments/20210518/78723fa7/attachment.html>

More information about the Adapt mailing list