[Adapt] [Seminar]Long Inputs for Transformer-based Models
贾琪
jia_qi_0217 at 163.com
Tue May 18 23:51:39 CST 2021
Hi Adapters,
We all know that training large transformer models on long sequence is expensive and may not be possible on a standard GPU card because of the self-attention mechanism that grows quadratically with sequence length. In this seminar, I'll first give an analysis of memory occupation for standard transformer-based models (BART). Then, I'll talk about two solutions from the view of the model and the input data respectively to tackle the long document summarization problem.
See you there~
Time: Wed 4:00pm
Venue: SEIEE 3-414
Best Regards,
Angel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.sjtu.edu.cn/pipermail/adapt/attachments/20210518/78723fa7/attachment.html>
More information about the Adapt
mailing list