[Adapt] [Seminar]Long Inputs for Transformer-based Models
jia_qi_0217 at 163.com
Tue May 18 23:51:39 CST 2021
We all know that training large transformer models on long sequence is expensive and may not be possible on a standard GPU card because of the self-attention mechanism that grows quadratically with sequence length. In this seminar, I'll first give an analysis of memory occupation for standard transformer-based models (BART). Then, I'll talk about two solutions from the view of the model and the input data respectively to tackle the long document summarization problem.
See you there~
Time: Wed 4:00pm
Venue: SEIEE 3-414
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Adapt