[Adapt] [Seminar]Long Inputs for Transformer-based Models

Tue May 18 23:51:39 CST 2021

Hi Adapters,

We all know that training large transformer models on long sequence is expensive and may not be possible on a standard GPU card because of the self-attention mechanism that grows quadratically with sequence length.  In this seminar, I'll first give an analysis of memory occupation for standard transformer-based models (BART). Then, I'll talk about two solutions from the view of the model and the input data respectively to tackle the long document summarization problem.

See you there~

Time: Wed 4:00pm

Venue: SEIEE 3-414

Best Regards,

Angel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.sjtu.edu.cn/pipermail/adapt/attachments/20210518/78723fa7/attachment.html>