[Adapt] [Seminar] From Transformer to Bert (From Angel to Jessie)
贾琪
jia_qi_0217 at 163.com
Wed Apr 17 09:18:56 CST 2019
Hi, Adapters,
This week Jessie and I are going to introduce the famous models, Transformer and Bert, proposed in the famous papers, "Attention is all you need" and "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". Although they have been mentioned in previous seminars a lot of time, the details of them haven't been discussed according to our investigation.
(Angel) I will mainly introduce the model architecture of Transformer, which is said to be the first sequence transduction model based entirely on attention. What is the multi-head self-attention mechanism? Why is Transformer more parallelizable and need less training time? Hope you can find the answers.
(Jessie) I will introduce a new language representation model called BERT which is designed to pre-train deep bidirectional representations by jointly conditioning both left and right context in all layers. BERT outperforms previous state-of-the-art models on 11 NLP tasks.
See you then!
Time: 17:00, April, 17
Venue: SEIEE 3-414
Best,
Jia Qi & Luo Zhiyi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.sjtu.edu.cn/pipermail/adapt/attachments/20190417/a5c350f8/attachment.html>
More information about the Adapt
mailing list