[Adapt] [Seminar] From Transformer to Bert (From Angel to Jessie)

Wed Apr 17 09:18:56 CST 2019

Hi, Adapters, 

This week Jessie and I are going to introduce the famous models, Transformer and Bert,  proposed in the famous papers,  "Attention is all you need" and "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". Although they have been mentioned in previous seminars a lot of time, the details of them haven't been discussed according to our investigation.
(Angel) I will mainly introduce the model architecture of Transformer, which is said to be the first sequence transduction model based entirely on attention. What is the multi-head self-attention mechanism? Why is Transformer more parallelizable and need less training time? Hope you can find the answers.
(Jessie) I will introduce a new language representation model called BERT which is designed to pre-train deep bidirectional representations by jointly conditioning both left and right context in all layers. BERT outperforms previous state-of-the-art models on 11 NLP tasks.
See you then!

Time: 17:00, April, 17
Venue: SEIEE 3-414
Best, 
Jia Qi & Luo Zhiyi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.sjtu.edu.cn/pipermail/adapt/attachments/20190417/a5c350f8/attachment.html>