<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div>Hi, Adapters, </div><div><br></div><div><div>This week Jessie and I are going to introduce the famous models, Transformer and Bert,  proposed in the famous papers,  "Attention is all you need" and "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". Although they have been mentioned in previous seminars a lot of time, the details of them haven't been discussed according to our investigation.</div><div>(Angel) I will mainly introduce the model architecture of Transformer, which is said to be the first sequence transduction model based entirely on attention. What is the multi-head self-attention mechanism? Why is Transformer more parallelizable and need less training time? Hope you can find the answers.</div><div>(Jessie) I will introduce a new language representation model called BERT which is designed to pre-train deep bidirectional representations by jointly conditioning both left and right context in all layers. BERT outperforms previous state-of-the-art models on 11 NLP tasks.</div></div><div>See you then!</div><div><br></div><div>Time: 17:00, April, 17</div><div>Venue: SEIEE 3-414</div><div>Best, </div><div><a name="_olk_signature" style="color: rgb(6, 73, 119);"><span lang="EN-US">Jia Qi & Luo Zhiyi</span></a></div></div><br><br><span title="neteasefooter"><p> </p></span>