[Adapt] Seminar

Tue May 7 22:03:02 CST 2019

Hi Adapters,

I will introduce an efficient language model called transformer-XL this week. 

To correctly understand an article, sometimes one will need to refer to a word or a sentence that occurs a few thousand words back. This is an example of long-range dependence — a common phenomenon found in sequential data — that must be understood in order to handle many real-world tasks. While people do this naturally, modeling long-term dependency with neural networks remains a challenge. Gating-based RNNs and the gradient clipping technique improve the ability of modeling long-term dependency, but are still not sufficient to fully address this issue.

One way to approach this challenge is to use Transformers, which allows direct connections between data units, offering the promise of better capturing long-term dependency. However, in language modeling, Transformers are currently implemented with a fixed-length context, i.e. a long text sequence is truncated into fixed-length segments of a few hundred characters, and each segment is processed separately.   TRANSFORMER-XL: ATTENTIVE LANGUAGE MODELS BEYOND A FIXED-LENGTH CONTEXT  use segment-level recurrent method to avoid the fixed-length problem and model long-term dependency.

Time: 17:00 May 8
Venue: SEIEE 3-414

Best,
Shanshan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.sjtu.edu.cn/pipermail/adapt/attachments/20190507/6418924c/attachment.html>