[Adapt] [ADAPT][SEMINAR]Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers

黄洁仪 xsiling at sjtu.edu.cn
Wed Nov 17 09:34:19 CST 2021


Hi Adapters,

V&L transformer-based models have achieved high performance in multimodal tasks today. However, it's hard to tell how do these models combine vision and language.

In this seminar, I'll talk about a paper, Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers, which is accepted by EMNLP 2021. I'll introduce the modal ablation method they proposed to help understanding how multimodal models integrate vision and language information, then show you the results on some BERT-extend models.

Hope you enjoy it!✋

Time: Wed 4:00pm
Venue: SEIEE 3-526A

Best Wishes.
Jieyi


More information about the Adapt mailing list