[Adapt] [ADAPT][SEMINAR]Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
黄洁仪
xsiling at sjtu.edu.cn
Wed Nov 17 09:34:19 CST 2021
Hi Adapters,
V&L transformer-based models have achieved high performance in multimodal tasks today. However, it's hard to tell how do these models combine vision and language.
In this seminar, I'll talk about a paper, Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers, which is accepted by EMNLP 2021. I'll introduce the modal ablation method they proposed to help understanding how multimodal models integrate vision and language information, then show you the results on some BERT-extend models.
Hope you enjoy it!✋
Time: Wed 4:00pm
Venue: SEIEE 3-526A
Best Wishes.
Jieyi
More information about the Adapt
mailing list