[Adapt] [Seminar]An Empirical Study of Training End-to-End Vision-and-Language Transformers

王宇飞 arthur-w at sjtu.edu.cn
Wed Apr 13 11:47:26 CST 2022


Hi Adapters,
Vision-and-language pre-training(VLP) has proven to be highly effective on various VL downstream tasks. While recent work has shown that fully transformer-based VL models can be more efficient than previous region-feature-based models, their performance on downstream tasks often degrades significantly. And in this talk, I would like to introduce how to design and pre-train a fully transformer-based VL model in an end-to-end manner. 

Hope you will enjoy it. 
Arthur


More information about the Adapt mailing list