[Adapt] [Seminar]Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective

王宇飞 arthur-w at sjtu.edu.cn
Wed May 18 11:19:34 CST 2022


Hi, Adapters:

In recent years, joint text-image embeddings have significantly improved thanks to the development of transformer-based vision-language models. Despite the advances, the representations produced by those models is still unclear. In that light, it is interested in studying the multimodal capacity of VL representations, and in exploring what information is learned and forgotten between pre-training and fine-tuning, which could show the current limits of the pre-training process. This talk will present multimodal representation capacity by model probing for several VL architectures. Several conclusions are inspiring for future researchers. 
Hope you will enjoy it.   

Time: 2022/05/18 16:00-18:00

Venus: 
Tencent meeting :
https://meeting.tencent.com/p/8332583699
833-258-3699

Best regards,
Arthur


More information about the Adapt mailing list