A number of ‘Embodied AI’ tasks combining language, visual perception, and navigation in realistic 3D environments have recently gained prominence, including Interactive and Embodied Question Answering, Vision-and-Language Navigation or VLN, and challenges based on household tasks. Tomorrow I will present a paper I found interesting concerning the Localization from Embodied dialog.

Imagine the following scenario: you get lost in a new building while trying to visit a friend who lives or works there. Unsure of exactly where you are, you call your friend and start describing your surroundings ("‘I’m standing on a red carpet looking at a seating area.") and your friend asks you ("Is there a round table in the middle of the room?")... After a few rounds of dialog, your friend who is familiar with the building will hopefully know your location. The paper Where Are You? Localizaiton from Embodied Dialog (paper here) focuses on the scenario described above.

Hope you'll enjoy the talk.

