[Adapt] Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation

Wed Dec 14 13:08:14 CST 2022

Hi Adapters, 

There is a growing interest in developing goaloriented dialog systems which serve users in accomplishing complex tasks through multiturn conversations. Although many methods are devised to evaluate and improve the performance of individual dialog components, there is a lack of comprehensive empirical study on how different components contribute to the overall performance of a dialog system. In the paper "Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation", the authors perform a system-wise evaluation and present an empirical analysis on different types of dialog systems which are composed of different modules in different settings.

Their results show that (1) a pipeline dialog system trained using fine-grained supervision signals at different component levels often obtains better performance than the systems that use joint or end-to-end models trained on coarse-grained labels, (2) component-wise, single-turn evaluation results are not always consistent with the overall performance of a dialog system, and (3) despite the discrepancy between simulators and human users, simulated evaluation is still a valid alternative to the costly human evaluation especially in the early stage of development.

I hope my talk makes you feel interesting and helpful.

Time: Wed 4:00pm
Tencent: 118-364-753
Best wishes.
Zitong