[Adapt] [Seminar] Is Human Evaluation Really That Reliable?

Tue Oct 12 20:54:18 CST 2021

Hi Adapters,

For most of the NLG tasks, human evaluations are considered as the gold standard. But as the models become more and more  capable of generating fluent utterances, how well can human judges detect the and judge machine-generated text? 

In this talk, I'll introduce the paper named All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text, which is one of the outstanding papers accepted by ACL2021. Following the intention of the author, we are going to discuss the following two questions: 

1. How well can untrained evaluators identify machine-generated text?
2. Can we train evaluators to better identify machine-generated text?

After, we will think about what can we do to improve human judgement currently.

Hope you enjoy it!

Time: October 13th at 4pm
Venue: SEIEE 3-414

Best,
Ruolan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.sjtu.edu.cn/pipermail/adapt/attachments/20211012/0d7eab82/attachment.htm>