<meta http-equiv="Content-Type" content="text/html; charset=GB18030"><div style=""><font size="4">Hi Adapters,<br><br></font><p class="p1" style="margin: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; line-height: normal;"><font size="4">     Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance </font><span style="font-size: large;">of NLP models. </span></p><p class="p1" style="margin: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; line-height: normal;"><font size="4"><br></font></p><p class="p1" style="margin: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; line-height: normal;"><font size="4">     In this seminar, I want to present a a taskagnostic methodology, CheckList, for testing NLP models. They said NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it. I will introduce this method and my work around this area.</font></p><div data-mce-style="clear: both;" style="clear: both;"><font size="4"><br></font></div><font size="4">Related papers:</font></div><div style=""><p class="p1" style="margin: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; line-height: normal;"><font size="4"><a href="https://arxiv.org/pdf/2005.04118.pdf">Beyond Accuracy: Behavioral Testing of NLP Models with CheckList</a></font></p><p class="p1" style="font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; line-height: normal; margin: 0px;"></p></div><div style=""><font size="4"><br>Hope you can gain a fresh perspective after the talk.<br><br><br>Time: Wed 4:30pm<br><br>Venue: SEIEE 3-414<br><br>Best regards,<br><br>Shanshan</font></div>