[Adapt] [Seminar]Benchmark for General AI Assistants

Tue Nov 28 21:38:44 CST 2023

Hi,Adapters,

Large Language Models (LLMs) arguably open the way to general purpose systems. However, evaluating these systems is an open problem: given their emerging new capabilities, LLMs are regularly breaking AI benchmarks, at an ever-increasing rate.Furthermore, open-ended generation generally requires human or model-based evaluation. Human evaluation will become less and less feasible when increasing the task complexity.Model-based evaluations on the other hand are by construction dependent of stronger models hence cannot evaluate new state-of-the-art models. Overall, evaluating new AI systems requires to rethink benchmarks.

Hope you find this talk interesting.

Time: Wed 10 am. - 11:30 am.
Meeting link: https://teams.microsoft.com/l/meetup-join/19%3ameeting_M2VmMTU5MzgtODUzOC00NmU4LTg0MzktNGFjNDdiMmIwYTI1%40thread.v2/0?context=%7b%22Tid%22%3a%225cdc5b43-d7be-4caa-8173-729e3b0a62d9%22%2c%22Oid%22%3a%221a8b9fa0-af57-4a1c-9390-22d1c201d622%22%7d

Best wishes,
minghao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.sjtu.edu.cn/pipermail/adapt/attachments/20231128/dc0dce64/attachment.htm>