[Adapt] [Seminar] TTS frontend：NLP as backbone of Audio synthesis

Wed Dec 11 09:43:16 CST 2019

Hi Adapters,

I'm Zhiling, a senior undergraduate student, and I'll give my first presentation today. I just ended an internship at the TTS(Text-To-Speech) group of Bytedance AI lab. And I'd like to share some knowledge about TTS that I've learned there.

As all of you, my research interest is NLP. However, I have to learn TTS from scratch after being surprisingly picked by the TTS group for my intern application. But after I broke my limited NLP-only view,  I found many relations and similarities between TTS and NLP. TTS frontend is a typical example, where NLP acts as its backbone.

My topic is "TTS frontend：NLP as backbone of Audio synthesis". 
First, I'll introduce some basics of TTS system from an NLPer's perspective. 
Then, I'll dig deep into TTS frontend by introducing some recent works by the Bytedance TTS group.

Some ideas may be uncommon, but useful for the NLP community. Hopefully you'll draw some inspirations from my presentation.

Related papers:
https://arxiv.org/abs/1911.04111 | A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis
https://arxiv.org/abs/1911.04128 | A hybrid text normalization system using multi-head self-attention for mandarin

Time: Wed 4:30pm

Venue: SEIEE 3-414

Best regards,

Zhiling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.sjtu.edu.cn/pipermail/adapt/attachments/20191211/9e844616/attachment.html>