ICASSP 2025 Tutorial: Speech Synthesis with Discrete Speech Tokens

Date:

drawing

This half-day tutorial is an upgraded version of a previous one with the same title in NCMMSC 2024 in Xinjiang, China. I participated as the lead for all experiments, and was the main presenter for Section 1 (Discrete Speech Tokens: Taxonomy) and 2 (Experimental Analysis of Discrete Tokens).

Abstract

In this tutorial, we will systematically introduce and taxonomize tokenization methods for speech. We will perform fair experimental comparisons between these tokens to show their distinct properties. Then, various types of discrete token-based speech synthesis systems, especially TTS models, will be elaborated in detail. Following this, we will focus on the combination of discrete speech tokens and large language models (LLMs) that pioneers cross-modality interactive voice agents with understanding and reasoning abilities. Finally, we will share insights on the existing problems and challenges.