Talk in the University of Cambridge: Reducing Speaker and Temporal Redundancy in Discrete Speech Tokenization

Date: October 20, 2025

drawing

Discrete speech tokens have emerged as a fundamental representation for various downstream speech processing tasks, particularly in speech generation. However, most existing tokens encode dense, fixed-rate acoustic information, which introduces substantial redundancy and limits their efficiency. In this talk, I will first provide a brief review on the taxonomy of current discrete speech tokens, then present our works exploring the reduction of this information redundancy in two critical directions:

(1) Speaker timbre disentanglement, introducing a low-bitrate, single-codebook and speaker-decoupled codec for speech.

(2) Variable-rate temporal compression, exploring methods to dynamically adjust the frame rate of discrete tokens for better compactness and bitrate-performance tradeoff.

Together, these efforts highlight pathways toward more efficient and controllable discrete speech representations, paving the way for the next generation of speech technologies.

Share on

Twitter Facebook LinkedIn

Yiwei Guo

Share on