Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

Future Blog Post

less than 1 minute read

Published: January 01, 2199

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published: August 14, 2014

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published: August 14, 2012

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

Unsupervised word-level prosody tagging for controllable speech synthesis

Published in IEEE ICASSP, 2022

This paper aims at enhancing word-level prosody controllability in TTS models by decision tree-based clustering.

Recommended citation: Yiwei Guo, Chenpeng Du, Kai Yu. (2022). "Unsupervised word-level prosody tagging for controllable speech synthesis." In Proc. IEEE ICASSP, 2022, pp.7597-7601.

VQTTS: High-fidelity text-to-speech synthesis with self-supervised VQ acoustic feature

Published in ISCA Interspeech, 2022

This paper is the first to successfully integrate discrete SSL features in TTS that produces a competitive high-fidelity TTS system.

Recommended citation: Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu. (2022). "VQTTS: High-fidelity text-to-speech synthesis with self-supervised VQ acoustic feature." In Proc. ISCA Interspeech, 2022, pp.1596-1600.

EmoDiff: Intensity controllable emotional text-to-speech with soft-label guidance

Published in IEEE ICASSP, 2023

This paper is about designing a emotion intensity-controllable TTS model by a new soft-label guidance algorithm in the diffusion paradigm.

Recommended citation: Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu. (2023). "EmoDiff: Intensity controllable emotional text-to-speech with soft-label guidance." In Proc. IEEE ICASSP, 2023.

Speaker adaptive text-to-speech with timbre-normalized vector-quantized feature

Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023

This paper proposes TN-VQTTS that leverages timbre-normalized vector-quantized acoustic feature for TTS speaker adaptation with little data.

Recommended citation: Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu. (2023). "Speaker adaptive text-to-speech with timbre-normalized vector-quantized feature." IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, vol. 31, pp. 3446-3456.

UniCATS: A unified context-aware text-to-speech framework with contextual vq-diffusion and vocoding

Published in AAAI, 2024

This paper proposes a context-aware TTS system with strong zero-shot TTS and speech editing abilities, by a contextual token vocoder CTX-vec2wav and discrete diffusion-based CTX-txt2vec.

Recommended citation: Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, Kai Yu. (2024). "UniCATS: A unified context-aware text-to-speech framework with contextual vq-diffusion and vocoding." Proc. AAAI, 2024, vol. 38, No. 16, pp. 17924-17932.

VoiceFlow: Efficient text-to-speech with rectified flow matching

Published in IEEE ICASSP, 2024

This paper applies the rectified flow matching algorithm to improve the efficiency of TTS system in the differential equation family (e.g. diffusion and flow matching).

Recommended citation: Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu. (2024). "VoiceFlow: Efficient text-to-speech with rectified flow matching." In Proc. IEEE ICASSP, 2024, pp. 11121-11125.

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

Published in ISCA Interspeech, 2025

This paper proposes LSCodec, a low-bitrate (50Hz/0.45kbps and 25Hz/0.25kbps, single codebook), speaker-decoupled discrete speech codec.

Recommended citation: Yiwei Guo, Zhihan Li, Chenpeng Du, Hankun Wang, Xie Chen, Kai Yu (2025) LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec. Proc. Interspeech 2025, 5018-5022, doi: 10.21437/Interspeech.2025-1106

Recent Advances in Discrete Speech Tokens: A Review

Published in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

This review provides a comprehensive in-depth summary and analysis of recent discrete speech tokenization methods.

Recommended citation: Yiwei Guo, Zhihan Li, Hankun Wang, Bohan Li, Chongtian Shao, Hanglei Zhang, Chenpeng Du, Xie Chen, Shujie Liu, Kai Yu. (2025). "Recent Advances in Discrete Speech Tokens: A Review." IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025.

AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions

Published in AAAI, 2026

This paper proposes to simply mask some attention heads in an LALM (large audio language model) to achieve reliable task specification. This is because selectively masking some attention heads in an LALM can trigger its specific task functionalities well.

Recommended citation: Yiwei Guo, Bohan Li, Hankun Wang, Zhihan Li, Shuai Wang, Xie Chen, Kai Yu (2026). "AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions." In Proc. AAAI, 2026.