Publications

Below are some selected publications. You can find a full list of my articles on my Google Scholar profile.

Conference Papers


AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions

Published in AAAI, 2026

This paper proposes to simply mask some attention heads in an LALM (large audio language model) to achieve reliable task specification. This is because selectively masking some attention heads in an LALM can trigger its specific task functionalities well.

Recommended citation: Yiwei Guo, Bohan Li, Hankun Wang, Zhihan Li, Shuai Wang, Xie Chen, Kai Yu (2026). "AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions." In Proc. AAAI, 2026.

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

Published in ISCA Interspeech, 2025

This paper proposes LSCodec, a low-bitrate (50Hz/0.45kbps and 25Hz/0.25kbps, single codebook), speaker-decoupled discrete speech codec.

Recommended citation: Yiwei Guo, Zhihan Li, Chenpeng Du, Hankun Wang, Xie Chen, Kai Yu (2025) LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec. Proc. Interspeech 2025, 5018-5022, doi: 10.21437/Interspeech.2025-1106

UniCATS: A unified context-aware text-to-speech framework with contextual vq-diffusion and vocoding

Published in AAAI, 2024

This paper proposes a context-aware TTS system with strong zero-shot TTS and speech editing abilities, by a contextual token vocoder CTX-vec2wav and discrete diffusion-based CTX-txt2vec.

Recommended citation: Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, Kai Yu. (2024). "UniCATS: A unified context-aware text-to-speech framework with contextual vq-diffusion and vocoding." Proc. AAAI, 2024, vol. 38, No. 16, pp. 17924-17932.

Journal Articles


Speaker adaptive text-to-speech with timbre-normalized vector-quantized feature

Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023

This paper proposes TN-VQTTS that leverages timbre-normalized vector-quantized acoustic feature for TTS speaker adaptation with little data.

Recommended citation: Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu. (2023). "Speaker adaptive text-to-speech with timbre-normalized vector-quantized feature." IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, vol. 31, pp. 3446-3456.