Gene-Ping Yang
Research Scientist at Meta.
I am currently a Research Scientist at Meta, working on speech post-training for controllable and expressive TTS.
I completed my Ph.D. in Informatics at the University of Edinburgh, where I spent a wonderful time at the Centre for Speech Technology Research (CSTR) with Prof. Hao Tang and Prof. Peter Bell. My research focuses on self-supervised pre-training, speech tokenization, and speech-text alignment to uncover the underlying patterns and geometry of speech representations.
I received M.S. in Computer Science and B.S. in Electrical Engineering from National Taiwan University, where I built my speech foundation and have the pleasure of working with Prof. Lin-shan Lee and Prof. Hung-yi Lee on speech separation and enhancement.
Selected Publications
Speech Representation Learning & Tokenization
-
A Simple HMM with Self-Supervised Representations for Phone Segmentation
Gene-Ping Yang, Hao Tang. SLT 2024.
[bib] [abstract] -
Towards Matching Phones and Speech Representations
Gene-Ping Yang, and Hao Tang. ASRU 2023.
[bib] [abstract] -
Autoregressive Predictive Coding: A Comprehensive Study
Gene-Ping Yang, Sung-Lin Yeh, Yu-An Chung, James Glass and Hao Tang. JSTSP 2022.
[bib] [abstract] -
On-Device Constrained Self-Supervised Learning for Keyword Spotting via Quantization Aware Pre-Training and Fine-tuning
Gene-Ping Yang, Yue Gu, Sashank Macha, Qingming Tang, Yuzong Liu. ICASSP 2024.
[bib] [abstract] -
On-device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
Gene-Ping Yang, Yue Gu, Qingming Tang, Dongsu Du, Yuzong Liu. Interspeech 2023.
[bib] [abstract]
Automatic Speech Recognition
-
Beyond Words: Towards Effective Modeling of Non-Verbal Vocalizations in Automatic Speech Recognition
Gene-Ping Yang, Haibin Wu, Peng Su, Ruizhe Huang, Suwon Shon, ..., Yuzong Liu. Under Review. -
Supervised Attention In Sequence-to-Sequence Models for Speech Recognition
Gene-Ping Yang and Hao Tang. ICASSP 2022.
[bib] [abstract]
Speech Separation & Enhancement
-
Distributed Asynchronous Device Speech Enhancement via Windowed Cross-Attention
Gene-Ping Yang, Sebastian Braun. WASPAA 2025.
[bib] [abstract] -
Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training
Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-yi Lee. Interspeech 2021.
[bib] [abstract] -
Interrupted and Cascaded Permutation Invariant Training for Speech Separation
Gene-Ping Yang, Szu-Lin Wu, Yao-Wen Mao, Hung-yi Lee, Lin-shan Lee. ICASSP 2020.
[bib] [abstract] -
Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering
Gene-Ping Yang, Chao-I Tuan, Hung-Yi Lee, Lin-shan Lee. In Interspeech, 2019 Oral.
[bib] [abstract]
Text-to-Speech
-
T-Mimi: A Transformer-Based Mimi Decoder for Real-Time On-Phone TTS
Haibin Wu, Bach Viet Do, Naveen Suda, Julian Chan, Madhavan C R, Gene-Ping Yang, Yi-Chiao Wu, Naoyuki Kanda, Yossef Adi, Xin Lei, Yue Liu, Florian Metze, Yuzong Liu. ICASSP 2026.