Gene-Ping Yang

Gene-Ping Yang

Research Scientist at Meta.

I am currently a Research Scientist at Meta, working on speech post-training for controllable and expressive TTS.

I completed my Ph.D. in Informatics at the University of Edinburgh, where I spent a wonderful time at the Centre for Speech Technology Research (CSTR) with Prof. Hao Tang and Prof. Peter Bell. My research focuses on self-supervised pre-training, speech tokenization, and speech-text alignment to uncover the underlying patterns and geometry of speech representations.

I received M.S. in Computer Science and B.S. in Electrical Engineering from National Taiwan University, where I built my speech foundation and have the pleasure of working with Prof. Lin-shan Lee and Prof. Hung-yi Lee on speech separation and enhancement.

Representation Learning Adaptive pre-training methods for learning segment-based speech units beyond fixed-frame representations.
Neural Speech Tokenization Joint segmentation and discretization methods that turn continuous audio into high-fidelity tokens for LLM integration.
LLM Post-Training SFT and GRPO methods for controllable TTS, natural prosody, conversational flow, and non-verbal speech.
Automatic Speech Recognition Robust ASR systems that optimize implicit speech-text alignment.

Selected Publications

Speech Representation Learning & Tokenization

  • A Simple HMM with Self-Supervised Representations for Phone Segmentation
    Gene-Ping Yang, Hao Tang. SLT 2024.
    [bib] [abstract]
  • Towards Matching Phones and Speech Representations
    Gene-Ping Yang, and Hao Tang. ASRU 2023.
    [bib] [abstract]
  • Autoregressive Predictive Coding: A Comprehensive Study
    Gene-Ping Yang, Sung-Lin Yeh, Yu-An Chung, James Glass and Hao Tang. JSTSP 2022.
    [bib] [abstract]
  • On-Device Constrained Self-Supervised Learning for Keyword Spotting via Quantization Aware Pre-Training and Fine-tuning
    Gene-Ping Yang, Yue Gu, Sashank Macha, Qingming Tang, Yuzong Liu. ICASSP 2024.
    [bib] [abstract]
  • On-device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
    Gene-Ping Yang, Yue Gu, Qingming Tang, Dongsu Du, Yuzong Liu. Interspeech 2023.
    [bib] [abstract]

Automatic Speech Recognition

  • Beyond Words: Towards Effective Modeling of Non-Verbal Vocalizations in Automatic Speech Recognition
    Gene-Ping Yang, Haibin Wu, Peng Su, Ruizhe Huang, Suwon Shon, ..., Yuzong Liu. Under Review.
  • Supervised Attention In Sequence-to-Sequence Models for Speech Recognition
    Gene-Ping Yang and Hao Tang. ICASSP 2022.
    [bib] [abstract]

Speech Separation & Enhancement

  • Distributed Asynchronous Device Speech Enhancement via Windowed Cross-Attention
    Gene-Ping Yang, Sebastian Braun. WASPAA 2025.
    [bib] [abstract]
  • Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training
    Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-yi Lee. Interspeech 2021.
    [bib] [abstract]
  • Interrupted and Cascaded Permutation Invariant Training for Speech Separation
    Gene-Ping Yang, Szu-Lin Wu, Yao-Wen Mao, Hung-yi Lee, Lin-shan Lee. ICASSP 2020.
    [bib] [abstract]
  • Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering
    Gene-Ping Yang, Chao-I Tuan, Hung-Yi Lee, Lin-shan Lee. In Interspeech, 2019 Oral.
    [bib] [abstract]

Text-to-Speech

  • T-Mimi: A Transformer-Based Mimi Decoder for Real-Time On-Phone TTS
    Haibin Wu, Bach Viet Do, Naveen Suda, Julian Chan, Madhavan C R, Gene-Ping Yang, Yi-Chiao Wu, Naoyuki Kanda, Yossef Adi, Xin Lei, Yue Liu, Florian Metze, Yuzong Liu. ICASSP 2026.