NCSOFT Speech AI

(2025) ULF-TTS: An Uncluttered Hybrid TTS System using Language and Flow Matching Models, Accepted by APSIPA ASC 2025

Demo page

(2025) When Humans Growl and Birds Speak: High-Fidelity Voice Conversion from Human to Animal and Designed Sounds, Accepted by Interspeech 2025

Demo page

(2024) MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech, Accepted by EMNLP 2024

Demo page

(2023) Synthe-Sees: Face based Text-to-Speech for Virtual Speaker, Accepted by ICASSP 2024

Demo page

(2022) Avocodo: Generative Adversarial Network for Artifact-free Vocoder, Accepted by AAAI 2023

Demo page

(2022) Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch, Accepted by Interspeech 2022

Demo page

(2022) Hierarchical and Multi-Scale Variational Autoencoder for Diverse And Natural Speech Synthesis, Accepted by Interspeech 2022

Demo page

(2022) Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis, Accepted by Interspeech 2022

Demo page

(2021) GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis, Accepted by Interspeech 2021

Demo page

(2021) FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis, Accepted by Interspeech 2021

Demo page

(2021) N-Singer: Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement, Accepted by Interspeech 2021

Demo page

(2021) Hierarchical Context-Aware Transformers for Non-AutoRegressive Text to Speech, Accepted by Interspeech 2021

Demo page

(2021) A NEURAL TEXT-TO-SPEECH MODEL UTILIZING BROADCAST DATA MIXED WITH BACKGROUND MUSIC, Accepted by ICASSP 2021

Demo page

(2020) Detecting Mismatch Between Text Script and Voice-Over Using Utterance Verification Based on Phoneme Recognition Ranking, pp. 8264??268, ICASSP 2020

Paper & Presentation

(2020) VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network, pp. 200-204, Interspeech 2020

Paper & Demo

(2020) Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning, pp. 4402-4406, Interspeech 2020

Paper & Demo

(2020) Effective Emotion Transplantation in an End-to-End Text-to-Speech System, IEEE Access, vol. 8, pp. 161713-161719, 2020.

Paper & Demo

(2020) WaveGlowGAN: the bipartite flow based vocoder with generative adversarial networks for high quality speech synthesis (Submitted)

demo page

(2020) Improving End-to-end Korean Voice Command Recognition using Domain-specific Text (Submitted)

demo page

(2020) Multi-task Learning using Morphological Information for End-to-end ASR (Submitted)

demo page

NCSOFT Speech AI

Publications