ci_logo

N-Singer2: Improving the naturalness of synthesized singing voice

in the Korean pop ballad genre


Gyeong-Hoon Lee, Tae-Woo Kim, Minsu Kang

{ghlee3401, ktw0114, mskang}@ncsoft.com


Abstract

 Singing voices in the pop ballad genre especially tend to have more dynamic and complicated components
related to vocal techniques of a singer, such as vibrato, breathy sound, and fundamental frequency (F0) fluctuations,
than the voices suited to the children’s song genre. It is important to understand the characteristics of a singer
to generate a natural singing voice. However, it is difficult to model a singing voice for pop ballads
because both linguistic content and singer identity are entangled with prosody information.
In this paper, we propose modeling natural singing voices by disentangling singer-independent
and singer-dependent features using a multi-singer system. In the variation predictor of our system,
for each input note pitch, we use a difference-based F0 to model the vibrato and F0 fluctuations
only related to the characteristics of the singer. Our system also predict the voiced range of a singing voice to prevent
predicting pitch in the unvoiced range arising from the wrong duration of musical notes. In addition, we apply a data
augmentation method to the training dataset for generating a singing voice with a more stable prosody.
Experimental results show that our system can generate a more natural singing voice than the conventional system.

Structure


Contents
  1. Audio Samples (Korean ballad song genre)
  2. Audio Samples (Korean children’s song genre)
Demo page of N-Singer2


1. Audio Samples in the ballad genre (Korean)


Sentence: 우우우우후 나에게
(Pronunciation): woo woo woo woo hoo na ege
N-Singer (Baseline) N-Singer2 (Ours) Ground Truth
Sentence: 사랑이라는 말 그 끝에 그대
(Pronunciation): sa lang ilaneun mal geu kkeut-e geudae
N-Singer (Baseline) N-Singer2 (Ours) Ground Truth
Sentence: 감추기만했던 내 마음 말할게요
(Pronunciation): gam chu gi man hae ddeon nae ma-eum mal hal geyo
N-Singer (Baseline) N-Singer2 (Ours) Ground Truth
Sentence: 내가 어떻게 헤아릴 수가 있을까요
(Pronunciation): nae ga eo tteo ke he a lil suga isseulkkayo
N-Singer (Baseline) N-Singer2 (Ours) Ground Truth
Sentence: 누군가의 한숨 그 무거운 숨을
(Pronunciation): nu gun ga ui han sum geu mugeoun sum eul
N-Singer (Baseline) N-Singer2 (Ours) Ground Truth


2. Audio Samples in the children’s song genre (Korean)


Sentence: 얼마나 더 울어야 제대로 사랑할까요
(Pronunciation): eolmana deo ul eoya je dae lo sa lang hal kka yo
N-Singer (Baseline) N-Singer2 (Ours) Ground Truth
Sentence: 어디도 가지 못하게 깊숙히 숨겨둘까요
(Pronunciation): eo di do gaji mo ta ge gip sug ki sum gyeo dul kkayo
N-Singer (Baseline) N-Singer2 (Ours) Ground Truth
Sentence: 달님도 쉬었다 가는 길 산노루가 넘나드는 길
(Pronunciation): dal nim do swi eo dda ganeun gil san no luga neom na deu neun gil
N-Singer (Baseline) N-Singer2 (Ours) Ground Truth
Sentence: 새로운 세상이 자꾸자꾸 보인다
(Pronunciation): sae loun se sang-i jakku jakku bo inda
N-Singer (Baseline) N-Singer2 (Ours) Ground Truth
Sentence: 오색빛이 찬란한 거리거리의 성탄빛
(Pronunciation): o saeg bichi chan lan han geo li geoli eui seong tan bit
N-Singer (Baseline) N-Singer2 (Ours) Ground Truth