Multi-task Learning using Morphological Information for End-to-end ASR

Authors

Hyeopwoo Lee, Minsoo Na, Hoon-Young Cho

Abstract

A morpheme is the smallest meaningful morphological unit of a language and a word is divided into morphological units using a Part-of-speech (POS) tagger. Conventional end-to-end (E2E) automatic speech recognition (ASR) uses speech and word pair information for training. In this paper, we propose a joint multi-task learning method using morphological information that can be predicted from the input text and the output of the POS tagger. The proposed method reduced 19.4% syllable error rate (SER) on Korean public corpus and 6.1% word error rate (WER) on English public corpus with Transformer-based structure relatively.