Audio examples
- Input accented speech: The original test audios from L2-Arctic corpus
- Accent Conversion and Improving Pronunciation: The videos output converted by the non-streaming model, the streaming model and the synthetic ground-truth generated by native TTS
Input with Arabic accent | Output audio for Accent conversion and Improving Pronunciation | |||||
Non-Streaming model | Our streaming model | Synthetic Ground-Truth | ||||
Input with Chinese accent | Output audio for Accent conversion and Improving Pronunciation | |||||
Non-Streaming model | Our streaming model | Synthetic Ground-Truth | ||||
Input with Vietnamse accent | Output audio for Accent conversion and Improving Pronunciation | |||||
Non-Streaming model | Our streaming model | Synthetic Ground-Truth | ||||
Input with Indian accent | Output audio for Accent conversion and Improving Pronunciation | |||||
Non-Streaming model | Our streaming model | Synthetic Ground-Truth | ||||
Input with Korean accent | Output audio for Accent conversion and Improving Pronunciation | |||||
Non-Streaming model | Our streaming model | Synthetic Ground-Truth | ||||
Video examples
- Input accented speech: The original videos from youtube (more noisy environment)
- Accent Conversion and Improving Pronunciation: The videos converted by the non-streaming model and our streaming model
Input Video with Indian accent | Output Video for Accent conversion and Improving Pronunciation | |||
Non-Streaming model | Our streaming model | |||
Input Video with Chinese accent | Output Video for Accent conversion and Improving Pronunciation | |||
Non-Streaming model | Our streaming model | |||
Input Video with Vietnamse accent | Output Video for Accent conversion and Improving Pronunciation | |||
Non-Streaming model | Our streaming model | |||