Audio examples
- Input accented speech: The original test audios from L2-Arctic corpus
- Accent Conversion and Improving Pronunciation: The videos output converted by the non-streaming model, the streaming model and the synthetic ground-truth generated by native TTS
| Input with Arabic accent | Output audio for Accent conversion and Improving Pronunciation | |||||
| Non-Streaming model | Our streaming model | Synthetic Ground-Truth | ||||
| Input with Chinese accent | Output audio for Accent conversion and Improving Pronunciation | |||||
| Non-Streaming model | Our streaming model | Synthetic Ground-Truth | ||||
| Input with Vietnamse accent | Output audio for Accent conversion and Improving Pronunciation | |||||
| Non-Streaming model | Our streaming model | Synthetic Ground-Truth | ||||
| Input with Indian accent | Output audio for Accent conversion and Improving Pronunciation | |||||
| Non-Streaming model | Our streaming model | Synthetic Ground-Truth | ||||
| Input with Korean accent | Output audio for Accent conversion and Improving Pronunciation | |||||
| Non-Streaming model | Our streaming model | Synthetic Ground-Truth | ||||
Video examples
- Input accented speech: The original videos from youtube (more noisy environment)
- Accent Conversion and Improving Pronunciation: The videos converted by the non-streaming model and our streaming model
| Input Video with Indian accent | Output Video for Accent conversion and Improving Pronunciation | |||
| Non-Streaming model | Our streaming model | |||
| Input Video with Chinese accent | Output Video for Accent conversion and Improving Pronunciation | |||
| Non-Streaming model | Our streaming model | |||
| Input Video with Vietnamse accent | Output Video for Accent conversion and Improving Pronunciation | |||
| Non-Streaming model | Our streaming model | |||