According to Google, Translatotron uses a sequence-to-sequence network model that takes a voice input, processes it as a spectrogram — a visual representation of frequencies — and generates a new spectrogram in a target language. The result is a much faster translation with less likelihood of something getting lost along the way. The tool also works with an optional speaker encoder component, which works to maintain a speaker’s voice. The translated speech is still synthesized and sounds a bit robotic, but can effectively maintain some elements of a speaker’s voice. You can listen to samples of Translatotron’s attempts to maintain a speaker’s voice as it completes translations on Google Research’s GitHub page. Some are certainly better than others, but it’s a start.

Model architecture of Translatotron

Google has been fine-tuning its translations in recent months. Last year, the company introduced accents in Google Translate that can speak a variety of languages in region-based pronunciations and added more langauges to its real-time translation feature. Earlier this year, Google Assistant got an “interpreter mode” for smart displays and speakers that can between 26 languages.

دیدگاه خود را بنویسید

نشانی ایمیل شما منتشر نخواهد شد. بخش‌های موردنیاز علامت‌گذاری شده‌اند *