Síntese de voz aplicada ao português brasileiro usando aprendizado profundo

Deep Artificial Neural Networks have been used to solve a wide range of problems. In particular, such methodology allowed to substantially increase the state of the art in the area of speech synthesis. In this work we explored the state of the art of speech synthesis for Brazilian Portuguese, for th...

ver descrição completa

Autor principal: Casanova, Edresson
Formato: Trabalho de Conclusão de Curso (Graduação)
Idioma: Português
Publicado em: Universidade Tecnológica Federal do Paraná 2020
Assuntos:
Acesso em linha: http://repositorio.utfpr.edu.br/jspui/handle/1/12513
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
Resumo: Deep Artificial Neural Networks have been used to solve a wide range of problems. In particular, such methodology allowed to substantially increase the state of the art in the area of speech synthesis. In this work we explored the state of the art of speech synthesis for Brazilian Portuguese, for this it was necessary to create an audio base containing approximately 10 hours of a single speaker in the language. Deep neural networks are formed by a number of nodes, or units, connected by links, these nodes represent artificial neurons and are arranged in layers connected by sets of weights. A number of models for voice synthesis were investigated, such as DCTTS, Tacotron and Mozilla TTS. Some experiments were proposed to explore the main models of speech synthesis and vocoders in the literature. The results showed that the Mozilla TTS model sounds more natural and performs better than the other explored models, however, the audio quality synthesized by the DCTTS model is very close. In addition, the use of transfer learning from the English to Portuguese was explored, which demonstrate advantages in the application of such technique.