Contributions to the study of the protein folding problem using deep learning and molecular dynamics
The Protein Folding Problem (PFP) is one of the main challenges in the Computational Biology area. Globular proteins are believed to evolve from random initial conformations through folding pathways achieving, in almost all cases, to a functional native structure. Studies of the folding process are...
Autor principal: | Hattori, Leandro Takeshi |
---|---|
Formato: | Tese |
Idioma: | Inglês |
Publicado em: |
Universidade Tecnológica Federal do Paraná
2021
|
Assuntos: | |
Acesso em linha: |
http://repositorio.utfpr.edu.br/jspui/handle/1/24963 |
Tags: |
Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
|
id |
riut-1-24963 |
---|---|
recordtype |
dspace |
spelling |
riut-1-249632021-05-17T06:11:27Z Contributions to the study of the protein folding problem using deep learning and molecular dynamics Contribuições para o estudo do problema de dobramento de proteínas usando métodos de aprendizado profundo e dinâmica molecular Hattori, Leandro Takeshi Lopes, Heitor Silverio https://orcid.org/0000-0003-3984-1432 http://lattes.cnpq.br/4045818083957064 Benitez, Cesar Manuel Vargas https://orcid.org/0000-0002-5691-5432 http://lattes.cnpq.br/3930929146154435 Britto Junior, Alceu de Souza https://orcid.org/0000-0002-3064-3563 http://lattes.cnpq.br/4251936710939364 Lopes, Fabricio Martins http://orcid.org/0000-0002-8786-3313 http://lattes.cnpq.br/1660070580824436 Lopes, Heitor Silverio https://orcid.org/0000-0003-3984-1432 http://lattes.cnpq.br/4045818083957064 Frigori, Rafael Bertolini https://orcid.org/0000-0002-4861-7240 http://lattes.cnpq.br/5836878566801544 Parpinelli, Rafael Stubs https://orcid.org/0000-0001-7326-5032 http://lattes.cnpq.br/4456007001373501 Proteínas Dinâmica molecular Biologia computacional Computação de alto desempenho Biologia Molecular Computacional Proteômica - Processamento de dados Simulação (Computadores) Proteins Molecular dynamics Computational biology High performance computing Computational molecular biology Proteomics - Data processing Computer simulation CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Engenharia Elétrica The Protein Folding Problem (PFP) is one of the main challenges in the Computational Biology area. Globular proteins are believed to evolve from random initial conformations through folding pathways achieving, in almost all cases, to a functional native structure. Studies of the folding process are related to several abnormal events, such as misfolding and protein aggregation. Therefore, several computational approaches have been proposed in the literature for this problem. Deep Learning (DL) methods have been highlighted in studies in the Proteomics area, given their ability to extract features vectors and their efficiency after the training process. Recurrent Neural Networks (RNN) are cyclic DL methods that have achieved state-of-the-art performance for sequential and temporal problems. Therefore, this thesis presents contributions to studying the spatial-temporal pathways of the protein folding using RNN methods. To achieve these contributions, experiments of this thesis were organized in three steps: develop a framework to generate a massive amount of protein folding data using pure sequential and parallel Molecular Dynamics (MD) methods in the canonical ensemble; propose a Neighbourhood List (NL) approach to the parallel MD method; apply RNNs networks to the PFP. In the first step, we presented a package called PathMolD-AB to simulate and analyze folding data trajectories using the 3D-AB off-lattice model to represent the protein structure. The datasets generated from PathMolD-AB correspond to the MD evolution of 3,500 folding pathways, encompassing 35×106 states. The speedup analysis showed that the parallel approach obtained faster simulations when used protein sequences with more than 99 amino acids were used. In the second step, the NL approach with parallel MD showed higher improvement in the speedup performance than the purely parallel MD version with protein sequences between 99 to 1,000 amino acids, which covers 80% of the entire Protein Data Bank (PDB). In the last step of this thesis, a comparative analysis between RNNs architectures were carried out using the many-to-one model with datasets generated by the PathMold-AB. Results indicate that the Long Short-Term Memory ( obtained the best performance than other RNNs architectures in terms of prediction error. The biological analysis indicated that the LSTM predicted structures with similar features to the target (MD), in terms of hydrophobic and polar compactness, and also torsion and bond energies, suggesting that this approach is auspicious for the PFP study. O Protein Folding Problem (PFP) é um dos principais desafios da área de Biologia Computacional. Acredita-se que as proteínas globulares evoluem de conformações iniciais aleatórias através de trajetórias de dobramento, alcançando, em quase todos os casos, uma estrutura nativa funcional. Estudos relacionados ao dobramento proteico estão relacionados a vários eventos anormais, como dobramento incorreto e agregação de proteínas. Portanto, várias abordagens computacionais têm sido propostas na literatura para este problema. Métodos de Deep Learning (DL) têm se destacado em estudos na área de Proteômica, dada a sua capacidade de extrair vetores de características e também pela sua eficiência após o processo de treinamento. Recurrent Neural Network (RNN) são métodos DL cíclicos que alcançaram desempenho do estado-da-arte para problemas sequenciais e temporais. Esta tese apresenta contribuições para o estudo das trajetórias espaço-temporais do enovelamento de proteínas utilizando métodos RNN. Para alcançar essas contribuições, os experimentos desta tese foram organizados em três etapas: desenvolver um framework para gerar grande quantidades de dados de dobramento de proteínas usando métodos sequenciais e paralelos de Molecular Dynamics (MD) no ensemble canônico; propor uma abordagem de Neighbourhood List (NL) para o método MD paralelo; aplicar redes RNNs ao PFP. Na primeira etapa, apresentamos um pacote chamado PathMolD-AB para simular e analisar trajetórias de dados de dobramento usando o modelo 3D-AB off-lattice para representar a estrutura da proteína. Os conjuntos de dados gerados a partir do PathMolD-AB correspondem à 3.500 trajetórias de dobras, abrangendo 35 × 106 estados de dobramento. A análise de speedup mostrou que a abordagem paralela obteve simulações mais rápidas quando se utilizaram sequências de proteínas com mais de 99 aminoácidos. Na segunda etapa, a abordagem NL com MD paralelo mostrou melhoria no desempenho de aceleração do que a versão MD puramente paralela com sequências de proteínas entre 99 a 1.000 aminoácidos, que abrange 80 % de todo o Protein Data Bank (PDB). Na última etapa desta tese, foi realizada uma análise comparativa entre as arquiteturas de RNNs utilizando o modelo many-to-one com conjuntos de dados gerados pelo PathMold-AB. Os resultados indicam que a Long Short-Term Memory (LSTM) obteve o melhor desempenho que as outras arquiteturas de RNNs em termos de erro de predição. A análise biológica indicou que a rede LSTM previu estruturas com características semelhantes ao alvo (MD), em termos de compactação hidrofóbica e polar, e também energias de torção e ligação, sugerindo que esta abordagem é auspiciosa para o estudo PFP. 2021-05-16T20:44:31Z 2021-05-16T20:44:31Z 2020-11-30 doctoralThesis HATTORI, Leandro Takeshi. Contributions to the study of the protein folding problem using deep learning and molecular dynamics. 2020. Tese (Doutorado em Engenharia Elétrica e Informática Industrial) - Universidade Tecnológica Federal do Paraná, Curitiba, 2020. http://repositorio.utfpr.edu.br/jspui/handle/1/24963 eng openAccess http://creativecommons.org/licenses/by/4.0/ application/pdf Universidade Tecnológica Federal do Paraná Curitiba Brasil Programa de Pós-Graduação em Engenharia Elétrica e Informática Industrial UTFPR |
institution |
Universidade Tecnológica Federal do Paraná |
collection |
RIUT |
language |
Inglês |
topic |
Proteínas Dinâmica molecular Biologia computacional Computação de alto desempenho Biologia Molecular Computacional Proteômica - Processamento de dados Simulação (Computadores) Proteins Molecular dynamics Computational biology High performance computing Computational molecular biology Proteomics - Data processing Computer simulation CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Engenharia Elétrica |
spellingShingle |
Proteínas Dinâmica molecular Biologia computacional Computação de alto desempenho Biologia Molecular Computacional Proteômica - Processamento de dados Simulação (Computadores) Proteins Molecular dynamics Computational biology High performance computing Computational molecular biology Proteomics - Data processing Computer simulation CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Engenharia Elétrica Hattori, Leandro Takeshi Contributions to the study of the protein folding problem using deep learning and molecular dynamics |
description |
The Protein Folding Problem (PFP) is one of the main challenges in the Computational Biology area. Globular proteins are believed to evolve from random initial conformations through folding pathways achieving, in almost all cases, to a functional native structure. Studies of the folding process are related to several abnormal events, such as misfolding and protein aggregation. Therefore, several computational approaches have been proposed in the literature for this problem. Deep Learning (DL) methods have been highlighted in studies in the Proteomics area, given their ability to extract features vectors and their efficiency after the training process. Recurrent Neural Networks (RNN) are cyclic DL methods that have achieved state-of-the-art performance for sequential and temporal problems. Therefore, this thesis presents contributions to studying the spatial-temporal pathways of the protein folding using RNN methods. To achieve these contributions, experiments of this thesis were organized in three steps: develop a framework to generate a massive amount of protein folding data using pure sequential and parallel Molecular Dynamics (MD) methods in the canonical ensemble; propose a Neighbourhood List (NL) approach to the parallel MD method; apply RNNs networks to the PFP. In the first step, we presented a package called PathMolD-AB to simulate and analyze folding data trajectories using the 3D-AB off-lattice model to represent the protein structure. The datasets generated from PathMolD-AB correspond to the MD evolution of 3,500 folding pathways, encompassing 35×106 states. The speedup analysis showed that the parallel approach obtained faster simulations when used protein sequences with more than 99 amino acids were used. In the second step, the NL approach with parallel MD showed higher improvement in the speedup performance than the purely parallel MD version with protein sequences between 99 to 1,000 amino acids, which covers 80% of the entire Protein Data Bank (PDB). In the last step of this thesis, a comparative analysis between RNNs architectures were carried out using the many-to-one model with datasets generated by the PathMold-AB. Results indicate that the Long Short-Term Memory ( obtained the best performance than other RNNs architectures in terms of prediction error. The biological analysis indicated that the LSTM predicted structures with similar features to the target (MD), in terms of hydrophobic and polar compactness, and also torsion and bond energies, suggesting that this approach is auspicious for the PFP study. |
format |
Tese |
author |
Hattori, Leandro Takeshi |
author_sort |
Hattori, Leandro Takeshi |
title |
Contributions to the study of the protein folding problem using deep learning and molecular dynamics |
title_short |
Contributions to the study of the protein folding problem using deep learning and molecular dynamics |
title_full |
Contributions to the study of the protein folding problem using deep learning and molecular dynamics |
title_fullStr |
Contributions to the study of the protein folding problem using deep learning and molecular dynamics |
title_full_unstemmed |
Contributions to the study of the protein folding problem using deep learning and molecular dynamics |
title_sort |
contributions to the study of the protein folding problem using deep learning and molecular dynamics |
publisher |
Universidade Tecnológica Federal do Paraná |
publishDate |
2021 |
citation |
HATTORI, Leandro Takeshi. Contributions to the study of the protein folding problem using deep learning and molecular dynamics. 2020. Tese (Doutorado em Engenharia Elétrica e Informática Industrial) - Universidade Tecnológica Federal do Paraná, Curitiba, 2020. |
url |
http://repositorio.utfpr.edu.br/jspui/handle/1/24963 |
_version_ |
1805302213183012864 |
score |
10,814766 |