RNAplonc: um classificador para identificação de longos RNAs não codificantes em plantas

Long non-coding RNAs (lncRNAs) correspond to a non-coding RNA class that has gained emerging attention in the last years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast w...

ver descrição completa

Autor principal: Negri, Tatianne da Costa
Formato: Dissertação
Idioma: Português
Publicado em: Universidade Tecnológica Federal do Paraná 2018
Assuntos:
Acesso em linha: http://repositorio.utfpr.edu.br/jspui/handle/1/3415
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
Resumo: Long non-coding RNAs (lncRNAs) correspond to a non-coding RNA class that has gained emerging attention in the last years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast with the myriad of prediction tools available for mammalian lncRNAs. Given that the biological features and mechanisms generating lncRNAs in the cell are likely different between animals and plants, specific tools for plants is a need for these studies. With this in mind, we present here RNAplonc, a classifier approach for the identification of lncRNAs in plants from mRNA-based data. To build this tool, we used publicly available lncRNA and mRNA sequences from six plant genomes: Arabidopsis thaliana, Cucumis sativus, Glycine max, Oryza sativa, Populus trichocarpa and Setaria italica. This data was extracted from the public databases PLNlncRbase, GreeNC and Phytozome, from which we used 22.543 lncRNAs and 29.960 mRNAs as a training set. We selected 16 features that could best classify lncRNAs from 5.468 features with the REPTree algorithm for lncRNA. After an extensive comparison with tools used for lncRNA identification in plants (CPC) and animals (PLEK and lncRScan-SVM), we found that RNAplonc obtained a better accuracy (92%) in the training dataset when compared to the 77% of accuracy obtained with the CPC tool. We also found that RNAplonc produced more reliable lncRNA predictions from plant transcripts, as estimated for 17 datasets in 13 species from the CANTATAdb, GreeNC and PNRD databases. We also evaluated RNAplonc performance in two case studies that identified lncRNAs from Populus tomentosa and Gossypium, respectively. RNAplonc could correctly identify 98.5% of biologically validated lncRNAs in Populus and 99.1% in Gossypium. RNAplonc, its documentation and training datasets are available at the website: http://rnaplonc.cp.utfpr.edu.br/. We can conclude that RNAplonc retrieves correctly known plant lncRNAs. Moreover, RNAplonc can be a strategy for lncRNA discovery, providing a rich resource of candidate lncRNAs specifically for plants.