Navegar por los elementos (1 total)

  • Resumen es exacto "De-novo transcriptome assembly of species without a reference genome is a common problem among researchers working in functional genomics. Although programs and routines for the assembly of transcriptomes are continuously published, the quality of de novo assemblies using short readings is limited by different error types. One issue to overcome is the research gap about which methodologies apply in each experiment to obtain a highquality de-novo assembly. Also, it is unclear how accurate the quality metrics are to evaluate assemblies individually or comparatively if different tools and settings were used to obtain multiple de novo transcriptome assemblies. In this context, this work addresses the analysis of the performance of the bioinformatics tools and quality metrics commonly applied to denovo transcriptome assembly using short reads (RNA-Seq). During the development of this work, transcriptomic sequencing datasets were simulated from high-quality real data. Datasets exhibiting different degrees of complexity were used to test assembly programs and different improvement strategies from the primary results. To compare and classify the assemblies obtained, we used a group of dependent and independent of reference metrics. These metrics were analyzed individually and collectively through multivariate analysis. From the obtained results, the level of alternative splicing and the fragment size of the pairend reads (PE) were identified as the variables with the highest incidence on the quality of the assemblies. Analyzing the assemblies obtained from different values of the variables read size (SE) and fragment size (PE), sampling problems associated with the distribution of transcripts sizes, exon numbers, and the splicing levels were detected. Different clustering strategies were implemented, which did not produce improvements in the final results, increasing the levels of error and redundancy. We also worked on the characterization and modeling of the different types of errors produced in the assemblies. Then, we trained different classifiers to predict the probability that a contig is correctly assembled, which can be use for erroneous contigs filtration. The results obtained highlight the importance of obtaining assemblies with a greater number of represented genes, instead of trying to solve all splicing isoforms by implementing clustering strategies that increase error rates.


    "

Título: Desarrollo de estratégias bioinformáticas para el análisis genómico funcional de datos provenientes de secuenciación masiva

Formatos de Salida

atom, csv, dc-rdf, dcmes-xml, json, omeka-xml, rss2