Year of Award:
Molecular & Cellular Analysis Technologies
WELSH, JOHN T
Other PI or Project Leader:
VACCINE RESEARCH INSTITUTE OF SAN DIEGO
Current methods for RNA-seq library preparation attempt to uniformly sample all sequences across every mRNA molecule, optimally with sufficient overlap to allow de novo reassembly of the mRNA sequences from which they derive, or alternatively, to allow inference of mRNA sequence by alignment with reference sequences. Genes that encode mRNAs in multiple isoforms present a challenge: given a complete set of short sequence reads that span every exon and splice junction, certain alternative underlying mRNA isoform models cannot be deconvoluted using data of this nature. This confounding situation occurs when more than one isoform model can explain the frequencies of exon and junction sequence reads, and it is mathematically unavoidable: ultimately, short sequence reads do not contain the information needed to unambiguously identify the correct isoform model for certain common splicing patterns. We propose to test a method to preserve the necessary information. In this method, the goal is to associate a single barcode with multiple sequence reads from the same mRNA molecule, and different barcodes with sequence reads from each other mRNA molecule transcribed from the same gene. This will be done by random primed synthesis of cDNA using barcoded random primers in such a manner that each mRNA molecule is exposed to one, and only one, barcode during random primed reverse transcription. In principle, this method produces molecule-specific collections of barcoded cDNAs, which, upon high throughput sequencing, can be aligned to reveal the specific structural details of mRNA isoforms on a molecule-by-molecule basis. This approach would solve the isoform model identifiability problem.