University of Texas at Austin

Past Event: CSEM Student Forum

Free Divergence Error Correcting DNA Barcodes

John Hawkins, CSEM, ICES, UT Austin

10 – 11AM
Friday Jan 19, 2018

POB 6.304

Abstract

Many large-scale next-generation sequencing (NGS) experiments utilize DNA barcodes—short DNA sequences prepended to DNA libraries—for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to significant data loss or spurious results. Widely-used error-correcting DNA barcodes (e.g., Hamming and Levenshtein codes) do not properly account for insertions and deletions in the DNA—the most common type of synthesis error. Here, we overcome this problem with the design and experimental validation of Filled/truncated Right End Edit (FREE) barcodes which correct for substitution, insertion, and deletion errors, even when these errors can alter the barcode length. FREE barcodes are designed with experimental considerations in mind, including balanced GC content, minimal homopolymer runs, and reduced internal hairpin propensity. We generate and include lists of barcodes with different lengths and error-correction levels that may be useful in diverse NGS applications, including >106 single-error correcting 16-mers that strike a balance between decoding accuracy, barcode length, and library size. Moreover, combining two or more FREE codes into a single barcode increases the available barcode space combinatorially, which we demonstrate by finding lists with > 1015 error-correcting barcodes. Our software for creating barcode libraries and decoding sequenced barcodes is efficient and designed to be user-friendly for the general biology community.

Event information

Date
10 – 11AM
Friday Jan 19, 2018
Location POB 6.304
Hosted by Ivana Escobar