Algorithm Hunts for Anomalies in Gene Expression Data
Technology Networks | November 28, 2019
Computational biologists at Carnegie Mellon University have devised an algorithm to rapidly sort through mountains of gene expression data to find unexpected phenomena that might merit further study. What's more, the algorithm then re-examines its own output, looking for mistakes it has made and then correcting them. This work by Carl Kingsford, a professor in CMU's Computational Biology Department, and Cong Ma, a Ph.D. student in computational biology, is the first attempt at automating the search for these anomalies in gene expression inferred by RNA sequencing, or RNA-seq, the leading method for inferring the activity level of genes. The researchers report that they have already have detected 88 anomalies — unexpectedly high or low levels of expression of regions within genes — in two widely used RNA-seq libraries that are both common and not previously known. "We don't yet know why we're seeing those 88 weird patterns," Kingsford said, noting that they could be a subject of further investigation. Though an organism's genetic makeup is static, the activity level, or expression, of genes varies greatly over time. Gene expression analysis has thus become a major tool for biological research, as well as for diagnosing and monitoring cancers.