Statisticians at the University of Wisconsin School of Medicine and Public Health have developed a mathematical formula to clear a major roadblock to accurately analyze genes in single-cell samples.

Single-cell RNA sequencing allows an investigator to measure the messenger RNA expression level of every gene in every cell, and it has already enabled critical insights into embryonic development, cancer and neural diversity, according to Christina Kendziorski, professor in the Department of Biostatistics and Medical Informatics.

But a major challenge with single-cell RNA sequencing is that the technology introduces technical changes. For many types of analyses, if these changes are not taken into consideration, biological signals are obscured and may be distorted, she said.

In 2016, Kendziorski and her graduate student, Rhonda Bacher, first discovered this phenomenon while analyzing H1 stem cells with Li-Fang Chu and other collaborators in the lab of James Thomson at the Morgridge Institute for Research in Madison.

They noticed thousands of changes in gene expression from cell to cell.

“Hundreds of genes that were not supposed to be changing were changing,” Kendziorski said 

To identify the problem, her team took two sets of cell samples that were identical except for the number of gene-expression sequences sampled from each cell. One set contained a large number of sequences, and the other set was sampled far less.

She expected to see 20 to 50 variations appearing between the two groups due to random chance, she said, but her research team was finding as many as 2,000 variations.

They discovered that the data were not being properly normalized, using standard approaches, to adjust for the amount of sequences sampled.

To address this, her team created an algorithm called SCnorm. The equation makes it easier to properly normalize single-cell RNA-sequenced data, which will help to ensure that downstream analyses are accurate.

The creation of SCnorm was recently published in the journal Nature Methods.