Development and validation of 2-gene host-based COVID-19 diagnostic classifiers

Testing for coronavirus disease 2019 (COVID-19) has been a key point of most countries' strategies for fighting the disease. While initially governments were forced to rely on makeshift PCR testing to acquire results, a lengthy and poorly scalable process, more standardized processes were quickly developed, and lateral flow tests that could provide point-of-care diagnostics helped even further. A group of researchers from Biohub has been attempting to create a more accurate method for PCR testing. The research can be found on the medRxiv* preprint server.

Study: A 2-Gene Host Signature for Improved Accuracy of COVID-19 Diagnosis Agnostic to Viral Variants. Image Credit: Trofimchuk Vladimir/Shutterstock

The study

The researchers started by identifying the top-performing 2-gene candidates in an RNA-seq cohort developed in previous work, supplementing it with additional samples that had been gathered. The final cohort included 318 patients, 90 of which had COVID-19, and 59 had different viral acute respiratory infections (ARIs). These were split into a training and testing set, with roughly equivalent proportions of those with and without COVID-19 in each.

A greedy selection algorithm was used to identify 2-gene combinations that were best for predicting COVID-19 status. A first gene was selected to maximize the area under the receiver operating characteristic curve (AUC) of the support vector machine (SVM) binary classifier that used the selected genes as features. A second gene was selected to maximize the AUC when combined with the first gene. The best' first genes were interferon-stimulated genes IFI6, IFI44L, and HERC6. Most of the 'second genes were similarly related to immune/inflammation processes and signals.

The performance of the top nine combinations was estimated with 10,000 rounds each of 5-fold cross-validation within the training and testing sets or training on the training set and predictions on the testing set. The final approach showed AUC values up to 0.93. Classifiers were validated using an external independently generated NP swab RNA-seq dataset acquired from colleagues in New York. Once again, the 2-gene combinations showed very similar results.

Generally, the first gene was sufficient to distinguish COVID-19 from other non-viral ARIs, whereas the second gene was more useful for telling the difference between COVID-19 and other viral ARIs. IFI6 alone successfully separated COVID-19 and non-viral samples, but some viral ARI samples showed similar levels of IFI6 expression. Adding GB5 allowed for better separation as it is typically expressed more highly in other viral ARIs.

The researchers decided to refine the genes by considering the expression fold-change between COVID-19 and the other groups. They plotted the AUC of SVM classifiers for each gene against the fold-change of that gene between the COVID-19 and non-viral samples, averaged between both datasets. Several ISGs showed the robust predictive value and fold-changes that should be able to be detected by qPCR. Plotting the AUC of classifiers using IFI6 in combination with every second gene against the fold-change of the second gene between COVID-19 and other viral ARIs showed several candidates with slightly smaller fold changes, although still detectable by PCR. Only some of these genes were detected by the genes selected by the greedy algorithm.

Four 'first' genes were selected, and the expression of these genes relative to a reference gene was measured using qPCR from swabs from a new cohort of patients and controls (with and without COVID-19). The swabs were not sequenced, so those without COVID-19 could be viral or non-viral. All four genes were able to assign most samples to COVID-19 or not. SVM classifiers that relied on a single gene could show significant prediction performance independently.

To test the ability to separate COVID-19 from other viruses, four 'second' genes were chosen and measured in the COVID-19 samples compared to a subset of the original sequenced viral samples. Three genes showed significantly higher expression in the non-COVID-19 samples, and the fourth showed higher expression in COVID-19 samples. All could successfully predict COVID-19 status from qPCR data.

Finally, the researchers assessed how effective this method would be across variants, performing qPCR for IFI6 and GBP5 on samples with Omicron and Delta, resulting in the prediction of COVID-19 with a high likelihood in all variant samples. They also examined how likely it would be that cross-contamination in the laboratory could lead to false positives – a common problem with PCR tests. Even with stimulated false positives, the estimations remained accurate.

Conclusion

This research could prove invaluable for COVID-19 testing in the future and could help ensure accurate readings from COVID-19 samples. While there is still development to take place, the authors have proven that this is a viable and realistic method for detecting COVID-19, and the information they have gathered could inform testing manufacturers and help the industry maintain accuracy.

*Important notice

medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Albright, J. et al. (2022) "A 2-Gene Host Signature for Improved Accuracy of COVID-19 Diagnosis Agnostic to Viral Variants". medRxiv. doi: 10.1101/2022.01.06.21268498. https://www.medrxiv.org/content/10.1101/2022.01.06.21268498v1

Posted in: Medical Science News | Medical Research News | Disease/Infection News

Tags: Contamination, Coronavirus, Coronavirus Disease COVID-19, covid-19, Diagnostic, Diagnostics, Gene, Genes, Inflammation, Interferon, Laboratory, Omicron, Research, Respiratory, RNA

Comments (0)

Written by

Sam Hancock

Sam completed his MSci in Genetics at the University of Nottingham in 2019, fuelled initially by an interest in genetic ageing. As part of his degree, he also investigated the role of rnh genes in originless replication in archaea.

Source: Read Full Article