GENOME-BASED APPROACH TO PREDICT COVID-19 SURGES
Disclaimer: Copyright infringement not intended.
Introduction
- New and emerging variants of SARS-CoV-2 virus continue to pose a threat to the health of populations across the globe. Until January 2022, there have been more than 6,000 mutations in the spike gene of the SARS-CoV-2.
- Early prediction for emergence of new strains is critical for pandemic preparedness.
- Most of the currently available predictive models are based on the reported infections and deaths.
- But now researchers have come up with Strainflow Model. It is a supervised predictive model using features of SARS-CoV-2 genome sequences.
Strainflow Model
- Earlier models do not incorporate features from the virus sequences in a predictive manner.
- Strainflow, plugs this gap by taking a sequence-driven approach to predict future surges using a novel artificial intelligence pipeline.
- This study was based on a simple hypothesis — virus sequences can be treated as documents that can be read like a book by natural language understanding (NLU) models. Further, the models can discover the underlying “grammar” patterns which are causally predictive of future surges.
- Thus, Strainflow is a genomic surveillance model for SARS-CoV-2 genome sequences.
- Here, sequences are treated as documents with words (codons) to learn the codon context of 0.9 million spike genes using the skip-gram algorithm.
- The team experimented with several NLU models optimised for efficiently learning the “grammar of Spike gene”.
- The best model compressed the viral sequences in 36 dimensions. Each of these 36 dimensions is a different cocktail mix of codon level relationships. Some of these 36 cocktail mixtures may encode the patterns that make the virus spread faster.
- Time series analysis of the information shows their leading relationship with the monthly COVID-19 cases for seven countries (e.g., USA, Japan, India, and others).
- And Machine Learning modeling can help develop an epidemiological early warning system for the COVID-19 caseloads.