CovidGraph – a COVID-19 Knowledge Graph

CovidGraph was founded in early 2020 as a nonprofit collaboration of researchers, software developers, data scientists, and medical professionals. In April 2021, CovidGraph became part of HealthECCO and will be pursued under this umbrella going forward.

The research and communication platform we built starting with the CovidGraph project currently includes over 130,000 publications, case statistics, genes and functions, molecular data, and more. All of this forms the basis for current and future projects within the HealthECCO initiative.

Who Is This Project Aimed At?

Our aim is to help researchers quickly and efficiently find their way through COVID-19 datasets and to provide tools that use artificial intelligence, advanced visualization techniques, and intuitive user interfaces. This allows to explore papers, patents, existing treatments and medications around the family of the corona viruses.

In addition to literature data we connected information from fundamental entities in biology - namely genes and proteins and their function - , spanning a network of unparalleled size and knowledge.

Knowledge is primarily centered around the domain of corona-viruses but is steadily extended to other connected diseases.

Applications

The CovidGraph project provides a growing number of applications to interact with the data stored in the Knowledge Graph. Please feel free to use these apps free of charge, no registration/sign-up needed.

You can scroll through our available applications below:

Datasets

We integrate data from various sources and link them in our knowledge graph:

COVID-19 Open Research Dataset (CORD-19)

In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of over 44,000 scholarly articles, including over 29,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community.
https://pages.semanticscholar.org/coronavirus-research

The Lens COVID-19 Datasets

The Lens has assembled free and open datasets of patent documents, scholarly research works metadata and biological sequences from patents, and deposited them in a machine-readable and explorable form.
https://about.lens.org/covid-19/

Ensembl Genome Browser

Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.
http://www.ensembl.org

NCBI Gene Database

Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.
https://www.ncbi.nlm.nih.gov/gene

The Gene Ontology Resource

The Gene Ontology (GO) knowledgebase is the world’s largest source of information on the functions of genes. This knowledge is both human-readable and machine-readable, and is a foundation for computational analysis of large-scale molecular biology and genetics experiments in biomedical research.
http://geneontology.org

Experimental data from clinical studies and molecular genetics

tbd

2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE

This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).
https://github.com/CSSEGISandData/COVID-19

United Nations World Population Prospects 2019

The 2019 Revision of World Population Prospects is the twenty-sixth round of official United Nations population estimates and projections that have been prepared by the Population Division of the Department of Economic and Social Affairs of the United Nations Secretariat.
https://population.un.org/wpp/

Systems Biology Models

Systems biology data is integrated from MaSyMoS (Management System for Models and Simulations), a Neo4j graph database. MaSyMoS stores computational models, simulation descriptions and associated meta-data, including a collection of COVID-19 models from BioModels.
https://masymos.readthedocs.io/en/latest/
https://www.ebi.ac.uk/biomodels/covid-19