Extracting insights from HealthECCO knowledge graph
Biology and medicine develop rapidly because we are able to uncover meaningful relations between key concepts, such as protein expression and RNA, controlled cell death and caspase activation, or viral infection and cytokine release by immune cells. Researchers are meticulously studying such relationships in controlled experiments, trying to uncover what is yet to be discovered. As always, one must be aware of what has been done and attempted historically before taking on new endeavors. This is where the limitations of our brain capacity begin to surface — biology is just too vast for a single person to grasp at once. Still, the most interesting discoveries often happen at the intersection of different research areas — we saw tremendous developments in biology when disillusioned physicists entered the field and brought their ideas and machinery into the labs.
Perhaps there is a way to tackle ever growing biological knowledge complexity with modern mathematics and computer science? Representing the body of knowledge as a graph looks promising, and below we demonstrate what insights these adventures might uncover.
The concept of knowledge graph has been known for quite a long time and it is successfully being used in multiple industries. Knowledge graph represents the data from multiple domains in one place with explicit connections between the entities. These connections are the most valuable part, since they provide insights that are hard to notice when the amount of data is enormous.
In this blog post we are going to use the data from a biomedical graph called Covidgraph by HealthECCO. As follows from its name it is mostly dedicated to Covid-19 and contains over 130,000 publications, case statistics, genes and functions, molecular data, and more. The main goal of this post is to demonstrate how even a relatively small piece of data from a big publicly available knowledge graph can be used for analysis, how it can be done using graph algorithms and simple no-code tools, and what tasks can be solved using graph data representation.
Covidgraph is built on the Neo4j graph storage, so we are going to use Cypher statements and the Graph Data Science (GDS) library. We are also going to use Knime as the tool for analysis and visualization of the results.