COVID-19 Open Research Dataset Aims to Speed Up Coronavirus Cure with 29,000+ Articles

COVID-19 Open Research Dataset Aims to Speed Up Coronavirus Cure with 29,000+ Articles

The dataset comprises over 29,000 scholarly articles on the coronavirus family, built by Microsoft, Facebook, National Library of Medicine and more.

Shouvik Das
  • News18.com
  • Last Updated: March 19, 2020, 4:26 PM IST
Share this:

Technology has a bigger role to play in fighting today's coronavirus pandemic crisis than ever before, and medical research is one of the key areas where the technology giants can contribute significantly. Focusing on this, Eric Horvitz, the chief scientific officer at Microsoft, published details of a collaborative database of scholarly articles presented in a machine readable format, called COVID-19 Open Research Dataset (CORD-19). The CORD-19 database comprises over 29,000 scholarly articles on research done on the coronavirus family, with over 13,000 articles presented in full text and uploaded to the database in a format that can be read and processed by machines.

CORD-19 is one of the collaborative efforts put forth by a conglomerate of technology giants, research institutes and scientific organisations, which in this case include the National Library of Medicine (NLM), the Allen Institute for AI, Georgetown University, the Chan Zuckerberg Initiative, Kaggle and the White House Office of Science and Technology Policy (OSTP). The database collates many research articles that may have otherwise been difficult for scientists to procure, as the world moves towards gaining a better understanding of COVID-19, and devise a cure that can stop the pandemic on its tracks.

Explaining in his post, Horvitz says, "A key aspect of aggregating scientific literature into a valuable unified data resource is gaining access to the full content of articles—including permissions to analyze the content with computational tools. Many medical articles are tucked behind paywalls. Even when text is made available, publishers may not provide researchers with the rights to perform machine analysis and datamining. Much has been going on behind the scenes to open up the literature on the coronavirus family and on COVID-19 to create this kind of machine-readable resource."

Machine learning is one of the most important tools in speeding up scientific research, for it is this tool that can process massive amounts of data much faster than humans, and bring up important results that can be key in understanding the mutated virus strain. The coronavirus pandemic has brought the world to a standstill, with many industries facing a crisis situation. As of now, 159 countries stand affected by COVID-19, with over 2,19,000 confirmed cases and nearly 9,000 deaths worldwide. In such times, tools such as machine learning and computational tools such as natural language analysis can play a key role in helping scientists across the world — something that CORD-19 will aim to do going forward.

The CORD-19 database is now live, and can be accessed here.

Share this:
Next Story