Microsoft Launches COVID-19 Open Research Dataset

The machine-readable dataset could help win the war on this pandemic

Why This Matters:

The more we know about COVID-19, the better our chances of beating it. Pulling together disparate, yet relevant, datasets in one, free, AI-searchable database is one very important step.

 Getty Images

The war on COVID-19 will be won with social distancing, cleanliness, and, probably, a lot of data. Microsoft led a coalition of partners in developing a new machine-readable dataset from 29,000 articles on COVID-19 and coronavirus-related ailments. On Wednesday, the company opened it to the public.

A team effort: The data was collected from the National Library of Medicine (NLM), the Allen Institute for AI, Georgetown University, the Chan Zuckerberg Initiative (from Facebook’s Mark Zuckerberg and his wife Priscilla Chan), Kaggle and the White House Office of Science and Technology Policy (OSTP). 

The secret sauce. This is data the White House asked technology companies to tackle a week ago. Microsoft indexed and mapped thousands of articles. It noted in a LinkedIn post on what is now being called the COVID-19 Open Research Dataset (CORD-19), there’s full text for over 13,000 of the nearly 30,000 articles.

Actually, a big deal: Because scientific articles are often hidden behind paywalls, having full text indexing of this many of them is a significant achievement. Microsoft and its partners apparently will be working on adding to that data store, ostensibly by getting more scientific paper holders to make their text freely available to the project.

What can researchers do with this? "It’s my hope that the machine-readable content will stimulate advances in computing methods that can help investigators to develop deeper understandings and approaches to addressing the COVID-19 pandemic," wrote Eric Horvitz Technical Fellow and Chief Scientific Officer at Microsoft in the LinkedIn post. Artificial Intelligence may be able to, for example, spot patterns in the research that human brains might otherwise miss.

Bottom Line: They say knowledge is power and this free, open resource for the global research community may also be the key to a cure.

