Berkeley Lab and Grungemaster/Dreamstime
Berkeley Lab researchers

COVID-19 Machine Learning Tool Assimilates Research Papers

April 28, 2020
Online AI tool uses text mining algorithms to scan and make sense of hundreds of new papers every day.

The volume of literature produced on the topic of COVID-19 is daunting. So much so that scientists can’t keep up and need help finding relevant papers and building correlations.

Enter The search engine uses natural language processing techniques to scan, search, synthesize, draw insights and make connections.

A group of materials scientists at Lawrence Berkeley National Laboratory (Berkeley Lab), who usually spend their time researching high-performance materials for thermoelectrics or battery cathodes, built the text mining tool. Their quest to develop text and data mining techniques that can help answer high-priority questions related to COVID-19 stems from the White House’s March 16 call to action.

At the time, the COVID-19 Open Research Dataset (CORD-19) of scholarly literature about COVID-19, SARS-CoV-2 and the Coronavirus group had the most extensive machine-readable coronavirus literature collection available for data and text mining, with more than 29,000 articles.

Once the Berkeley Lab team set to work, its prototype was up and running within a week; after a month the tool had collected more than 61,000 research papers. About 8,000 were specifically about COVID-19 and the balance were about related topics, such as other viruses and pandemics in general. They estimate 200 new articles are published every day on the coronavirus. “Within 15 minutes of the paper appearing online, it will be on our website,” said Amalie Trewartha, a postdoctoral fellow who is one of the lead developers.

Ready for Public Use

The tool went live this week when the Berkeley Lab team released an upgraded version that allows the user to search for “related papers” and sort articles using machine-learning-based relevance tuning. COVIDScholar will also recommend similar abstracts and automatically sort papers in subcategories, such as testing or transmission dynamics, allowing users to do specialized searches.

The developers built automated scripts to grab new papers (including preprint papers), clean them up and make them searchable. At the most basic level, COVIDScholar acts as a simple search engine—albeit a highly specialized one touted as the largest single-topic literature collection on COVID-19—according to the developers.

Next Steps

The team of artificial intelligence experts will now train its algorithms to look for unnoticed connections between concepts. “You can use the generated representations for concepts from the machine learning models to find similarities between things that don’t actually occur together in the literature, so you can find things that should be connected but haven’t been yet,” said John Dagdelen, a UC Berkeley graduate student and Berkeley Lab researcher who is one of the lead developers.

Further on, the team plans to work with researchers in Berkeley Lab’s Environmental Genomics and Systems Biology Division and UC Berkeley’s Innovative Genomics Institute to improve COVIDScholar’s algorithms. The idea is to synthesize systems in a way that will allow researchers to discover new connections within their data, said Dagdelen.

Not From Left Field

The entire tool runs on the supercomputers of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science user facility located at Berkeley Lab. The online search engine and portal are powered by the Spin cloud platform at NERSC.

Chalk up the speed with which the team was able to iterate ideas to experience. The group spent three years doing natural language processing for materials science and built a similar tool, called MatScholar, a project supported by the Toyota Research Institute and Shell.

Last year the team published a paper in Nature that showed how an algorithm with no training in materials science could recommend materials for functional applications several years before their discovery.

Sponsored Recommendations

How BASF turns data into savings

May 7, 2024
BASF continuously monitors the health of 63 substation assets — with Schneider’s Service Bureau and EcoStruxure™ Asset Advisor. ►Learn More: https://www.schn...

Agile design thinking: A key to operation-level digital transformation acceleration

May 7, 2024
Digital transformation, aided by agile design thinking, can reduce obstacles to change. Learn about 3 steps that can guide success.

Can new digital medium voltage circuit breakers help facilities reduce their carbon footprint?

May 7, 2024
Find out how facility managers can easily monitor energy usage to create a sustainable, decarbonized environment using digital MV circuit breakers.

The Digital Thread: End-to-End Data-Driven Manufacturing

May 1, 2024
Creating a Digital Thread by harnessing end-to-end manufacturing data is providing unprecedented opportunities to create efficiencies in the world of manufacturing.

Voice your opinion!

To join the conversation, and become an exclusive member of Machine Design, create an account today!