IBM’s new cloud analytics platform helps researchers extract data from patents and scientific journals
IBM has introduced a new cloud analytics application to allow researchers to speed up their searches of massive amounts of patents and scientific journals to find information on pharmaceutical chemicals.
Search algorithms in the IBM Strategic IP Insight Platform (SIPP), announced on 8 December, make extraction of drawings, figures and articles from scientific publications faster than humans laboriously sifting through pages.
IBM donated to the US’ National Institutes of Health (NIH) a large amount of data the company curated using SIPP. Researchers at the NIH will use the information to discover new medication and research cures for cancer.
The data contains 12 million patents and 20 million Medline scientific abstracts. Medline is a National Library of Medicine database of biomedical and life sciences journal citations. Life sciences and consumer goods companies can also access the NIH data to research chemicals.
Universities such as Berkeley, Johns Hopkins and Stanford are interested in using the data donated to the NIH, Dr. Ying Chen, a research scientist at IBM and a developer of SIPP, told eWEEK.
IBM pulled the data from millions of patents and scientific literature published from 1976 to 2000. Scientists can search the data at the National Centre for Biotechnology Information’s PubChem site, a database that aggregates scientific data on chemical structures. PubChem allows scientists to research chemicals for new drugs, cancer treatments and consumer products.
Researchers also use PubChem to see which pharmaceutical companies may have registered patents for certain drugs, according to Chen.
Scientists traditionally had to search for the chemical names in paper journals, and now IBM’s cloud-based platform will help them curate the data on molecules and chemicals within 24 hours of publication. In the database, the chemical names map out to synonyms for the chemicals.
“We’ve invented a machine-curation technology that would automatically read patents and scientific literature and extract chemical names,” Chen said.
For the NIH project, IBM took pharmaceutical data from AstraZeneca, Bristol-Myers Squibb, DuPont and Pfizer.
“They really contributed their domain expertise around the chemistry, biology and drug research to develop the technologies around chemical names extraction and curation,” Chen said. “It’s their contributions that made it possible for us to be able to extract chemicals and make the information available to the public.”
IBM extracted the data from 2.4 million chemical compounds, 4.7 million patents and 11 million biomedical journals.
The SIPP software runs on IBM’s software as a service (SAAS) SmartCloud platform. SIPP is able to quicken automated image analysis and enhanced optical recognition of chemical images and symbols taken from patents and literature. Researchers can access this information in real time using analytics and natural language, or speech, processing.
SIPP will allow the NIH as well as other organisations to build similar databases in the cloud, according to Chen.
“We have set up SIPP in a very scalable cloud model that allows us to grow our underlying data content on an ongoing basis,” Chen said. “Today we’re mining abstracts of content, and tomorrow we can analyse full articles,” Chen said.
“What makes SIPP stand out in my view is the comprehensiveness of the data made available in SIPP and the ways you can look at this data and the cross-linking to other data sets out there,” Marc Nicklaus, head of the Computer-Aided Drug Design (CADD) group at the NIH’s National Cancer Institute, said in an IBM video.
SIPP will also help organisations share data and collaborate on research and development, according to Chris Moore, partner and vice president at IBM Global Business Services.
“It provides a new approach to finding and correlating critical information through the combination of underlying data, deep analytics [and] delivery via the cloud as well as customised services to help research and development organisations fundamentally change their business,” Moore said in a statement.