Artificial intelligence scanning ancient scientific research documents captures discoveries that people have missed so far.
The algorithm, which uses the relationships between words, identifies thermoelectric material candidates for that are promising and likely to be better than those currently used.
Scientists used machine learning to uncover the information contained in ancient research documents. The machine learning algorithm, which uses the language of millions of ancient research documents, has started to make new discoveries.
Researchers at the Lawrence Berkeley National Laboratory gave details of their work in an article published in Nature on July 3. Researchers who run an algorithm called Word2Vec to scan the old scientific research documents to uncover scientific discoveries that people may have missed, shared the results. After scanning, the algorithm began to make predictions about possible thermoelectric materials.
The findings were published July 3 in the journal Nature. The lead author of the study, “Unsupervised Word Embeddings Capture Latent Knowledge from Materials Science Literature,” is Vahe Tshitoyan, a Berkeley Lab postdoctoral fellow now working at Google. Along with Jain, Berkeley Lab scientists Kristin Persson and Gerbrand Ceder helped lead the study.
Even though the algorithm made this discovery, it does not have information about thermoelectric, what equation it finds. Artificial intelligence makes predictions and determines candidates by looking at the word sequence only.
Anubhav Jain, a researcher of this study, says that artificial intelligence can examine any subject. Artificial intelligence, acting as a researcher, sometimes examines different subjects.
The algorithm has read a summary of 3.3 million scientific articles so far in order to be trained. At this stage, the algorithm, which has a repertoire of approximately 500 thousand words, has learned the relationship between machine learning and words.
Artificial intelligence, dealing with words according to their relationships and structural contexts, placed each associated series on a separate vector. In other words, a vector such as the periodic table – nonmetals – carbon – organic molecule – DNA can emerge. What was noteworthy here were the relations associated with thermoelectric but not included in any summary.
Researchers feeding the algorithm only through articles up to 2009 realized that a structure normally discovered in 2012 had already been discovered by artificial intelligence in this study.
Their findings highlight the possibility of extracting knowledge and relationships from the massive body of scientific literature in a collective manner, and point towards a generalized approach to the mining of scientific literature.