Drawing on millions of biomedical journal publications to do predictive biology

Drawing on millions of biomedical journal publications to do predictive biology The biomedical literature captures the most current biomedical knowledge and is a tremendously rich resource for research. With over 24 million publications currently indexed in the US National Library of Medicine’s PubMed index, however, it is becoming increasingly challenging for biomedical researchers to keep up with this literature. Automated strategies for extracting information from it are required. Large-scale processing of the literature enables direct biomedical knowledge discovery. This paper introduces the use of text mining techniques to support analysis of biological data sets, specifically discussing applications in protein function prediction and analysis of genetic variants that are supported by analysis of the literature. Review of the work suggests that methods that integrate simple textanalysis with more targeted relation extraction, and methods that combine literature-derived information with complementary biological data, represent the most promising future directions.