Biomedical Knowledge Discovery in the Big Data Era

Large volumes of data are being generated in biomedical research every day by laboratories in both academia and industry. Generally speaking, there are two kinds of big data in biology and biomedical field. The first type is high-throughput experimental data, such as genomics data. The other type is unstructured text data in the scientific literature. The availability of such data has the potential to substantially accelerate research in biomedical sciences and drug development in the pharmaceutical industry. However, effective utilization of the data has become one of the bottlenecks in biomedical research. In this presentation, I will talk about several of our recent studies where we either used such data for making new discoveries or developed methods to address the general challenges of big data in biomedical discoveries. Specifically, I will talk about our research in cancer to illustrate the potential of making new discoveries from reanalysis of public genomics data. I will also talk about our research in cross-platform normalization and maximizing the reusability of public gene expression data, which addressed two of the four components of the FAIR data principles (Findable, Accessible, Interoperable, and Reusable). Finally, I will talk about our research in text mining, including a new AI-powered search engine for biomedical literature, Biomedical Knowledge Discovery Engine (BioKDE,, and some of my thoughts on AI-assisted knowledge discovery.

Dr. Jinfeng Zhang
Department of Statistics
Florida State University