Machine learning is expected to improve low throughput and high assay cost in cell-based phenotypic screening. However, it is still a challenge to apply machine learning to achieving sufficiently complex phenotypic screening due to imbalanced datasets, non-linear prediction, and unpredictability of new chemotypes. Here, we developed a prediction model based on the heat-diffusion equation (PM-HDE) to address this issue. The algorithm was verified as feasible for virtual compound screening using biotest data of 946 assay systems registered with PubChem. PM-HDE was then applied to actual screening. Based on supervised learning of the data of about 50,000 compounds from biological phenotypic screening with motor neurons derived from ALS-patient-induced pluripotent stem cells, virtual screening of >1.6 million compounds was implemented. We confirmed that PM-HDE enriched the hit compounds and identified new chemotypes. This prediction model could overcome the inflexibility in machine learning, and our approach could provide a novel platform for drug discovery.
All Science Journal Classification (ASJC) codes
- Decision Sciences(all)