Zorn KM, Foil DH, Lane TR, Russo DP, Hillwalker W, Feifarek DJ, Jones F, Klaren WD, Brinkman AM, Ekins S. 2020. Machine learning models for estrogen receptor bioactivity and endocrine disruption prediction. Environ Sci Technol 54(19):12202–12213, PMID: 32857505.
The U.S. Environmental Protection Agency (EPA) periodically releases in vitro data across a variety of targets, including the estrogen receptor (ER). In 2015, the EPA used these data to construct mathematical models of ER agonist and antagonist pathways to prioritize chemicals for endocrine disruption testing. However, mathematical models require in vitro data prior to predicting estrogenic activity, but machine learning methods are capable of prospective prediction from the molecular structure alone. The current study describes the generation and evaluation of Bayesian machine learning models grouped by the EPA’s ER agonist pathway model using multiple data types with proprietary software, Assay Central. External predictions with three test sets of in vitro and in vivo reference chemicals with agonist activity classifications were compared to previous mathematical model publications. Training data sets were subjected to additional machine learning algorithms and compared with rank normalized scores of internal five-fold cross-validation statistics. External predictions were found to be comparable or superior to previous studies published by the EPA. When assessing six additional algorithms for the training data sets, Assay Central performed similarly at a reduced computational cost. This study demonstrates that machine learning can prioritize chemicals for future in vitro and in vivo testing of ER agonism.