Hagan B, Groff L, Patlewicz G, Shah. 2025. Toward metabolic similarity in read-across: A case study using graph convolutional networks to predict genotoxicity outcomes. Chem Res Toxicol 38(6):1122-1133; doi: 10.1021/acs.chemrestox.5c00120.
Abstract
Metabolic similarity is a key consideration in evaluating candidate source analogues for read-across (RAx), but approaches to systematically characterize metabolism for read-across prediction are still evolving. Metabolic similarity is multifaceted, considering the similarity of the metabolic tree, the metabolites simulated, and the transformation pathways. The structure of metabolic trees lends itself naturally to graph representations, for which several methods, including graph convolutional networks (GCNs), can be applied to quantify the pairwise similarity between the target and source analogue(s) within an analogue or category approach. In this study, we compared metabolic graph representations of metabolites with structural similarities in predicting genotoxicity outcomes using a data set comprising 5403 chemicals. Xenobiotic metabolism pathways were predicted using the rat liver models within the commercial expert system, TIssue MEtabolism Simulator (TIMES), and the phase I and II xenobiotic metabolism modules within the freely available system BioTransformer. Metabolic pathways were converted to graphs and used to train GCNs, generating embeddings for each chemical. The classification performance of generalized read-across (GenRA), random forest (RF), logistic regression (LR), and multilayer perceptron (MLP) was compared using GCN-derived embeddings versus both Morgan and MACCS chemical fingerprints to identify genotoxic chemicals. GCN embeddings with LR, based on in vivo TIMES metabolism predictions using MACCS fingerprints as node features, achieved the highest area under the curve of the receiver operating characteristic of 0.807, outperforming GenRA and LR with MACCS fingerprints by 14.47% and 5.49%, respectively. Our findings suggest that GCN embeddings of predicted metabolism pathways perform substantially better than structural features of the parent chemicals in predicting genotoxicity outcomes. Such GCN embeddings offer new avenues of systematically encoding end point metabolic information to facilitate analogue identification for read-across.