Publications : 2017

Chappell G, Welsh B, Harvey S, Harris M, Wikoff D. 2017. Validation and application of a text mining tool in the identification and categorization of mechanistic data: A case study in improving problem formulation of carcinogenicity assessments. Poster presented at Society of Toxicology Annual Meeting, March 15, Baltimore, MD.


As the role of systematic literature review increases in the field of toxicology, efforts to establish methodology and best practices are ongoing. To this end, we assessed the utility of a text-mining and machine learning tool (SWIFT) in the characterization of mechanistic data associated with carcinogenic endpoints using the Ten Key Characteristics of Carcinogens (TKCC) organizational approach. Objectives included: 1) assess the of utility of SWIFT in characterizing mechanistic information from peer reviewed literature using previously-conducted systematic reviews that employed the TKCC approach, and 2) assess TKCC as a problem formulation tool via a validated text mining approach within SWIFT. Using agents that have been classified in different groups by the International Agency for Research on Cancer for case studies, we found that applying a text mining strategy within SWIFT aided in title/abstract screening following a broad search for individual agents. This method returned an average of ~90% of the TKCC-relevant studies that were previously identified by an analyst, suggesting its utility as a problem formulation tool. We also found that optimization of the search syntax was critical; utilization of internally-developed search strings identified different subsets of papers compared to SWIFT’s built-in carcinogen-relevant categorization (e.g., 71 vs 13 papers, respectively, out of the 100 analyst-identified studies for TKCC #8 “Modulates receptor-mediated effects” for one of the agents). This highlights the critical nature of transparency in literature identification. Regarding the second objective, the overall profiles of categorized studies according to the TKCC varied considerably across different agents (e.g., each TKCC included 3-17% of all studies for one Group 2B agent, while 54% of all studies were categorized as one of two TKCC for a Group 2A agent). Collectively, our findings demonstrate the potential utility of computational tools to support problem formulation, though ultimately underscore the importance of evaluation beyond TKCC categorization. These efforts also highlight the continued need for comprehensively and transparently identifying appropriate literature for chemical risk assessment.