East A, Wheeler M, Kennedy S. Artificial intelligence application to critical appraisal of published literature: A case example using the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) evaluation method. Poster presentation, Health and Environmental Sciences Institute (HESI) Biannual Meeting, Washington, DC, June 2025.
Abstract
Use of systematic reviews is increasing; these are rigorous assessments that typically require substantial resources. To date, much focus has been on identification and selection of literature, with fewer efforts on critical appraisal. Recent advancements in artificial intelligence (AI) now suggest that scientific manuscripts can be evaluated by a large language model (LLM). The objective of this effort is to evaluate the feasibility of leveraging AI to rapidly assess study reliability. OpenAI’s 4o-Turbo was utilized to assess the reliability of six full-text manuscripts on the ecotoxicity of 6PPD-Q. Specifically, the Criteria for Reporting and Evaluating Ecotoxicity Data (CRED) were used; CRED assesses 20 criteria related to conduct and reporting. Ratings are assigned based on fulfillment: criterion fulfilled, criterion not fulfilled, criterion not reported, and criterion not applicable. All 20 criteria questions were analyzed by the AI, and prompt engineering was performed to calibrate the AI responses to the expert classifications. For each question, only relevant sections of the text were provided in the context window. AI results were then compared to expert judgments. The AI model was partially concordant with expert judgement, but was more likely to determine a criterion not fulfilled rather than not reported, suggesting a lack of ‘expectation’ and subject-matter expertise. However, the AI correctly identified all non-applicable criteria, and was strong at determining if good laboratory practices (GLP) conditions were used. This exploratory effort shows promise for using AI in systematic review, and that with additional parameterization and prompt engineering, a more attuned study reliability model could be developed. LLMs may enhance the speed and breadth of literature reviews. AI models that can critically assess papers using frameworks such as CRED can expand the scope of systematic reviews by being deployed at scale across larger bodies of literature.