A<sc>BSTRACT</sc>

Pharmacophore

pharmacophorejournal.com

Pharmacophore

2229-5402

pharmacophorejournal.com-6853

10.51847/vwHtDaETbQ

Original research

Regulatory Text Mining System for Pharmaceutical Quality Risk Detection from Guidelines and Deviation Reports

Martins

Bruno

1 Pereira

Lucas

1 Azevedo

Renata

2 Costa

Pedro

1Department of Computational Pharmacology, Faculty of Pharmacy, University of Minho, Braga, Portugal. 2Department of Pharmaceutical Intelligence Systems, Faculty of Pharmacy, University of Porto, Porto, Portugal.

Address for correspondence: Prof. Wael Abu Dayyih, Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Mutah University, Al-Karak 61710, Jordan. E-mail: bruno.martins@gmail.com

28 04 2025

16 2 32 42

2026

https://creativecommons.org/licenses/by-nc-sa/4.0/

This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.

A<sc>BSTRACT</sc>

Pharmaceutical quality relies on the proactive detection of process, product, and compliance risks, yet critical signals are often hidden within unstructured sources such as regulatory guidance, deviation narratives, CAPA records, inspection observations, and other quality documents, which are typically reviewed in a fragmented manner. Traditional quality risk management depends heavily on manual document review and local expertise, making it challenging to identify recurring issues across sites, benchmark internal deviations against external regulatory expectations, or develop a comprehensive view of emerging risks. This article proposes an AI-powered regulatory text mining system that ingests regulatory guidelines, deviation reports, and CAPA records to extract risk entities and their relationships, link them to manufacturing processes, and build a queryable quality-risk knowledge graph. The framework integrates document ingestion, preprocessing, named entity recognition, relation extraction, transformer-based risk classification, knowledge graph construction, and dashboard-based decision support, with human verification to ensure interpretability, auditability, and compliance with regulatory standards. By converting scattered textual information into actionable quality-risk intelligence, the system enables quality teams to anticipate compliance gaps, prioritize CAPA activities, and respond more rapidly to evolving regulatory expectations, shifting pharmaceutical organizations from reactive documentation toward predictive, science-based quality oversight.

Regulatory text mining Pharmaceutical quality Deviation reports CAPA Natural language processing Knowledge graph