<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD with MathML3 v1.3 20210610//EN" "JATS-archivearticle1-3-mathml3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"
  dtd-version="1.3" xml:lang="en" article-type="research-article">
  <?DTDIdentifier.IdentifierValue -//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN?>
  <?DTDIdentifier.IdentifierType public?>
  <?SourceDTD.DTDName JATS-journalpublishing1.dtd?>
  <?SourceDTD.Version 1.2?>
  <?ConverterInfo.XSLTName jats2jats3.xsl?>
  <?ConverterInfo.Version 1?>
  <?properties open_access?>
  <front>
    <journal-meta>
      <journal-id journal-id-type="iso-abbrev">Pharmacophore</journal-id>
      <journal-id journal-id-type="publisher-id">pharmacophorejournal.com</journal-id>
      <journal-id journal-id-type="publisher-id">Pharmacophore</journal-id>
      <journal-title-group>
        <journal-title>Pharmacophore</journal-title>
      </journal-title-group>
      <issn pub-type="epub">2229-5402</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">pharmacophorejournal.com-6859</article-id>
      <article-id pub-id-type="doi">10.51847/8PE7sUIyYy</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Original research</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Multimodal Foundation Model for Pharmaceutical Knowledge Extraction from Structures, Assays, Patents, and Regulatory Text</article-title>
      </title-group>
                    <contrib-group>
                      <contrib contrib-type="author">
              <name>
                <surname>Ricci</surname>
                <given-names>Paolo</given-names>
              </name>
                              <xref rid="aff1" ref-type="aff">1</xref>
                                                            <xref rid="cor1" ref-type="corresp" />
                          </contrib>
                      <contrib contrib-type="author">
              <name>
                <surname>Luca</surname>
                <given-names>Marco De</given-names>
              </name>
                              <xref rid="aff1" ref-type="aff">1</xref>
                                        </contrib>
                      <contrib contrib-type="author">
              <name>
                <surname>Ferraro</surname>
                <given-names>Giulia</given-names>
              </name>
                              <xref rid="aff2" ref-type="aff">2</xref>
                                        </contrib>
                      <contrib contrib-type="author">
              <name>
                <surname>Russo</surname>
                <given-names>Antonio</given-names>
              </name>
                              <xref rid="aff1" ref-type="aff">1</xref>
                                        </contrib>
                  </contrib-group>
                  <aff id="aff1">
            <label>1</label>Department of AI-Based Drug Discovery, Faculty of Pharmacy, University of Naples Federico II, Naples, Italy.
          </aff>
                  <aff id="aff2">
            <label>2</label>Department of Computational Pharmaceutical Systems, Faculty of Engineering, University of Bologna, Bologna, Italy.
          </aff>
                          <author-notes>
            <corresp id="cor1">
              <bold>Address for correspondence:</bold> Prof. Wael Abu Dayyih, Department of
              Pharmaceutical Chemistry, Faculty of Pharmacy, Mutah University, Al-Karak 61710, Jordan.
                              E-mail: <email xlink:href="paolo.ricci@gmail.com">paolo.ricci@gmail.com</email>
                          </corresp>
          </author-notes>
                    <pub-date pub-type="epub">
        <day>28</day>
        <month>10</month>
        <year>2025</year>
      </pub-date>
      <volume>16</volume>
      <issue>5</issue>
      <fpage>20</fpage>
      <lpage>30</lpage>
      <permissions>
        <copyright-statement>
          Copyright: &#x000a9; 2026 Pharmacophore
        </copyright-statement>
        <copyright-year>2026</copyright-year>
        <license>
          <ali:license_ref xmlns:ali="http://www.niso.org/schemas/ali/1.0/"
            specific-use="textmining" content-type="ccbyncsalicense">
            https://creativecommons.org/licenses/by-nc-sa/4.0/</ali:license_ref>
          <license-p>This is an open access journal, and articles are distributed under the terms of
            the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows
            others to remix, tweak, and build upon the work non-commercially, as long as appropriate
            credit is given and the new creations are licensed under the identical terms.</license-p>
        </license>
      </permissions>
      <abstract>
        <title>A<sc>BSTRACT</sc></title>
        <p>Pharmaceutical research and development generates diverse knowledge spanning chemical structures, biological assay readouts, patent claims, and regulatory documents, yet these sources are typically curated, searched, and interpreted through separate workflows even when describing the same compound, target, or safety concern. No single system currently provides unified reasoning across structural chemistry, pharmacology, intellectual property, and regulatory evidence, forcing researchers to manually integrate information from multiple databases, document repositories, and expert interpretations. This article proposes a conceptual multimodal foundation model for pharmaceutical knowledge extraction that aligns molecules, assays, patents, and regulatory text within a shared representation space. The system architecture combines molecular encoders, assay-table encoders, document-text encoders, contrastive alignment modules, retrieval-augmented generation, and a conversational interface to enable evidence-grounded question answering across pharmaceutical data modalities. Such a model could assist medicinal chemists, pharmacologists, regulatory scientists, and competitive-intelligence teams in retrieving integrated answers that currently require separate searches, while also supporting drug repurposing, safety signal review, and patent landscape analysis by linking evidence across modalities. By facilitating cross-domain reasoning, a pharmaceutical multimodal foundation model could transform the synthesis of complex evidence into a routine and accessible capability.</p>
      </abstract>
      <kwd-group>
                <kwd>Multimodal foundation model</kwd>
                <kwd>Pharmaceutical informatics</kwd>
                <kwd>Chemical language model</kwd>
                <kwd>Regulatory text mining</kwd>
                <kwd>Patent knowledge extraction</kwd>
                <kwd>Retrieval-augmented generation</kwd>
              </kwd-group>
    </article-meta>
  </front>
</article>