NEURAL NETWORK–BASED SEARCH FOR COX-2 ACTIVE LIGANDS FROM COXIB-LIKE AND SIMILAR COMPOUNDS
Liza Tybaco Billones1*, Alex Cerbito Gonzaga1
|
|
ABSTRACT
The development of novel non-steroidal anti-inflammatory drugs (NSAIDs) free of serious side effects remains an attractive area of research. The availability of hundreds of compounds with known inhibitory activity against COX-2, the intended enzyme target of most NSAIDs, provides an excellent opportunity to explore various quantitative structure-activity relationship models and apply them in the binary classification of compounds. In this work, an artificial neural network or neural net (NN) model was constructed on a dataset consisting of 1380 compounds and 184 attributes, i.e., molecular descriptors. A feedforward NN consisting of 63 input nodes, 1 hidden layer with 33 nodes, and trained on 80% of the dataset by a backpropagation algorithm, has learned after 200 training cycles to classify compounds as active or inactive against COX-2. It has excellent predictive performance (accuracy = 93.5%, AUC = 0.97) on the 20% test set. The neural net classified 875 newly designed variants of COX-2 selective inhibitors and 163 structurally related compounds as active against the COX-2 target. The top hits have superior (or at least comparable) binding affinities compared to the control and possess the desirable properties of an oral drug.
Keywords: Molecular descriptors, NSAID, COX-2 inhibitors, Neural network, Anti-inflammatory
Introduction
The long-established non-steroidal anti-inflammatory drugs (NSAIDs) such as ibuprofen, naproxen, and indomethacin possess adverse effects that include gastrointestinal bleeding, ulceration, and perforation. This is associated with the inhibition of the cytoprotective cyclooxygenase-1 (COX-1) enzyme [1] that is expressed in many tissues. COX-1 is a housekeeping enzyme [2] that mediates important physiological functions such as the production of the mucus lining for inner stomach protection and overall tissue homeostasis [1], and thus dubbed the “good COX”. On the other hand, the inducible isozyme COX-2 is expressed largely at the sites of inflammation and is responsible for the production of pro-inflammatory prostaglandins that cause pain, fever, and inflammation [3], and hence tagged the “bad COX”. Thus, drugs that would selectively inhibit the activity of COX-2 exert analgesic, antipyretic, and anti-inflammatory effects and the side effects associated with COX-1 inhibition [4, 5].
The newer so-called “coxibs” such as celecoxib [6], etoricoxib [7], and parecoxib [8] are remarkably selective to COX-2. But even this group of NSAIDs has an array of undesirable side effects that are specific to a particular drug [9, 10]. The later discovery of COX-2 expression in kidneys, brain, and spinal cord [11, 12] indicated that the prostaglandins derived from COX-2 have beneficial effects including renal function regulation [13] and ulcer healing [14]. For this reason, some selective NSAIDs may also present side effects such as cardiovascular and renal problems [15, 16] depending on the extent of COX-2 inhibition. Thus, the discovery of new classes of anti-inflammatory compounds with optimal COX-2 risk is still an active area of research.
The ligand-based method is an established approach in drug discovery that uncovers new active chemical entities using the structural information of known active compounds [17]. It utilizes quantitative models that relate the observed biological activity to structure-derived molecular properties. The experimental bioactivity can be measured with a continuous variable like IC50 (half maximal inhibitory concentration), EC50 (half maximal effective concentration), or MIC (minimum inhibitory concentration); or a nominal variable like compound classification, i.e., active or inactive. Several statistical techniques and machine learning methods can be applied in classification tasks such as logistic regression [18], discriminant analysis [19], random forest [20], and artificial neural network [21], among others. This work developed a neural network (NN) model that was applied to the Derivatives or “coxib”-like variants, and to the Similars, another set of structurally related compounds.
Materials and Methods
The compounds included in this study were collected from literature as described elsewhere [22, 23]. These published articles on COX inhibitors covered more than two decades of research. The compounds in the dataset were assigned to either the Active or Inactive group depending on their IC50 values, that is, Active if IC50 ≤ 10 μM and Inactive if IC50 > 10 μM.
A personal computer running on Microsoft Windows 7 Professional 64-bit OS using a 3.50-GHz IntelÒ CoreÔ i7-4770K processor with 8.00-GB RAM was used in the creation, manipulation, and optimization of the molecular structures and calculation of their properties. A machine with a MacOS Catalina operating system, 3.1-GHz Dual-Core Intel Core i7 processor, and 16-GB RAM was used in the data analysis and model construction. The chemical structures were drawn using ChemDraw Professional 16.0 (www.perkinelmerinformayics.com) tool, the geometry optimization at the semi-empirical PM3 level was carried out in Spartan 16 (www.wavefun.com), and the calculation of molecular descriptors was performed in both Discovery Studio (DS) version 4.0 (Biovia, Inc.) and Spartan 16. MS Excel was used in organizing the data and RapidMiner Studio 9.7.001 (www.rapidminer.com) in Neural Net modeling and data analyses [24]. The ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties, QED (Quantitative Estimate of Druglikeness) scores, Synthetic Accessibility (SA) scores, and molecular docking data were obtained as described in our previous work [22, 23].
Two sets of compounds were generated where the hits would be determined. The set of Derivatives was composed of 1100 compounds created by performing bioisosteric replacement [25] at selected positions in 5 structures that represent the 5 largest families, i.e. cyclopentenes, imidazolyls, difluorobenzenes, furanyls/thiophenyls, and isoxazoles, of known COX-2 inhibitors [22, 23]. The similarities included the 600 top hits in a similarity search study done in SwissSimilarity (http://www.swisssimilarity.ch).
Results and Discussion
Data was first cleaned by removing duplicate compounds and descriptors that were essentially invariable or had numerous missing items. The final number of compounds was 1380 with 929 actives and 451 inactives, and the number of descriptors was 184. Subsequently, the 80-20 dataset split was applied putting aside 1104 (80%) compounds for model construction and 276 (20%) for model validation. The best NN model was then implemented in the Derivatives and the Similars.
Mimicking biological neural networks, artificial neural networks (ANN) or simply neural nets (NN) are composed of artificial neurons or nodes, which are interconnected by weights that resemble the biological synapses [26]. Excitatory connections are characterized by positive weights while the inhibitory synapses are reflections of negative weights [27]. A linear combination was formed that summed up all inputs, which are modified according to assigned or calculated weights. An activation function, which in many applications was a sigmoid function, was employed to limit the amplitude or keep the output within an acceptable range, i.e. scaled to 0 and 1, or -1 and +1. NN is an adaptive system that uses nonlinear data modeling of complex relationships between inputs and outputs like in this case, between molecular descriptors and compound classification, i.e., active or inactive.
This work applied multilayer perceptron (MLP) that develops a model utilizing a feedforward neural network and backpropagation learning technique [28]. It consists of at least three layers of nonlinearly-activating nodes that are fully connected in a feedforward way [29], that is, each node in one layer connects with a certain weight to every node in the following layer, moving in only one direction from the input nodes through the hidden nodes to the output nodes. The backpropagation algorithm is a supervised learning method involving two phases, propagation and weight update, that are repeated until a good network performance is reached [30]. This algorithm computes the value of some predefined error function by comparing the predicted outcomes with the observed values. The error is then fed back through the network by which the algorithm adjusts the weights such that the error diminishes. This process is repeated for a sufficiently large number of training cycles until this error reaches a certain minimum value and the network has learned a certain target function. In this work, the optimization was stopped when the error decreased to 1 × 10-4.
For hyperparameter tuning, the 80-20 data split was again applied to the dataset intended for model construction, so that a set of 883 compounds was used to build the models that were then tested on the remaining 221 compounds. Among the performance metrics, model specificity was considered also important in that a model should be able to identify potential dropouts and so avoid the costly high attrition rate at the later stages of the drug development. Figure 1 shows the best model constructed using 63 attributes, that allowed a maximum of 0.75 correlation coefficient among the variables. On the validation set, the corresponding best NN model had 96.4% accuracy, 97.3% sensitivity, and 94.4% specificity, outshining both the Multiple Logistic Regression [22] and Random Forest [23] models. Like in the RF study, the models with fewer input nodes, i.e., limited information, gave less accurate predictions. But the inclusion of more but very highly correlated descriptors (r > 0.75) did not further improve the model performance. The best NN model was also indicated with an excellent predictive performance by both ROC-AUC [31] and Cohen’s kappa coefficient [32] with high values of 0.977 and 0.918, respectively (Figure 1b).
|
|
a) |
b) |
Figure 1. Performance Scores of Neural Net models as a function of the correlation coefficient (r) among the independent variables: training cycle = 200, learning rate = 0.01, momentum = 0.9, number of hidden layers = 1; input data was shuffled before learning, the sigmoid function was used as activation function, and optimization was stopped at epsilon error below 1 × 10-4. |
Figure 2 shows that performance scores improve with an increase in the number of training cycles, until the 200th cycle. Setting the training cycle at 200, Figure 3 shows that performance scores fluctuate around the average values of 92.5% accuracy, 94.4% sensitivity, 88.6% specificity, 0.98 AUC, and 0.92 kappa. Nevertheless, a careful examination of the Figure unveiled the best-performing NN, that is the one with 33 nodes in the hidden layer. The effect of the size of the hidden layer was further explored, but more hidden layers beyond what is shown in the graph did not result in an improvement in the model performance.
|
|
a) |
b) |
Figure 2. Performance Scores of Neural Net models as a function of the number of training cycles: r < 0.75, learning rate = 0.01, momentum = 0.9, number of hidden layers = 1; input data was shuffled before learning, the sigmoid function was used as activation function, and optimization was stopped at epsilon error below 1 × 10-4. |
|
|
a) |
b) |
Figure 3. Performance Scores of Neural Net models as a function of the hidden layer size: r < 0.75, learning rate = 0.01, momentum = 0.9, number of hidden layers = 1; input data was shuffled before learning, the sigmoid function was used as activation function, and optimization was stopped at epsilon error below 1 × 10-4. |
Hence, the best NN model (Figure 4a) was that built from 63 input nodes with a single hidden layer of 33 nodes on 200 training runs. The model, as shown in Figure 4b, correctly predicted the classification of the compounds in the test set with 93.5% overall accuracy, 96.2% sensitivity (179 out of 186 actives), and 87.8% specificity (79 out of 90 inactives). The outstanding performance of the model was further demonstrated by the ROC curve that is very close to the top-left corner with the corresponding AUC value that is very close to 1 (0.97) indicating a very good measure of separability between active and inactive.
|
|
a) |
|
|
|
b) |
c) |
Figure 4. a) The best NN model constructed from 63 independent variables using 1 hidden layer with 33 nodes, the closer to 1 (red) the greater the confidence that the compound is active against COX-2; b) the NN model prediction of compound classification, i.e., active/inactive; c) the NN model ROC-AUC plot: training cycle = 200, learning rate = 0.01, momentum = 0.9; input data was shuffled before learning, the sigmoid function was used as activation function, and optimization was stopped at epsilon error below 1 × 10-4. |
When applied to the set of 1100 Derivatives, the NN model classified 875 as actives, i.e., orange to red data points in Figure 5. The average probability of being active 0.95 is slightly higher than the Multiple Logistic Regression (MLogR) model prediction (0.94) [22], and much higher than that of the RF model (0.69) [23]. Consistent with previous studies, nearly all (96%) cyclopentane derivatives are predicted as active.
|