Predictive QSAR models for the toxicity of Phenols
Auteur Hamada Hakim*
Materials and Environment Analytical Sciences Laboratory, Larbi Ben M'hidi University - Oum El Bouaghi
B.P. 358 route de Constantine, 04000 Oum el Bouaghi, Algeria.
*Corresponding Author E-mail: hakimannaba2178@yahoo.fr, hakim.hamada@univ-oeb.dz
ABSTRACT:
Toxicity data for the 50% growth inhibitory concentration against Tetrahymena pyriformis pCIC50 = -logCIC50 for 85 phenols substituted were obtained experimentally. Log (CIC50)-1 along with the hydrophobicity, the logarithm of the 1-octanol/water partition coefficient (log Kow), and R2u (GETAWAY descriptors). The entire data set was randomly split into a training set (60chemicals) used to establish the QSAR model, and a test set (25 chemicals) for statistical external validation The descriptors models were selected from an extensive set of several descriptors (topological, geometrical and quantum). Quantitative structure-activity/property (QSAR / The values of the statistical parameters obtained from the multiple linear regression analysis (R²=95.5%, Q²=95.01%, S=0.157, F=604.34, P=0, SDEC=0.153, SDEP=0.161, Q²ext=95.96%, SDEPext=0.153) testify to the good fit of the model.
Quantitative structure activity relationship (QSAR) studies can help you find bioactive molecules in a more rational way.5 The ability to estimate the properties of new chemicals is the QSAR method's main success.6
The QSAR method makes use of powerful computers, molecular graphics, and sophisticated software. Attempts were made to broaden the quantitative understanding of the relationship between intrinsic, physical, chemical, biological, or molecular properties.7
2. MATERIAL AND METHODS:
A QSAR model was reported for studying the antioxidant activity of a series of phenols using Dragon version descriptors. 5.38 and hyper hem 7.5 software9. Software in addition to empirical descriptor from previous studies (logP)10.
Other variable selection methods have been found to be inferior to the genetic algorithm (GA)11. Thus, variable selection was performed on the training set by maximizing the variance explained by cross-validation by omitting an observation in Todeschini's version of Moby Digs12. Ordinary least squares regression and genetic algorithm selection of explanatory variable subsets (Genetic Algorithm-Variable Subset Selection, or GA-VSS)13
The crossover and mutation processes of the genetic algorithm in the MobyDigs software are controlled by a parameter T ranging from 0 to 1. The genetic algorithm's parameters were set as follows: Pop = 100 for the model population; T = 0.5 to balance the roles of the two processes of crossover and mutation.
The use of the GA-VSS method resulted in several good models for predicting pcic50 based on various sets of molecular descriptors. The best model was created by combining the R2u (R autocorrelation of lag 2/ unweighted) and logP functions.
Data sources:
Toxicity13
Population growth inhibition data. pCIC50 =-log (CIC50) and CIC50 is the concentration in mmol/L, reducing the growth to 50% of the control after 96 h of exposure) for Tetrahymena preforms were secured from the literature.
Definition CIC50 14
The concentration required to produce 50% inhibition of a biological activity (for example, an enzyme reaction, cell growth, reproduction, etc.).In simpler terms, it measures how much of a particular substance or molecule is needed to inhibit some biological process by 50%. IC50 is commonly used as a measure of drug-receptor binding affinity.
R2u (R autocorrelation of lag 2/unweighted), GETAWAY descriptors15:
Getaway (Geometry, Topology, and Atom-Weights Assembly) descriptors are derived from the Molecular Influence Matrix (MIM), that is, a matrix representation of molecules denoted by H and defined as the following [Consonni, Todeschini et al., 2002a, 2002b]:
(1)
Where M denotes the molecular matrix composed of the centered Cartesian coordinates x, y, and z of the molecule atoms (including hydrogens) in a given conformation. Atomic coordinates are assumed to be calculated with respect to the geometrical center of the molecule to obtain translational invariance. The molecular influence is a symmetric AxA matrix, where A represents the number of atoms, and shows rotational invariance with respect to the molecule coordinates, thus resulting independent of molecule-alignment rules. The R indices are defined in the same way as the H indices, by using the off-diagonal elements of the influence/distance matrix R instead of the elements of the matrix H:
(2)
Where hi and hj are the leverages of the atoms i and j, and rij is their geometric distance, The diagonal elements of the matrix R are zero, while each off-diagonal element i–j, resembling the single terms in the summation of the gravitational indices, is calculated by the ratio of the geometric mean of the corresponding ith and jth diagonal elements of the matrix H over the interatomic distance rij provided by the geometry matrix G.
The square-root product of the leverages of two atoms is divided by their interatomic distance. According to the basic idea that interaction between atoms in the molecule decreases as their distance increases, it appears to make less significant contributions from pairs of atoms far apart. Obviously, the largest values of the matrix elements derive from the most external atoms (i.e., those with high leverages) and simultaneously from those next to each other in the molecular space (i.e., those having small interatomic distances).
(3)
Where u is an atomic weighting scheme and d (dij; k) is a Dirac delta function, equal to 1 when the topological distance dij between atoms i and j is equal to k and zero otherwise. D is the molecule's topological diameter; that is, the maximum topological distance in the molecule.
Logp (partition coefficients (n-octanol—water) (Pow) 10.
Octanol/water partition coefficients P are experimental values found in the lo databank (Sangster Research Laboratories, Montreal, Canada).
Development and Validation of QSAR Model:
Application of the GA-VSS led to several good models for the prediction of based on different sets of molecular descriptors. The best two models were constructed using the log p and R2u pretreated data, which were divided into the training set and test set based on Kennard-Stone permutation, where 70% of the dataset (60 compounds) is the training set and the remaining 30% (25 compounds) is the test set.
The data is shown in the following table 1.
Table 1: parameter of 85 substituted phenol
|
MolID |
N° |
Status |
PcicI50 |
logP |
R2u |
|
2,3,6-trimethylphenol |
1 |
Training |
0.418 |
2.67 |
1.778 |
|
2,3-dichlorophenol |
2 |
Training |
1.271 |
3.04 |
0.803 |
|
2,4,6-tribromophenol |
3 |
Training |
1.695 |
3.69 |
0.762 |
|
2,4,6-trichlorophenol |
4 |
Training |
1.036 |
3.17 |
0.809 |
|
2,4-dibromophenol |
5 |
Training |
-0.029 |
2.3 |
1.663 |
|
2,4-dichlorophenol |
6 |
Training |
1.128 |
3.14 |
0.807 |
|
2,4-dimethylphenol |
7 |
Training |
0.396 |
1.65 |
0.854 |
|
2,5-dichlorophenol |
8 |
Training |
0.078 |
1.92 |
1.601 |
|
2,5-dimethylphenol |
9 |
Training |
0.789 |
2.91 |
1.449 |
|
2,6-difluorophenol |
10 |
Training |
0.504 |
2.33 |
0.838 |
|
2-acetylphenol |
11 |
Training |
0.64 |
2.85 |
1.472 |
|
2-allylphenol |
12 |
Training |
0.277 |
2.15 |
0.851 |
|
2-bromo-4-methylphenol |
13 |
Training |
0.031 |
1.61 |
0.778 |
|
2-bromophenol |
14 |
Training |
0.176 |
2.47 |
1.626 |
|
2-chloro-5 -methylphenol |
15 |
Training |
0.284 |
1.76 |
0.875 |
|
2-chlorophenol |
16 |
Training |
0.483 |
1.81 |
0.851 |
|
2-cyanophenol |
17 |
Training |
-0.252 |
1.1 |
0.802 |
|
2-ethylphenol |
18 |
Training |
-0.953 |
0.73 |
1.359 |
|
2-fluorophenol |
19 |
Training |
0.803 |
2.88 |
1.716 |
|
2hydroxy benzaldhyde |
20 |
Training |
0.842 |
2.78 |
0.822 |
|
2-hydroxybenzaldoxime |
21 |
Training |
2.573 |
5.7 |
1.265 |
|
2-hydroxybenzamide |
22 |
Training |
0.93 |
3.42 |
1.704 |
|
2-hydroxybenzylalcohol |
23 |
Training |
1.562 |
3.61 |
0.792 |
|
2-isopropylphenol |
24 |
Training |
0.113 |
2.35 |
1.678 |
|
2-phenylphenol |
25 |
Training |
0.957 |
2.5 |
0.839 |
|
2-tert-butyl-4-methylphenol |
26 |
Training |
-0.065 |
1.7 |
0.756 |
|
3 -chloro-4-fluorophenol |
27 |
Training |
0.473 |
1.93 |
0.869 |
|
3 -methoxyphenol |
28 |
Training |
0.085 |
1.38 |
0.835 |
|
3,4,5,6-tetrabromo-2-methylphenol |
29 |
Training |
1.118 |
2.93 |
0.815 |
Table 1: Parameter of 85 substituted phenol (Continued).
|
MolID |
N° |
Status |
PcicI50 |
logP |
R2u |
|
3,4,5-trimethylpheno |
30 |
Training |
-0.819 |
0.46 |
1.543 |
|
3,4-dimethylphenol |
31 |
Training |
-0.093 |
1.35 |
1.575 |
|
3,5-dichlorophenol |
32 |
Training |
1.038 |
3.14 |
1.427 |
|
3,5-dimethylphenol |
33 |
Training |
1.278 |
3.93 |
1.612 |
|
3-acetylphenol |
34 |
Training |
0.702 |
2.9 |
1.725 |
|
3-chlorophenol |
35 |
Training |
0.7 |
2.78 |
1.43 |
|
3-cyanophenol |
36 |
Training |
0.795 |
3.1 |
1.458 |
|
3-ethylphenol |
37 |
Training |
1.203 |
3.78 |
1.652 |
|
3-fluoropheno |
38 |
Training |
0.545 |
2.39 |
0.849 |
|
3-hydroxybenzaldehyde |
39 |
Training |
1.292 |
3.69 |
1.964 |
|
3-iodophenol |
40 |
Training |
0.013 |
1.81 |
1.645 |
|
3-iso-propylphenol |
41 |
Training |
0.206 |
2.5 |
1.606 |
|
3-methylphenol |
42 |
Training |
0.017 |
1.77 |
0.873 |
|
3-phenylphenol |
43 |
Training |
2.033 |
4.75 |
1.811 |
|
3-tert-butylphenol |
44 |
Training |
1.779 |
3.84 |
0.751 |
|
4-acetamidophenol |
45 |
Training |
1.024 |
3.07 |
1.021 |
|
4-acetylphenol |
46 |
Training |
-0.384 |
0.9 |
1.342 |
|
4-benzyloxyphenol |
47 |
Training |
0.854 |
2.9 |
0.829 |
|
4-bromo-2,6-dimethylpheno |
48 |
Training |
-0.143 |
1.34 |
1.639 |
|
4-bromo-6-chloro-2-methylphenol |
49 |
Training |
-0.192 |
1.97 |
1.502 |
|
4-bromophenol |
50 |
Training |
1.355 |
3.75 |
1.032 |
|
4-butoxyphenol |
51 |
Training |
0.635 |
3.2 |
1.645 |
|
4-chloro-2-methylphenol |
52 |
Training |
0.913 |
3.31 |
1.691 |
|
4-chloro-3 -methylphenol |
53 |
Training |
2.092 |
5.31 |
1.908 |
|
4-chloro-3,5-dimethylpheno |
54 |
Training |
1.233 |
3.98 |
1.731 |
|
4-chloro-3,5-dimethylphenol |
55 |
Training |
0.618 |
2.88 |
1.37 |
|
4-chlorophenol |
56 |
Training |
0.478 |
2.47 |
1.656 |
|
4-cyclopentylphenol |
57 |
Training |
0.572 |
2.47 |
1.649 |
|
4-ethoxyphenol |
58 |
Training |
-0.046 |
1.89 |
1.551 |
|
4-ethylphenol |
59 |
Training |
0.084 |
1.96 |
1.561 |
|
4-fluorophenol |
60 |
Training |
2.523 |
5.18 |
0.684 |
Table 1: parameter of 85 substituted phenol (Continued).
|
MolID |
N° |
Status |
Pcic50 |
logP |
R2u |
|
4-heptyloxyphenol |
61 |
Test |
2.05 |
4.08 |
0.727 |
|
4-hexyloxyphenol |
62 |
Test |
1.403 |
3.25 |
0.783 |
|
4-hromo-2,6-dichlorophenol |
63 |
Test |
0.009 |
2.34 |
1.644 |
|
4-hydroxybenzamide |
64 |
Test |
0.346 |
2.64 |
1.356 |
|
4-hydroxybenzophenone |
65 |
Test |
-0.242 |
1.28 |
0.888 |
|
4-hydroxybenzylcyanide |
66 |
Test |
1.094 |
3.09 |
0.989 |
|
4-hydroxyphenethylalcohol |
67 |
Test |
1.297 |
4.1 |
1.842 |
|
4-hydroxypropiophenone |
68 |
Test |
-0.145 |
1.58 |
1.623 |
|
4-iodophenol |
69 |
Test |
0.122 |
2.23 |
1.622 |
|
4-iso-propylphenol |
70 |
Test |
-0.381 |
1.39 |
1.565 |
|
4-methoxyphenol |
71 |
Test |
0.229 |
2.5 |
1.621 |
|
4-methylphenol |
72 |
Test |
0.609 |
3.05 |
1.645 |
|
4-phenoxyphenol |
73 |
Test |
-0.062 |
1.98 |
1.496 |
|
4-phenylphenol |
74 |
Test |
1.351 |
3.23 |
0.962 |
|
4-propylphenol |
75 |
Test |
0.73 |
3.3 |
1.701 |
|
4-sec-butylphenol |
76 |
Test |
1.277 |
3.87 |
1.38 |
|
4-tert-butylphenol |
77 |
Test |
0.681 |
2.59 |
0.836 |
|
4-tert-octylphenol |
78 |
Test |
1.203 |
3.78 |
1.629 |
|
4-tert-pentylphenol |
79 |
Test |
1.648 |
4.22 |
1.788 |
|
a-3-trifluoro-p-cresol |
80 |
Test |
-0.78 |
0.33 |
0.875 |
|
ethyl-3-hydroxybenzoate |
81 |
Test |
-0.828 |
0.67 |
1.492 |
|
ethyl-4-hydroxybenzoate |
82 |
Test |
0.056 |
2.03 |
1.63 |
|
methyl-3-hydroxybenzoate |
83 |
Test |
0.473 |
3.05 |
1.678 |
|
methyl-4-hydroxybenzoate |
84 |
Test |
1.355 |
3.75 |
0.963 |
|
pentachlorophenol |
85 |
Test |
0.98 |
3.58 |
1.734 |
The formula used to calculate the cross-validated:
A multiple linear regression analysis method was used to generate QSAR models employing Quick Stat software. To check the predictive power of the models, cross validation was done by the leave one out procedure. The following statistical parameters were considered to compare the generated QSAR models: R 2, Q2, standard deviation (S), F–test and internal predictive power by cross validated coefficient (Q2)16.
Determination coefficient (R2)17.
R2 The coefficient of determination (R 2) indicates the quality of fit and is calculated as:
(4)
Where yi and are, respectively, the
measured and averaged (over the entire data set) values of the dependent
variable for the ideal model, the sum of squared residuals being 0, the value
of R2 is 1. As the value of R2 deviates from 1, the
fitting quality of the model deteriorates. The square root, or R2, is the
multiple correlation coefficient (R). Correlation coefficient (r2) which is relative measure of quality of
fit18.
Adjusted determination coefficient (Adjusted R2)19
(5)
n refers to the number of observation and k number of descriptors
If one goes on increasing the number of descriptors in a model for a fixed number of observations, R2 values will always increase, but this will lead to a decrease in the degree of freedom and low statistical reliability. Therefore, a high value of R2 is not necessarily an indication of a good statistical model that fits well the available data20.
Ratio (The Fisher-Snedécor Coefficient)21.
The F statistic, calculated from R2 and the number of data points, determines the statistical significance of the regression equation at specified degrees of freedom (df)
F-ratio test is the most well-known statistical tests, this is defined as:
(6)
Standard deviation (s)22.
Values of R² and adj R² attest the good fitting performances of the model which, moreover, is very highly significant (great value of the Fisher parameter F).18
(7)
It is an indicator of dispersion. It provides information on how the distribution of data is performed around the average. The closer its value is to 0, the better the adjustment and the more reliable the prediction will be.
The Predicted residual sum of squares (PRESS)23
The PRESS (predicted residual sum of squares) statistic appears to be the most important parameter for a good estimate of the real predictive error of the models. Its small value indicates that the model predicts better than chance and can be considered statistically significant. It is calculated by the following equation:
(8)
eiithresidual
hii ithdiagonal element of (9)
Standard deviation error in calculation defined as24
(10)
Standard deviation error of prediction (SDEP23
(11)
Cross-validated R2 (R2cv) (or Q²)25
(12)
A value Q2 > 0.5 is generally regarded as a good result and Q2 > 0.9 as excellent26
External validation
coefficient
27
(13)
Here, next refers to the number of test set compounds.
(14)
Where the sum runs over the test set objects (ext n).
The applicability domain (AD)[28-29] is a theoretical region in space is defined by the descriptors of the model and the modeled response for which a given QSRR should make reliable predictions. In this work, the structural AD was verified by the leverage approach. The leverage, hii, 32, is defined as follows:
n (15)
Where xi is the i th compound's descriptor row vector, x T i is its transpose, and X is the model matrix derived from the calibration set descriptor values.The warning leverage, h*, is, generally, fixed at 3(p+1)/n, where n is the total number of samples in the training set and m is the number of descriptors involved in the correlation.
The Williams plot, the plot of the standardized residuals versus the leverage, was exploited to visualize the applicability domain (AD).
The warning leverage (h*) is defined as:
(16)
Where N is the number of training compounds, p is the number of predictor variables. When the h value of a compound is lower than h *, the probability of accordance between predicted and actual values is as high as that for the compounds in the training set. A chemical with i h > * h will reinforce the model if the chemical is in the training set. But such a chemical is in the validation set and its predicted data may be unreliable. However, this chemical may not appear to be an outlier because its residual may be low. Thus, the leverage and the jackknifed residual should be combined for the characterization of the AD.
External validation criteria or “Tropsha’s criteria30
Golbraikh and Tropsha’s criteria proposed a set of parameters for determining the external predictability of a QSAR model. According to Golbraikh and Tropsha, models are considered satisfactory if all of the following conditions are satisfied:
2- R2ext>0.7
R2 Correlation coefficient between the predicted and observed activities
R02 is a quantity characterizing linear regression with the Y-intercept set to zero and observed versus predicted activities R0′2 slopes k and k′ of regression lines (predicted versus observed activities, and observed versus predicted activities) through the origin.
3. RESULTS AND DISCUSSION:
QSAR Analysis:
The MLR model was built by stepwise regression on training set as follows:
pCIC50 = - 0.602(±0.60192) + 0.661(± 0.66051)
*logP - 0.400(±0.40005) *R2u (15)
S = 0.157 R² = 95.5% R² (adj) = 95.3% F= 604.34 p= 0.000
In the equation, n is the number of compounds, and R2 is the multiple correlation coefficient, and S is the standard error, and F is the Fisher inspection value, and P is the prominence rate. A higher correlation coefficient and lower standard error indicate that the model is more reliable. The P value is much smaller than 0.05, which means that the regression equation has statistical significance statistical parameters.
External validation:
The results are shown in the table 2.
Table 2. External validation
|
ntr |
next |
Q² |
R² |
R²adj |
|
60 |
25 |
95.01 |
95.5 |
95.34 |
|
Q² ext |
SDEC |
SDEP |
SDEPext |
|
|
95.96 |
0.153 |
0.161 |
0.153 |
|
Cross-validation
Fig. 1. Cross-validation vs. experimental Pcic50
The additional external validation according to (Golbraikh and Tropsha 2002) and described previously confirms the validity of the proposed model. The results are as follows
Fig. 2. Plot of experimental vs. predicted values in a regression model
Fig. 3. Plot predicted of vs. experimental values in a regression model
Applicability Domain:
As shown in the Williams plot (Fig. (4).
Fig. 4. The Williams plot, the plot of the standardized residuals vs. leverages
The He Williams plot for the presented MLR model is shown in Fig. 5. From this plot, the leverage values (hi) of any compound in the training and test sets are less than the critical value (h* = 0.15) excepting the compounds (4-fluorophenol and 2-hydroxy benzaldoxime) as outliers. Also, the standardized residuals of all compounds in the training and test sets are less than two standard deviation units (±2σ). Therefore,
As a result, the model displays the best statistical parameters and good prediction properties.
Y-randomization Test
Table 3. Randomization test
|
Itération |
R²% |
Q²% |
|
0 |
95.5 |
95.01 |
|
1 |
13.3 |
3.78 |
|
2 |
1.1 |
0 |
|
3 |
0.7 |
0 |
|
4 |
2.5 |
0 |
|
5 |
0.5 |
0 |
|
6 |
1.5 |
0 |
|
7 |
0.1 |
0 |
|
8 |
1.7 |
0 |
|
9 |
1.7 |
0 |
|
10 |
2 |
0 |
This is a widely used technique to ensure the robustness of a QSAR model. In this test, the dependent-variable vector, Y-vector, is randomly shuffled and a new QSAR model is developed using the original independent-variable matrix.
Results are shown in table 3. Q2 and R2 are both low.
Values obtained after every shuffle indicate that the good results in our original model are not due to a chance correlation of the training set.
General conclusion:
The toxicity of 85 variously substituted phenols, characterized by the concentration of inhibition at 50% of the growth (CIC50) of a population of ciliated protozoa of the Tetrahymena pyriformis family, could be linked to the octanol/water partition coefficient (logkow) and R2u.
Through the determination of suitable statistical parameters, we have observed a high relationship between experimental and predicted activity values, indicating the validation and excellent quality of the derived QSAR model.
REFERENCES:
1. J. Michałowicz, W. Duda, Phenols--Sources and Toxicity, Polish Journal of Environmental Studies, 2007.16.
2. K.E. Hevener, D.M. Ball, J.K. Buolamwini, R.E. Lee, Quantitative structure–activity relationship studies on nitrofuranyl anti-tubercular agents, Bioorganic & medicinal chemistry,16;2008:8042-8053.
3. Parimal M. Prajapati, Yatri R. Shah, DhruboJyoti Sen. Artificial Neural Network: A New Approach for QSAR Study. Research J. Science and Tech. 3(1); 2011: 17-24
4. Sudhanshu Dhar Dwivedi, Arpan Bharadwaj, Amit Shrivastava. Application of Topological Descriptor: QSAR Study of Chalcone Derivatives. Asian J. Research Chem. 3(4); 2010:1030-1034.
5. Satyajit Dutta, Sagar Banik, Sovan Sutradhar, Sangya Dubey, Ira Sharma. 4D-QSAR: New Perspectives in Drug Design. Asian J. Research Chem. 4(6; 2011: 857-862.
6. Lokendra Kumar Ojha, Ajay M Chaturvedi, Arpan Bhardwaj, Abhilash Thakur, Mamta Thakur. Physiochemical Investigation and Role of Indicator Parameter in the Modeling of Tetrahydroimidazole Benzodiazepine -1- one (TIBO): A QSAR Study. Asian J. Research Chem. 5(3; 2012:377-382.
7. Sapkale GN, Khandare DD, Patil SM, Ulhas S Surwase. Drug Design: An Emerging Era of Modern Pharmaceutical Medicines. Asian J. Research Chem. 3(2; 2010: 261-264.
8. R. Todeschini, V. Consonni, M. Pavan, DRAGON Software for the Calculation of Molecular Descriptors, Release 5.4 for Windows, Milano, 2006.
9. Hyperchem™ Release 7, Hypercube for Windows, Molecular Modeling System, 2000.
10. K. Pirgelovfi 1, S. Balfi~ 1, T. W. Schultz 2 Model-Based QSAR for Ionizable Compounds: Toxicity of Phenols Against Tetrahymenapyriformis Arch. Environ. Contam. Toxicol. 30 ;1996 : 170-177 .
11. R. Leardi, R. Boggia et M. Terrile. Genetic Algorithms as a Strategy for Feature Selection, Journal of Chemometrics, 6;1992: 267 – 281.
12. R. Todeschini, D. Ballabio, V. Consonni, A. Mauri, M. Pavan, MOBYDIGS, version 1.1, Copyright TALETE srl.2009.
13. M. Pavan, A. Mauri et R. Todeschini. Total Ranking Models by the Genetic Algorithm Variable Subset Selection (GA–VSS) Approach for Environmental Priority Settings, Analytical and Bioanalytical Chemistry, 380; 2004: 430 – 444.
14. Mark T.D. Cronin, T. Schultz W. Structure-toxicity relationships for Phenols to Tetrahymena Pyriformis, Chemosphere.32; 1996:1453-1468.
15. Enkatesh Kamath, Aravinda Pai. Application of Molecular Descriptors in Modern Computational Drug Design –An Overview. Research J. Pharm. and Tech.10(9) 2017.: 3237-3241. doi: 10.5958/0974-360X.2017.00574.1
16. V. Consonni, R. Todeschini, M. Pavan, Structure/response correlations and similarity/diversity analysis by GETAWAY descriptors, 1—Theory of the novel 3D molecular descriptors, Journal of Chemical Information and Modeling .42;2002:682-692.
17. Prarthana V Rewatkar, Ganesh R Kokil. QSAR Studies of Novel 1- and 8-Substituted-3-Furfuryl Xanthines: An Adenosine Receptor Antagonist. Asian J. Research Chem. 3(2): April- June 2010. 416-420.
18. Chatterje, S. and Hadi, A.S. Regression Analysis by Example. 4th Edition, John Wiley & Son, Inc., Hoboken,. 2006. p366
19. Sameer Dixit, Arun K. Sikarwar. Statistical Approach to Modelling of Activity of Phenol’s and its Derivatives against L1210 Leukaemia cells. Asian J. Research Chem. 13(3); 2020: 237-240. doi: 10.5958/0974-4150.2020.00046.2
20. Besse, P Pratique de la modélisation statistique; Publication du laboratoire de statistique et Probabilité .2003
21. Bando, P., et al. Single-Component Donor-Acceptor Organic Semiconductors Derived from TCNQ. The Journal of Organic Chemistry,59;1994: 4618-4629.
22. Siegel, A.F. Practical Business Statistics. IRWIN, 1997.3rd Edition.
23. Kiran Madhawai, Dinesh Rishipathak, Santosh Chhajed, Sanjay Kshirsagar. Predicting the Anti-Inflammatory Activity of Novel 5-Phenylsulfamoyl-2-(2-Nitroxy) (Acetoxy) Benzoic acid derivatives using 2D and 3D-QSAR (kNN-MFA) Analysis. Asian J. Res. Pharm. Sci.7(4); 2017: 227-234. doi: 10.5958/2231-5659.2017.00036.4
24. T. Hastie, R. Tibshirani and J. Friedman, “The Elements of Statistical Learning: Data Mining, Inference, and Prediction,” 2nd Edition, Springer, New York, 2009.
25. Golbraikh, A. and Tropsha, A. Beware of q2! Journal of Molecular Graphics and Modelling.20;2002: 269-276.https://doi.org/10.1016/S1093-3263(01)00123-1
26. Roy K., Kar S., Das R. A Primer on QSAR/QSPR Modeling. Springer International Publishing;. Statistical methods in QSAR/QSPR. 2015: 37–59.
27. Consonni, V., Ballabio, D. and Todeschini, R. Evaluation of Model Predictive Ability by External Validation Techniques. Journal of Chemometrics,24; 2010: 94-201. https://doi.org/10.1002/cem.1290.
28. R. S. Kalkotwar, R. B. Saudagar. Design, Synthesis and anti- microbial, anti-inflammatory, Antitubercular activities of some 2,4,5-trisubstituted imidazole derivatives. Asian J. Pharm. Res. 3(4); 2013: 159-165.
29. L. Eriksson, J. Jaworska, A. Worth, M. Cronin, R.M. Mc Dowell, P. Gramatica, Methods for reliability, uncertainty assessment, and applicability evaluations of regression based and classification QSARs, Environmental Health Perspectives.111;2003:1361-1375.
30. A. Tropsha, P. Gramatica, V.K. Grombar, The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models, QSAR and Combinatorial Science. 22 ; 2003: 69-76.
Received on 26.05.2022 Modified on 22.07.2022
Accepted on 19.09.2022 ©AJRC All right reserved
Asian J. Research Chem. 2022; 15(6):433-438.