Application of Topological Descriptor: QSAR Study of Chalcone Derivatives

 

Sudhanshu Dhar Dwivedi1, Arpan Bharadwaj2 and Amit Shrivastava*

1Govt. Science and Commerce College Benazeer, Bhopal (M.P.)

2Govt. Madhav Science P.G. College, Ujjain (M.P.)

*Corresponding Author E-mail: amit_chem@yahoo.com

 

ABSTRACT:

A set of chalcone derivatives were tested for their antimalarial activities. Quantitative structure activity relationship (QSAR) analysis was applied to forty-two of the abovementioned derivatives using a combination of various topological descriptors. A multiple linear regression (MLR) procedure was used to model the relationships between molecular descriptors and the antimalarial activity of the chalcone derivatives. The stepwise regression method was used to derive the most significant models as a calibration model for predicting the antimalarial activity of this class of molecules. The best QSAR models were further validated by the calculation of statistical parameters for the established theoretical models. High agreement between experimental and predicted activity values, obtained in the validation procedure, indicated the good quality of the derived QSAR models.

 

KEYWORDS: QSAR; chalcone derivatives; multiple linear regressions; statistical parameters.

 


 

INTRODUCTION:

Quantitative structure–activity relationship and Quantitative structure–property relationship (QSAR/QSPR) studies are indubitably of great importance in modern chemistry and biochemistry. To obtain a significant correlation, it is essential that appropriate descriptors are employed, for such considerations the molecular structure is often represented as a simple mathematical object, such as a number, sequence, or a set of selected invariants of matrices, generally referred to as molecular descriptors2-4. Multiple regression analysis is usually used in such studies in the hope that it might point to structural factors that influence a particular property. It may help one in model building and assist in the design of molecules with prescribed desirable properties, which is an important goal in drug research. In chemistry, anything that can be said about the magnitude of the property and its dependence upon changes in the molecular structure5 depends on the chemist’s capability to establish valid relationships between structure and property. In many physical-chemistry, organic, biochemical and biological areas, it is increasingly necessary to translate those general relations into quantitative associations expressed in useful algebraic equations known as Quantitative Structure-Activity (-Property) Relationships (QSAR/QSPR)6.

 

To get an insight into the structure-activity relationship we need molecular descriptors that can effectively characterize molecular size, molecular branching or the variations in molecular shapes, and can influence the structure and its activities. Many descriptors reflect simple molecular properties and thus they can provide some meaningful insights into the physical chemistry nature of the activity/property under consideration. Chemical graph theory7 advocates an alternative approach to QSAR/QSPR studies based on mathematically derived molecular descriptors. Such descriptors often referred to as topological indices8. Many descriptors reflect simple molecular properties and can thus provide insight into the physicochemical nature of the activity/ property under consideration. If molecular structure is critical for understanding of a particular structure-activity and property-activity relationship, then one should consider structural invariants derived from molecular structure9. Several graph theoretical invariants have been generalized so that they produce structure-dependent descriptors10-13. Ideally, the activities and properties are connected by some known mathematical function, F: Biological activity = F [structure (in present study topological & physicochemical descriptors are used as the structural parameters.)] Biological activity can be any measure such as log1/C, Ki, IC50, ED50, EC50, log K and Km.

 

The relationship or function is more often than not a mathematical expression derived by statistical or related techniques. In present study the multiple linear regression (MLR) technique is used. The parameters describing structural properties are used as independent variables and the biological activities are dependent variables.

 

In the present investigation a QSAR study is performed over a set of 42 chalcone derivatives. Their Biological activity is measure as IC50a (µM). For simplification of mathematics calculation we take Log IC50a (µM). This study based on the application of topological parameters in QSAR.

 

MATERIAL AND METHOD: –

We studied a series of chalcones with the activity express as IC50 a (µM) was taken from the literature. These chacone derivatives with their activity are presented in table 1. Topology Indices: All the topological indices used are calculated from the hydrogen suppressed molecular graph though their calculations are exclusively discussed in the literature. Topology indices are used for convert structure property into numerical form. Calculated topological descriptors included wiener index14-15 (W), mean distance degree deviation (MDDD), (Xu), kier flexibility index (PHI), mean information contain on the distance degree magnitude (IDDM), unipolarity (UNIP), randic connectivity indices19 fifth order (X5A), high per detour index (WW), log of product of row sum (LPRS), gutman molecular topological index (GMTI), solvation connectivity index chi-0 (X0Sol), polarity no. [P3] (P), (DECC), total information content on distant magnitude18 (IDMT), mean information content on distance equality18 (IDE), eccentric connectivity index17 (CSI), solvation connectivity index chi- 2 (X2Sol), average randic connectivity indices19 first order (X1A), mean information contain on the distance degree equality (IDDE), second mohar index (TI2), mean wiener index (WA), harary index (HAR1), schultz molecular topological index16 (SMTI), first zegreb index (Zm1), randic connectivity indices chi- 5 (X5), randic connectivity indices19 chi-3 (X3), total walk count (TWC),  average randic connectivity indices19 zero order (X0A),

 

Topological molecular descriptors are used in QSAR studies because of their accessibility, being easily computed by available software programs. The set of molecular descriptor which are used in the study are calculated by DRAGON software20. Stepwise multi regression analysis method was used to perform QSAR analysis. The stepwise multiple linear regressions (MLR) are a commonly used variant of MLR. Each variable is added to the equation at a time and a new regression is performed. The new term is retained only if the equation passes a test for significance. This regression method is especially useful when the number of variables is large and when the key descriptors are not known. This is the basis of maximum-R2 method for deriving most appropriate QSAR model. When the number of independent variables is greater than the number of molecules, multiple linear regressions cannot be applied then we applied stepwise multiple regression.

 

The best model was selected on the basis of various statistic parameter such as PSE (Prediction square error), K (Degree of lack of relationship), E (Index of forecasting efficiency), PE (Probable error of estimation), T test, Adjusted R2, Q (Quality of proposed model), S press (Uncertainty of prediction), PRESS (The Expression of PRESS), F test, SEE (Standard error of estimation) these are statistic parameter show the predictivity and significance of the model.

 

RESULT: -

When the data was subjected to stepwise multiple linear regression analysis, in order to develop QSAR between antimalarial activity of various compound as dependents variables and topology indices as independent variables, several equation is obtained.

 

Various widely used topological indices tested in the present study. In the proposing QSAR model for the modeling the antimalarial activity of compound we used the maximum R2 method. We used the cross validation parameter for investigating predictive power of various parameters and prove our finding. For the QSAR study of the same series we tested the multivariate combination of the parameter. The result obtain from the multivariate combination are encouraging and better model are show below with their statistics.

 

Graph between observed and calculated value of LogIC50 a(µM)

Model – 1    (Figure-1)

 

Graph between observed and calculated value of LogIC50 a(µM)

Model – 2   (Figure-2)


Table (1) Data set of chalcone derivative ()

 

Compound

R'

R

IC50 a(µM)

Log IC50 a(µM)

1

2',3',4'-trimethoxy

2,4- dichloro

5.4

0.73239376

2

2',3',4'-trimethoxy

4-dimethylamino

18

1.255272505

3

2',3',4'-trimethoxy

4-trifluoromethyl

3

0.477121255

4

2',3',4'-trimethoxy

2,4-dimethoxy

16.5

1.217483944

5

2',3',4'-trimethoxy

4-methyl

25.6

1.408239965

6

2',3',4'-trimethoxy

4-ethyl

16.5

1.217483944

7

2',3',4'-trimethoxy

3-quinolinyl

2

0.301029996

8

2',3',4'-trimethoxy

4-methoxy

25

1.397940009

9

2',3',4'-trimethoxy

4-fluoro

9.5

0.977723605

10

2',3',4'-trimethoxy

4-phenyl

26.2

1.418301291

11

2',3',4'-trimethoxy

4-nitro

22.5

1.352182518

12

2',3',4'-trimethoxy

3,4-dichloro

14.5

1.161368002

13

2',3',4'-trimethoxy

4-chloro

14.5

1.161368002

14

2',3',4'-trimethoxy

2-chloro

41.5

1.618048097

15

2',3',4'-trimethoxy

3-chloro

24.4

1.387389826

16

2',3',4'-trimethoxy

H

15.8

1.198657087

17

4'-butoxy

2,4-dimethoxy

108

2.033423755

18

2',4'-dimethoxy

2,4-dichloro

18.8

1.274157849

19

2',4'-dimethoxy

4-trifluromethyl

5.9

0.770852012

20

2',4'-dimethoxy

2,4-difluoro

6.2

0.792391689

21

2',4'-dimethoxy

2,4-dimethoxy

2.1

0.322219295

22

2',4'-dimethoxy

4-dimethylamino

70

1.84509804

23

2',4'-dimethoxy

4-cyano

94.5

1.975431809

24

2',4'-dimethoxy

H

55.5

1.744292983

25

4'-ethoxy

2'4-difluoro

28.1

1.44870632

26

4'-ethoxy

4-methoxy

33

1.51851394

27

4'-ethoxy

3-quinolinyl

24.9

1.396199347

28

4'-ethoxy

4-fluoro

24.1

1.382017043

29

4'-ethoxy

2,4-dichloro

96

1.982271233

30

4'-ethoxy

4-trifluromethyl

24

1.380211242

31

4'-ethoxy

2,4-dimethoxy

30

1.477121255

32

4'-ethoxy

4-methyl

38

1.579783597

33

4'-ethoxy

4-nitro

39

1.591064607

34

4'-ethoxy

4-dimethylamino

30

1.477121255

35

4'-ethoxy

H

43

1.633468456

36

2',4'-dihydroxy

2,4-difluoro

16

1.204119983

37

2',4'-dimethoxy

3-quinolinyl

2.2

0.342422681

38

2',4'-dimethoxy

4-quinolinyl

27

1.431363764

39

2',4'-dimethoxy

4-methoxy

128

2.108903128

40

2',4'-dimethoxy

4-dimethylamino

55.3

1.742725131

41

4'-methoxy

4-methoxy

21.7

1.336459734

42

4'-methoxy

4-methyl

70

1.84509804

 

 


Model 1:

Log IC50a (µm) = -14.51602(±6.26413) -0.16991(±0.04304) Polarity -1.06474(±0.57544) IDDM -1.44263(±0.26085) χ5 +1.22205(±0.36563) χ3 -0.45028(±0.07633) PHI +0.4167(±0.05330) MDDD +0.25054(±0.02875) χ0Sol -1.71862(±0.25943) Xu -0.00004(±0.00001) IDMT -0.0125(±0.00130) UNIP +0.44561(±0.09717) TWC -0.37731(±0.22138) χ0A         -----------(1)

 

R2=0.913, K=0.29495, PE=0.00886, T test=20.4899, Adjusted R2=0.87723, Q=6.0832, PRESS=0.71568, S PRESS=0.13259, F test=1448.3125, SEE=0.15708

Model 2:

Log IC50   a (µm) =33.13728(±4.46492) +0.24704(±0.02355) χ0Sol +0.29242(±0.06762) MDDD -0.00005(±0.00001) WW -0.02983(±0.00657) UNIP +4.08436(±1.34974) WA -2.1381(±0.25467) Xu -1.51101(±0.48849) IDDM -0.00004(±0.00001) IDMT +2.04721(±0.68426) TI2 -0.42213(±0.06214) PHI -69.25261(±13.9105) MSD +0.00038(±0.00015) SMTI + 0.04010(±0.02159) ZM1                                 ---------------(2)

R2=0.938, K=0.24789, PE=0.00625, T test=24.7171, Adjusted R2=0.91002, Q=7.18878, PRESS=0.50203, S PRESS=0.11099, F test=2511.0221, SEE=0.15708


 

Table (2) for observed and calculated activity

Observed Activity in Log IC50 a(µM)

Predicted Activity in Log IC50 a(µM) for Model 1

Predicted Activity in Log IC50 a(µM) for Model 2

Predicted Activity in Log IC50 a(µM) for Model 3

Predicted Activity in Log IC50 a(µM) for Model 4

Predicted Activity in Log IC50 a(µM) for Model 5

0.73239

1.02089

0.92891

0.97410

1.06121

1.13101

1.25527

1.54172

1.49445

1.54016

1.50278

1.49716

0.47712

0.58998

0.61528

0.63435

0.62694

0.57958

1.21748

1.07307

1.22440

1.14510

1.07060

1.13235

1.40823

1.36881

1.34052

1.36368

1.33628

1.27479

1.21748

1.32103

1.21784

1.26829

1.29253

1.18994

0.30102

0.25855

0.330803

0.30377

0.34310

0.40866

1.3979

1.27465

1.17436

1.22433

1.25710

1.13488

0.97772

1.15059

1.12535

1.15566

1.12167

1.01462

1.41830

1.13429

1.32820

1.31806

1.32469

1.25247

1.35218

1.13330

1.11157

1.15301

1.19083

1.01227

1.16136

1.08600

1.02888

1.06656

1.00411

1.23805

1.16136

1.16267

1.15335

1.16893

1.20633

1.05240

1.61804

1.63004

1.61594

1.63698

1.58720

1.41616

1.38738

1.20093

1.23782

1.25089

1.12471

1.42246

1.19865

1.38663

1.25664

1.24719

1.25998

1.19533

2.03342

2.05434

2.05688

2.00715

2.13225

1.68585

1.27415

1.27692

1.33508

1.31272

1.23466

1.16756

0.77085

0.86221

0.87289

0.87991

0.77889

0.82978

0.79239

0.79594

0.84832

0.85060

0.71443

0.60096

0.32221

0.39664

0.34325

0.22252

0.43659

0.97262

1.84509

1.67710

1.68122

1.70540

1.63719

1.73499

1.97543

2.07336

2.07830

2.07019

2.02382

2.08726

1.74429

1.70478

1.86562

1.69970

1.95032

1.62932

1.4487

1.45354

1.47495

1.47252

1.46579

1.42927

1.51851

1.64972

1.65048

1.71175

1.74612

1.58463

1.39619

1.53912

1.47980

1.51416

1.20640

1.29266

1.38201

1.37208

1.31277

1.34289

1.28960

1.54545

1.98227

1.92028

1.94585

1.91861

1.97304

1.97609

1.38021

1.32096

1.21059

1.23624

1.23374

1.39867

1.47712

1.33978

1.38967

1.40775

1.38365

1.31998

1.57978

1.59005

1.52769

1.55067

1.50396

1.80563

1.59106

1.54329

1.63636

1.59526

1.58199

1.68033

1.47712

1.54329

1.63636

1.59526

1.58199

1.68033

1.63346

1.56356

1.59581

1.59647

1.60432

1.67691

1.20411

1.042072

1.103161

1.05644

1.11997

0.95651

0.34242

0.38530

0.37

0.37279

0.55018

0.47563

1.43136

1.42323

1.36617

1.29995

1.41221

1.32409

2.10890

2.07336

2.07830

2.07019

2.02382

2.08726

1.74272

1.67710

1.68122

1.70540

1.63719

1.73499

1.33645

1.60147

1.44365

1.49650

1.63841

1.49339

1.84509

1.70457

1.77775

1.77510

1.74666

1.79495

 


Model 3:

Log IC50   a (µm) = 29.92919(±3.87923) -2.62764(±0.31106) Xu +0.23882(±0.02212) χ0Sol +2.04259(±0.69733) TI2 +5.15492(±1.27396) WA +0.29195(±0.05752) MDDD -0.00003(±0.00001) WW -0.02905(±0.00670) UNIP -1.4805(±0.49789) IDDM -0.00004(±0.00001) IDMT -0.42685(±0.06265) PHI -64.30332(±12.25940) MSD +0.33954(±0.07770) HAR1 ----------------(3)

 

R2=0.933, K=0.2569, PE=0.00672, T test=23.7879, Adjusted R2=0.90625, Q=7.03919, PRESS=0.54632, SPRESS= 0.11579, F test=2245.076, SEE=0.13727

 

Model 4:

Log IC50a(µm) = 34.12237(±5.04790) +0.33232(±0.05648) MDDD -1.61711(±0.43606) Xu -0.34393(±0.077355) PHI -1.78642(±0.60925) IDDM -0.00928(±0.00342) UNIP -124.21328(±25.71528) χ5A -0.00008(±0.00001) WW +0.15379(±0.07474) LPRS +0.00021(±0.00009) GMTI +0.24638(0.02663) χ0Sol -0.10159(±0.04324) Polarity---(4)

R2=0.898, K=0.31829, PE=0.01031, T test=18.8368, Adjusted R2=0.86155, Q=5.6828, PRESS=0.83854, S PRESS=0.14356, F test=1133.8266, SEE=0.16681

 

Model 5:

Log IC50a (µm) = 27.12575(±6.03589) +3.25065(±0.46785) DECC +0.36882(±0.06741) χ2Sol -0.00005(±0.00002) IDMT -0.51515(±0.09513) PHI -4.99326(±1.19148) IDE -0.00009(±0.00002) WW +0.00667(±0.00245) CSI -28.35979(±10.20545) χ1A -1.23082(±0.62680) IDDM +0.40981(0.21460) IDDE  ------------(5)

 

R2=0.857, K=0.37815, PE=0.01456, T test=15.4828, Adjusted R2=0.81087, Q=4.7481, PRESS=0.72938, S PRESS=0.13399, F test=643.789, SEE=0.19497

 

In above equation, + sign indicate that activity is proportional in successive regression analysis. We have carried out several multi parametric regression analysis. In all such multi parametric regression analysis better result are obtained than the mono parametric model. The observed and calculated activities of these models are given in table 2 and their graph between observed and calculated activity value are recorded in Figure 1, 2 3, 4 and 5.

 

Graph between observed and calculated value of LogIC50 a(µM)

Model-3   (Figure-3)

 

Graph between observed and calculated value of LogIC50 a(µM)

Model-4   (Figure-4)

 

Graph between observed and calculated value of LogIC50 a(µM)

Model-5    (Figure-5)

 

DISCUSSION:

The model expressed by equation 2, this model has the highest R2 value with good statistics. The best model is one which has the best statistics as well as best predicting power. Thus we have obtained predictive correlation coefficient (R2) for the model express Equation (1-5) by correlating observed activity with calculated one. The R2 obtained are presented in figure (1-5). This show the model expressed by Equation 2 is most appropriate model for modeling activity. The predictive power of the model can also be justify by calculating PSE, K, E, PE, T test, F test, PRESS, SPRESS, SEE, Adjusted R2.The above all statics for the model mentioned with the 1-5 models. The comparative analysis of statistic associated with model show that the model based on Equation 2 is most suitable for modeling of activity.

 

The involvement of these topological properties on activity modeling indicates that the activity behavior of compounds is affected by these properties or we can say that these properties are responsible for that type of behavior. These topological parameters are related to steric, and electronic attitude of molecule.

 

We predict value of activity of unknown compound by the use of value of topological indices. The above discussion supported the utility of topological parameters which are used in research work.

 

REFERENCES:

1.       J. Med. Chem. 2001,44, 4443-4452.

2.       Randic´, M. J. Chem. Inf. Comput. Sci. 1997, 37, 672.

3.       Balaban, A. T. et al; Topics Curr. Chem. 1983, 114, 21

4.       Balaban, A. T. Historical developments of topological indices. Topological Indices and Related Descriptors in QSAR and QSPR; Devillers, J., Balaban, A. T., Eds; Gordon and Breach: Amsterdam, the Netherlands, 1999; p 403.

5.       Balaban, A. T. J. Mol. Struct. (THEOCHEM) 1988, 165, 243.

6.       Basak, S. C. Information theoretic indices of neighborhood complexity and their applications. In Topological Indices and Related Descriptors in QSAR and QSPR; Devillers, J., Balaban, A. T., Eds; Gordon and Breach: Amsterdam, the Netherlands, 1999; p 563.

7.       Trinajstic´, N. In Chemical Graph Theory; CRC Press: Boca Raton, FL, 1992; p 225.

8.       Basak. S. C. et al. Use of graph-theoretic geometrical molecular descriptors in structure-activity relationships. In From Chemical Topology to Threedimensional Geometry; Plenum Press: New York, 1977; p 73.

9.       Randic´, M.; Razinger, M. On characterization of 3D molecular structure. In From Chemical Topology to Three-dimensional Geometry; Plenum Press: New York, 1977; p 159.

10.     Randic´, M. Int. J. Quantum Chem: Quantum Biol. Symp. 1988, 15, 201.

11.     Randic´, M. et al. Computer Chem. 1990, 14, 237.

12.     Randic´, M. J. Chem. Inf. Comput. Sci. 1995, 35, 373.

13.     Randic´, M. New J. Chem. 1995, 196, 781.parameters in QSAR.

14.     Wieney, H.J. Am. Chem. Soc., 69. (1947), 2636-2638.

15.     Wieney, H.J. Phys. Chem., 15, (1947), 766.

16.     Schultz, H.P., J. Chem. Inf. Compute. Sci., 29, (1989) 227-228.

17.     Sharma, V. et al. Chem. Inf. Comput. Sci., 37 (1997), 273-282.

18.     Banchev, D. Information Theoretic Indices fro characterization of chemical structure RSP-Wiley, Chichetsey (U.K.) (1983).

19.     Randic´, M. Stud. Phys. Theor. Chem. 1988, 54, 101.

20.     Dragon 5 software, http://www.disat.unimib.it/chm/Dragon.htm      

 

 

 

Received on 14.06.2010        Modified on 29.06.2010

Accepted on 08.07.2010        © AJRC All right reserved

Asian J. Research Chem. 3(4): Oct. - Dec. 2010; Page 1030-1034