Method for screening and terminating structure descriptor of activity related model of pollutant quantitative structure



The invention discloses a method for screening and terminating a structure descriptor of an activity related model of a pollutant quantitative structure. The method provided by the invention comprises the following steps of: integrating a cross validation correlation coefficient q2 and a model modification correlation coefficient R2adj, establishing a statistical model of a variable subset to obtain a correlation coefficient r2 between an observed value and a model estimation value and obtain a modification correlation coefficient R2adj; subjecting the variable subset of the process above to cross validation to obtain a cross validation correlation coefficient q2 of the model, wherein the cross validation is carried out by means of two methods, i.e. a leave-one-out cross validation and a leave-many-out cross validation; constructing a new parameter QRadj according to a statistical parameter obtained in the process above, wherein the numerical value of the new parameter QRadj of the same system is proportional to the stability of the model and is proportional to the predictive ability. The method for screening and terminating a structure descriptor of an activity related model of apollutant quantitative structure provided by the invention has the advantages that the relatively high cross validation correlation coefficient q2 of the model can be ensured while avoiding the presence of over-fitting phenomenon through the new standard QRadj, the QSAR (Quantitative Structure Activity Relationship) model variable combination with low r2 value and high q2 value can be prevented from screening, and the stability and the predictive ability of the model are scientifically described.




