NMR chemical shift model description

Contents

Introduction

The current version of NMRPredictor employs the combination of two basic methods (similarity search based on HOSE code technology and QSPR modeling) for chemical shift prediction. After a concise introduction to the HOSE code technology, QSPR descriptors and decision tree-based QSPR modeling we will show how the HOSE and QSPR approaches can be merged in order to obtain an accurate and robust chemical shift prediction model.

HOSE code-based chemical shift prediction1

The HOSE code technology is often used to describe the chemical environment of a selected atom up to a certain radius. Atoms with the same HOSE code are assumed to have similar chemical shift values. The larger the radius of the common HOSE code is the more similar the chemical shifts are. If we have a database containing HOSE codes as keys and corresponding experimental chemical shifts as values we can predict chemical shift values based on similarity search.

Example:

Example1

Chemical shift descriptors2,3,4

Two kind of descriptor sets were implemented for chemical shift prediction: "sum" (for 1H NMR) and "detailed" (for 13C NMR). Both of them are based on a traversal of the molecule graph starting from a selected atom (focus atom, this is the atom whose chemical shift needs to be predicted). After identifying the neighbors of the focus atom in the different spheres (see the figure below) we count the occurrences of previously defined atom types in the different spheres. The current implementation employs 6 spheres around the focus atom and an additional sphere containing the rest of the atoms. Currently 40 atom types are handled by the descriptor computation. Number of ring closures and hydrogen atoms in a given sphere are added to the 40 atom type counts for each sphere. The same procedure is repeated for atoms belonging to one of the pi-electron systems of the molecule. Thus the total number of the chemical shift descriptors equals 2*(6+1)*(40+2) = 588. In addition to the descriptors of the sum model, the detailed model also utilizes 8 physicochemical descriptors (valence, period, electronegativity, van der Waals radius, hybridization, bond type to previous atom, number of protons attached, ring closure count) to characterize atoms of the inner (in our case only for the first) spheres. The rest of the spheres are described by the previously outlined method. Thus the detailed model generates 2*6*(40+2)+4*8 = 536 descriptors for 13C NMR chemical shift prediction.

Spheres

Decision tree-based (M5P) chemical shift prediction

In order to reach a better chemical shift prediction accuracy the following clusters have been introduced:

Each of these clusters has an M5P decision tree-based chemical shift prediction model. Decisions corresponding to the nodes of the tree are made based on the chemical shift descriptor values until one of the leaves is reached. Each leaf of the decision tree corresponds to a multilinear regression (MLR) model which is employed for the prediction of the chemical shift of the focus atom.

Mixed chemical shift model

In order to predict chemical shifts accurately, we combined the decision tree-based and HOSE models as follows:

Chemical shift data source

The training and test chemical shift data were obtained from NMRShiftDB, see the link http://nmrshiftdb.nmr.uni-koeln.de/ for further details.

References

  1. Anal. Chim. Acta 103, 355-365 (1978).
  2. J. Chem. Inf. Model. 48, 128-134 (2008).
  3. J. Chem. Inf. Comput. Sci. 40, 1169-1176 (2000).
  4. J. Magn. Reson. 157, 242-252 (2002).
 
Copyright © 1999-2014 ChemAxon Ltd.    All rights reserved.