NMR chemical shift model description

Introduction
HOSE code-based chemical shift prediction
Chemical shift descriptors
Decision tree-based (M5P) chemical shift prediction
Mixed chemical shift model
Chemical shift data source
References

Introduction

The current version of NMRPredictor employs the combination of two basic methods (similarity search based on HOSE code technology and QSPR modeling) for chemical shift prediction. After a concise introduction to the HOSE code technology, QSPR descriptors and decision tree-based QSPR modeling we will show how the HOSE and QSPR approaches can be merged in order to obtain an accurate and robust chemical shift prediction model.

HOSE code-based chemical shift prediction¹

The HOSE code technology is often used to describe the chemical environment of a selected atom up to a certain radius. Atoms with the same HOSE code are assumed to have similar chemical shift values. The larger the radius of the common HOSE code is the more similar the chemical shifts are. If we have a database containing HOSE codes as keys and corresponding experimental chemical shifts as values we can predict chemical shift values based on similarity search.

Example:

Chemical shift descriptors^2,3,4

Two kind of descriptor sets were implemented for chemical shift prediction: "sum" (for ¹H NMR) and "detailed" (for ¹³C NMR). Both of them are based on a traversal of the molecule graph starting from a selected atom (focus atom, this is the atom whose chemical shift needs to be predicted). After identifying the neighbors of the focus atom in the different spheres (see the figure below) we count the occurrences of previously defined atom types in the different spheres. The current implementation employs 6 spheres around the focus atom and an additional sphere containing the rest of the atoms. Currently 40 atom types are handled by the descriptor computation. Number of ring closures and hydrogen atoms in a given sphere are added to the 40 atom type counts for each sphere. The same procedure is repeated for atoms belonging to one of the pi-electron systems of the molecule. Thus the total number of the chemical shift descriptors equals 2*(6+1)*(40+2) = 588. In addition to the descriptors of the sum model, the detailed model also utilizes 8 physicochemical descriptors (valence, period, electronegativity, van der Waals radius, hybridization, bond type to previous atom, number of protons attached, ring closure count) to characterize atoms of the inner (in our case only for the first) spheres. The rest of the spheres are described by the previously outlined method. Thus the detailed model generates 2*6*(40+2)+4*8 = 536 descriptors for ¹³C NMR chemical shift prediction.

Decision tree-based (M5P) chemical shift prediction

In order to reach a better chemical shift prediction accuracy the following clusters have been introduced:

¹³C clusters:
- aromatic C
- aromatic CH
- sp³ CH₃
- sp³ CH₂
- sp³ CH
- sp³ C
- sp² CH or CH₂
- sp² C
- sp C
¹H clusters:
- protons attached to C
- heteroatomic protons (X-H, where X is not C)

Each of these clusters has an M5P decision tree-based chemical shift prediction model. Decisions corresponding to the nodes of the tree are made based on the chemical shift descriptor values until one of the leaves is reached. Each leaf of the decision tree corresponds to a multilinear regression (MLR) model which is employed for the prediction of the chemical shift of the focus atom.

Mixed chemical shift model

In order to predict chemical shifts accurately, we combined the decision tree-based and HOSE models as follows:

For ¹H NMR:
- Start with a HOSE radius of 6 and generate HOSE code for the focus atom
- If there are shifts corresponding to this HOSE code, return the average them and
- If not, go to HOSE radius of 5, ...
- The minimal possible HOSE radius is 4
- Invoke the M5P-based chemical shift model if there have not been any HOSE hits.
For ¹³C NMR:
- Very similar to the 1H NMR mixed chemical shift model
- The minimal possible HOSE radius is 3 in this case

Chemical shift data source

The training and test chemical shift data were obtained from NMRShiftDB, see the link http://nmrshiftdb.nmr.uni-koeln.de/ for further details.

References

Anal. Chim. Acta 103, 355-365 (1978).
J. Chem. Inf. Model. 48, 128-134 (2008).
J. Chem. Inf. Comput. Sci. 40, 1169-1176 (2000).
J. Magn. Reson. 157, 242-252 (2002).