pKa calculation training

If you think your experimental data improved the accuracy of the default pKa calculator, you can take advantage of the supervised pKa learning method that is built into the pKa calculator. Special structural parts can have an effect on the pKa values calculated by the built-in method, so your correction library based on your experimental data helps the pKa calculator increase the prediction accuracy.

Training set

Inaccurately predicted ionization centers need to be identified and experimental data for them have to be collected. Since the learning algorithm is based on linear regression analysis, therefore you need to collect as many experimental pKa data as possible otherwise there won't be correlation. There is no strict rule on the number of the experimental datapoints, if your purpose is to create a local model, only for a certain type of chemical of the ionization centers, then it may be enough to collect a few representative structures. A robust model, however, requires as many diverse structures and pKa values as possible.

The experimental data should be collected in an sdf file, then the training algorithm has to be run which creates a correction library. This will be stored on your local computer, in your user folder. Finally, this correction library via MarvinSketch, cxcalc, Chemical Terms can be applied.

Input file preparation
It can be compiled by using either Instant JChem or JChem for Excel. The sdf file should contain the followings:


A sample of a typical training set is shown in the picture (pKa_trainingset.sdf). ID1 is the index of the atom with the experimental pKa1 value.

mypkadata

Generate the correction library

Correction library can be created by using the command line, cxtrain:
cxtrain pka -i [library name] [training file] 
Example:
cxtrain pka -i mypka mydata.sdf

Application of the correction library

    MarvinSketch

  1. Select MarvinSketch menu:Tools > Protonation > pKa.
  2. Set the 'Use correction library' box to activate the training option (see figure below).
  3. If you have created multiple training sets, choose the most accurate one from the dropdown list below the checkbox.

  4. pKa options panel in Marvin

    The next figure shows the results with (I) and without (II) applying the correction library.

    MarvinSketch trained pKa calculation MarvinSketch not trained pKa calculation
    I. pKa calculation with training data II. pKa calculation without training data

    Cxcalc

    To include your correction library in the pKa calculation use the parameter --correctionlibrary or its short form: -L.
    cxcalc pKa  --correctionlibrary  [library name] [input file/string]
    Example
    $ cxcalc pKa --correctionlibrary mypka "CSC1=NC2=C(N1)C=NC(O)=N2"
    Result
     id      apKa1   apKa2   bpKa1   bpKa2   atoms
     1       11.19   16.01   2.34    -2.59   7,11,9,4

    If you use cxcalc pKa calculation without the correction library, the results will be calculated with the built-in dataset.
    Example
    $ cxcalc pKa "CSC1=NC2=C(N1)C=NC(O)=N2"
    Result
     id      apKa1   apKa2   bpKa1   bpKa2   atoms
     1       8.34   16.01   2.34    -2.59   7,11,9,4

    For more options see this page.

    Chemical Terms

    Chemical Terms are available either from command line, evaluator or from Instant JChem.

    Command line, evaluator

    Evaluator is designed to evaluate mathematical expressions on molecules. Your correction library can be applied as follows:
    evaluate -e "pKa('correctionlibrary:[library name]')" "[input file/string]"
    Example
    evaluate -e "pKa('correctionlibrary:mypka')" "CSC1=NC2=C(N1)C=NC(O)=N2"
    or Result
    ;;;-2,59;;;11,19;;2,34;;16,01;

    For more evaluator functions on pKa training see this page.

    Instant JChem

    Choose the 'New Chemical Terms Field icon' and type the chemical term into the window, use the correctionlibrary:[library name] parameter. Do not forget to adjust the Name, the Type and the DB Column Name.
    Example
    The following figure demonstrates the usage of pKa training in the 'New Chemical terms' window. The expression pKa ('correctionlibrary:mypKa type:acidic','1') defines that the plugin use the correction library named mypKa, and it will calculate the strongest acidic pKa of the molecule(s).


    New Chemical Terms window in Instant JChem


    The results of this calculation are shown in the figure below. You can see the difference between the untrained (column 5., Strongest acidic pKa) and trained (column 6., Trained strongest acidic pKa) pKa values.

    New Chemical Terms window in Instant JChem