This manual gives you a walk-through on how to use the cxtrain command line tool:
Introduction
Some property calculations can be enhanced when experimental data are available for molecules that are similar to the target. Such user-specific information can be incorporated into so-called training libraries, which can be generated with the ChemAxon's commandline tool cxtrain. It is a part of JChem and Marvin Beans packages.
The generated training library, stored on the user's computer, is used by the calculator plugins for improving the property prediction.
Usage
Invoking cxtrain
Invoking cxtrain -h gives the following output:
cxtrain <prediction> [options] [input file (training set)]
Prediction: pka train pKa prediction logp train logP prediction prediction train custom prediction General options: cxtrain -h, --help this help message -i, --training-id<training> sets the training ID -l, --list list available training ID's -g, --ignore-error continue with next molecule on error pKa options: -V, --validation <filepath> validation results file path logP options: -t, --tag <tag name> name of the SDFile tag that stores the experimental logP values -a, --add-built-in-training-set add built-in logP training set Custom prediction options: -t, --tag <tag name> name of the SDFile tag that stores the experimental property values
So you can train a plugin by calling cxtrain:
cxtrain <prediction> [options] [input file (training set)]
where prediction must be chosen from among pka, logP or Custom prediction (used for a custom property).
Input of cxtrain
cxtrain can handle any molecular file format that is supported by ChemAxon. (e.g.: MDL Molfile, SDF)
Placing the training library
The generated training library is stored on your computer, and it can be used via Marvin, Chemical Terms, Instant JChem or cxcalc
.
Options
General options
- Applying the option --training-id (-i), you can set the ID of your training. Afterwards, this ID will refer the given training during the calculation.
- The available training ID's can be listed using option --list (-l).
--ignore-error (-g) skips the molecule on error and continues with the next correct one.
Plugin-specific options
The following plugin-specific options are available:
pKa Plugin:
- --validation <filepath> (-V) creates validation data; the file path of the pKa training validation chart can be defined optionally.
--add-built-in-training-set (-a) merges your data with the data from built-in logP training set.
Option --tag (-t) defines the name of the SDFile tag that stores the experimental logP values.
Option --tag (-t) defines the name of the SDFile tag that stores the experimental custom defined values.
Examples
Training pKa calculations
Step #1 Creating the training library from a given data file pKa_trainingset.sdf with a training ID mypka:
cxtrain pka -i mypka pKa_trainingset.sdf
Step #2 Using the generated training set in pKa calcutlations with cxcalc:
cxcalc pKa --correctionlibrary mypka "CSC1=NC2=C(N1)C=NC(O)=N2"
The result of the training is:
id apKa1 apKa2 bpKa1 bpKa2 atoms
1 11.19 16.01 2.34 -2.59 7,11,9,4
Training logP calculations
Step #1 Creating the training library from the given data file logP_trainingset.sdf (with experimental logP values stored in the SDF tag named LOGP), setting training ID to mylogp and including data from the built-in training set:
cxtrain logp -t LOGP -i mylogp -a logP_trainingset.sdf
Step #2 To apply your generated logP training library in calculations use the parameter --trainingid and combine it with the parameter --method via cxcalc:
cxcalc logp --method user --trainingid mylogp "CC(C)CCO"
The result of the training is:
id logP 1 1,13