The Chemical Terms Evaluator is designed to evaluate mathematical expressions on molecules. These expressions usually have a chemical meaning formulated in ChemAxon's Chemical Terms Language using built-in chemical and general purpose functions. It is also possible to extend this built-in set of calculations by a user-defined configuration.
Apart from evaluating Chemical Terms by the evaluate
chemaxon command
line tool, this evaluation mechanism is used for chemical calculations in chemaxon products
where computational and/or search conditions come into the picture, such as
pharmacophore feature identification (PMapper)
(note, that pmapper feature definitions use a
specific syntax),
reaction definitions (Reactor),
database filters and chemical calculations (JChem Cartridge).
The heart of the evaluator mechanism is the JEP Java Expression Parser.
You may want to look at the complete language reference including a description of the expression syntax and some simple examples showing how some well-known chemical rules can be formulated in this language.
Evaluator uses molecule context to set the input molecule, therefore calculations refer to the input molecule by default. The language reference also includes a set of Evaluator examples. A set of working examples is available.
- Download and launch platform specific installer by following the installation instructions.
evaluate
evaluates a single expression and prints the result
in human readable text format or else outputs the input molecule with the result set as a
specified SDF tag.
evaluate [options] [input files/strings]
Options: -h, --help this help message -l, --list-functions list Chemical Terms functions Input Options: -c, --config <filepath> configuration XML file (if omitted then default configuration is applied) -n, --no-input-mol expression should be evaluated without input molecule -e, --expr-string <str|filepath> expression string or file Output Options: -o, --output <filepath> output file path (default: stdout) -g, --ignore-error continue with next molecule on error -v, --verbose verbose output -C, --clean <dim[:opts]> clean output molecules (dim: 2 or 3) with options (default: t2000 - time limit: 2 sec) (see http://www.chemaxon.com/marvin/help/sci/cleanoptions.html) -f, --format <format> output format if result is molecule (default: smiles or smarts) (ignores the output options below) -x, --extract <format> extract mode: write exactly those molecules in the specified format that satisfy the input boolean expression (excludes other output options) -p, --precision <precision> max. number of fractional digits in the output (default: 2) -S, --sdf-output SDF output (otherwise text output) -t, --tag name of the SDFile tag to store the evaluation result (default: CALC) -i, --include-expr output expression string
The input molecule file can contain more than one molecule, in this case the expression evaluation is performed for all input molecules one-by-one.
The command line parameter --config
specifies the filename of the
configuration file. If this parameter is not specified, then the
default configuration is used.
If the command line parameter --no-input-mol
is specified then the expression is evaluated
without input molecule.
The command line parameter --expr-string
specifies the expression string if it is
given on the command line or the file path containing the expression string.
The command line parameter --format
specifies the output molecule format in case
when the output is a molecule or a molecule array. The default format is SMILES / SMARTS.
If this option is used then all other output options except for --output
,
--ignore-error
and --verbose
are ignored.
If the command line parameter --clean
is specified then result molecules as well as
SDF output is cleaned in the given dimension.
If the command line parameter --extract
is specified then the input expression is
used as a molecule filter: for each input molecule it is evaluated as a boolean condition and the
program filters the molecules that satisfy this condition, that is, for which the expression
evaluation result is true
. These molecules are written as output in the specified format.
If this option is used then all other output options except for --output
,
--ignore-error
and --verbose
are ignored.
The command line parameter --precision
specifies the maximum number of fractional
digits to be displayed in the output.
If the command line parameter --sdf-output
is specified then input molecules are
written to the output in SDF format with evaluation result set as an SDF tag.
The command line parameter --tag
specifies this SDF tag.
If the command line parameter --include-expr
is specified then the evaluation result
is preceeded by the expression string itself in the output.
If the command line parameter --ignore-error
is specified, then import/export errors
will not stop the processing but the error is written to the console and the molecule is skipped.
By default, the program exits in case of molecule import/export errors.
The software may take molecules from a text file. Most molecular file formats are accepted (MDL molfile, Compressed molfile, SDfile, Compressed SDfile, SMILES, etc.).
If no input file name is given in the command line the standard input is read.
If no output file name is given, results are written to the standard output.
If the --sdf-output
command line parameter is specified,
the output format is SDF and the evaluation result is written to an SDF tag
(default tag: CALC). Otherwise only the evaluation result is written to the
output in simple text format.
The configuration file is an XML file containing some/all of the following optional subsections:
The evaluator parameter section currently sets the cache-mode attribute: if set to "true" then matching condition and plugin calculation results are cached in the molecule object and reused instead of performing the same structure search or chemical calculation repeatedly. The default is "false", since typically a Chemical Terms evaluation does not contain multiple references to the same matching condition or calculation and the caching procedure by itself also has some overhead.
Example:
<Params Cached="true"/>
The plugin declarations enables different structure based chemical calculations (e.g. pKa, logP, logD) to be referenced in the expression strings.
Declaration
The plugin definition section contains the following data for each plugin reference that is to be used in the expressions:
marvin/plugins
directory
(marvin
refers to Marvin istallation directory),
where the plugin class should be loaded from (optional, loaded from the usual CLASSPATH if omitted);
The set of possible plugin parameters and a short description for each plugin can be seen with the help of the cxcalc program:
cxcalc <plugin> -h
where plugin is the plugin ID in the cxcalc configuration file. The parameter names used by the Evaluator are the long command line parameter names, without the starting '--' double dashes. For example, take pKa, type:
cxcalc pka -h
which prints out the following help text:
Calculator plugin: pka. pKa calculation. Usage: cxcalc [general options] [input files] pka [pka options] [input files] pka options: -h, --help this help message -p, --precision <floating point precision as number of fractional digits: 0-8 or inf> default: 2 -t, --type [pKa|acidic|basic] (default: pKa) -m, --mode [macro|micro] (default: macro) -n, --ions max number of ionizable atoms to be considered (default: 8) -i, --min min basic pKa (default: -10) -x, --max max acidic pKa (default: 20) -a, --na number of acidic pKa values displayed (default: 2) -b, --nb number of basic pKa values displayed (default: 2)
The help
, precision
, na
and nb
parameters refer to display options, therefore these are not used by the Evaluator.
Thus the parameter set for the pKa calculation in our case is:
type, mode, ions, min, max.
The same plugin can be used with different parameter settings if the XML configuration has more than one
<Plugin>
section with the same
java class but different plugin names
used to reference the plugins with each of the different
parameter sections. In the following example the
pKa1
name references pKa calculation with minimal
basic pKa value
-3
and maximal acidic pKa value 10
while the
pKa2
name references pKa calculation with
minimal basic pKa value
-20
and maximal acidic pKa value 30
.
Different functions of a calculator plugin can be referenced by different
IDs. In the example below, the "mass" result type of the ElemetalAnalyser
plugin is referenced by the mass name, while the "exactmass" result type of
the same plugin is referred by the exactmass name.
<Plugins> <Plugin ID="charge" Class="chemaxon.marvin.calculations.ChargePlugin" JAR="ChargePlugin.jar"/> <Plugin ID="ioncharge" Class="chemaxon.marvin.calculations.IonChargePlugin"> <Param Name="pH" Value="3.6"/> <Param Name="max-ions" Value="6"/> <Param Name="min-percent" Value="5"/> <Param Name="charge-type" Value="accumulated"/> </Plugin> <Plugin ID="microspecies" Class="chemaxon.marvin.calculations.MajorMicrospeciesPlugin"/> <Plugin ID="pka" Class="chemaxon.marvin.calculations.pKaPlugin"/> <Plugin ID="pKa1" Class="chemaxon.marvin.calculations.pKaPlugin"> <Param Name="min" Value="-3"/> <Param Name="max" Value="10"/> </Plugin> <Plugin ID="pKa2" Class="chemaxon.marvin.calculations.pKaPlugin"> <Param Name="min" Value="-20"/> <Param Name="max" Value="30"/> </Plugin> <Plugin ID="logp" Class="chemaxon.marvin.calculations.logPPlugin"> <Param Name="type" Value="logPMicro"/> </Plugin> <Plugin ID="mass" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin"> <Param Name="type" Value="mass"/> </Plugin> <Plugin ID="exactmass" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin"> <Param Name="type" Value="exactmass"/> </Plugin> <Plugin ID="logp" Class="chemaxon.marvin.calculations.logPPlugin"/> <Plugin ID="logd" Class="chemaxon.marvin.calculations.logDPlugin"/> <Plugin ID="acc" Class="chemaxon.marvin.calculations.HBDAPlugin"> <Param Name="type" Value="acc"/> </Plugin> <Plugin ID="don" Class="chemaxon.marvin.calculations.HBDAPlugin"> <Param Name="type" Value="don"/> </Plugin> <Plugin ID="acceptorcount" Class="chemaxon.marvin.calculations.HBDAPlugin"> <Param Name="type" Value="acceptorcount"/> </Plugin> <Plugin ID="donorcount" Class="chemaxon.marvin.calculations.HBDAPlugin"> <Param Name="type" Value="donorcount"/> </Plugin> </Plugins>
The expression strings can also include references to predefined functions.
These functions are implemented by java classes that have to implement the
org.nfunk.jep.function.PostfixMathCommandI
interface. See the
JEP API Documentation for details.
Declaration
The function definition section contains the user-defined function implementation java classes
accessible from the expressions. Each class is given an ID
:
this is the name that the function is referenced by from the expression. The
Class
attribute specifies the java class that implements
the function. A predefined function may have preset parameters in a similar fashion as in the
Plugin declaration section. Currently only the
atomic property query function applies this for presetting the name of the
atomic property to be queried.
Example:
<Functions> <Function ID="array" Class="chemaxon.jep.function.IntArray"/> <Function ID="min" Class="chemaxon.jep.function.Min"/> <Function ID="max" Class="chemaxon.jep.function.Max"/> <Function ID="count" Class="chemaxon.jep.function.Count"/> <Function ID="sum" Class="chemaxon.jep.function.Sum"/> <Function ID="sortasc" Class="chemaxon.jep.function.SortAsc"/> <Function ID="sortdesc" Class="chemaxon.jep.function.SortDesc"/> <Function ID="in" Class="chemaxon.jep.function.In"/> <Function ID="eval" Class="chemaxon.jep.function.AtomEvaluatorFunction"/> <Function ID="filter" Class="chemaxon.jep.function.Filter"/> <Function ID="minatom" Class="chemaxon.jep.function.MinAtom"/> <Function ID="maxatom" Class="chemaxon.jep.function.MaxAtom"/> <Function ID="minvalue" Class="chemaxon.jep.function.MinValue"/> <Function ID="maxvalue" Class="chemaxon.jep.function.MaxValue"/> <Function ID="atomprop" Class="chemaxon.jep.function.AtomProperties"/> <Function ID="hcount" Class="chemaxon.jep.function.AtomProperties"> <Param Name="property" Value="hcount"/> </Function> <Function ID="connections" Class="chemaxon.jep.function.AtomProperties"> <Param Name="property" Value="connections"/> </Function> <Function ID="valence" Class="chemaxon.jep.function.AtomProperties"> <Param Name="property" Value="valence"/> </Function> <Function ID="atno" Class="chemaxon.jep.function.AtomProperties"> <Param Name="property" Value="atno"/> </Function> <Function ID="map" Class="chemaxon.jep.function.AtomProperties"> <Param Name="property" Value="map"/> </Function> <Function ID="arom" Class="chemaxon.jep.function.AtomProperties"> <Param Name="property" Value="arom"/> </Function> </Functions>
The matching condition declaration enables the Match
function to be used
in expression strings. This function performs substructure search and optionally checks for
atom matching.
Declaration
The declaration gives a reference ID to the function, should contain a Class
attribute which specifies the java class that implements the function, and can specify the
search attributes in case when they differ from the default settings.
Specifying search attributes is optional, if omitted then the default values are used. For a detailed description of
the search options see the JChem Query Guide.
Attribute | Range | Default Value |
---|---|---|
StereoSearch | true/false | true |
DoubleBondStereoMatchingMode | none/marked/all | marked |
SubgraphSearch | true/false | true |
ExactAtomMatching | true/false | false |
ExactStereoMatching | true/false | false |
OrderSensitiveSearch | true/false | false |
Example:
<Matching ID="match" Class="chemaxon.jep.function.Match"> <Search DoubleBondStereoMatchingMode="all" OrderSensitiveSearch="true"/> </Matching>
A detailed description of the usage of the match function in expression strings is given below. A table of match function descriptions with examples is also available as a short reference.
Default plugin and function definitions as well as the default matching condition are read from the
built-in evaluator.xml
file located under the
chemaxon/jep
directory in marvinbeans.jar / jchem.jar
provided by ChemAxon.
Plugins, functions and matching conditions
defined by the user are read from marvin/config/evaluator.xml
file (where marvin
is the Marvin
istallation directory) and from MARVIN_MAJOR_VERSION/evaluator.xml
file (where MARVIN_MAJOR_VERSION
is the major version of Marvin, e.g. "5.1") located under the .chemaxon
(UNIX / Linux) or chemaxon
(Windows) subdirectory in the user's home directory. The user defined XML configuration elements are added to default
configuration, if both exist then user defined configuration override the built-in settings.
target.sdf
file
where the mass calculator plugin is defined in the config.xml
configuration file:
evaluate -c config.xml -e "mass()" target.sdf
200
, molecule mass is
computed according to the default configuration:
evaluate -e "mass() >= 200" -x sdf -o heavy.sdf target.sdf
evaluate -e calc.txt target.sdf
evaluate -e calc.txt -S -i -t RESULT -o result.sdf target.sdf
expr.txt
file:
evaluate -e expr.txt -m query.sdf target.sdf
3
fractional digits,
uses the charge calculation defined in config.xml
:
evaluate -c config.xml -e "charge()" -p 3 target.sdf
evaluate -c config.xml -e "charge()" -p 3 -S -t CHARGE -o result.sdf target.sdf
1
and 2
of the Markush structure m.mrv,
writes the resulting structures in MRV format:
evaluate -e "markushEnumerations('1,2')" m.mrv -f mrv
3
random enumerations of the Markush structure m.mrv,
writes the resulting structures in MRV format, aligns scaffold and stores scaffold/R-group coloring data:
evaluate -e "randomMarkushEnumerationsDisplay(3)" m.mrv -f mrv
Note, that the display options (coordinates and attached coloring data) cannot be stored in the default SMILES output format, therefore it is necessary to specify the MRV output format in this case.