Extended SMILES, SMARTS

Codename: cxsmiles,cxsmarts

Contents:

Extended SMILES, SMARTS format

ChemAxon Extended SMILES/SMARTS is used for storing special features of the molecules after the SMILES string. Any information can be stored after the SMILES string if it is separated by space or tab characters as the SMILES parsers ignore them or use them as comment. The extended features are stored in the following format:
SMILES_String |<feature1>,<feature2>,...|
ChemAxon's extended SMILES/SMARTS does not contain non-ASCII characters, they are escaped in the usual form, "&#n;", with their character code, n. The ASCII characters ',', ';', '|', '{', '}' and ':' in Data Sgroup information are also escaped in this way. Moreover, the symbols '$', ';', '|', '{', '}' between dollar signs (see Atom labels / aliases / values ) are coded in the above mentioned way as well.
The extended feature description is economic. If some feature is missing in the molecule, then the corresponding special characters are not written. (Eg: If the atoms of the molecule has no alias strings at all, no "$" and ";" characters are written.) Moreover, if no feature of the molecule to be written, the extended feature field is omitted.
Please note that the SMILES string part generated in cxsmiles format is not always the same as the one generated by smiles output. Eg: In case of Ferrocene the coordinate bonds are not exported to plain SMILES ([Fe].c1cccc1.c1cccc1), but they appear in the cxsmiles (c12c3c4c5c1[Fe]23451234c5c1c2c3c45 |C:4.5,0.6,1.7,2.8,3.9,7.12,6.10,9.16,10.18,8.14|).

In extended smiles export the following additional features are exported:

Escaping

In some places special characters are escaped to '&#code' where code is the ASCII code of the special character.
Not escaped characters in fields of Sgroups and DataSgroups: 'a'-'z', 'A'-'Z', '0'-'9' and '><\"!@#$%()[]./\\?-+*^_~=' and the space character.
Not escaped characters in atom property keys and values: 'a'-'z', 'A'-'Z', '0'-'9' and '><\"!@#$%()[]./\\?-+*^_~=' and the space character.
Not escaped characters in atom labels and atom values: 'a'-'z', 'A'-'Z', '0'-'9' and '><\"!@#%()[]./\\?-+*^_~=,:' and the space character.

Import options

s
Fix chiral flag from cxsmiles input.
By default the molecule absolute stereoconfiguration (relative or absolute chirality - chiral flag) is specified at the extended part of the cxsmiles string. If it is missing it is assumed to be absolute by default (see Molecule absolute stereoconfiguration above). Using the 's' option the molecule's absolute stereoconfiguration is tried to be figured out.
Example: molconvert cxsmiles -s 'C[C@H]1CC[C@@H](C)CC1{cxsmiles}' results C[C@H]1CC[C@@H](C)CC1
But: molconvert cxsmiles -s 'C[C@H]1CC[C@@H](C)CC1{cxsmiles:s}' results C[C@H]1CC[C@@H](C)CC1 |r|

See also SMILES import options.

Export options

Export options can be specified in the format string. The format descriptor and the options are separated by a colon. All options have default values (see below). Using the "+" or "-" sign the default export values can be changed to "true" or "false" respectively. If the option is given without "+" or "-" modifier then the default values are not used and only the specific feature is exported.
Examples:
"cxsmiles:" writes all default features (absolute stereoconfiguration, enhanced stereo features, atom labels, wiggly bond indexes, ring stereo bond info and reaction fragment level grouping),
"cxsmiles:lc" writes the atom labels and the atomic coordinates only,
"cxsmiles:+c" writes writes all default features and the atomic coordinates,
"cxsmiles:-le" writes absolute stereoconfiguration, enhanced stereo features, ring stereo bond info and reaction fragment level grouping but not atom labels and wiggly bond indexes.

u Write unique cxsmiles output. (Includes unique smiles string.)
Enhanced stereo information are also stored in unique format.
Default value: false.
e Write relative stereo configuration and enhanced stereo features. Default value: true.
l Write atom labels / aliases / values. Default value: true.
w Write wiggly and in case of atomic coordinate export also UP and DOWN bond indexes. Default value: true.
d Write CIS, TRANS ring bond indexes. Default value: true.
f Reaction fragment level grouping. Default value: true.
p Write local parities. Default value: true.
R Write radical numbers. Default value: true.
L Write lone electron pairs. Default value: true.
m Write multicenter SGroups and coordinate bonds. Default value: true.
N Write link nodes. Default value: true.
c[p] Write atomic coordinates. p can optionally specify the coordinate precision. If p is not specified, the default value 2 is used. Default value: false.
D Write Data Sgroup information. Default value: true.
BOM Write the UTF-8 byte order mark (BOM), if the given or the system's encoding is UTF-8. Default value: false.
q Write MDL query features. Default value: true.
P Write polymer Sgroups. Default value: true.
b Write local bicyclo-alkane stereo information. Default value: true.
B Write Hydrogen bonds. Default value: true.
A Write atom properties. Default value: true.

See also SMILES export options and basic export options.

See also