Currently JChem enables the searching of Markush structures containing Homology groups only with specific molecule queries (with no query features). Homology groups are supported on the target and on the query side - the latter only for non-Markush targets. Properties can also be specified for the groups.
Read the user's guide about Homology groups and editing their properties in MarvinSketch.
Homology groups are represented by Pseudo atoms, labeled with common chemical annotations of these groups. Most groups have Alias names that allow shorter names. The names are case insensitive, spaces might be inserted.
Pseudo atoms can be easily drawn in Marvin Sketch using the Homology groups template group.
The "Example" column shows complete structures representing the homology groups.
Table 1. Built-in Homology groups
Group name (Alias names) | Compulsory | Optional | Incomplete case | Example |
---|---|---|---|---|
Alkyl (CHK) |
- minimum of one carbon atom - only carbon and hydrogen atoms - single bonds - no ring bonds |
connection point at arbitrary position(s) | same requirements | |
Alkenyl (CHE) |
- at least one double bond - minimum of 2 carbon atoms - otherwise same as for Alkyl |
same as above | same as at compulsory, but the matching structure does not need to have any double bond | |
Alkynyl (CHY) |
- at least one triple bond - minimum of 2 carbon atoms - otherwise same as for Alkyl |
same as above, double bond | same as at compulsory, but the matching structure does not need to have any triple bond | |
CarbonTree (acyclicCarbon) |
Any connected acyclic carbon structure. | - | - | |
Carboalicyclyl (CYC, cycloalkyl) |
- monocyclic or fused aliphatic rings - only carbon and hydrogen atoms - no substitution by (saturated) alkyl chains |
- double or triple bonds in the ring but not aromatic - several connection points on the rings |
- any carbon structure without aromatic bonds - the substituting alkyl chain can be unsaturated |
|
Carboaryl (ARY, aryl) |
- monocyclic or fused rings - among these rings at least one should be aromatic - only carbon and hydrogen atoms |
- double bonds/triple bonds in the aliphatic rings - several connection points but all must be on an aromatic ring (can't have external connection on an aliphatic ring) |
- similar to carboalicyclyl but the atoms can have aromatic bonds: - any carbon structure where the external connection is on an atom that has aromatic bond or has only one bond. - the matching structure doesn't need to have a ring. |
|
Heteromonoalicyclyl (HET, heterocycle, heterocyclyl, AliphaticHeterocyclyl) |
- monocyclic, aliphatic ring with at least one hetero atom, carbon atom is also required |
same as carboalicyclyl | similar to carboalicyclyl but here hetero atoms are accepted as well, which means any structure without aromatic bonds | |
Heteromonoaryl (HEA, Heteroaryl) |
- similar to aryl but the monocyclic aromatic ring should contain at least one
hetero atom, carbon atom is also required - no fused rings |
same as aryl | Similar to aryl but here hetero atoms are accepted as well. Condition for the externally connecting atom holds as in case of aryl. | |
FusedHeterocyclyl (HEF, fusedHetero) |
- Fused ring system having at least one hetero atom, carbon atom is also required | same as aryl, but the connection point can be on an aliphatic ring as well | Any structure having hetero, carbon and hydrogen atoms, with any bonds. | |
Cyclyl (anycyclyl, anyring) |
Any kind of ring regardless fuseness, aromaticity and hetero-carbo nature. | - | - | |
RingSegment - |
A part of a ring where every atom has only 2 ring connections. Non ring connections are allowed. The group does not represent a whole ring. | - | - | |
HeteroSubstitutedAlkyl (HSA) |
- at least one carbon atom - at least one hetero atom - single bonds - no ring bonds |
connection point at arbitrary carbon atom(s) | same requirements | |
Haloalkyl |
- at least one carbon atom - at least one halogen atom - single bonds - no ring bonds |
connection point at arbitrary carbon atom(s) | same requirements | |
Hydroxyalkyl |
- at least one carbon atom - at least one terminal O atom - single bonds - no ring bonds |
connection point at arbitrary carbon atom(s) | same requirements | Unknown group (UNK) |
- | Any structure. Unknown structures are enumerated as the union of all other homology groups. | - | - |
AnyAtom - |
Any atom except hydrogen | - | - | C, N, O, P, S, ... |
Metal (MX) |
Any metal | - | - | U, K, Fe, Na, Ni, Al, ... |
AlkaliMetal (AMX) |
Alkali and alkaline earth metals | - | - | Na, K, Ca, Mg, ... |
OtherMetal (A35) |
Group IIIa-Va metals | - | - | Al, Ga, ... |
TransitionMetal (TRM) |
Transition metals excluding lanthanum | - | - | Fe, Ni, Zn, Co, Hg, W, ... |
Lanthanide (LAN) |
Lanthanides (including lanthanum) | - | - | Nd, Ce, Pr, ... |
Actinide (ACT) |
Actinides (including actinium) | - | - | U, Th, Pa, ... |
The homology groups are defined by the user, but there are some Predefined groups, too. User-defined Homology groups are represented by R-group definitions and during search these Pseudo atoms are translated to the corresponding R-group definitions.
These group definitions are customizable, the user can modify them or can make new definitions as well. Group names are treated as case insensitive, but in case sensitive file systems the definition files should be lowercase.
The following Predefined (User-defined) Homology groups are readily available in the system:
Protecting groups' definition file contains several definitions, each for protecting different functional groups. The protected functional group is defined by the neighborhood of the R-atom. When the R-atom has the same neighborhood as the "protecting" pseudo atom, then the group is replaced by the R-atom.
The conversion processes the group definitions in their order in the file. This means that more specific environments should be placed earlier. For example, a carboxyl protecting group definition should precede an alcohol definition, otherwise the alcohol definitions will be applied instead. Currently they are located in the following order:
Currently the system can't handle protecting groups having more than one attachment point, or groups where the heavy atoms of the functional group should be changed by the substitution. The readily available definitions contain amine, carboxyl and hydroxyl protecting groups.
JChem's group name: protecting
alias name: PRT
Some examples with different functional groups protected can be found on Table 2.
Table 2. Protecting group examples
Protecting group | Represented examples | ||
Residue left after removal of one or more OH groups from an acid. Currently it behaves as simple pseudo atoms: can only be matched by itself and is not enumerated. This behavior complies with the Thomson-Reuters/Questel acyl group handling.
JChem's group name: acyl
Alias name: acy
The union of all other homology groups except acyl, unknown and protecting. This union is represented by cyclyl, carbonTree, metal and halogen groups. If the group occurs in a ring then represents a ringSegment homology group.
JChem's group name: any
alias names: XX, anygroup
Currently there is one regulating option: 'completeHG', which specifies if the
part of the query side structure matching on the given group should represent an
entire homology group or if substructures are also accepted. Of course in the incomplete
case an entire structure can also match on the given homology group.
For example, if completeHG is set to true (default) an alkyl chain can't match on a cycloalkyl
group, only a ring (system). The detailed behavior is found at the definition of the groups.
And example is shown on Table 3.
Table 3. Complete and incomplete structures of Homology groups
target | query | hit | |
completeHG:y | completeHG:n | ||
To enable the enumeration of homology groups, the 'Homology Enumeration' option of Markush enumeration has to be switched on. Otherwise the 'Homology groups' are kept as 'Pseudo atoms'. This latter option might be useful for showing that these structures can't be fully enumerated.
For the Predefined groups, the R-group definitions specify the enumerable library as User-defined groups. I.e. these groups definitions can be customized. These structures are characteristic to the Homology group and encompass simple and large structures as well.
We have to emphasize, that these definitions are used only for enumeration and do not affect searching. As noted earlier, arbitrary structures fulfilling the requirements for the Homology group will match such a target.
Enumeration definitions contain two attachment points as default. After enumeration these are the atoms which connect to the first two neighbors of the group. If the enumerated Homology group's Pseudo atom has more than two connections, then further attachment points are added. These are put on atoms that have free valence and comply the requirements for externally connecting atoms of the given group. E.g. for 'aryl' only aromatic ring atoms can be the connection points. The atoms of the definition are investigated in the order of the Atom Numbers. If a definition does not have the sufficient number of such atoms, then it is rejected. When every definition of the homology group is rejected, an exception is thrown showing that the given homology group does not have any valid enumeration definition.
Enumeration of User-defined Homology groups use the same (customizable) R-group definitions as searching. User-defined Homology groups should have the same number of connections as in the definitions.
Customization modes:The default location of chemaxon_home directory of the user on different platforms:
Location of "User-defined" (for search and enumeration) user-defined homology group definition files: chemaxon_home/homology/user_def_groups/
Location of "Enumeration-only" user-defined homology group definition files: chemaxon_home/homology/enumeration_only/
Note: Create the above two directories if they do not exist.
The files of enumeration-only type user-defined groups must be placed into the directory chemaxon_home/homology/enumeration_only/.
If you would like to have different definitions for searching and enumeration of a user-defined group, then a separate file should be specified under the same file name in the "enumeration_only" dictionary as well. In this case the content of the "user_def_groups" will be used during searching and the content of the "enumeration_only" for enumeration.
If a definition is modified it comes into effect immediately, however the addition of a new group requires a restart of the Java Virtual Machine.
Table 4. Modifying amino protecting group definitions.
overwriting the definition | sample markush file | enumerations |
Some Homology groups have important properties. You might want to specify if the alkyl chain is branched, or any deuterium atoms are present. The Homology groups have a special property editing dialog where you can set the different properties. They include the followings (with the group to which it may be applied):
Read the user's guide about Homology groups and property editing in MarvinSketch.