OELib Primer

An Introduction to Programming with the OpenEye Library

by Matthew T. Stahl


Table of Contents



Introduction

Molecule and Related Classes

OEMol
OEAtom
OEBond
OEResidue
OERing

Molecule Operator Classes

OESmartsPattern
OEAtomTyper
patty
OEGastChrg

Utility Classes

OEBitVec
OEStopwatch
OEExtensionTable
CommandLine

Atom Data Translation Classes

OEElementTable
OETypeTable

Math Classes

Vector
Matrix3x3

Examples

Largest Ring Extraction
SMARTS Grep

Acknowledgments



Introduction


It is fair to say that OELib is a direct result of Babel. Programmers have long known that application development is facilitated by building software on top of libraries rich in functionality. Babel was the first experience I had in working with a molecule library. In addition to developing Babel, Pat Walters and I developed `OBabel' at Vertex Pharmaceuticals. OBabel was the first attempt at developing an object oriented molecule library. Although OBabel was a successful project, my departure from Vertex Pharmaceuticals provided a great opportunity to include much of what I had learned in previous projects into a new molecule class library. OELib was designed to be flexible, extensible, portable, and efficient class library for processing small molecules. I hope that others find it as useful as I have.


I admire open-source projects. Linux and the Free Software Foundation are both great examples of how summing the spare time of Geeks and Hackers can result in useful projects. Although the potential audience is small, I hope to take advantage many of the great minds writing chemical software by making OELib available to others. Two things can be accomplished by releasing source code to OELib. First, development time can be shortened by basing projects on OELib. The less people have to reinvent the wheel (or the function) the better. Second, by releasing the source code hopefully other programmers can contribute to the project. Joe Corkery, Brian Goldman, Anthony Nicholls, Roger Sayle, and Pat Walters have already made significant contributions to OELib. As the list of contributors grows all the users of OELib benefit.


This primer is not intended to be an exhaustive manual. OELib is still too young and changing too rapidly to write a manual that will not be sorely lacking soon after the first draft. The primer will hopefully be sufficient to get developers started with the basic functionality of OELib. Writing a program to perform simple manipulations of molecules should possible after reading and understanding the material in the primer. More complex software based on OELib will require the developer to dig into the headers and source code to understand all of the classes and member functions.



Molecule and Related Classes


OEMol


The most important class in OELib is OEMol, or the molecule class. The OEMol class is designed to store all the basic information associated with a molecule, to make manipulations on the connection table of a molecule facile, and to provide member functions which automatically perceive information about a molecule. A guided tour of the OEMol class is a good place to start.

An OEMol class can be declared in either of the following ways


OEMol mol;
//or
OEMol mol(SDF,MOL2);


The second declaration type sets the input and output formats for a molecule. For example


#include <iostream.h>
#include "mol.h"
int main(int argc,char **argv)
{
OEMol mol(SDF,MOL2);
cin >> mol;
cout << mol;
return(1);
}


will read in a molecule in SD file format from stdin (or the C++ equivalent cin) and write a MOL2 format file out to standard out. Additionally, The input and output formats can be altered after declaring an OEMol by the member functions


void SetInputType(enum io_type type);
void SetOutputType(enum io_type type);


where the current values of enum io_type are
 

enum io_type {UNDEFINED,SDF,MOL2,PDB,DELPDB,SMI,BOX,FIX,OEBINARY}

 
The following lines of code show how to set the input and output types of an OEMol through the member functions

 
OEMol mol;
mol.SetInputType(SDF);
mol.SetOutputType(MOL2);


Once a molecule has been read into an OEMol the atoms and bonds can be accessed by the following methods

 
OEAtom *atom;
atom = mol.GetAtom(5); //random access of an atom

  or


  OEBond *bond;
bond = mol.GetBond(14); //random access of a bond

  or


  OEAtom *atom;
vector<OEAtom*>::iterator i;
for (atom = mol.BeginAtom(i);atom;atom = mol.NextAtom(i)) //iterator access

or

OEBond *bond;
vector<OEBond*>::iterator i;
for (bond = mol.BeginBond(i);bond;bond = mol.NextBond(i)) //iterator access

 
It is important to note that atom arrays begin at one and bond arrays begin at zero. Requesting atom zero

 
OEAtom *atom = mol.GetAtom(0);


will result in an error, but


OEBond *bond = mol.GetBond(0);


is perfectly valid.

The ambiguity of numbering issues and off-by-one errors led to the use of iterators in OELib. An iterator is essentially just a pointer, but when used in conjunction with Standard Template Library (STL) vectors it provides an unambiguous way to loop over arrays. OEMols store their atom and bond information in STL vectors. Since vectors are template based, a vector of any user defined type can be declared. OEMols declare vector<OEAtom*> and vector<OEBond*> to store atom and bond information. Iterators are then a natural way to loop over the vectors of atoms and bonds.

Before going on to the OEAtom and OEBond classes in more detail it is important to point out several things about the OEMol class. Much of the information contained in a molecule can be accessed or modified by Get or Set methods. For instance, the GetAtom() method has already been discussed. The following member functions are examples of Get and Set methods in OEMol
 

//***data retrieval methods***
void *GetData(string &);
void *GetData(const char *);
bool HasFlag(int flag);
const char *GetTitle();
enum io_type GetInputType();
enum io_type GetOutputType();
OEAtom *GetAtom(int);
OEBond *GetBond(int);
OEBond *GetBond(int, int);
float GetEnergy();
 
//***data modification methods***
 
void SetTitle(char *title)
void SetTitle(string &title)
void SetEnergy(float energy)
void SetInputType(enum io_type type)
void SetOutputType(enum io_type type)
bool SetCoordsFromArray(float *);
void SetData(string &,void *);
void SetData(const char *,void *);
void SetFlag(int flag);

 
Of the preceding member functions GetData and SetData deserve special attention. The data structure of OEMol can be include all types of data through the GetData() and SetData() routines. Attribute/Value data can be added to an OEMol in the following manner

 
int *foo = new int [100];
mol.SetData("IntegerArray",(void*)foo);

 
and retrieved from the molecule by

 
int *foo;
foo = (int*)mol.GetData("IntegerArray");

 
The onus of freeing the memory stored with GetData is on the user as the pointer to the data inside OEMol is released when the Clear() member function is called.

The following OEMol member functions are useful manipulation routines

 
//***molecule modification methods***
bool Clear(); //clears all data out of an OEMol for re-use
void ToInertialFrame();
void RenumberAtoms(vector<int>&); //re-order atoms to those specified in the vector
bool Kekulize(); //convert an aromatic to a kekule structure
bool DeleteHydrogens();
bool DeleteNonPolarHydrogens(); //delete hydrogens attached to carbon
bool AddHydrogens();
bool AddPolarHydrogens(); //add hydrogens to hetero-atoms
bool StripSalts(); //delete all but the largest contiguous structure
Vector Center();
Vector Center(vector<int> &);
Vector Center(int);
void Translate(const Vector &v);


The following are useful perception member functions for OEMol.

 
//***molecule utilities and perception methods***
void FindSSSR(); //smallest set of smallest rings
void FindRingAtomsAndBonds();
void FindChildren(vector<int> &,int,int); //find all atoms connected to one end of a bond
void ContigFragList(vector<vector<int> >&); //get a list of contiguous units in a molecule
void FindLargestFragment(OEBitVec &); //find the largest contiguous unit of a molecule
//align a molecule by rotating such that the two atoms lie on top of the two points
void Align(OEAtom*,OEAtom*,Vector&,Vector&);
bool RemoveData(const char*); //Remove Data previously stored with SetData()
bool HasNonZeroCoords();

 
The basics of the OEMol class have now been covered. For more details and the inner workings of OEMol please look at mol.h and mol.cpp in the OELib distribution. To understand more about what can be done with the OEMol class it is necessary to look at the associated data classes.

 
OEAtom

 
To understand the OEAtom class it is important to state a key decision on which the design was based. In OBabel the atom class existed, but it was only a data container. All data access and modification of atoms was done through the molecule. The result was a molecule class that was very large an unwieldy. I decided to make the OEAtom class smarter, and separate out many of the atom specific routines into the OEAtom thereby decentralizing and shrinking the OEMol class. As a result the OEAtom class not only holds data, but facilitates extraction of data perceived from both the atom and the molecule.


A number of data extraction methods perform what is called `Lazy Evaluation', which is essentially on the fly evaluation. For example, when an atom is queried as to whether it is cyclic or what it's hybridization state is the information is perceived automatically. The perception of a particular trait is actually performed on the entire molecule the first time it is requested of an atom or bond, and stored for subsequent requests for the same trait of additional atoms or bonds.The OEAtom class is similar to OEMol and the whole of OELib in that data access and modification is done through Get and Set methods. The following are Get and Set methods in OEAtom

//***methods to set atomic information***
void SetIdx(int idx);
void SetHyb(int hyb);
void SetAtomicNum(int atomicnum);
void SetFormalCharge(int fcharge);
void SetType(char *type);
void SetType(string &type);
void SetPartialCharge(float pcharge);
void SetVector(Vector v);
void SetVector(float x,float y, float z);
void SetFlag(int flag);
void SetParent(OEMol *ptr);

//***methods to retrieve atomic information***
 
int GetFlag();
int GetIdx();
int GetHyb();
int GetAtomicNum();
int GetStereo();
int GetFormalCharge();
int GetValence();
int GetHvyValence(); //number of connected non-hydrogen atoms
char *GetType();
float GetX(); //return the X coordinate
float GetY();
float GetZ();
float GetPartialCharge();
Vector &GetVector(); //return the coordinates
OEResidue *GetResidue();
OEMol *GetParent();
bool GetNewBondVector(Vector&,float);
OEBond *GetBond(OEAtom *);

 
The following code demonstrates how to print out the atom numbers, element numbers, and coordinates of a molecule.

 
OEMol mol(SDF,SDF);
cin >> mol;
OEAtom *atom;
vector<OEAtom*>::iterator i;
for (atom = mol.BeginAtom(i);atom;atom = mol.NextAtom(i))
{
    cout << atom->GetIdx() << ` `;
    cout << atom->GetAtomicNum() << ` `;
    cout << atom->GetVector() << endl;
}

 
The OEAtom class also has a number of convenience methods for extracting property information from an atom.

//***requests for atomic property information***
int CountFreeOxygens();
int ImplicitHydrogenCount();
int ExplicitHydrogenCount();
int MemberOfRingCount();
int BOSum(); //sum of the bond orders of the bonds to the atom
bool HasFlag(int flag);
bool HasResidue();
bool IsHydrogen();
bool IsCarbon();
bool IsNitrogen();
bool IsOxygen();
bool IsSulfur();
bool IsPhosphorus();
bool IsConnected(OEAtom*);
bool IsOneThree(OEAtom*);
bool IsOneFour(OEAtom*);
bool IsCarboxylOxygen();
bool IsPhosphateOxygen();
bool IsSulfateOxygen();
bool IsNitroOxygen();
bool IsAmideNitrogen();
bool IsPolarHydrogen();
bool IsNonPolarHydrogen();
bool IsAromaticAtom();
bool IsInRing();
bool IsInRingSize(int);
bool HasAlphaBetaUnsat(bool includePandS=true);
bool HasBondOfOrder(int);
bool HasSingleBond();
bool HasDoubleBond();
bool HasAromaticBond();


A number of the property member functions indicate that atoms have some knowlege of their covalently attached neighbor atoms. Bonding information is partly redundant within a molecule in that an OEMol has a complete list of bonds in a molecule, and an OEAtom has a list bonds of which it is a member. The following code demonstrates how an OEAtom uses its bond information to loop over atoms attached to itself.

 
OEMol mol(SDF,SDF);
cin >> mol;
OEAtom *atom,*nbr;
vector<OEBond*>::iterator i;
atom = mol.GetAtom(1);
for (nbr = atom->BeginNbrAtom(i);nbr;nbr = atom->NextNbrAtom(i))
{
cout << "atom #" << atom->GetIdx() << " is attached to atom #" << nbr->GetIdx() << endl;
}

should produce an output like

atom #1 is attached to atom #2

 
OEBond


The OEBond class is straightforward in its data access and modification methods. OEBonds store pointers to the atoms on each end of the bond. In storing pointers to atoms instead of integer indices, the necessity of having to reorder bonds when atoms are shuffled, added, or delete is obviated.

 

//***bond modification methods***
void SetIdx(int idx);
void SetBO(int order);
void SetBegin(OEAtom *begin);
void SetEnd(OEAtom *end);
void SetFlag(int flag);
void SetLength(OEAtom*,float);
void Set(int,OEAtom*,OEAtom*,int,int);
 
//***bond data request methods***
 
int GetBO();
int GetBondOrder();
int GetFlag();
int GetBeginAtomIdx();
int GetEndAtomIdx();
int GetIdx();
OEAtom *GetBeginAtom();
OEAtom *GetEndAtom();
OEAtom *GetNbrAtom(OEAtom *ptr);
float GetEquibLength();
int GetNbrAtomIdx(OEAtom *ptr);

Several convenience property methods are provided


//***property request methods***
bool IsRotor();
bool IsAmide();
bool IsEster();
bool IsCarbonyl();
bool IsInRing();

OEResidue

Residue information drawn from PDB or MOL2 files are stored in the OEResidue class. OEResidues are stored inside the OEAtom class. The residue information for an atom can be requested in the following way

 
OEAtom *atom;
OEResidue *r;
atom = mol.GetAtom(1);
r = atom->GetResidue();
 
The following are OEResidue member functions.
 
void SetSerialNum(int sernum);
void SetChainNum(int chainnum);
void SetNum(int resnum);
void SetName(string &resname);
void SetAtomID(string &atomid);
void SetHetAtom(bool val);
int GetSerialNum();
int GetChainNum();
int GetNum();
string &GetName();
string &GetAtomID();
bool IsHetAtom();

OERing

Ring information beyond atom and bond membership is usually not necessary, but more information can be had about the rings in a molecule. The OERing class is used to store the information from a Smallest Set of Smallest Rings (SSSR) perception of a molecule. The OEMol member function GetSSSR() stores the information it perceives in a vector<OERing*> inside the molecule. Perception is only done once for a molecule unless the connection table is modified. The following code demonstrates how to extract the SSSR information.


OEMol mol(SDF,SDF);
cin >> mol;
vector<OERing*> vr;
vr = mol.GetSSSR();

OERings store the atom numbers of the atoms in each of the smallest set of smallest rings in both a vector<int> and an OEBitVec (described later in this document). An example of how to print out the atom numbers present in all SSSR rings is show below.


vector<OERing*>::iterator i;
vector<int>::iterator j;
vector<OERing*> *rlist = (vector<OERing*>*)mol.GetData("RingList");
for (i = rlist->begin();i != rlist->end();i++)
{
    for(j = (*i)->_path.begin();j != (*i)->_path.end();j++)
        cout << *j << ` `;
       cout << endl;
}

 
will produce something like the following output for benzene

1 2 3 4 5 6

Ring information is automatically deleted from an OEMol when it goes out of scope or the Clear() member function is called.


Molecule Operator Classes


Molecule operator classes add information to or discover information about molecule. Although some operator classes could have been made member functions, they are necessary in all applications using the OEMol class and therefore were broken out into operator classes to avoid excessive code bloat and complexity of the OEMol class.

OESmartsPattern

Substructure search is an incredibly useful tool in the context of a small molecule programming library. Having an efficient substructure search engine reduces the amount of hard code needed for molecule perception, as well as increases the flexibility of certain operations. For instance, atom typing can be easily performed based on hard coded rules of element type and bond orders (or hybridization). Alternatively, atom typing can also be done by matching a set of substructure rules read at run time. In the latter case customization based on application (such as changing the pH) becomes a facile operation. Fortunately for OELib and it's users, Roger Sayle donated a SMARTS parser which became the basis for SMARTS matching in OELib.

The SMARTS matcher, or OESmartsPattern, is a separate object which can match patterns in the OEMol class. The following code demonstrates how to use the OESmartsPattern class.

OEMol mol(SDF,SDF);
cin >> mol;
OESmartsPattern sp;
sp.Init("CC");
sp.Match(mol);
vector<vector<int> > maplist;
maplist = sp.GetMapList();
//or maplist = sp.GetUMapList();
//print out the results
vector<vector<int> >::iterator i;
vector<int>::iterator j;
for (i = maplist.begin();i != maplist.end();i++)
{
    for (j = i->begin();j != i->end();j++)
        cout << j << ' `;
    cout << endl;
}


The preceding code reads in a molecule, initializes a smarts pattern of two single-bonded carbons, and locates all instances of the pattern in the molecule. Note that calling the Match(OEMol&) function does not return the results of the substructure match. The results from a match are stored in the OESmartsPattern, and a call to GetMapList() or GetUMapList() must be made to extract the results. The function GetMapList() returns all matches of a particular pattern while GetUMapList() returns only the unique matches. For instance, the pattern [OD1]~C~[OD1] describes a carboxylate group. This pattern will match both atom number permutations of the carboxylate, and if GetMapList() is called, both matches will be returned. If GetUMapList() is called only unique matches of the pattern will be returned. A unique match is defined as one which does not cover the identical atoms that a previous match has covered.

OEAtomTyper

The OEAtomTyper class is designed to read in a list of atom typing rules and apply them to molecules. The code that performs atom typing is not usually used directly as atom typing, hybridization assignment, implicit valence assignment and charge are all done automatically when their corresponding values are requested of atoms.


atom->GetType();
atom->GetFormalCharge();
atom->GetHyb();


Atom types are defined in the file atomtyp.txt which is composed of two sections. The list of atom is at the head of the file and appears as

***ATOM_TYPES***
1 N3 3 0 3
2 N3+ 3 1 4
3 N2 2 0 3
4 N1 1 0 2
5 Npl 2 0 3

Each atom type line contains a reference number, the internal atom type, hybridization, formal charge, and implicit valence. The reference numbers for each of the atom types are used in the second section in association with SMARTS patterns.

***SMARTS_PATTERNS***
P(~[OD1])(~[OD1])~[OD1] 25 36 36 36
n 6
NC=O 10 18 14
NS(=O)=O 1 29 14 14
[Nr0]~[C,C+]([#6,#7])~[Nr0] 7 18 0 7
[OD1]~N~[OD1] 14 8 14


The reference numbers following the SMARTS patterns correspond to the reference numbers given in the atom types section. Atoms matching the SMARTS patterns will be assigned to the type given by the reference number unless the reference number is zero. Only the first SMARTS pattern that covers an atom will be used to assign the atom type. All subsequent matches of other patterns do not affect an atom's type. SMARTS patterns are applied in order they appear in atomtyp.txt so specific patterns should be defined toward the top of the patterns section and more general patterns toward the bottom.

The atom typing system in OELib gives the user a great deal of control over customizing typing rules. Since SMARTS patterns determine how OELib perceives a molecule it is rather trival to write a rule set that reflects the state of functional groups in any type of media. Charge and protonation states can reflect a wide variety of pH's in aqueos environments, or organic solvents. Translating file formats that have low information content about atom types to those with high information content becomes a process with much higher certainty and specificity than a system with hardcoded rules.

patty

Patty stands for programmable atom typer. The patty class was kindly donated by W. Patrick Walters. The patty class provides a more flexible means for atom typing than the OEAtomTyper. The behavior of patty is similar to the OEAtomTyper in that rules apply only to the first atom in the SMARTS pattern. The patty class can read any free format ASCII file which contains SMARTS patterns associated with user defined atom type. The following is an example of a valid patty rule

O=C hbacceptor

The following is a code sample that demonstrates the use of patty class.


OEMol mol(SDF,SDF);
cin >> mol;
string rulefile = "rules.txt";
patty p;
p.read_rules(p);
vector<string> type;
p.assign_types(mol,type);
for (int i = 1;i <= mol.NumAtoms();i++)
    cout << "atom number " << i << " was given a type " << type[i] << endl;

The array indices in the vector<string> into which the result values are placed match the corresponding atom numbers. Since atoms are numbered beginning from one, the first element in the vector<string> is empty, and the values are placed in [1...mol.NumAtoms()].

OEGastChrg

The OEGastChrg class is responsible for the assignment of partial charges to a molecule according to the Gasteiger charge model. When the partial charge of an atom is requested and partial charges do not yet exist for the molecule the OEGastChrg class will automatically be called to assign charges for the molecule. If charges have been read in for a molecule the following code shows how to force the recalculation of partial charges.


OEMol mol(MOL2,SDF);
cin >> mol;
mol.UnsetPartialChargesPerceived();
OEAtom *atom;
vector<OEAtom*>::iterator i;
for (atom = mol.BeginAtom(i);atom;atom = mol.NextAtom(i))
{
    cout << "atom number = " << atom->GetIdx();
    cout << " charge = " << atom->GetPartialCharge() << endl;
}

Formal charges are used as seed values of the initial charge of atoms, and the partial charge is propagated to neighboring atoms. For example, quaternary amines would have a +1 charge, and the effect of the positive charge would be felt by neighboring atoms according to the Gasteiger model.



Utility Classes

Utility classes provide functionality that is commonly desired in software in general. These classes facilitate the construction of larger end user programs based on OELib.

OEBitVec

The OEBitVec class is a fast and efficient bitstring class that is handy to use as a truth table. Truth tables are an easy way to store whether a list of items has a particular propery. Instances of OEBitVec can by dynamically resized, and have a number of overloaded operators that make code simple and readable. The following examples demonstrate uses of the OEBitVec class.


OEBitVec bv1,bv2,bv3;
bv1.SetBitOn(5);
bv2.SetBitOff(200);
bv1 |= bv2;
bv1 = bv1 & bv2;
if (bv1.Empty()) //Empty() returns true if no bits are set on
{
    cout << "bv1 = " << bv1 << endl;
}
 
int bit;
for (bit = bv1.NextBit(0);bit != bv1.EndBit();bit = bv1.NextBit(bit))
{
    cout << "the next bit turned on is " << bit << endl;
}

 
For a full description of the OEBitVec class member functions please consult the header file bitvec.h.
OEStopwatch

The OEStopwatch class makes timing the execution of blocks of code to microsecond accuracy very simple. The class effectively has two functions, Start() and Elapsed(). The usage of the OEStopwatch class is demonstrated by the following code.


OEStopwatch sw;
sw.Start();
//insert code here
cout << "Elapsed time = " << sw.Elapsed() << endl;


Although based on system specific function calls, OEStopwatch works on Unix flavored systems as well as Microsoft Operating systems.

OEExtensionTable

A number of file extensions are common to certain formats of molecule file types. The OEExtensionTable class facilitates setting input and output types in the OEMol class based on input and output file names. The OEExtensionTable class is dependent on a file called extable.txt to provide the extension to type translation. The following is an example extable.txt file

EXTENSION TYPE
sdf sdf
mdl sdf
mol sdf
mol2 mol2
pdb pdb
ent pdb
smi smi
box box
mmd mmd

 
The default constructor for the OEExtensionTable class takes a single parameter specifying the name of the environment variable which points to the default copy of the extable.txt file. The OEExtensionTable class will look first at the current working directory to see if a copy the extable.txt file exists. If extable.txt is not found, the class with then look in the directory pointed to by the environment variable for a global copy of the file. An example of the initialization and usage follows

 
OEExtensionTable extable;
enum io_type inputType,outputType;
inputType = extable.FilenameToType("foo.mol2");
outputType = extable.FilenameToType("bar.sdf");
OEMol mol(inputType,outputType);

The input and output types taken from the extensions on the filenames and used in the molecule constructor.

CommandLine

The command line class provides a means of facile development of command line interfaces (CLI). The basic idea is that the CLI is predefined in a class which then parses what was typed on the command line, checks for errors, and stores the relevant data until retrieval is necessary. The following code demonstrates how to define a command line interface.

Void Usage()
{
    cerr << "usage statement here" << endl;
    exit(0);
}
 
int main(int argc,char **argv)
{
bool on;
int val;
vector<string> files;
CommandLine cmdline;
cmdline.SetUsageFunction(Usage);
cmdline.AddFlag("-on",on);
cmdline.AddFlag("-value",val,10);
cmdline.AddFlag("-files",files);
cmdline.ProcessArguments(argc,argv);
cerr << "the state of on is = " << on << endl;
cerr << "the value of val is = " << val << endl;
vector<string>::iterator i;
for (i = files.begin();i!= files.end();i++)
cerr << "file = " << *i << endl;
}

 
It is important to set a usage function in the command line class that alerts the user to errors in the arguments given on the command line. If unknown flags are used or arguments to flags are omitted the user can be alerted to the error. The command line interface is defined by the AddFlag() functions. As the command line arguments are parsed the values associated with each flag are placed in the variable given in the AddFlag() function. Default values can be given as a third argument such as AddFlag("-val",val,10). Booleans are set to false by default, and turned on if the associated flag is given on the command line. Please see the file commandline.h in the distribution for the full list of member functions of the command line class.



Atom Data Translation Classes

OEElementTable

Translating element data is a common task given that many file formats give either element symbol or atomic number information, but not both. The OEElementTable class facilitates conversion between textual and numeric element information. An instance of the OEElementTable class is declared as a global in data.cpp. Source files that include the header file mol.h automatically have an extern definition to the global in data.cpp. The following code sample demonstrates the use of the OEElementTable class.


cout << "The symbol for element 6 is " << etab.GetSymbol(6) << endl;
cout << "The atomic number for Sulfur is " << etab.GetAtomicNum(16) << endl;
cout << "The vanDerWaal radius for Nitrogen is " << etab.GetVdwRad(7);

OETypeTable

Molecular file formats frequently store information about atoms in an atom type field. Some formats store only the element for each atom, while others include hybridization and local environments, such as the mol2 atom type field. The OETypeTable class acts as a translation table to convert atom types between a number of different molecular file formats. The constructor for OETypeTable automatically reads the text file types.txt. Just as OEElementTable, an instance of OETypeTable is declared as a global in data.cpp and is referenced as extern OETypeTable ttab in mol.h.  The following code demonstrates how to use the OETypeTable class to translate the internal representation of atom types in an OEMol to Sybyl Mol2 atom types.


ttab.SetFromType("INT");
ttab.SetToType("SYB");
OEAtom *atom;
vector<OEAtom*>::iterator i;
string src,dst;
for (atom = mol.BeginAtom(i);atom;atom = mol.EndAtom(i))
{
    src = atom->GetType();
    ttab.Translate(dst,src);
    cout << "atom number " << atom->GetIdx() << "has mol2 type " << dst << endl;
}


Math Classes

Vector

The Vector class was designed to simplify operations with floating point coordinates. To this end many of the common operations have been overloaded for simplicity. Vector addition, subtraction, scalar multiplication, dot product, cross product, magnitude and a number of other utility functions are built in to the vector class. For a full description of the class member functions please consult the header file Vector.h. The following code demonstrates several of the functions of the vector class.


Vector v1,v2,v3;
v1 = VX;
v2 = VY;
v3 = cross(v1,v2);
v3 *= 2.5;
v3.normalize();


Matrix3x3

Rotating points in space can be performed by a vector-matrix multiplication. The Matrix3x3 class is designed as a helper to the Vector class for rotating points in space. The rotation matrix may be initialised by passing in the array of floating point values, by passing euler angles, or a rotation vector and angle of rotation about that vector. Once set, the Matrix3x3 class can be used to rotate Vectors by the overloaded multiplication operator. The following demonstrates the usage of the Matrix3x3 class.


Matrix3x3 mat;
mat.SetupRotMat(0.0,180.0,0.0); //rotate theta by 180 degrees
Vector v = VX;
v *= mat; //apply the rotation


Examples

Largest Ring Extraction

Finding the largest ring system in a molecule is a useful example of how to loop over atoms and bonds in a molecule, as well as proper usage of the OEBitVec class. The FindLargestRingSystem function uses the method of expanding shells to identify ring systems. A seed atom is chosen which is part of a cycle, and all bonds of the seed are crossed if the bonds are part of a ring system. The same operations are performed to the new shell that has been created by following cyclic bonds. When no more shells are found a ring system has been identified, and it is stored if it is larger than any previously identified ring system. The bits corresponding to the atoms numbers in the largest ring system are turned on the OEBitVec bv which is passed as the second argument to the function.


void FindLargestRingSystem(OEMol &mol,OEBitVec &bv)
{
bv.Clear(); //unset all bits
int j;
OEAtom *atom,*nbr;
vector<OEAtom*>::iterator i;
vector<OEBond*>::iterator k;
OEBitVec rsys,curr,used,next;
for (atom = mol.BeginAtom(i);atom;atom = mol.NextAtom(i)) //loop over all atoms
    if (atom->IsInRing() && !used[atom->GetIdx()])
            // if the atom is a member of a ring and hasn't already been visited
    {
        rsys.Clear();
        curr.Clear();
        rsys |= atom->GetIdx();
        curr |= atom->GetIdx();
        used |= atom->GetIdx();
//set an atom on in the curr bitvec - then loop over all of the neighbors of atoms
//in the curr bitvec and put them in the next bitvec. Transfer the next bitvec
//to the curr bitvec when no new neighbors are found.
    while (!curr.Empty())
    {
        next.Clear();
        for (j = curr.NextBit(0);j != curr.EndBit();j = curr.NextBit(j))
        {
            atom = mol.GetAtom(j);
            for (nbr = atom->BeginNbrAtom(k);nbr;nbr = atom->NextNbrAtom(k))
                if ((*k)->IsInRing() && !used[nbr->GetIdx()])
                {
                    next |= nbr->GetIdx();
                    used |= nbr->GetIdx();
                }
            rsys |= next;
            curr = next;
        }
    }
    if (rsys.CountBits() > bv.CountBits())
    bv = rsys;
    }
}

SMARTS Grep

A crude method of substructure searching can be accomplished in much the same way as the Unixtm grep command. The SMARTS grep program given below takes a SMARTS string and a smiles filename on the command line. All molecules that match the SMARTS pattern are then written to standard out. Note that the program can be modified to read from any file format that OELib supports, as well as write out any file format just by changing the declaration of OEMol.

#include "mol.h"
#include "oeutil.h"
#include "parsmart.h"
using namespace std;
using namespace OpenBabel;
int main(int argc,char **argv)
{
if (argc != 3)
{
string err = "Usage: ";
err += argv[0];
err += " \"PATTERN\" filename.smi";
ThrowError(err);
exit(0);
}
OESmartsPattern sp;
sp.Init(argv[1]);
ifstream ifs;
if (!SafeOpen(ifs,argv[2])) return(1);
OEMol mol(SMI,SMI);
for (;;)
{
mol.Clear();
ifs >> mol;
if (mol.Empty()) break;
if (sp.Match(mol))
cout << mol;
}
return(1);
}


Acknowledgments

Many thanks to those who have participated in the development of OELib.

Joe Corkery
Brian Goldman
Anthony Nicholls
Roger Sayle
Patrick Walters