Macromolecular structure based information may be stored in two states: dynamic and static. The experimentally deduced structures are stored in an online database known as the Protein Data Bank: PDB. Every PDB file is presented in number of lines. Each line consists of 80 columns and terminated by an end-of-line indicator. Our focus is to simplify the PDB format such that we could make use of most of the information present in the PDB files. From our experience we have realize that a lot of information depicted in the REMARK part of the PDB file is generally overlooked due to extremely textual nature and thus making it hard to be utilized computationally. However, some of the points could be very vital for research point of view of some users. So we have decided to highlight some of the very important aspects among those information and presented it nicely. Secondly, we also aim to overcome the limitations of PDB. For this, we introduce remodeling of PDB file such that a number of useful information is added and redundancy is eliminated. Lastly, a number of new applications are developed to analyze the structural and functional aspects of the biological structures. These are made by implementing inter-disciplinary algorithms and by use of various task oriented data-structures such as kd-tree, octree and 3D grids.
New techniques and algorithms are being developed every day for analyzing the protein and understanding its function better. Such algorithms have not yet been implemented and transformed into computer programs. We want to make a toolkit where such functions- old and new are available in one environment for the complete analysis. Overcoming limitations of PDB. A few Problems with PDB format have been listed in the table:
PDB format faults
Presence of dummy coordinates, identical coordinates
Atomic occupancies larger than 1.00 or even negative
Use of different ligand names for water molecule i.e. H2O, WAT, HOH, etc
Presence of solvents which are there only because of experimental techniques
Use of different ligand name for same small molecule
Number of atoms that can be included in file is limited to 99999, 9999 for amino acids and 62 for chains
Textual descriptions are not computationally accessible and cannot be easily utilized to automatically generate the complete assembly
Structural Faults
Atoms too close to each other resulting in close contacts
Un-natural dihedral angles
Connectivity between atoms is wrong
Contradictory Uniprot and Structural sequence
Structural faults that can not be corrected
Missing coordinates
Electron density does not match with the coordinate
Experimental errors
Approach and Methods:
We apply concepts of mathematics and computer science to improve the existing tools and build new applications for analysis. We have already developed a PDB parser in MAT which works faster than currently available parsers. The relevance of these applications is highly apt and crucial for interpretation of results. Some of the applications are listed with their approach.