# Difference between revisions of "ChargeCalculator:Theoretical background"

(→EEM Cutoff) |
|||

(9 intermediate revisions by the same user not shown) | |||

Line 3: | Line 3: | ||

EEM is an empirical method developed as a cost-effective alternative to quantum mechanics (QM) based methods, as it enables the determination of atomic charges that are sensitive to the molecule's topology and three-dimensional structure. EEM has been successfully applied to zeolites, small organic molecules, polypeptides and proteins. | EEM is an empirical method developed as a cost-effective alternative to quantum mechanics (QM) based methods, as it enables the determination of atomic charges that are sensitive to the molecule's topology and three-dimensional structure. EEM has been successfully applied to zeolites, small organic molecules, polypeptides and proteins. | ||

− | ACC implements | + | ACC implements the classical EEM formalism (''Full EEM''), along with two additional modifications (''EEM Cutoff'', ''EEM Cutoff Cover''). We give a brief description of each below. Please refer to the literature for a more in-depth description of EEM and examples of applications (e.g., <ref name="Ionescu_2013"/><ref name="Svobodova_2013"/>). |

− | =EEM= | + | =Full EEM= |

The classical EEM formalism estimates atomic charges via a set of coupled linear equations: | The classical EEM formalism estimates atomic charges via a set of coupled linear equations: | ||

Line 11: | Line 11: | ||

[[File:EEM equation.png]] | [[File:EEM equation.png]] | ||

− | In order solve this system of equations and calculate the atomic charges for all atoms, the following terms need to be known: | + | In order to solve this system of equations and calculate the atomic charges for all atoms, the following terms need to be known: |

* distances between all pairs of atoms | * distances between all pairs of atoms | ||

* total molecular charge | * total molecular charge | ||

Line 26: | Line 26: | ||

While EEM is very fast compared to QM methods, handling large molecules or complexes still requires significant time and memory resources. In order to make such calculations accessible to you in real time, ACC implements two special EEM approximations. | While EEM is very fast compared to QM methods, handling large molecules or complexes still requires significant time and memory resources. In order to make such calculations accessible to you in real time, ACC implements two special EEM approximations. | ||

− | The | + | The ''EEM Cutoff'' approximation employs a cutoff for the size of a given system of equations being solved. Specifically, for each atom, ACC solves a system containing only the equations for atoms within a certain distance in Angstrom (''cutoff radius'') from the given atom. The number of equations considered depends on the density of the molecular structure and overall shape of the molecule in the area of that particular atom. |

Thus, for a molecule with 10000 atoms and a cutoff radius of 10, instead of solving one matrix with 10000 x 10000 elements, ACC will solve 10000 matrices of much smaller size (approximately from 50 x 50 up to 400 x 400). The essence of the ''EEM Cutoff'' method is that, instead of a very large calculation, ACC will run many small calculations, each of them being less memory and time demanding than the original one. ''EEM Cutoff'' is therefore efficient only for large molecules, containing at least several thousands of atoms. | Thus, for a molecule with 10000 atoms and a cutoff radius of 10, instead of solving one matrix with 10000 x 10000 elements, ACC will solve 10000 matrices of much smaller size (approximately from 50 x 50 up to 400 x 400). The essence of the ''EEM Cutoff'' method is that, instead of a very large calculation, ACC will run many small calculations, each of them being less memory and time demanding than the original one. ''EEM Cutoff'' is therefore efficient only for large molecules, containing at least several thousands of atoms. | ||

− | In other words, running ''EEM Cutoff'' is like running ''EEM'' for a set of overlapping fragments of the original molecule. A fragment is generated for each atom. The position and type of the atoms in each fragment are the same as in the original molecule. The only issue is the total charge of the fragment. ''EEM Cutoff'' assigns each fragment a quota of the total molecular charge | + | In other words, running ''EEM Cutoff'' is like running ''EEM'' for a set of overlapping fragments of the original molecule. A fragment is generated for each atom. The position and type of the atoms in each fragment are the same as in the original molecule. The only issue is the total charge of the fragment. ''EEM Cutoff'' assigns each fragment a quota of the total molecular charge proportional to the number of atoms in the fragment, and irrespective of the nature of these atoms. Then ACC solves the EEM equation for each fragment, and for each such calculation returns the charge of the atom at the center of the fragment. Once all fragments have been processed, each atomic charge is adjusted by a constant value to ensure that the sum of all atomic charges is the total molecular charge. While this algorithm may not be chemically rigorous, it has proven both robust and sufficiently accurate (RMSD less than 0.003e compared to Full EEM) if the cutoff radius is relevant (over 8 Angstrom). |

=EEM Cutoff Cover= | =EEM Cutoff Cover= | ||

− | To further enhance the time and memory efficiency of EEM | + | To further enhance the time and memory efficiency of EEM, ACC implements an additional approximation with specific focus on large biomolecular complexes with hundreds of thousands of atoms. This additional approximation is applied to the ''EEM Cutoff'' method in order to reduce the number of EEM matrices that will be solved. |

+ | |||

+ | While in the ''EEM Cutoff'' method ACC generates one fragment for each atom in the molecule, this further approximation generates fragments only for a subset of atoms. The algorithm by which this subset of atoms is obtained ensures that each atom in the molecule will eventually contribute to at least one fragment. In other words, the entire volume of the molecule is covered, and the method is thus termed ''EEM Cutoff Cover''. | ||

+ | |||

+ | In ''EEM Cutoff Cover'', the subset of fragment generating atoms is obtained in such a way that: | ||

+ | * no two atoms in this subset are connected to each other | ||

+ | * each atom in the molecule has at least one neighbor (within two bonds) included in this subset. | ||

+ | |||

+ | The fragments for ''EEM Cutoff Cover'' are generated in the same way as for ''EEM Cutoff'', according to the ''cutoff radius''. Thus, the average size of the resulting EEM matrices will not differ. However, since fewer fragments are generated for ''EEM Cutoff Cover'', the final number of EEM matrices to be solved will be up to 4 times lower than for ''EEM Cutoff''. The charge on each atom in the molecule is then computed as the sum of its charge contributions from each fragment. Further, each atomic charge is corrected in such a way that the sum of all atomic charges equals the total molecular charge. ''EEM Cutoff Cover'' has also proven robust and sufficiently accurate (RMSD less than 0.003e compared to the ''EEM Cutoff'' of comparable cutoff radius), and is the method of choice for biomolecular complexes of tens of thousands of atoms and higher. | ||

+ | |||

+ | The only notable issue which may arise with the ''EEM Cutoff Cover'' approach is when the molecular system contains atoms for which there are no EEM parameters. Unlike in the ''full EEM'' and ''EEM Cutoff'' approaches, these atoms are not entirely ignored here. Specifically, these atoms are included in the subset of atoms used in the fragment generation step. In extremely rare cases, the molecular system might contain atoms which are only connected to atoms without EEM parameters and which have been included in the fragment generation step. In this case, ''EEM Cutoff Cover'' will not be able to calculate charges for these uniquely bonded atoms, even when EEM parameters are available for them. This problem is very unlikely to arise if H atoms are present, and will only affect a small number of charges even if it does. | ||

+ | |||

+ | |||

+ | |||

+ | '''Return to the [[ChargeCalculator:UserManual | Table of contents]].''' | ||

+ | |||

+ | =References= | ||

+ | <references> | ||

+ | <ref name="Ionescu_2013">Ionescu C-M, Geidl S, Svobodová Vareková R, Koca J. Rapid Calculation of Accurate Atomic Charges For Proteins via the Electronegativity Equalization Method. J. Chem. Inf. Model., 2013, 53 (10), pp 2548-2558. DOI: 10.1021/ci400448n</ref> | ||

+ | |||

+ | <ref name="Svobodova_2013">Svobodová Vareková R, Geidl S, Ionescu C-M, Skrehota O, Bouchal T, Sehnal D, Abagyan R, Koca J. Predicting pKa values from EEM atomic charges. J. Cheminf. 2013, 5, 18.</ref> | ||

+ | </references> |

## Latest revision as of 20:47, 29 May 2015

The *Electronegativity Equalization Method* (EEM) is the approach employed by ACC to calculate atomic charges.

EEM is an empirical method developed as a cost-effective alternative to quantum mechanics (QM) based methods, as it enables the determination of atomic charges that are sensitive to the molecule's topology and three-dimensional structure. EEM has been successfully applied to zeolites, small organic molecules, polypeptides and proteins.

ACC implements the classical EEM formalism (*Full EEM*), along with two additional modifications (*EEM Cutoff*, *EEM Cutoff Cover*). We give a brief description of each below. Please refer to the literature for a more in-depth description of EEM and examples of applications (e.g., ^{[1]}^{[2]}).

# Full EEM

The classical EEM formalism estimates atomic charges via a set of coupled linear equations:

In order to solve this system of equations and calculate the atomic charges for all atoms, the following terms need to be known:

- distances between all pairs of atoms
- total molecular charge
- empirical parameters (here k,A,B) covering all atom types present in the molecule

ACC calculates the interatomic distances based on the atomic positions it reads from the molecular structure file. The user is required to provide the total charge, or ACC will assume the molecule is neutral (total molecular charge is 0). EEM parameters for each atom type (e.g., carbon, oxygen) present in the molecule are read from a set of EEM parameters suitable for the molecule in question. Many sets of EEM parameters have been published in literature, and are available in ACC as built-in sets. These sets may be used as they are, or with user modifications where necessary.

EEM parameters are generally developed based on reference QM calculations. A QM-based charge calculation approach is characterized by the setup of the wave function calculation (theory level, basis set, environment), as well as by the procedure used to partition the molecular electron density, or to deduce the electrostatic contribution of each atom. We refer to the sum of these characteristics as the "charge definition".

The maximum accuracy and potential application of any set of EEM parameters is given by the charge definition used during its development. Performance is further influenced by the procedure used when fitting the EEM parameters to the reference data.

# EEM Cutoff

While EEM is very fast compared to QM methods, handling large molecules or complexes still requires significant time and memory resources. In order to make such calculations accessible to you in real time, ACC implements two special EEM approximations.

The *EEM Cutoff* approximation employs a cutoff for the size of a given system of equations being solved. Specifically, for each atom, ACC solves a system containing only the equations for atoms within a certain distance in Angstrom (*cutoff radius*) from the given atom. The number of equations considered depends on the density of the molecular structure and overall shape of the molecule in the area of that particular atom.

Thus, for a molecule with 10000 atoms and a cutoff radius of 10, instead of solving one matrix with 10000 x 10000 elements, ACC will solve 10000 matrices of much smaller size (approximately from 50 x 50 up to 400 x 400). The essence of the *EEM Cutoff* method is that, instead of a very large calculation, ACC will run many small calculations, each of them being less memory and time demanding than the original one. *EEM Cutoff* is therefore efficient only for large molecules, containing at least several thousands of atoms.

In other words, running *EEM Cutoff* is like running *EEM* for a set of overlapping fragments of the original molecule. A fragment is generated for each atom. The position and type of the atoms in each fragment are the same as in the original molecule. The only issue is the total charge of the fragment. *EEM Cutoff* assigns each fragment a quota of the total molecular charge proportional to the number of atoms in the fragment, and irrespective of the nature of these atoms. Then ACC solves the EEM equation for each fragment, and for each such calculation returns the charge of the atom at the center of the fragment. Once all fragments have been processed, each atomic charge is adjusted by a constant value to ensure that the sum of all atomic charges is the total molecular charge. While this algorithm may not be chemically rigorous, it has proven both robust and sufficiently accurate (RMSD less than 0.003e compared to Full EEM) if the cutoff radius is relevant (over 8 Angstrom).

# EEM Cutoff Cover

To further enhance the time and memory efficiency of EEM, ACC implements an additional approximation with specific focus on large biomolecular complexes with hundreds of thousands of atoms. This additional approximation is applied to the *EEM Cutoff* method in order to reduce the number of EEM matrices that will be solved.

While in the *EEM Cutoff* method ACC generates one fragment for each atom in the molecule, this further approximation generates fragments only for a subset of atoms. The algorithm by which this subset of atoms is obtained ensures that each atom in the molecule will eventually contribute to at least one fragment. In other words, the entire volume of the molecule is covered, and the method is thus termed *EEM Cutoff Cover*.

In *EEM Cutoff Cover*, the subset of fragment generating atoms is obtained in such a way that:

- no two atoms in this subset are connected to each other
- each atom in the molecule has at least one neighbor (within two bonds) included in this subset.

The fragments for *EEM Cutoff Cover* are generated in the same way as for *EEM Cutoff*, according to the *cutoff radius*. Thus, the average size of the resulting EEM matrices will not differ. However, since fewer fragments are generated for *EEM Cutoff Cover*, the final number of EEM matrices to be solved will be up to 4 times lower than for *EEM Cutoff*. The charge on each atom in the molecule is then computed as the sum of its charge contributions from each fragment. Further, each atomic charge is corrected in such a way that the sum of all atomic charges equals the total molecular charge. *EEM Cutoff Cover* has also proven robust and sufficiently accurate (RMSD less than 0.003e compared to the *EEM Cutoff* of comparable cutoff radius), and is the method of choice for biomolecular complexes of tens of thousands of atoms and higher.

The only notable issue which may arise with the *EEM Cutoff Cover* approach is when the molecular system contains atoms for which there are no EEM parameters. Unlike in the *full EEM* and *EEM Cutoff* approaches, these atoms are not entirely ignored here. Specifically, these atoms are included in the subset of atoms used in the fragment generation step. In extremely rare cases, the molecular system might contain atoms which are only connected to atoms without EEM parameters and which have been included in the fragment generation step. In this case, *EEM Cutoff Cover* will not be able to calculate charges for these uniquely bonded atoms, even when EEM parameters are available for them. This problem is very unlikely to arise if H atoms are present, and will only affect a small number of charges even if it does.

**Return to the Table of contents.**

# References

- ↑ Ionescu C-M, Geidl S, Svobodová Vareková R, Koca J. Rapid Calculation of Accurate Atomic Charges For Proteins via the Electronegativity Equalization Method. J. Chem. Inf. Model., 2013, 53 (10), pp 2548-2558. DOI: 10.1021/ci400448n
- ↑ Svobodová Vareková R, Geidl S, Ionescu C-M, Skrehota O, Bouchal T, Sehnal D, Abagyan R, Koca J. Predicting pKa values from EEM atomic charges. J. Cheminf. 2013, 5, 18.