Protein Folding -- Predicting Structure And Function

My Thesis

The topic of my masters thesis project at George Mason University is the study of a genetic algorithm approach to predicting protein structure in abstract proteins folded in 3-dimensional lattices. This is an important topic in the field of proteomics and one of the "grand challenges" of molecular biology. If you are unfamiliar with the protein folding problem, check out my introduction to protein folding.

Hydrophobic Packing Problems

Because hydrophobicity is thought to influence something like 70% of a protein's folding behavior, I am discarding the electrodynamics and mechanical constraints in real world proteins, and abstracting the proteins as simple chains of hydrophobic or polar amino acids. Instead of folding these proteins in normal three-space, we will fold our abstract proteins onto an artificial three dimensional lattice.

These are the essentials of the "hydrophobic packing model" pioneered by K.A. Dill. Hydrophobic packing models discard the ancillary details of protein folding without reducing the computational complexity of the problem. Thus, we can devise problem solving systems which can scale-up to solving real-world protein folding problems.

Evolutionary Computation Approach

Because the protein folding problem is considered to be an NP-complete problem, most investigators have given up on trying to find polynomial time solutions, and turned to massively parallel computers to derive solutions by brute force. These approaches are unacceptable for two reasons:

Interesting proteins are typically large: several thousand amino acids in length. An exhaustive computer search for the optimal solution would require billions of years even harnessing all the computer power on the planet.
Such approaches contribute nothing to the understanding of the pathways by which proteins fold. Big powerful computers are impressive engineering feats, but not scientifically useful because they can't tell you "why" proteins fold the way they do.

Other research has turned to non-admissible methods which use clever heuristics for deriving the structures of proteins. Genetic algorithms, because they have been used to successfully solve other problems with NP-complete characteristics, are also a popular approach and can be further enhanced using heuristics. My research utilizes a novel genetic algorithm to folding proteins.

No Free Lunch

Unlike exhaustive search techniques, genetic algorithms utilize non-admissible heuristics and thus cannot gaurentee polynomial time solutions... and even if a GA arrived at the optimal solution, it wouldn't know that the solution is optimal because it hasn't examined all of the conformations possible. Gaurenteeing a solution is tantemount to saying that you know the optimal solution's fitness value ahead of time. But since fitness relies on structure, you would have to know the structure ahead of time as well. This is the so-called "termination problem" in GAs, and is also an unsolved problem.