Foresight Institute Logo
Image of nano

De novo protein design space extends far beyond biology

Foldit is a protein molecule modeling program used by citizen scientists worldwide to contribute to protein design research. Credit: University of Washington Institute for Protein Design

In his first (1981) publication on what he later (1986) termed nanotechnology Eric Drexler pointed to molecular engineering as a pathway from current biotechnology toward “general capabilities for molecular manipulation”, more recently described as “high-throughput atomically precise manufacturing“. Specifically, he pointed to de novo protein design as a path leading eventually to complex non-biological machinery, suggesting that designing proteins to fold as needed will be easier than predicting how natural proteins will fold. Accordingly de novo protein design has been one of our favorite topics on Nanodot—for example, these milestones from the past five years: “Designing protein-protein interactions for advanced nanotechnology“, “Gamers, citizen science, and protein structures“, “Crowd-sourced protein design a promising path to advanced nanotechnology“, “Nanotechnology milestone: general method for designing stable proteins“, “Computational design of protein-small molecule interactions“.

This past year, several major advances in de novo protein design have been reported by the research group of David A. Baker, who shared the 2004 Feynman Prize in Nanotechnology in the Theory category, at the University of Washington, and their collaborators at the Fred Hutchinson Cancer Research Institute. A hat tip to ScienceDaily for reprinting this University of Washington news release “Big moves in protein structure prediction and design“:

Custom design with atomic level accuracy enables researchers to craft a whole new world of proteins

The potential of modular design for brand new proteins that do not yet exist in the natural world is explored Dec. 16 in the journal Nature. The reports are the latest in a recent series of developments toward custom-designing proteins.

Naturally occurring proteins are the nanoscale machines that carry out nearly all the essential functions in living things.

While it has been known for more than 40 years that a protein’s sequence of amino acids determines its shape, it has been challenging for scientists to predict a protein’s three-dimensional structure from its amino acid sequence. Conversely, it has been difficult for scientists to devise brand new amino acid sequences that fold up into hitherto unseen structures. A protein’s structure dictates the types of biochemical and biological tasks it can perform.

The Nature letters look at one type of natural construction: proteins formed of repeat copies of a structural component. The researchers examined the potential for creating new types of these proteins. Just as the manufacturing industry was revolutionized by interchangeable parts, originating protein molecules with the right twists, turns and connections for their modular assembly would be a bold direction for biotechnology.

The letters are, “Exploring the repeat protein universe through computational design” [abstract, full text PDF courtesy of the Baker lab] and “Rational design of alpha-helical tandem repeat proteins with closed architecture.” [abstract, full text PDF courtesy of the Baker lab] The findings suggest the possibilities for producing protein geometries that exceed what nature has achieved. The work was led by postdoctoral fellows TJ Brunette, Fabio Parmeggiani and Po-Ssu Huang in David Baker’s lab at the University of Washington, and Lindsey Doyle and Phil Bradley at the Fred Hutchinson Cancer Research Institute in Seattle.

In addition, over the past several months, researchers at the UW Institute for Protein Design (IPD), Fred Hutch, and other institutions have described several advances in two longstanding problem areas in building new proteins from scratch.

“It has been a watershed year for protein structure predictions and design,” said Baker, a UW professor of biochemistry, Howard Hughes Medical Institute investigator, and head of the IPD. …

Because this news release reports a number of major advances that are not individually described, we will only consider the first of the above two Nature letters in this post. Subsequent posts will consider other research results highlighted by this news release. “Exploring the repeat protein universe through computational design” presents a completely automatic protocol to design variations on a specified structural motif. The protocol produces folds that are completely unlike those found in nature, suggesting that known families of protein structures sample only a small part of the polypeptide structure space, and thus opening up a wide array of new possibilities for molecular engineering.

Noting the widespread occurrence in nature of families of proteins made of multiple tandem copies of a repeating structural motif, that some of these naturally occurring repeat proteins have been re-engineered for molecular recognition and molecular scaffolding applications, and that all known designed repeat structures have been based on naturally occurring protein families, the authors ask if these families cover all stable repeat structures that can be built from the 20 genetically coded amino acids, or if natural evolution has only sampled a small subset of what is possible.

To explore the range of possible repeat protein structures, they generated new protein backbone arrangements and designed sequences, unrelated to any naturally occurring repeat protein, predicted to fold into those structures. Since they knew from natural proteins that a wide variety of curvatures can be generated by tandem repeating a helix-loop-helix-loop structural repeat, they chose a motif with two helices varied from 10 to 28 residues and two turns from 1 to 4 residues. A completely automated design process was used to generate designs fitting the chosen motif with low energies and complementary core side-chain packing. All the designs have an overall helical structure, and are thus classified on the basis of three parameters defining a helix: (1) the radius, (2) the twist between adjacent repeats around the helix axis, and (3) the translation between adjacent repeats along the helical axis.

The 761 designed helical repeat proteins (DHR) that passed all of the filters used to check for stable helices cover a much larger range of these parameters than do the native repeat families found in natural proteins. 83 designs, none of which were related to natural proteins, were selected for experimental characterization, expressed in Escherichia coli, and purified. 72 of these were stably folded at 95°C, and 53 of these were monomeric. The crystal structures were solved for 15 of these structures, and found to match the design over both the protein backbone and the hydrophobic core side chains. The designs have very different overall shapes; for example, linear and untwisted, linear and twisted, spiral, and a flat toroid. The authors conclude:

… The crystal structures illustrate both the wide range of twist and curvature sampled by our repeat protein generation process and the accuracy with which these can be designed. … The design models and sequences are very different from each other and from naturally occurring repeat proteins, without any significant sequence or structural homology to known proteins …

The fully automatic design protocol that this research uses to design examples of one structural motif clearly demonstrates that known, natural proteins are “… only the tip of the iceberg of what is possible for polypeptide chains …” It would seem that the design skills and software to design proteins that fold predictably has now reached a level of maturity that a serious attempt to investigate designed proteins as a path to general capabilities in molecular engineering, and eventually to high throughput atomically precise manufacturing, as proposed by Drexler in 1981, would not be unreasonable.
—James Lewis, PhD

Leave a Reply