115:412/508 Proteins and Enzymes                                                                                                    spring 2002

Post-translational Modification - The Peptide Chain

Last time I talked about translational modification of proteins, today I will begin talking about post-translational modifications.  Translational and post-translational modification, chem­ical modi­fication of proteins, and site-directed mutagenesis can at a deep level be grouped together; they are all ways of making a protein something other than what the wild-type gene specified, with its code for 20 'natural' amino acids.  The two deepest ways of dividing them are, is the modification translational - during normal protein synthesis by ribosomes - or post-translational, after synthesis of the protein chain?  Is the modification natural, something we ob­serve in nature and try to determine the significance of, or is it artificial, some­thing we do to the protein to find out more about how the natural protein works, or to improve its working, its stability or efficiency?  The post-translational modi­fications can be further subdivided into modifications of the side chains and modifications of the peptide chain, which includes con­trolled proteolysis - for instance, formation of insulin from proinsulin, activation of proteases - but also modifications of the amino and carboxyl termini, rearrangements of the protein sequence, and even inversion of an amino acid, l to d.

Let me make a table:

                                    Translational                                Post-translational

Natural                        selenocysteine                         Side-chain modifications, e.g. phosphoryla-
                                    mutations                                  tion, prenylation, acylation, O- and N-glycosi-
                                                                                       dation, nucleosidylation, di sulfide formation,
                                                                                       methylation, hypusine, vitamin K-dependent                               carboxyla-tion, hydroxylation, cross-links, etc.
                                                                                       Main chain modification: proteolytic cleavage,
                                                                                       rearrangement, inversion, N- and C-terminal
                                                                                       modification

Artificial                      site-directed mutation              Side-chain modifications: acylation, alkyla-
                                    insertion of unnatural                tion, ring substitution, disulfide reduction &
                                    amino acids from tRNA           oxidation
                                    analog amino acids                 Main chain modification: removal of amino
                                                                                       acids & insertion of others.

We have so far covered the translational changes, natural and artificial, the chemical modification of side chains, and a few of the natural modifications of side chains.  Today I will talk about modifications involving the main peptide chain of proteins, which result in proteins with sequence different from that coded for by the gene; the most remarkable of these, the excision and splicing of internal protein sequences called inteins, was largely discovered at New Eng­land Biologal Laboratories and was turned by them into a product - I'm giving you their flyer on the subject.  Next lecture I will talk about natural post-translation­al modifications; as with chemical modifications, I can't talk about all of them, but I'll give you copies of the table of contents of three volumes of Methods in Enzymology, as at least a list of presently known modifications.  I will talk about prenylation and fatty acylation, which are hot topics now, and I hope about disulfide bond interchange, which is of basic import­ance.

The natural changes can be further classified in at least two ways: whether they are re­versible or irreversible, and whether they are for control purposes or to change the properties of the protein - these last categories aren't fully separ­ate.  Reversible modifications such as phosphorylation are generally for control purposes, to activate or deactivate an enzyme, but so are some irreversible processes, notably activation of proteases by cleavage of a peptide bond.  Some modifications such as glycosidation, prenylation and fatty acylation seem to be to direct the protein to a particular location inside or outside the cell, as into mem­branes.  In many cases, such as the transfer of an aminobutyl moiety specifically to the initiation factor eIF-5A and its hydroxylation to create a hypusine residue, which Dr. K.Y. Chen works on, we don't know what the function of the modifica­tion is, but it must be importance, as disrupting it is lethal.

I begin with modifica­tions of the N- and C-termini of proteins.  Polypep­tide chains are synthe­sized begin­ning with formylmethionine, but the formyl group always, and indeed the methionine usually, is removed in the mature pro­tein.  Quite often the mature protein has the amino terminus acetyl­ated, usually if the amino acid is alanine or serine, sometimes glycine, methionine or aspartic acid; this is a pain is you are trying to deter­mine the amino terminal sequence by Edman degradation.  Amino-ter­minal glutamine easily cyclizes to form pyroglut­amic acid, in which the a-amino group replaces the separate nitrogen in amide linkage to the g carboxyl.  Sometimes this is an artifact of work-up of the protein, but usually it is real.  Another modification of the amino terminus, as of lysine, histidine and arginine side chains, is methyla­tion.  In cer­tain bacterial proteins, such as histi­dine decarboxylase of Lactobacillus and Clos­tridium perfringens, the amino terminal serine or threonine is deaminated to generate a pyruv­oyl or a-ketobutyryl ter­minus; this is used as an ana­log of pyridoxal 5'-phosphate in amino acid transamination and decarboxylation reactions.  The enzyme is synthesized as an inactive precursor protein with a Ser-Ser sequence; the car­boxyl of the first serine shifts to the b-hyd­roxyl and then is eliminated, taking the oxygen with it and leaving an a-aminoacrylate residue; the double bond presumably shifts to the nitrogen, yielding an imino acid, and the =NH is eas­ily replaced by oxygen from water, yielding the final keto acid. [1]   The sequence before the split remains in the mature complex as a separate sub­unit (the complex is actually a pentamer with two each of these small and large subunits).  The a-amino group also reacts with sugars, as in glucuronylglycine in some fungal enzymes and non-enzymic glycation when blood sugar is high in diabetes.

There are a number of cases where the C-terminal carboxyl group of the polypeptide is converted to an amide.  There aren't many other modifications of the C-terminus, though there is a removal of several amino acids associated with farnesylation or geranylgeranylation of a cysteine side chain three residues in from the C-terminus, which I'll talk about in con­nection with that reaction.

There are also cases where an additional amino acid, most often arginine, is put on to a synthesized protein, usually at the amino terminus but in one case at the C-terminus (tyrosine added to tubu­lin), usually from the charged tRNA, but without ribosomes or genetic code; the result is a protein which has a seemingly normal sequence but includes an amino acid not found in the gene!  This can also be done artificially.  A protein can be cleaved at a specific position by a specific protease, a single amino acid at the new C-terminus removed by a car­boxypeptidase, and the protein incubated in an organic sol­vent with a protease stable under that condition and a high con­centration of another amino acid.  Under these conditions the proteolytic cleavage runs in reverse, because water is at a low concentration rather than 55 m, and the added amino acid is inserted and peptide bonds reformed.

Mainly I want to talk about things that happen to the peptide bond.  Of course the sim­plest case is removal of the original N-terminal methionine; the next simplest case is proces­sing of signal peptides.  This is a major topic which a whole lecture could be spent on, and you can find it in general biochemistry textbooks.  The polypeptide as synthesized includes an amino-terminal sequence rich in hydrophobic amino acids, which may bind to the signal recog­nition parti­cle and be delivered to the membranes of the endo­plasmic reticulum, where gly­cosylation takes place and the signal peptide is cleaved off.  Or this pre­protein goes to and into the cell membrane of the bacterial cell, or the outer membrane of the mitochondrion or chloroplast.  In these cases the signal peptide is cleaved off in two stages, one on going through the outer membrane to the intermem­brane space, one on going through the inner membrane; only when both signal sequences are removed does it fold to its final structure.  Or, as with cytochrome c1, the protein may go first to the matrix inside the inner membrane, then back to the intermembrane space.  There is a specific signal peptidase, or several, which recognizes a specific sequence of the signal peptide to cleave it; by now computers are programmed to recognize this in a gene sequence.  Proof of a signal peptide is obtained by comparing the amino acid sequence of the mature protein with that deduced from the gene; the signal sequence is the difference.

Another even vaster topic is the cleavage of a precursor form, called a pro­protein, to yield the final active form.  Even 20 years ago there was a very large book, Proteases and Biological Control, on this subject.  The classic cases are the activation of chymotrypsinogen to chymotrypsin and the formation of pep­tide hormones such as insulin.  Chymotrypsinogen is synthesized in the pancreas as the in­active zym­ogen chymotrypsin and secreted into the intestine in this form.  The critical cleavage, by trypsin, is of the arg15-ile16 peptide bond, freeing the a-amino group of ile16 to interact with the side chain carboxyl of asp-184; this allows small conformational changes which make the active site active in ways I shall discuss later.  This form is called p-chymotrypsin; the amino terminal pep­tide is still attached to the rest of the protein by a disulfide bond of cys1.  A fur­ther proteolytic cleavage removes the dipeptide ser14-arg15, yielding d-chymo­trypsin, and others remove the dipeptide thr147-asn148, yield­ing the final pro­duct a-chymotrypsin, the form usually studied.

Insulin is synthesized as a preproprotein 105 amino acids long.  A 24-a.a. signal peptide is removed as it goes into the endoplasmic reticulum, yielding pro­insulin.  This folds and forms disulfide bonds between cysteines 7 and 67 and between 19 and 80.  A trypsin-like protease cleaves this at arg31 and arg60.  The sequence 32-60 goes away, and arg31 is trimmed off by a carboxypeptidase to yield mature insulin, with the A and B chains held together by the disulfide bonds.  It will not fold up naturally on reduction of the disulfides, because fold­ing information in the 32-60 sequence is no longer present.  Peptide hormones and other bioactive peptides are generally formed similarly, by being cut out of an initially synthesized large protein; for in­stance, the precursor preproopiomelanocortin can yield, on different cleavages, corticotropin, b- and g-lipotropin, a-, b- and g-MSH, enkephalin and endorphin.

This is just cleavage of peptide bonds; could new peptide bonds be formed?  Indeed they can; some cases are summarized by Cooper and Stevens [2] .  In sever­al cases, notably the RecA protein from Mycobacter­ium tuberculosis [3] and the catalytic subunit Tfp1p of the vacuolar H+ ATPase of Saccharomyces cerevisiæ (yeast) [4] , the protein is the same size as in other organ­isms, but the open reading frame in the gene is larger, and the homology is at the beginning and the end of the sequence.  This could be due to RNA editing, but the only RNA observed is the size expected from the open reading frame.  In the yeast case it was shown that translation results in one copy each of the 69 kDa Tfp1p and of a 50 kDa spacer protein.  Frame-shifting deletions in the spa­cer resulted in a trun­cated protein, the NH2 terminal part of the mature protein + part of the spacer, while deletion of an entire codon still yielded the mature protein.  Sequencing the ap­propriate peptide of the mature protein yielded a sequence running across the splice point, proving that there really is a peptide bond formed.  In both cases splicing substi­tutes a cyste­ine from the amino terminus of the C-terminal domain for a cyste­ine of the spacer.

The well-known lectin (carbohydrate-binding protein) concanavalin A does even more [5] .  It is synthesized as a 261-a.a. precursor which goes into the endo­plasmic reticulum.  Here it is clipped at residues 119 and 130; the small peptide goes away, but the two-chain form is properly folded and binds carbohydrates.  In the folded form the amino terminus apparently is very near residue 252; it displaces the amino group of residue 253, with formation of a new peptide bond.  Subsequently the peptide 131-134 is trimmed off.  Thus the mature protein has the original residue 135 as amino terminus and the original residue 119 as C-terminus.  Concanavalin A made in E. coli with a bacterial signal peptide attached goes to the periplasmic space and does not rearrange, but a related protein which stays in the cytoplasm does rearrange in this way.  All the changes occur at the carboxyl side of an asparagine; the residues following are not specified in the Cooper and Stevens paper.  It is now known that the residue on the C-termi­nal side of the cleavage is always a cysteine, serine or threonine.

The most studied case is the DNA polymerases from the thermophilic arch­æa Thermococcus littoralis and Pyrococcus sp. GB-D, which Francine Perler at New England BioLabs was trying to clone.  Homology of the gene to other DNA polymerases showed in Thermococcus two intervening sequences [6] (these are now termed inteins [7] , by anal­ogy with introns; the pieces spliced together are called exteins), and expression of the gene in E. coli gave a mess.  Removal of one of the inteins resulted in the expected 90 kDa polymerase and a 45 kDa protein representing the other intein.  This has sequence similarity to 'homing' endo­nucleases which are coded by introns and install their DNA sequence into allelic genes that lack the introns, and in some cases endo­nuclease activ­ity has been shown; it seems to be involved in moving the intein DNA sequence around in the genome of the org­an­ism.

Perler's group studied [8] the Pyrococcus intein by instal­ling it in a chimeric gene for mal­tose-binding pro­tein + paramyosin Sal; proteins containing the maltose-binding domain can be purified by adsorption on amylose.  Letting E. coli synthesize protein overnight at 12° - 20° made lots of a three-part 132 kDa chimeric pro­tein, MIP, as well as an apparent­ly 180 kDa form they called MIP*, and products of cleav­age at either the N- or the C-termi­nus of the in­tein (M + IP, MI + P).  Presumably MIP can accumulate because the temperature is so far below the growth temperature of the thermophile from which it came.  Warming up the precur­sor MIP to 37° resulted in splicing, yielding mainly the spliced mal­tose binding protein-para­myosin (MP) + the intein (I), along with some IP.  Splicing is fastest at pH 6, slower at higher pH.  The slow migrating form, MIP*, is a branched inter­mediate of the splicing reaction; it has two amino-terminal sequences, those of the maltose-binding protein and the intein.  During a slow splicing reaction the amount of it increases, then decreases.  At high pH (10) it goes back to the linear form MIP.

Splicing of MIP made in E. coli and purified by chromatography on amylose indicates that it doesn't require any other enzyme, and only the intein sequence, followed by serine, cyste­ine or in one case threonine, is necessary to get splicing.  Presumably the intein folds to bring the splice points in close proximity, and presumably the extein structures have to allow this folding.  The intein sequence begins with a cys or ser and ends, in the seven inteins known, with three hydro­phobic amino acids, then his-asn.  (Some other 'putative' inteins have been found by sequence homology which however have a glycine, or in one case a phenyl­alanine, instead of the histidine.  It is not known whether these putative inteins actually splice.)  The released intein has been shown [9] by mass spectrometry to be released with a C-terminal succinimide formed from the asparagine.  The branched intermediate is considered to involve an ester linkage to the OH or SH of the N-terminal ser or cys of the C-terminal extein, paramyosin (P) in this example.  This paper found that incubating guanidine-denatured branched intermediate at pH 9 yielded M + IP, indicating that the more alkali-labile ester bond was from the amino-terminal extein to the C-terminal extein, rather than from the intein.

The now fairly accepted mechanism for splicing [10] , [11] , is shown in Fig. 2 of the handout.  It involves (i) nucleophilic attack of the intein ser or cys on the C-terminal carboxyl of the amino terminal extein, an N-S or N-O shift such as is also seen in formation of the a-keto group in histidine decarboxylase, and as observed with ordinary peptides under acid conditions, (ii) transesterifica­tion to the downstream ser or cys, forming the branched inter­mediate; (iii) attack of the NH2 of the C-terminal asparagine of the intein on this carboxyl, yielding the succinimide C-terminus of the intein, (iv) O (or S) to N shift of the carboxyl back to the amino group of the downstream extein.  This O-N shift must be rapid, since the MP product is alkali-stable.  The histidine next to the aspara­gine seems to be involved in step iii but not i or ii, since when it is replaced by other amino acids the branched intermed­iate is formed but splicing doesn't occur.  The mech­anism has also been investigated for the VMA intein of the vacuolar ATPase of Saccharomyces cerevisiæ [12] Sce VMA intein for short.

Since Xu and Perler are at a company, they have developed a product, a protein purifica­tion system similar to those with a cleavable glutathione-S-transferase or maltose-binding pro­tein but requiring no proteolytic enzyme for cleavage; it uses the intein for cleavage, but does not splice the C-terminal sequence back on.  Your protein is cloned into a vector with its C-ter­minus next to the amino-terminal cysteine of the Sce VMA intein, with a chitin-binding domain beyond the intein.  The fusion protein is produced in E. coli and adsorbed onto a chitin column.  The column is then incubated overnight at 4° with dithiothreitol or b-mercaptoethanol, which plays the role of Cys455, the N-terminal of the sec­ond extein - it replaces the intein cys­teine by transesterification.  Your protein is thus released, with dithiothreitol in thioester linkage to the C-terminus.  You could easily cleave this off with hydroxylamine, or replace it with [14C]-cysteine which would transesterify and then undergo S-N shift to form a peptide bond, thus stably radiolabeling the protein.

Some related reactions are mentioned by Shao and Kent.  The 'sonic hedge­hog' pre­cursor protein - a name resulting from the imagination of Drosophila geneticists - which is important in patterning of embryonic structures, self-cleaves into two proteins at a cysteine residue.  In this case the subsequent transesterifica­tion is not to another cysteine, but to the hydroxyl of a cholesterol, making one protein more hydrophobic [13] .  So, even developmental molecular biologists need to know protein chemistry.

It remains to crystallize the splicing precursor protein to locate the various amino acids in space and determine whether only neighboring amino acids are needed to catalyze the reac­tions, or others more distant in space also act.  If there are, they can only be in the intein, since that is all that is needed; and the frequent appearance of self-cleavage reactions leaving an amino terminal serine, threonine or cysteine suggests that at least the N-O or N-S shift requires no unusual structure.

An even newer modification of an amino acid in a protein is inversion of an l-amino acid to a d-amino acid [14] , [15] .  The funnel-web spider Agelenopsis aperta produces a venom contain­ing peptide toxins which paralyze its prey by block­ing voltage-sensitive Ca++ channels.  The toxins are synthesized as larger precursors containing also signal sequences for extracellular transport and acdic sequences cleaved off to produce the mature toxins.  Two of these toxins, IVB and IBC, have identical sequences 48 a.a. long and the same disulfide bonds, but are sep­arable on hplc.  IVB is considerably more toxic.  Protease cleavage at Glu42 or CNBr cleavage at Met43 yielded C-terminal hexa- or pentapeptides from the two toxins which were separable by hplc, while the pep­tides from the rest of the toxin were indistinguishable.  Peptides with the C-ter­minal sequence, Gly-Leu-Ser-Phe-Ala, were synthesized with either l- or d-ser at posi­tion 46 (probably they tried other d-amino acids too, but the structure paper wasn't published yet) and coeluted with the peptides from the toxins, the l-ser peptide with that from IVC, the d-ser peptide with that from the more ac­tive IVB.  They then synthesized the complete toxins with l- or d-ser at position 46, and showed that what folded correctly was identical to the natural toxin.

Crude venom converted IVC to IVB; this was better demonstrated when the venom was fractionated by gel filtration, separating the 30 kDa isomerase from an 86 kDa metalloprotease which otherwise degraded IVC.  This points up one reason for such inversion: d-amino acid peptide bonds are not cleaved by the proteases that cleave natural l-amino acid peptide bonds, which the insect prey might use to detoxify IVC.  But more importantly, the inversion allows the toxin to be a better fit to the Ca++ channels it blocks, making it a more potent toxin; the repertory of protein conformation has been added to.  I could give you more examples of similar but much smaller peptides with d-amino acids, but they are in the Kreil paper I am giving you.

In connection with what I just said about d-amino acid peptide bonds being stable to proteases, I mention that pharmaceutical companies are much interes­ted in biologically active peptides with either d-amino acids or methylated nit­ro­gens in the peptide bond, since these will be resistant to proteolytic cleavage, so that the pill can be taken by mouth, the peptide won't be chewed up in the stomach.


References on polypeptide splicing and amino acid inversion 



[1] Recsei, P.A., Huynh, Q.K., and Snell, E.E., Proc. Nat. Acad. Sci. 80:973-977 (1983)

[2] Cooper, A.A, and Stevens, T.H., BioEssays 15:667-674 (1993)  Review.

[3] Davis, E.O., Jenner, P.J., Brooks, P.C. Colston, M.J., and Sedgwick, S.G., Cell 71:201-210 (1992)  Splicing of M. tuberculosis RecA protein.

[4] Cooper, A.A., Chen, Y., Lindorfer, M.A., and Stevens, T.H., EMBO J. 12:2575-2583 (1993)  Splic­ing of S. cerevisiæ TFP1.

[5] Bowles, D.J., and Pappin, D.J., Trends Biochem. Sci. 13:60-64 (1988) Traffic and assembly of concanavalin A.

[6] Perler, F.B., Comb, D.G., Jack, W.E., Moran, L.S., Qiang, B., Kucera, R.B., Benner, J., Slatko, B.E., Nwankwo, D.O., Hempstead, S.K., Carlow, C.K.S., and Jannasch, H., Proc. Nat. Acad. Sci. USA 89:5577-5581 (1992)  Intervening sequences in an Archæa DNA polymerase gene.

[7] Perler, F.B., Davis, E.O., Dean, G.E., Gimble, F.S., Jack, W.E., Neff, N., Noren, C.J., Thorner, J., and Belfort, M., Nucleic Acids Res. 22:1125-7 (1994)  Nomenclature of protein splicing.

[8] Xu, Ming-Qun, Southworth, M.W., Mersha, F.B., Hornstra, L.J., and Perler, F.B., Cell 75:1371-1377 (1993)  In vitro splicing of purified precursor and identification of branched intermediate.

[9] Xu, Ming-Qun, Comb, D.G., Paulus, H., Noren, C.J., Shao, Y., and Perler, F.B., EMBO J. 13:5517-5522 (1994)  Analysis of the branched intermediate and its resolution by succinimide formation.

[10] Xu, Ming-Qun, and Perler, F.B., EMBO J. 15:5146-5153 (1996)  The mechanism of protein splicing, as explicated by mutation of critical residues at the splice sites.

[11] Shao, Y., and Kent., S.B.H., Chemistry & Biology 4:187-194 (1997)  Review.

[12] Chong, S, Shao, Y., Paulus, H., Benner, J., Perler, F.B., and Xu, M.-Q., J. Biol. Chem. 271:22159-22168 (1996)  Similar study of splicing at the VMA intein of the vacuolar ATPase of Saccharomyces cerevisiæ; development of a mesophilic in vitro splicing system and protein cleavage reaction.

[13] Porter, J.A., Young, K.E. & Beachy, P.A., Science 274:255-9 (1996).

[14] Kreil, G., Science  266:996-7 (1994) Commentary on the next paper.

[15] Heck, S.D., Siok, C.J., Krapcho, K.J., Kelbaugh, P.R., Thadeio, P.F., Welch, M.J., Williams, R.D., Ganong, A.H., Kelly, M.E., Lanzetti, A.J., Gray, W.R., Phillips, D., Parks, T.N., Jackson, H., Ahlijanian, M.K., Saccomano, N.A., and Volkmann, R.A., Science 266:1065-1068 (1994)