|
|
||||||||
Edward Jenner Institute for Vaccine Research, Compton, Berkshire, United Kingdom
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
The largest and most systematic screen for determining the sequence specificity of TAP was performed by Uebel et al. (14). A peptide-binding motif for TAP has been defined by van Endert et al. (15). Hydrophobic aromatic amino acids are preferred at the C terminus, position 3 (p3) and position 7 (p7). Hydrophobic or positively charged residues are preferred at position 2 (p2), whereas aromatic or acidic residues are preferred at position 1 (p1). Proline at positions p1 and p2 has a strong negative effect on TAP binding. TAP preferences were confirmed by combinatorial peptide libraries (16), artificial neural networks studies (17), and matrix-based scoring functions (18, 19). The results of such studies are consistent with TAP contributing significantly to epitope selection.
Recently, we proposed a method for predicting the binding affinity of peptide-protein interactions, which we called the additive method (20). The additive method assumes that each amino acid makes an additive and constant contribution to the biological activity regardless of amino acid variation in the rest of the peptide, with possible interactions between amino acids being accounted for by cross-terms. The method was applied initially to predict the affinity of peptides binding to HLA-A*0201 molecule (20) and was then extended to 10 other human MHC class I (21, 22), three murine class I (our unpublished observations), three human class II (23), and six murine class II molecules (our unpublished observations). Most of these predictive models can be accessed via the MHCPred server (URL: http://jenner.ac.uk/MHCPred) (24). The additive method is universal, being equally applicable to any peptide-protein interaction. We have subsequently used this method in the cyclical optimization of high affinity peptides binding to HLA-A*0201, generating superbinders and anchorless epitopes (25). In the present study we used the additive method to develop a TAP binding prediction model, analyzing and extending the TAP binding motif. Importantly, we also evaluated how well this model acts as a preselection step in predicting MHC binding peptides. To distinguish between fully and partially TAP-dependent alleles, two datasets were examined. Peptides binding to HLA-A*0201 were selected as representatives of HLA alleles exhibiting partial TAP dependence, and peptides binding to HLA-A*0301 represented fully TAP-dependent HLA alleles.
| Materials and Methods |
|---|
|
|
|---|
A set of 163 polyalanine nonameric peptides was used as a training set. Originally, using the peptide AAASAAAAY as the parent peptide, a set was prepared that included all natural amino acids except cysteine substituted at each position. This set was used to develop an artificial neural network-based prediction model (17). Because the binding affinities for the peptides were presented in the form of a histogram and used IC50 values relative to the reference peptide RRYNASTEL, we initially extracted these from the graph and then calculated raw, nonnormalized IC50 values, which we present as logIC50 values (pIC50). This set was used to develop an additive model for TAP binding.
Two sets of nonameric peptides were used to test the predictive ability of the additive model. Set A consisted of 47 analogues of the peptide ALAKAAAAV (data from Fig. 1 of Ref. 17). Originally, affinities were presented as IC50 values relative to the parent peptide ALAKAAAAV. Set B included 38 nonamers (data from Tables II and IV of Ref. 17), with affinities presented as IC50 values relative to the reference peptide RRYNASTEL.
|
|
|
Additive method
The additive method (20) was applied to the training set of 163 polyalanine nonameric peptides. A matrix of 172 columns and 163 rows was generated. The number of columns corresponded to 171 independent variables (nine positions x 19 amino acids) and one dependent variable (pIC50). The number of rows equaled the number of peptides in the training set. Each peptide was represented by a binary string of 171 ones and naughts. A term is equal to 1 when a particular amino acid at a particular position is present and is equal to 0 when it is absent. The matrix was solved by partial least squares (PLS) linear regression, as implemented in SYBYL 6.7 (29). PLS is a so-called projection method (30) that can handle matrixes with more variables than observations and with noisy and highly collinear data. In this situation, conventional statistical methods, such as multiple regression, produce over-fitted models, i.e., models that fit the training data well, but are unreliable in prediction. PLS forms new variables, named principal components, as linear combinations of the initial variables and then uses them as predictors of the dependent variable, TAP binding affinity in our case. The additive model was assessed using the correlation coefficient r2, and its predictive power was validated by the two external test sets and presented as rpred, the correlation coefficient between the predicted and experimentally derived relative IC50 values.
| Results |
|---|
|
|
|---|
The TAP additive model is shown in Table I. Two principal components explain 99.9% of the variance in the set. Positive and negative coefficients indicate amino acids that make positive and negative contributions to TAP binding affinity, respectively. The most positive coefficient belongs to Phe at p9, followed by Phe, Tyr, and Trp at p3. The most negative value corresponds to Ser at p9, followed by Pro at p2, and Asp and Gly at p9.
|
External validation
Two sets were used to test the predictive ability of the additive model. Both of them were taken from Ref. 17 , but because the reference peptides used in the calculation of relative IC50 values were different, they are given as two separate test sets: A and B. Test set A included 47 peptides, and test set B comprised 38 peptides. The binding affinities were calculated by the additive model and are presented as the logarithm of the relative IC50 value (IC50test/IC50reference). The correlation between the predicted and the experimental logIC50relative (rpred) was used to assess model predictability. The correlation graphs for both sets are presented in Fig. 1. Set A has an rpred of 0.717 (Fig. 1, top panel), and set B has an rpred of 0.832 (Fig. 1, bottom panel). The high predictive ability of the TAP additive model confirmed the applicability of the additive method for TAP binding affinity prediction.
Comparison with other quantitative matrices
Peters et al. (18) generated a consensus scoring matrix for TAP binding affinity prediction. The matrix elements represent log(IC50) values for TAP binding that add up to the log(IC50) value of the peptide. A low matrix entry at a given position corresponds to an amino acid well suited for TAP binding. Bhasin and Raghava (19) developed a quantitative matrix using a support vector machine-based method to model the TAP binding affinity of peptides. These two quantitative matrices were compared with the matrix generated by the additive method in the present study. The correlation coefficients are given in Table II. Good correlation was found between Peters and additive matrices (r = 0.838). The negative value of the correlation coefficient comes from the TAP affinity presentation; in Peters model, affinity is presented as log(IC50), but in the additive model affinity it is given as log(IC50). Correlation coefficients for positions 1, 2, 3, 7, and 9 were higher than 0.9. Moderate correlations were found between the matrix of Bhasin and Raghava and that of Peters and between Bhasin and Raghavas and additive matrices (r = 0.683 and r = 0.594, respectively). Again, the highest correlation coefficients were found for positions 1, 2, 3, and 9. The prediction ability of TAP quantitative matrices was similar for the anchor positions and differs only for the nonanchors.
TAP preselection of MHC binders
Many studies have suggested that the peptide selectivity exhibited by TAP binding may contribute to epitope selection (18, 31, 32, 33). To assess quantitatively the impact of this preselection for alleles that are strongly or weakly TAP dependent, two ROC analyses were performed. HLA-A*0201 was considered a representative of alleles that receive peptides via both TAP-dependent and TAP-independent mechanisms. This set included 317 A*0201 binders and 239 MHC nonbinders (Fig. 2A). For fully TAP-dependent alleles, HLA-A*0301 was selected. This set included 76 A*0301 binders and 237 MHC nonbinders (Fig. 2B). TAP and HLA (23) additive scoring functions were applied to predict binders and nonbinders. In both cases, HLA scoring functions gave better predictions than TAP scoring functions. However, TAP scores were better for fully TAP-dependent A*0301 than for partially TAP-dependent A*0201 (AROC = 0.874 vs AROC = 0.721). As is evident from the number of false and true negatives at different TAP cutoffs (Table III), a lower TAP cutoff (logIC50, <3.00) is recommended for A*0201 peptides preselection than for A*0301 (logIC50, <5.00). Increasing the TAP cutoff drastically increased the number of false negatives for A*0201, but did not affect the number of false negatives for A*0301. Thus, a TAP cutoff of 3.00 eliminated only 24 of the nonbinders (10%) for A*0201, whereas at a TAP cutoff of 5.00, 80 A*0301 nonbinders (33%) were eliminated. Unsurprisingly, TAP preselection was more efficient for fully TAP-dependent alleles than for partially TAP-dependent alleles.
|
|
|
| Discussion |
|---|
|
|
|---|
Because TAP transport precedes HLA binding, a conflict will only arise between positions that are deleterious for TAP binding but preferred for HLA binding, not between TAP-preferred and HLA deleterious positions. The absolute sum of contributions (Table I) indicates that positions 1, 2, 3, and 9 exhibit the greatest variation in amino acid preference. It is widely assumed that positions 2 and 9 are the primary anchors, and positions 1 and 3 are secondary anchors for MHC binding. Supermotifs for certain major MHC class I supertypes are shown in Table IV, together with those amino acids that are preferred for TAP at the same positions. Most HLA alleles prefer peptides with hydrophobic or aromatic amino acids at their C termini; only the A3 binding motif has positively charged amino acids (Arg or Lys) in this study. Phe, Tyr, and Trp are the preferred amino acids at the C terminus of TAP binding peptides, whereas Arg makes a small positive contribution, and Lys makes a negligible contribution (Table I). Ile, Leu, and Val exhibit moderate negative values (<0.4 log units). Ser, Asp, Gly, Asn, Thr, and Glu are all detrimental for TAP binding, and this provides a possible explanation of why few human class I MHC ligands have these amino acids at their C termini.
There is a great variety of preferred amino acids at anchor position 2 in HLA motifs (Table I). A2 and A3 supertypes prefer hydrophobic amino acids, A24 prefers aromatic, B7 prefers Pro, B27 prefers positively charged amino acids, and B44 prefers negatively charged ones. All these amino acids make positive contributions to TAP binding (Table I), except for Pro and Asp. At position 2, Pro is a preferred anchor for B7, whereas Asp is preferred for B44. These are the only points of conflict between TAP and HLA binding preferences. The deleterious effects of Pro and Asp suggest that ligands with Pro and Asp at position 2 are unlikely to be transported into the lumen of the ER via a TAP-dependent mechanism. Fortunately, these ligands often bear Phe and Tyr at their C termini, which are strongly preferred by TAP, indicating a potential compensating effect for Pro and Asp.
Position 1 is the next most sensitive position for TAP binding after position 9. It is thought to be a secondary anchor for MHC binding, and the side chain occupies pocket A (34). However, the TAP_19 model (AROC = 0.563) suggests that this position is not overly important for TAP transport. Additionally, the highest negatively contributing amino acids for TAP affinity, Glu and Asp, are common at position 1 in many HLA ligands (35).
Phe, Tyr, and Trp at position 3 have the highest positive contribution to TAP binding after Phe9, whereas Asp and Gly contribute negatively. The side chain at position 3 occupies pocket D in the MHC binding groove, and it is thought to be an important secondary anchor (36). A wide range of amino acids, including Asp and Gly, are available at this position in different MHC ligands, which point to the moderate importance of this position for TAP transport.
Weak amino acid contributions to TAP binding were seen at positions 4, 5, 6, 7, and 8 (Table I). Similar results have been reported by others (18, 31, 32). The primary interaction of TCR is with residues 58 of a class I MHC binding nonapeptide (36). Thus, Ag recognition by a TCR is in the region of the peptide where TAP exerts minimal selection. Moreover, TAP transport is only one part of the complexity inherent in the emerging picture of class I presentation (6, 37).
MHC class I presentation has long been assumed to be, broadly speaking, a linear process beginning with the ubiquitin-mediated transport of proteins to the proteasome, a multimeric protease responsible for digestion of most cytosolic protein (38); this step is followed by binding to TAP. However, this simple picture is becoming more complicated by the day. Peptides are unfolded and cleaved by cytosolic proteases other than the proteasome. Tripeptidyl peptidase II is the only properly characterized such proteolytic enzyme (39), but may be one of many more. Peptides generated by the proteasome or tripeptidyl peptidase II may be degraded by cytosolic proteases such as leucine aminopeptidase (40) and thimet oligoendopeptidase (41). Peptides entering the ER pool are trimmed by ER-associated aminopeptidase (42) and other proteases, such as furin (43), leukocyte-derived arginine aminopeptidase-1 (44), or puromycin-resistant aminopeptidase (45), before binding to MHCs. The process can be decomposed into a set of peptide cleavage and peptide binding steps, each of which will require modeling using bioinformatic techniques, such as our additive method, which can predict the peptide specificity of each stage (where peptides are cleaved) and associated on- and off-rates (how rapidly peptide cleavage occurs). Because the TAP additive model considers the last nine positions before the C-terminal, it is applicable to any peptide extended at the N terminus. N termini extensions do not affect the TAP binding affinity prediction, which is a particular advantage of quantitative methods. However, because of the inherent complexity of immunological presentation, one cannot account for its dynamic behavior solely using bioinformatic methods to predict individual steps. Rather, we will need to supplement them with well-understood mathematical models, similar to those used in metabolic control theory. These methods, which can account for substrate fluxes within complex multicomponent metabolic pathways, allow ready incorporation of quantitative aspects of bioinformatic models and may help us better understand why particular peptide Ags become immunodominant (46). In particular, it is clear that peptide transport into the ER proceeds via both TAP-mediated and TAP-independent pathways. Because there are no clear sequence signals differentiating peptides transported via these several mechanisms, and the molecular components of TAP-independent pathways are not yet understood, it will require both experimentation to identify and characterize other transporters as well as in silico informatics techniques to produce robust methods able to reliably predict the complexities of Ag presentation.
Our current quantitative TAP model is our first attempt to address this wider challenge: the modeling of the complete class I presentation pathway. Overall, TAP selection is neither overly precise nor restrictive, allowing a wide variety of peptides to be transported into the ER. Using TAP preselection in binding affinity prediction methods could reduce the number of nonbinders by one-sixth to one-third.
| Acknowledgments |
|---|
| Footnotes |
|---|
1 Address correspondence and reprint requests to Dr. Darren R. Flower, Edward Jenner Institute for Vaccine Research, Compton, Berkshire, U.K. RG20 7NN. E-mail address: darren.flower{at}jenner.ac.uk ![]()
2 Abbreviations used in this paper: ER, endoplasmic reticulum; PLS, partial least squares; ROC, receiver operating characteristic. ![]()
Received for publication April 23, 2004. Accepted for publication May 30, 2004.
| References |
|---|
|
|
|---|
. 1995. MHC ligands and peptide motifs: first listing. Immunogenetics 41:178.[Medline]
This article has been cited by other articles:
![]() |
A. Hearn, I. A. York, and K. L. Rock The Specificity of Trimming of MHC Class I-Presented Peptides in the Endoplasmic Reticulum J. Immunol., November 1, 2009; 183(9): 5526 - 5536. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Feldhahn, P. Donnes, P. Thiel, and O. Kohlbacher FRED--a framework for T-cell epitope detection Bioinformatics, October 15, 2009; 25(20): 2758 - 2759. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Schatz, B. Peters, N. Akkad, N. Ullrich, A. N. Martinez, O. Carroll, S. Bulik, H.-G. Rammensee, P. van Endert, H.-G. Holzhutter, et al. Characterizing the N-Terminal Processing Motif of MHC Class I Ligands J. Immunol., March 1, 2008; 180(5): 3210 - 3217. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Lundegaard, O. Lund, C. Kesmir, S. Brunak, and M. Nielsen Modeling the adaptive immune system: predictions and simulations Bioinformatics, December 15, 2007; 23(24): 3265 - 3275. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |