Abstract
The classical HLA-C and the nonclassical HLA-E and HLA-G molecules play important roles both in the innate and adaptive immune system. Starting already during embryogenesis and continuing throughout our lives, these three Ags exert major functions in immune tolerance, defense against infections, and anticancer immune responses. Despite these important roles, identification and characterization of the peptides presented by these molecules has been lacking behind the more abundant HLA-A and HLA-B gene products. In this study, we elucidated the peptide specificities of these HLA molecules using a comprehensive analysis of naturally presented peptides. To that end, the 15 most frequently expressed HLA-C alleles as well as HLA-E*01:01 and HLA-G*01:01 were transfected into lymphoblastoid C1R cells expressing low endogenous HLA. Identification of naturally presented peptides was performed by immunoprecipitation of HLA and subsequent analysis of HLA-bound peptides by liquid chromatographic tandem mass spectrometry. Peptide motifs of HLA-C unveil anchors in position 2 or 3 with high variances between allotypes, and a less variable anchor at the C-terminal end. The previously reported small ligand repertoire of HLA-E was confirmed within our analysis, and we could show that HLA-G combines a large ligand repertoire with distinct features anchoring peptides at positions 3 and 9, supported by an auxiliary anchor in position 1 and preferred residues in positions 2 and 7. The wealth of HLA ligands resulted in prediction matrices for octa-, nona-, and decamers. Matrices were validated in terms of their binding prediction and compared with the latest NetMHC prediction algorithm NetMHCpan-3.0, which demonstrated their predictive power.
This article is featured in In This Issue, p.2609
Introduction
The MHC is a polygenic and polymorphic segment on human chromosome 6 that encodes histocompatibility Ags including the classical (or class Ia) and nonclassical (or class Ib) MHC molecules (in humans also called HLA). HLA-A, HLA-B, and HLA-C belong to the classical MHC molecules, which display a high degree of polymorphism. In contrast, HLA-E and HLA-G are considered nonclassical MHC molecules showing limited polymorphism. Similar to classical MHC molecules, HLA-E and HLA-G are heterodimers, consisting of a heavy α-chain and β2-microglobulin, and take part in the peptide-presentation pathway. HLA-C, -E and -G share the ability to interact with NK cell receptors as well as TCRs, thereby bridging between innate and adaptive immunity.
Within the classical HLA molecules, HLA-C plays a special role in the interaction with NK cells. This feature manifests itself in the unusually conserved α1 domain (1) that, in combination with a generally less polymorphic region in the α2 domain, shapes the binding site of killer cell Ig-like receptors (KIRs). Compared to HLA-A and HLA-B, HLA-C shows a lower expression level at the cell surface and represents only ∼10% of classical MHC molecules. HLA-C allotypes have been implicated in many diseases, including viral infections, cancer, and autoimmune disorders, with HLA-C–restricted epitopes recognized by either CTLs or NK cells. One of the most frequent cancer mutations, KRAS G12D, has recently been shown to be presented by HLA-C*08:02. Moreover, the corresponding epitope is able to induce T cell responses in cancer patients, which can be harnessed for adoptive-transfer immunotherapy (2).
Many genetic associations of HLA-C alleles with several diseases have been reported, ranging from increased protection to higher susceptibility for a certain disease (3). Last but not least, HLA-C expression on extravillous trophoblasts plays a central role in the development and tolerance of the fetus during pregnancy by interacting with maternal NK cells (4).
Peptide motifs of HLA-C were first based on pool sequencing and few individual sequences (5). The first high-throughput approach to determine the binding specificities of a larger set of HLA-C alleles was conducted by Rasmussen et al. (6) applying an in vitro peptide–HLA class I dissociation assay with synthetic peptides. By using this approach, binding motifs for 16 HLA-C allotypes were uncovered, although often with less-pronounced anchor residues.
The nonclassical HLA-E has been implicated in the presentation of MHC class I leader peptides (7, 8). Its expression level is dependent on the HLA class Ia expression level, and previous reports suggest it to be around 5% of the HLA-C expression (9). The HLA-E–peptide complex acts as ligand for the family of CD94/NKG2 receptors expressed predominantly on NK cells, but also on a subset of CD8+ T cells (10, 11). Both the KIR and CD94/NKG2 receptor family sense changes in HLA expression by interacting with HLA-C or HLA-E, respectively. Whereas the conserved HLA-E–CD94/NKG2 system seems to be specialized in sensing HLA expression levels, polymorphic KIRs are able to detect early changes in the peptide repertoire presented on classical HLA, especially HLA-C (12–14). The HLA-E–CD94/NKG2 interaction has also been associated with fetal–maternal tolerance through inhibition of uterine NK cells by HLA-E–expressing, extravillous trophoblasts (15). In addition to the presentation of MHC class I leader peptides, HLA-E is able to present pathogenic epitopes to CTLs (16, 17). However, the peptide binding pocket of HLA-E is highly hydrophobic and thus especially adapted for binding of HLA class I leader peptides. This unusual hydrophobicity within the binding pockets may further restrict the peptide repertoire. In fact, only few peptides could be shown to be presented in vivo by HLA-E.
The nonclassical HLA-G is mainly expressed on fetal tissue exerting a major tolerogenic function and promoting fetal development (18). In adults, expression of HLA-G is found on immune-privileged organs, including cornea, thymus, pancreatic islets, endothelial, and erythroblasts. In addition, dendritic cells and macrophages may also express HLA-G (19). Moreover, expression can be induced during various diseases, including cancer, viral infections, inflammatory diseases, or autoimmune disorders, mainly as an escape strategy to avoid immune recognition. Due to the checkpoint function, HLA-G is considered an attractive target for anticancer treatment using blocking Abs (20). In contrast, HLA-G expression in transplants is associated with better tolerance of the graft (21). HLA-G interacts with different inhibitory receptors such as Ig-like transcript 2 (ILT2) expressed by B cells, subsets of NK and T cells, monocytes, and dendritic cells (22); ILT4, which is solely expressed by monocytes and dendritic cells (23); and KIR2DL4, which is expressed mainly on NK cells (24). Compared to HLA-E, the peptide repertoire of HLA-G is larger but less complex than the peptide repertoire of MHC class Ia molecules (25). The peptide motif of HLA-G was first defined by Diehl et al. (26) from a small set of naturally eluted and pool-sequenced peptides exhibiting anchors at position 2 (isoleucine or leucine), position 3 (proline), and position 9 (leucine).
Considering the high importance of HLA-C, -E, and -G in many immunological processes, the clarification of ligand characteristics of these HLA molecules is of great relevance. In this study, peptide motifs were unveiled via comprehensive analyses of naturally presented HLA ligands. HLA-presented peptides were analyzed by liquid chromatographic tandem mass spectrometry (LC-MS/MS) after immunoprecipitation of HLA molecules from transfected C1R cells. The EBV-transformed lymphoblastoid C1R cell line is well suited for this approach due to a functional Ag presentation pathway and low endogenous HLA expression (27, 28). We had applied this approach previously for monoallelic motif determinations (29–37), and it was also used more recently by Abelin et al. (38). We used this approach for the 15 most frequent HLA-C alleles. To our knowledge, we comprehensively analyzed, for the first time, the peptide pool presented by the nonclassical HLA molecules HLA-E and HLA-G. All analyzed HLA-C allotypes as well as HLA-G binding motifs were generated by Gibbs clustering (39). SYFPEITHI (40) matrices were subsequently created for octa-, nona-, and decamers, and their predictive power has been analyzed in comparison with NetMHCpan-3.0 (41).
Materials and Methods
DNA vectors
42) for C*01:02:01, C*02:02:01, C*03:03:01, C*03:04:01:01, C*04:01:01:01, C*05:01:01:01, C*06:02:01:01, C*07:01:01:01, C*07:02:01:01, C*08:02:01:01, C*12:03:01:01, C*14:02:01, C*15:02:01, C*16:01:01, C*17:01:01:01, E*01:01:01:01, and G*01:01:01:01. Codon usage was adapted to the codon bias of Homo sapiens genes without changing the protein sequence.
Vectors were linearized by mixing 50 μg plasmid DNA with 50 μl CutSmart Buffer (New England Biolabs, Ipswich, MA), 10 μl PvuI-HF (20,000 U/ml; New England Biolabs), and 390 μl double dH2O and incubating for 2 h at 37°C. Complete linearization was confirmed by agarose gel electrophoresis. DNA was extracted by phenol (Sigma-Aldrich, St. Louis, MO)/chloroform (Merck, Darmstadt, Germany)/isoamyl alcohol (Sigma-Aldrich) and precipitated by adding 1/10 vol of 3 M sodium acetate (Roth, Karlsruhe, Germany) and 2.5 vol 100% ethanol (VWR Chemicals, Radnor, PA). The linearized vector was frozen for 2 h at −80°C and then centrifuged at 13,000 rpm for 30 min at 4°C. The supernatant was removed and the pellet was dried under sterile conditions. The DNA pellet was dissolved in 40 μl sterile Ampuwa water and the concentration was determined by Nanodrop at 260 nm (NanoDrop 1000 Spectralphotometer; Peqlab, Erlangen, Germany).
Transfection and selection
6 cells/ml. For transfection, 500 μl mycoplasma-free cell suspension was mixed with 10 μg linearized plasmid DNA in a Gene Pulser electroporation cuvette (0.4-cm gap; Bio-Rad, Hercules, CA). Electroporation was conducted using the Gene Pulser II (Bio-Rad) at 250 V and 975 μF. Afterward, cells were incubated in 75-cm2
HLA expression and cell sorting
HLA cell surface expression was verified by flow cytometry. For this purpose, 1 × 106 cells were washed with FACS buffer consisting of 2% FBS with 2 mM EDTA (Roth) in PBS (Lonza, Basel, Switzerland) and transferred into a 96-well plate (Greiner Bio-One, Kremsmünster, Austria). After an additional wash, cells were incubated with 100 μl of 20 μg/ml of pan-HLA class I–specific monoclonal W6/32 Ab (in-house production) (43
For intracellular staining of the C1R–E*01:01 transfectant, cells were fixed with 100 μl Cytoperm/Cytofix solution (BD Biosciences) for 20 min prior to incubation with the respective Abs. For cell wash, 2% FBS, 2 mM EDTA, 0.1% saponine (AppliChem, St. Louis, MO), and 0.5% BSA (Roth) in PBS was used.
Cell sorting
Cell populations showing high expression of HLA were sorted using a BD FACSJazz Cell Sorter (BD Biosciences) following the HLA cell surface staining procedure.
Cell harvest
Cells were cultured up to an amount of 2.5 × 109 cells and harvested by centrifugation at 1500 rpm for 15 min at 4°C. After two washing steps with cold PBS, cells were collected in a 50 ml centrifugation tube and frozen at −80°C.
Isolation of HLA ligands by immunoaffinity purification
HLA class I molecules were isolated using standard immunoaffinity purification as described previously (44, 45). In brief, cell pellets were lysed in 10 mM CHAPS (Applichem)/PBS (Lonza) containing protease inhibitor (Complete; Roche, Basel, Switzerland). HLA molecules were purified employing the pan-HLA class I–specific monoclonal W6/32 Ab, covalently linked to CNBr-activated Sepharose (GE Healthcare, Little Chalfont, U.K.). HLA–peptide complexes were eluted by repeated addition of 0.2% trifluoroacetic acid (Merck). Elution fractions E1–E8 were pooled and HLA ligands were separated from larger molecules by ultrafiltration using centrifugal filter units (Amicon; Merck Millipore). HLA ligands were extracted and desalted using ZipTip C18 pipette tips (Merck Millipore). Extracted peptides were eluted in 35 μl of acetonitrile (Merck)/0.1% trifluoroacetic acid, vacuum centrifuged to complete dryness, and resuspended in 25 μl of 1% acetonitrile/0.05% trifluoroacetic acid. Samples were stored at −20°C until analysis by LC-MS/MS.
Analysis of HLA ligands by LC-MS/MS
TopSpeed method. Survey scans were generated in the Orbitrap at a resolution of 120,000. Precursor ions were isolated in the quadrupole, fragmented by collision-induced dissociation in the ion trap, and finally fragment ions were recorded in the Orbitrap. Mass range was limited to 400–650 m/z with charge states 2+ and 3+ selected for fragmentation.
Database search and spectral annotation
Data was processed against the human proteome included in the Swiss-Prot database (http://www.uniprot.org, release September 27, 2013; containing 20,279 reviewed protein sequences) applying the SequestHT algorithm (46) in the Proteome Discoverer (version 1.3; Thermo Fisher) software. Precursor mass tolerance was set to 5 ppm and fragment mass tolerance to 0.02 Da. The search was not restricted to an enzymatic specificity. Oxidized methionine was enabled as a dynamic modification. Percolator (47)-assisted false discovery rate (FDR) calculation was set at a target value of q ≤ 0.05 (5% FDR). Peptide-spectrum matches with q ≤ 0.05 were filtered according to additional orthogonal parameters to ensure spectral quality and validity. Peptide lengths were limited to 8–12 aa.
HLA ligands annotation, length distribution, ligand, and source proteome overlap
Due to endogenous expression of HLA-B*35:03 and HLA-C*04:01 in C1R cells, isolated HLA ligands of these allotypes had to be excluded from further analysis to allow for identification of HLA ligands of the transfected allele. GibbsCluster 1.1 (39) is an unsupervised way to cluster peptides according to their sequence similarity. For each transfectant, clustering of nonameric peptides was carried out. Nonamers represent the most abundant length variant in all analyzed alleles. The number of clusters was set to 1–3. A “trash cluster” with a threshold of 0 was incorporated to remove outliers. Sequence weighting type was set to “Clustering.” The default settings were used for all other options. The peptide motifs of HLA-B*35:03 and HLA-C*04:01 were previously described (48, 49) and could be confirmed performing exemplarily Gibbs clustering of some HLA-B*35:03– or HLA-C*04:01–positive samples of our in-house database containing different samples and corresponding HLA typings. Thus, clusters of these two allotypes could be well distinguished from the previously undefined analysis cluster that was assigned to the transfected HLA. The transfected HLA cluster was visualized employing Seq2Logo 2.0 (50) and Kullback–Leibler logotype using default settings. Anchor and auxiliary anchor positions were defined based on respective nonamer clusters that were assigned to the transfected HLA and subsequently adopted for octa- and decamers. This workaround was necessary because a clear distinction of all three expressed allotypes was not possible in all cases due to low peptide count and a higher proportion of non-HLA peptides (unsupervised clusters show combinations of transfected HLA, HLA-B*35:03, and HLA-C*04:01 motifs). With the exception of HLA-C*01:02, peptide anchor residues did not differ over the different length variants and clusters for octa- and decamers showed no obvious difference to the nonamer cluster. Peptides possessing anchor residues of the assigned transfected HLA cluster were selected from the initial peptide list for 8- to 11-mers and were defined as ligands. SYFPEITHI matrices were determined for 8- to 10-mers using frequencies of amino acids at each position from defined ligands according to established procedures (40). Length distribution was calculated including 8- to 11-mer ligands. Ligand overlap was determined using the 500 highest expressed ligands of each allele, defined by the sum of all precursor areas in all five technical replicates. Source proteome overlap was determined using the source proteins of the respective top 500 presented ligands.
Validation of SYFPEITHI matrices
For SYFPEITHI matrix validation, a k-fold (k = 5) cross-validation was used (51). For this purpose, peptide lists of each transfected allotype were randomly split into five equal folds, whereby four folds were used as training data sets to determine a SYFPEITHI matrix applying the GibbsCluster approach described above. The fifth fold was used for evaluation of the matrix. Clustering was performed on the fifth fold and peptides in the transfected HLA cluster were defined as true binders, whereas peptides in the other clusters and outliers were defined as false binders for the transfected HLA. Evaluation was performed exemplarily for one nonamer evaluation data set. Receiver operating characteristic (ROC) curve analysis was conducted to visualize the performance. Area under the curve (AUC) was calculated for each ROC curve. For comparison with NetMHCpan-3.0, commonly used thresholds were set to decide whether a peptide is defined as a binder or not. For SYFPEITHI, a threshold of ≥60% of the maximal score (defined by the sum of the highest possible scores in each position of the peptide) was set, and for NetMHCpan-3.0 a threshold of rank <2 was employed.
Results
HLA expression of transfected C1R cells
HLA expression of transfected C1R cells was analyzed by flow cytometry using the pan-HLA class I–specific Ab W6/32. Untransfected C1R cells were included as a negative control to distinguish expression of the transfected HLA from endogenous HLA-B*35:03 and HLA-C*04:01 expression. All transfectants, except C1R–HLA-E*01:01, demonstrated expression of the transfected HLA at the cell surface (Supplemental Fig. 1). C1R–HLA-E*01:01, stained by either W6/32 or HLA-E–specific Ab 3D-12, exhibited no cell surface expression of transfected HLA-E*01:01. However, intracellular staining of C1R–HLA-E*01:01 with 3D-12 Ab revealed the presence of intracellular pools of HLA-E*01:01. Furthermore, PCR of isolated plasmid DNA and subsequent sequencing of the HLA-E*01:01 locus confirmed the persistence of the transfected gene as well as the correct sequence (data not shown). Because the C1R cell line is HLA-E*01:03+ (52), which has a higher affinity to MHC class Ia leader peptides (53), this might explain the missing expression of transfected HLA-E*01:01 due to a lack of sufficient leader peptides. However, neither HLA-E*01:01 nor HLA-E*01:03 could be detected on the cell surface by flow cytometry, which in turn might be due to the overall low expression of endogenous HLA (Supplemental Fig. 1C). For all remaining transfected HLA, cell surface expression was sufficient for subsequent characterization of naturally processed and presented HLA ligands.
Peptide motifs of HLA-C
Peptides were obtained after immunoaffinity chromatography of HLA molecules from cell lysates. After separation by reversed-phase liquid chromatography, peptides were analyzed by mass spectrometry. GibbsCluster 1.1 (39) was used to separate ligands of the transfected HLA from those of endogenously expressed alleles in an unbiased manner (Supplemental Fig. 2). For HLA-C*05:01 and HLA-C*08:02, clustering revealed a similar motif to the endogenously expressed HLA-C*04:01. To avoid cross-contamination within the groups, clustering was repeated after exclusion of all peptides extracted from the C1R–HLA-C*04:01 transfectant.
In total, 392–3,463 ligands could be identified for the respective HLA transfectants possessing the anchor amino acids defined by clustering of nonamers (Table I, Supplemental Table I). Fig. 1 displays the peptide motifs of the 15 analyzed HLA-C molecules. All HLA-C allotypes share a hydrophobic C-terminal anchor position with differences in the preferred amino acid residues. This varies from aliphatic residues, such as valine or leucine in HLA-C*15:02, to aromatic residues phenylalanine and tyrosine in HLA-C*02:02. Most allotypes accept multiple hydrophobic or aromatic anchor residues at the C terminus, whereas a few have a clear preference for a single amino acid (e.g., leucine in HLA-C*01:02, -C*03:03/04, or -C*17:01). The frequency of aromatic residues correlates with the polymorphism at position 116 within the HLA molecules (Table II) (54, 55). Allotypes with a serine at position 116 more often favor aromatic residues at the C-terminal position of the peptide, whereas phenylalanine, tyrosine, or leucine at position 116 may interfere with the binding of aromatic residues. Eleven of fifteen HLA-C allotypes accept a second anchor shaped by peptide residues at position 2 (HLA-C*02:02, -C*03:03, -C*03:04, -C*06:02, -C*07:01, -C*07:02, -C*12:03, -C*14:02, -C*15:02, -C*16:01, and -C*17:01), whereas residues at position 3 constitute the second anchor for four of 15 HLA-C alleles (HLA-C*01:02, -C*04:01, -C*05:01, and -C*08:02). In contrast to small variations with regard to the C-terminal anchor residues, preferred residues at position 2 or 3 display a high degree of variability. A unique preference of proline in position 3 is favored by HLA-C*01:02. Small aliphatic or hydrophilic residues at position 2 constitute the anchor of HLA-C*02:02, -C*03:03, -C*03:04, -C*12:03, -C*15:02, -C*16:01, and -C*17:01. All of these allotypes possess a tyrosine at position 9, which may inhibit binding of larger anchor residues (54). Of note, six of them favor large aromatic residues at position 1, which may support the interaction provided by the small anchor residue at position 2. Only HLA-C*15:02 displays preferences for basic residues at position 1, which are also able to support the binding of the peptide. This may be feasible due to an asparagine at position 66 instead of a lysine, which constitutes this position in most allotypes. Differences in peptide specificities are marginal within the HLA-C*03 subtypes. Acidic residues at position 3 form the anchor for HLA-C*04:01, -C*05:01, and -C*08:02. All three molecules combine an asparagine at position 114 and an arginine at position 156. The arginine may serve for electrostatic interaction, whereas the asparagine at position 114 instead of aspartic acid may enable the binding of an acidic residue. Whereas HLA-C*04:01 has a clear preference for aromatic residues at position 2, HLA-C*05:01 and -C*08:02 accept only small residues at this position. An explanation for this may be the phenylalanine at position 9 of HLA-C*05:01 and -C*08:02 reduces the space to accommodate larger residues (54). HLA-C*04:01 possesses a serine at this position, which may enable the binding of large aromatic residues. Basic anchor residues at position 2 are preferred by HLA-C*06:02, -C*07:01, and -C*07:02. This may be explained by the aspartic acid at position 9 of HLA-C*06:02, -C*07:01, and -C*07:02. Major differences in the peptide specificities of the HLA-C*07 subtypes HLA-C*07:01 and -C*07:02 are revealed at position 1 and the anchor position 2. Both subtypes prefer arginine as anchor residue, whereas alternatively accepted anchor residues are threonine or asparagine for HLA-C*07:01 and tyrosine or lysine for HLA-C*07:02. The tyrosine in position 2 of HLA-C*07:02 ligands may be accepted due to a serine at position 99, where an aromatic residue is usually located. Further, HLA-C*07:01 favors basic residues at position 1, whereas HLA-C*07:02 does not have such a preference. As for HLA-C*15:02, the preference for basic residues at position 1 of HLA-C*07:01 ligands can be explained by an asparagine at position 66. Unique to HLA-C*14:02 is its preference for aromatic residues at anchor position 2. Again, a serine at position 9 instead of an aromatic amino acid which is generally placed at this position may enable binding of large aromatic residues. Further allotypes favoring aromatic residues at anchor position 2 are HLA-A*23 and -A*24, which also possess a serine at position 9. Auxiliary anchors (defined by a percentage share of >50% of amino acids with similar features) are located at position 1 of HLA-C*03:03, -C*03:04, and -C*17:01 ligands and at position 2 of HLA-C*04:01 ligands, with a preference for aromatic residues. Remarkable is the higher frequency of aromatic residues at position 5 and 7 of HLA-C*07:01 and -C*07:02 ligands and at position 8 of HLA-C*17:01 ligands, which may be explained by a leucine at position 147 instead of a tryptophan situated in this position in the other allotypes. The preference for aromatic residues at positions 5 and 7 of HLA-C*07:01 and -C*07:02 ligands may also be explained by an alanine at position 152, which may provide a larger pocket for residues at positions 5 and 7 of the ligands, a feature not shared by HLA-C*17:01. Exceptional to HLA-C*01:02 is its change at the anchor position 3 with proline for octameric and nonameric HLA ligands to a shared anchor with aliphatic residues at position 2, and proline, serine, or histidine at position 3 for longer ligands. In sum, peptide motifs of all analyzed HLA-C molecules could be identified and are in agreement with our knowledge of allotype-specific pocket characteristics. All HLA allotypes that have been analyzed in this study prefer nonameric ligands with frequencies varying from 62.5 to 91.3% (Fig. 2). Octamer frequency ranges from 4.9 to 25.0%. Decamers and undecamers were less frequent with 2.9–17.3% or 0.4–7.1%, respectively.
Sequence logos of the clusters corresponding to the transfected allotype visualized using Seq2Logo 2.0 (50). The size of the letter indicates the impact of the corresponding amino acid, presented by a given position in either positive or negative fashion. Black, aliphatic residues; gray, aromatic residues; green, hydrophilic residues; blue, basic residues; red, acidic residues.
Length distribution of HLA-C and HLA-G ligands.
HLA-C in the context of the supertype concept
The concept of grouping HLA allotypes into supertypes depending on their main anchor specificities was introduced in 1995 (36, 56). In sum, nine supertypes could be defined covering most of the HLA-A and HLA-B alleles: HLA-A*01, -A*02, -A*03, -A*24, -B*07, -B*27, -B*44, -B*58, and -B*62 (57, 58). In 2004, Doytchinova et al. (59) applied a bioinformatics approach based on structural similarities between allotypes, also integrating the HLA-C alleles. Using this strategy, two HLA-C supertypes could be defined, named C1 and C4. Supertype C1 was defined by a serine or glycine at position 77, whereas C4 supertypic allotypes possess an asparagine at this position. Allotypes from our study belonging to the C1 supertype are HLA-C*01:02, -C*03:03, -C*03:04, -C*07:02, -C*08:02, -C*12:03, -C*14:02, and -C*16:01; whereas HLA-C*02:02, -C*04:01, -C*05:01, -C*06:02, -C*07:01, -C*15:02, and -C*17:01 belong to the C4 supertype. However, this definition is not in line with the peptide motifs of HLA-C allotypes unveiled in this study (Fig. 1). Considering the peptide motifs of HLA-C, we now propose a new categorization into five groups. Three of these groups may be integrated into HLA-A and HLA-B supertypes (HLA-C*02:02, -C*03:03, -C*03:04, -C*12:03, -C*15:02, -C*16:01, and -C*17:01 into the A*01, B*58, or B*62 supertype; HLA-C*14:02 into the A*24 supertype; and HLA-C*06:02, -C*07:01, and -C*07:02 into the B*27 supertype). Allotypes with an anchor at position 3 may deserve additional supertype definitions. A C*01 supertype with proline at position 3 and aliphatic residues at the C terminus may account for the uniqueness of HLA-C*01. A C*04 supertype would integrate HLA-C*04:01, -C*05:01, and -C*08:02 into the supertype concept.
Characteristics of HLA-E and HLA-G ligands
HLA-E*01:01–transfected C1R cells present two MHC class I leader peptides, namely VMAPRTLIL derived from HLA-C*04:01 and VMAPRTLVL derived from HLA-A*02:01. The latter is to some extent surprising because there is no evidence for surface expression of HLA-A*02:01 in C1R (27, 60). VMAPRTLVL was detected in every C1R transfectant (note: C1R is HLA-E*01:03+), ensuring that it is not a false positive (FP) but most probably derived from a defective ribosomal product. Overall, three additional MHC class I leader peptides, VMAPRTLLL (HLA-C*02:02 and -C*15:02), VMAPRALLL (HLA-C*07:01 and -C*07:02), and VMAPRTLFL (HLA-G*01:01), were detected, which are presented by HLA-E*01:03. MHC class I leader peptides of HLA-B*35:03 (VTAPRTVLL) and HLA-C*17:01 (VMAPQALLL) are not presented by HLA-E*01:01 (note: only HLA-B*35:03 signal sequence could have been expressed on C1R-E*01:01) or the endogenously expressed HLA-E*01:03. This discrimination of peptides with one or two amino acid changes, mostly in positions contributing less to the interaction to HLA, illustrates the adaption of HLA-E in MHC leader peptide presentation and its restricted peptide repertoire.
HLA-G*01:01 reveals a marked peptide motif with anchors at position 3, composed of proline, isoleucine and valine, and at the C-terminal position (Ω) formed by leucine. An auxiliary anchor with lysine and arginine is shaped at position 1. Hydrophobic residue preferences show up at position 2 and position Ω-2 (Fig. 1). Contrary to HLA-E, HLA-G*01:01 exhibits a large peptide binding repertoire with 2258 identified ligands eluted solely from HLA-G*01:01–transfected C1R cells.
Ligand overlap
To look for ligand overlap across the analyzed allotypes, the top 500 most abundant ligands (defined by the sum of the AUC values of five technical replicates) of each HLA molecule were integrated. For HLA-C*07:01, only 392 ligands could be considered (Table III). The overlap in HLA-presented peptides among allotypes with clearly distinguishable peptide motifs (one or both anchor residues are different) was marginal with a maximal overlap of 2.46% between HLA-C*03:04 and HLA-C*08:02. Allotypes with consistent anchor residue preferences display higher overlap within the presented peptides, ranging from 3.20% between HLA-C*07:02 and HLA-C*14:02, and up to 11.98% between HLA-C*02:02 and HLA-C*12:03. Notably, HLA-C*05:01 and HLA-C*08:02 show a high overlap, sharing 27.39% ligands. The comparison to HLA-C*04:01 is limited due to endogenous expression of the allotype on C1R and its similarity to HLA-C*05:01 and -C*08:02. Because peptides from C1R-C*04:01 had to be excluded for ligand definition of HLA-C*05:01 and -C*08:02, the overlap is consequently zero. Nevertheless, overlap of HLA-C*05:01 and HLA-C*08:02 to HLA-C*04:01 should be markedly lower because HLA-C*04:01 favors large aromatic residues in position 2, whereas HLA-C*05:01 and HLA-C*08:02 prefer small residues. In fact, including intrinsic HLA-C*04:01 ligands (no exclusion of peptides of C1R-C*04:01 from C1R-C*05:01 and -C*08:02 peptide lists), the overlap is higher between HLA-C*05:01 and -C*08:02 with 40.45% compared with 26.26% between HLA-C*04:01 and -C*05:01 or 25.31% between HLA-C*04:01 and -C*08:02, respectively. High overlap is seen between the HLA-C*03 subtypes HLA-C*03:03 and HLA-C*03:04, with 42.86% of shared ligands. The HLA-C*07 subtypes HLA-C*07:01 and HLA-C*07:02 display a rather small overlap of 10.12% within their ligands compared with the HLA-C*03 subtypes, which can be explained by differences in the preferred residues in anchor position 2. In general, ligand overlap is marginal within HLA allotypes unless the same anchor residues are shared (61).
Source proteome overlap
Theoretically, all proteins within a cell may be used as source for peptide presentation. However, different factors such as source protein expression level, Ag processing, and transport efficiency and affinity of the peptide to the HLA and their stability may select for a smaller set of source proteins, which are presented by one allotype. The source proteins of the 500 most abundant ligands were selected for the overlap analysis of the source proteome (Tables IV, V) (62). The source protein overlap was the highest for allotypes with a high ligand overlap, which is obvious because the overlapping ligands derive from the same source protein. More interesting is the overlap of the source proteome added by nonoverlapping ligands. In fact, the additional overlap contributed by nonoverlapping ligands is comparable within all allotypes, independent of their peptide motifs, with a median increase of 5.42%. This allotype- and also subtype-independent low increase in the source proteome overlap displays the high diversification that is added by an additional HLA molecule.
Performance of established SYFPEITHI matrices
Identified peptides were used to establish SYFPEITHI matrices. Therefore peptides were clustered using GibbsCluster 1.1 (39). Anchor positions and residues were defined from clusters of the transfected HLA. Peptides harboring predefined anchor residues were defined as ligands and were used to establish SYFPEITHI matrices (Supplemental Fig. 2, Supplemental Table II). To examine the performance of the established SYFPEITHI matrices, k-fold cross-validation was performed (51). For that purpose, the initial peptide list of each analyzed transfectant was split into five parts. Four parts were used for clustering and subsequent definition of the SYFPEITHI matrix, whereas one part remained to validate the matrix. Peptides in the cluster corresponding to the transfected HLA were defined as true binders whereas peptides of the other clusters and peptides fitting to no cluster were defined as false binders. ROC points were calculated in 5% steps of the SYFPEITHI maximal score. AUC values extend from 0.88 for HLA-C*14:02 and HLA-C*17:01 to 0.97 for the HLA-C*01:02 nonamer matrix. Only the HLA-C*16:01 matrix performed less well with an AUC = 0.78 (Fig. 3). In conclusion, the performance of the matrices is excellent for discrimination of true and false binders within the data set.
ROC analysis of SYFPEITHI matrices of nonamers. Each point represents the TP- and FP-predicted ligands from applying SYFPEITHI thresholds in 5% steps from 0 to 100% of the maximal score.
Comparing SYFPEITHI and NetMHCpan-3.0 binding predictions
SYFPEITHI (40) and NetMHC (41, 63) are commonly used tools for HLA binding predictions. However, both prediction tools are based on different strategies. SYFPEITHI uses a position-based matrix scoring system that depends on amino acid frequencies at each position and the definition of anchor and auxiliary anchor positions using naturally eluted MHC ligands. In contrast, NetMHCpan-3.0 uses artificial neural networks that were trained on quantitative in vitro binding data of peptide–MHC class I complexes from the Immune Epitope Database (64). Thus, all ligands are also binders, but peptides identified to be binders in vitro are not necessarily natural ligands. To compare both prediction tools, peptides of each transfectant were split into true or false binders for the transfected HLA by clustering (peptides of the transfected HLA cluster = “true binders,” other peptides = “false binders”). Commonly used thresholds for binder definition were used with ≥ 60% of the maximal score for SYFPEITHI and rank <2 for NetMHCpan-3.0. The rate of FP- and true positive (TP)–predicted binders is illustrated in Fig. 4 for all analyzed allotypes, except for HLA-E*01:01. SYFPEITHI illustrates a powerful prediction with a high TP rate ranging from 0.66 to 0.91 and a low FP rate ranging from 0.00 to 0.22. Only the nonameric matrix for HLA-C*14:02 with an FP rate of 0.54 performed poorly, which can be explained by the motif similarities to HLA-C*04:01 (endogenously expressed by C1R) at position 2 (anchor and auxiliary anchor, respectively, preferring aromatic residues) and anchor position 9. NetMHCpan-3.0 exhibits higher TP rates between 0.71 and 0.97, but at the same time higher FP rates between 0.07 and 0.65. Similar to SYFPEITHI, the NetMHCpan-3.0 prediction for HLA-C*14:02 performs with an FP rate of 0.72. The highest disparity is seen for HLA-G*01:01 with a TP rate of 0.91 and an FP rate of 0.05 for SYFPEITHI, and a TP rate of 0.67 and an FP rate of 0.65 for NetMHCpan-3.0. Hence, NetMHCpan-3.0 displays a rather random prediction. In conclusion, the performance of the established SYFPEITHI matrices could be confirmed by comparison with NetMHCpan-3.0. SYFPEITHI outperformed with higher precision for all allotypes (Fig. 5).
Performance of SYFPEITHI matrices and NetMHCpan-3.0 prediction. Data set of each transfectant was divided into true and false binders to the respective HLA using unbiased clustering. Peptides in clusters of the corresponding transfected HLA were defined as true binders; peptides in clusters representing the endogenously expressed HLA molecules were defined as false binders of the transfected HLA. Peptides were defined as ligands with a SYFPEITHI score of ≥60% or NetMHCpan-3.0 rank <2.
Precision of SYFPEITHI and NetMHCpan-3.0. The precision is defined by the TP rate divided by the sum of TPs and FPs (= TP/[TP+FP]).
Discussion
It is not only the definition of the binding specificities of classical but often underestimated HLA-C alleles and the nonclassical HLA-E and HLA-G that is of great importance. With regard to their roles in many diseases, like cancer, viral infections, inflammatory diseases, autoimmune disorders, and transplantation; naturally processed and presented HLA ligands may contribute to our understanding of disease and foster approaches for intervention.
HLA-C
In this study, the peptide motifs of the 15 most frequently represented HLA-C alleles were comprehensively analyzed using mass spectrometry–based characterization of naturally presented HLA ligands (Fig. 1). Due to the low expression of HLA-C (65–69), it is hardly feasible to determine the peptide motifs in a system with simultaneous normal expression of HLA-A and -B in an unsupervised clustering approach. Hence, the lymphoblastoid C1R cell line with a low endogenous expression of HLA-B*35:03 (27) and HLA-C*04:01 was used to uncover the peptide motifs of HLA-C alleles.
The ligand yields encompass a wide range, which is because of differences in the expression levels but may also be caused by performance variances of mass spectrometric measurements. Nevertheless, yields of extracted HLA ligands were sufficient for all alleles to determine the binding motif of predominant nonamers in an unsupervised manner using GibbsCluster 1.1 (39). Peptide yields for less frequent length variants were in some cases insufficient, leading to contaminated clusters of the transfected HLA with peptides from the endogenously expressed HLA molecules. However, except for HLA-C*01:02, no changes in the preferred anchor residues emerged between the less frequent lengths and nonamers. Therefore, anchor residues were defined from the cluster of nonamers and assigned to the other length variants. This was helpful for the definition of ligands of all length variants. For HLA-C*01:02, additional anchor residues for longer length variants were included.
A common feature of all allotypes is their preference for hydrophobic and/or aromatic residues at the C-terminal position similar to HLA-B, whereas some HLA-A allotypes accept basic amino acids as C-terminal anchors. The restricted repertoire of anchor residues at the C-terminal position could be a result of the proximity to the interaction side of KIRs with HLA-C. KIRs interact with residues α73 to α90 of the HLA molecule (70, 71), which are less polymorphic in HLA-C compared to HLA-A and -B. This region is also mainly involved in the C-terminal anchor contacts (54). The low polymorphism of the α2 domain of HLA-C molecules restricts the binding repertoire of HLA-C alleles, reducing the number of potential ligands. This appears at a glance to be rather unfavorable for the body’s defense, because in theory fewer pathogen-derived or tumor-associated Ags could be presented on HLA-C molecules. This low polymorphism could be associated with the particular role of HLA-C in delivering inhibitory signals to NK cells to ensure self-tolerance. HLA-C alleles guarantee NK cell inhibition in every individual regardless of their HLA allele combination because, compared with HLA-A and -B (only few alleles show epitopes for KIR recognition), all HLA-C molecules have either the C1 or C2 epitope for KIR recognition (72, 73). HLA-C thereby allows combinatorial diversity of HLA-A and -B molecules and a still sufficiently broad binding repertoire within the population, without the disadvantage of a loss of self-tolerance due to missing-self signals.
The anchors in position 2 or 3, respectively, display high variability and thus contribute most to the peptide repertoire of HLA-C alleles. Based on the motif variability, five groups can be determined: 1) small residues in position 2 of HLA-C*02:02, -C*03:03, -C*03:04, -C*12:03, -C*15:02, -C*16:01, and -C*17:01; 2) acidic residues in position 3 of HLA-C*04:01, -C*05:01, and -C*08:02; 3) basic residues in position 2 of HLA-C*06:02, -C*07:01, and -C*07:02; 4) proline in position 3 of HLA-C*01:02; and 5) aromatic residues in position 2 of HLA-C*14:02. Interestingly, small residues in position 2 are often occupied by aromatic residues in position 1 or 3, which may stabilize the binding of the position 2 anchor. A striking change in the peptide motif is seen for longer-length variants of HLA-C*01:02, where the auxiliary anchor in position 2 almost reaches the importance of the anchor at position 3.
The predominant length variant in HLA-C is 9 aa. However, a higher rate of shorter HLA ligands were obtained for HLA-C*04:01, -C*05:01, -C*07:01, -C*07:02, -C*08:02, -C*14:02, -C*15:02, and -C*17:01. Interestingly, six of eight listed allotypes prefer charged or aromatic anchor residues, leading to the assumption that stronger interaction with the HLA molecule by charged or aromatic anchor residues may stabilize shorter peptides in the binding pocket.
Peptide overlap was generally low among the HLA alleles, ranging from 0% in allotypes with nonoverlapping motifs (note: C-terminal anchor has low variance throughout the HLA-C alleles) to 10.86% in allotypes with similar motifs (HLA-C*02:02 and HLA-C*12:03). This underlines the distinct peptide repertoire of allotypes with similar binding specificities, a feature that has also been reported for members of the HLA-B*44 supertype (61). However, high overlap is observed between HLA-C*05:01– and HLA-C*08:02–displaying peptide motifs, which are virtually indistinguishable from each other. In contrast, HLA subtypes usually demonstrate a high degree of binding similarity (41.24% peptide overlap between HLA-C*03:03 and HLA-C*03:04). What is exceptional is the lower promiscuity (8.12%) in the HLA-C*07 subtypes HLA-C*07:01 and -C*07:02 caused primarily by distinct differences in the favored anchor residues in position 2 and differences in position 1 (basic auxiliary anchor in HLA-C*07:01 or uncharged residues in HLA-C*07:02, respectively).
The source proteome overlap of the top 500 ligands of each allotype was particularly increased by overlapping ligands. A limitation is considering only the source proteome of the top 500 ligands, which underestimates the source proteome overlap, because there is a higher probability for further-included ligands (up to 3463 ligands were detected for one HLA molecule [Table I]) to derive from already included source proteins. However, this was necessary due to high variations in the ligand yields. By including the source proteins of all ligands, the source proteome overlap of nonoverlapping ligands would increase from 5.42 to ∼10%. This percentage still illustrates the high diversification added by a second allotype, including subtypes.
The SYFPEITHI matrices resulting from this work reveal high TP prediction rates for HLA-C ligands in combination with a low FP prediction rate, whereas NetMHCpan-3.0 generally gains slightly higher TP rates that are often accompanied by a high FP rate. This high FP rate is problematic in that the HLA-C*04:01 and HLA-B*35:03 peptides used for this comparison (endogenously expressed by C1R) exhibit distinguishable anchor residues. In general, SYFPEITHI prediction is more conservative (lower TP rates but outperforming low FP rates) than binding prediction with NetMHCpan-3.0, and SYFPEITHI prediction also is more precise than NetMHCpan-3.0 is.
The importance of HLA-C becomes apparent in that several peptides of tumor-associated Ags are known to be presented by HLA-C and are recognized by CD8+ T cells. HLA-C ligands, which are known to be recognized by CD8+ T cells, arise from the shared tumor-specific Ags MAGE (74–77), BAGE (78), GAGE (79), and NY-ESO1 (80); the differentiation Ags DCT, PMEL (81), and SLC45A3 (82); the overexpressed Ag TPBG (83); and the Ag PARP12 (84). Indeed, SAFPTTINF (MAGEA1) and VYPEYVIQY (PARP12) were also found within our data set. Furthermore, neoepitopes are known to arise from KRAS (85) or MUM2 (86). Within our data set, further peptides of tumor-associated Ags (according to Ref. 62) were found to be presented by HLA-C alleles, which may be targets of CD8+ T cells and NK cells (Table V).
HLA-E
C1R cells transfected with HLA-E*01:01 exhibit no increase in cell surface expression, although successful transfection was demonstrated by sequencing. This is in line with results of Braud et al. (52) revealing a correlation of HLA-E surface expression with the presence of HLA molecules. In fact, the presentation of the HLA-C*04:01 signal peptide VMAPRTLIL was higher in C1R transfected with an HLA allele harboring the same signal peptide (HLA-C*01:01, -C*03:03, -C*03:04, -C*05:01, -C*06:02, -C*08:02, -C*12:03, -C*14:02, and -C*16:01) compared with HLA-E*01:01–transfected cells (data not shown).
In accordance with Braud et al. (52), only five HLA signal peptides could be found to be presented by HLA-E when looking for recurring sequence similarities throughout the transfectants after the exclusion of HLA-B*35:03 and HLA-C*04:01 ligands (clustering also did not work). However, some conventional peptides from pathogens (87–90) and a prostate cancer–associated Ag (91) were reported to elicit HLA-E–dependent T cell responses. Furthermore, a broader binding repertoire of HLA-E was reported with similarities to the HLA-A*02 binding motif in TAP-deficient K562 cells (55, 92).
HLA-G
In contrast to HLA-E*01:01, HLA-G*01:01 displayed a much larger peptide repertoire with >2200 detected HLA ligands, including peptides of tumor-associated Ags such as PSA, cyclin B1, sperm protein 17, and LCK (Table V), which may elicit inhibitory effects on T and NK cells. The peptide motif displays unusual and highly specialized binding preferences. In contrast to the conclusion made by Diehl et al. (26) assuming three anchor residues (I/L in position 2, P in position 3, and L in position 9), our results indicate that position 2 seems to be less important for peptide binding. The preferred length of HLA-G*01:01 ligands is 9 aa with a sparse peptide overlap to the analyzed HLA-C alleles. The SYFPEITHI matrix for nonameric HLA-G*01:01 reveals a strong performance with a TP rate of 0.91 and an FP rate of 0.05 using the C1R-HLA-G*01:01 peptide data set, whereas NetMHCpan-3.0 exhibits a random prediction with a TP rate of 0.67 and an FP rate of 0.65.
In summary, the peptide motif of HLA-G*01:01 was uncovered, making use of >2200 HLA-G*01:01 ligands. The SYFPEITHI matrix for nonamers outperforms prediction by NetMHCpan-3.0.
Disclosures
H.-G.R. is a shareholder of Immatics Biotechnologies GmbH (Tübingen, Germany) and CureVac GmbH (Tübingen, Germany). H.S. is an employee of Immatics Biotechnologies GmbH. The authors declare that Immatics did not provide financial or scientific support in any direct relation to this manuscript or the underlying studies and was not involved in data collection, analysis, or decision to publish. The other authors have no financial conflicts of interest.
Acknowledgments
We thank Claudia Falkenburger, Zsofia Bittner, and Martin Laure for supporting some of the cell culture experiments and Beate Pömmerl for PCR of isolated plasmid DNA.
Footnotes
This work was supported by the European Union (European Research Council Grant AdG339842 Mutaediting), the Deutsche Forschungsgemeinschaft (SFB 685 and GRK 794), and the Interfaculty Center for Pharmacogenomics and Pharma Research Graduate School at Tübingen–Stuttgart.
The online version of this article contains supplemental material.
Abbreviations used in this article:
- AUC
- area under the curve
- FDR
- false discovery rate
- FP
- false positive
- ILT2/4
- Ig-like transcript 2/4
- KIR
- killer cell Ig-like receptor
- LC-MS/MS
- liquid chromatographic tandem mass spectrometry
- ROC
- receiver operating characteristic
- TP
- true positive.
- Received July 3, 2017.
- Accepted August 12, 2017.
- Copyright © 2017 by The American Association of Immunologists, Inc.