|
|
||||||||
Edward Jenner Institute for Vaccine Research, Compton, Berkshire, United Kingdom
| Abstract |
|---|
|
|
|---|
), DR3 (Glu9
, Gln70
, and Gln/Arg74
), DR4 (Glu9
, Gln/Arg70
, and Glu/Ala74
), DR5 (Glu9
, Asp70
), and DR9 (Lys/Gln9
); DQ1 (Ala/Gly86
), DQ2 (Glu86
, Lys71
), and DQ3 (Glu86
, Thr/Asp71
); DPw1 (Asp84
and Lys69
), DPw2 (Gly/Val84
and Glu69
), DPw4 (Gly/Val84
and Lys69
), and DPw6 (Asp84
and Glu69
). Apart from the good agreement between known binding motifs and our classification, several new supertypes, and corresponding thematic binding motifs, were also defined. | Introduction |
|---|
|
|
|---|
A principal feature of MHC molecules is their allelic polymorphism: the July 2004 ImMunoGeneTics/HLA database release lists 1114 class I and 707 class II molecules (3). Such polymorphism presumably enhances the probability of mounting an immune response by at least a subset of individuals within a population, ultimately increasing the chance of group survival against infection (4). Unlike many proteins, MHC alleles have arisen under a specific and discernible evolutionary pressure, adapting to a fitness landscape mediated by geographically constrained infectious disease. Moreover, any poly-epitope vaccine targeting the whole population would, on the same basis, need to bind a range of HLA molecules. Gulucota and DeLisi (5) found that three to six class I HLA alleles, depending on the ethnic group, would cover
90% of the population. Indeed, because of linkage disequilibrium (the joint probability of a given allelic pair is not generally equal to the product of their individual probabilities, Pij
PiPj), it is not necessarily optimal to choose the alleles with the highest individual frequencies.
The peptide binding site of MHC molecules is composed of a single protein chain for class I and two separate chains in class II. X-ray data reveal that the walls of the cleft are formed by two antiparallel helices and the floor is formed by an eight-stranded
-sheet (6, 7, 8, 9, 10). In MHC class I molecules, the ends of the cleft are closed off, generally allowing only short peptides of 811 aa to bind. In contrast, the cleft in class II is open-ended, allowing much longer peptides to bind, even though only 9 aa occupy the site itself. Both clefts have binding pockets, corresponding to primary and secondary anchor positions on the binding peptide. The combination of two or more anchors is called a motif. It has been found that certain class I alleles can recognize similar motifs (11, 12, 13, 14) and thus be grouped into HLA "supertypes", binding common "supermotifs". The classification of MHC molecules into supertypes, based on structural features and/or peptide specificity, is of prime importance in the development of epitope-based vaccines (15, 16). The experimental determination of motifs for every allele is prohibitively expensive in terms of labor, time, and resources. The only comprehensive, yet practical, alternative is a bioinformatic approach.
Chemometric methods are widely used and extensively validated in computational chemistry for structural classifications (17, 18). Recently, we proposed a "three-dimensional (3D)3 supertype fingerprint" approach which classifies alleles on the basis of information from the structure of the binding sites using two chemometric techniques: principal component analysis and hierarchical clustering (18). We applied this approach to class I MHC molecules belonging to the HLA-A, HLA-B, and HLA-C loci and showed that only 13 aa are sufficient for an allele to be classified within a particular supertype.
In the present study, a combined two-dimensional-3D approach was applied to class II HLA molecules belonging to the DR, DQ, and DP loci, identifying a consensus supertype classification. In contrast to class I, supertypes and supermotifs for class II MHC molecules have not been widely studied. There are only a few classifications for class II molecules: three for HLA-DR molecules (3, 19, 20) and one each for HLA-DQ (21) and HLA-DP (22). Here, we use two clustering techniqueshierarchical and nonhierarchicalapplied to both the sequences of HLA class II proteins and their 3D structures. Clustering is a data analysis technique that, when applied to a set of heterogeneous items, identifies homogeneous subgroups as defined by a given model or measure of similarity. For a detailed review see Ref.23 .
In hierarchical clustering, the data set is analyzed iteratively: at each step either a pair of clusters is merged (agglomerative) or a single cluster is divided (divisive). Determining the number of "natural" clusters is among the most difficult problems in clustering and, to date, no general solution has been found. Agglomerative hierarchical clustering was applied to HLA class II molecules using similarity fields generated by Comparative Similarity Indices Analysis (CoMSIA) (24, 25). CoMSIA is widely used in 3D molecular design to model the interactions between small molecules and proteins (26, 27, 28, 29, 30, 31, 32).
Nonhierarchical methods generate a specific number of disjoint, flat, unconnected clusters. K-means clustering is a nonhierarchical method in which the dataset is partitioned into k clusters by choosing an initial set of k seed compounds to act as initial cluster centers. Each compound is assigned to its nearest cluster and cluster membership is iteratively refined by shifting compounds between clusters until stability is achieved, i.e., no compounds are moved from one cluster to another. The k-means method was applied to a set of z-scales, as defined by Hellberg et al. (33) and extended by Sandberg et al. (34), which describe the most important properties of each amino acid within the HLA class II binding site. In the field of quantitative structureactivity relationships, z-descriptors are used to model the interactions between peptides and proteins (35, 36, 37, 38, 39).
Based on the consensus of the classifications, made by the hierarchical and nonhierarchical methods, twelve class II supertypes were defined: five DR, three DQ, and four DP. Fingerprints for each supertype were also identified. In a similar way to our existing analysis of class I supertypes (40), it was found that 13 aa are sufficient to distinguish between class II supertypes.
| Materials and Methods |
|---|
|
|
|---|
The protein sequences of HLA class II molecules were collected from the ImMunoGeneTics/HLA database (3). Only the first 80 aa of the
-chains and the first 90 residues of the
-chain were required as they formed the binding site. Sequences missing >10 aa at the N terminus were removed. All possible combinations of
- and
-chains for DQ and DP were modeled. This generates 738 DQ molecules (18 DQA x 41 DQB) and 1140 DP molecules (12 DPA x 95 DPB). Because the HLA-DR
-chain has yet to exhibit binding site polymorphism, only the HLA-DR
-chains were varied, generating 347 HLA-DRB structures. The rotameric placement of amino acid substitutions was modeled using SCRWL2.8 (41), with invariant amino acids held rigid. The x-ray structures of 1PYW (42) for DRB and 1JK8 (43) for DQ were used as templates for the backbone and invariant side chains. As x-ray data for DP molecules are not available, a homology model of the DP
1
1 domain (DPA1*0103/DPB1*0101) was derived using the structure of HLA-DR1 (DRB*0101) (PDB: 1PYW) as a template. The de facto standard protein homology modeling system Modeler was used (44). We applied standard settings for all parameters (45). Deletions were made at positions 23 and 24 in the
-chain. The resulting DP structure was used as the basis for side chain placement using SCRWL2.8. In the multivariant cases only the first molecule (*xxxx01) was considered. For use in Sybyl, all homology models were aligned to the starting structure. Hydrogen atoms and Kollman charges were added for each molecule. To avoid the influence of polymorphism outside the binding site ("polymorphic noise"), only the amino acids forming the binding site ("polymorphic signal") were considered, with conserved binding site amino acids also omitted. The amino acids forming the binding sites were selected on the basis of x-ray data of peptide-class II protein complexes (Refs.3 , 9 , 10 , 21 , 22 , 42 , 43 , 46 , 47) (Table I).
|
CoMSIA was used as implemented in Sybyl version 6.9 (Tripos, 2004). The 3D structures of proteins belonging to the same locus were aligned: the x-ray structure 1PYW was used as a template for DR (42), the x-ray structure 1JK8 (43) was used for DQ, and the modeled DP structure was used for DP molecules. The amino acids outside the binding site were excluded. The grid had a resolution of 2.0 Å and extended beyond the molecular dimensions by 4.0 Å in all directions. At each grid point, a similarity index between the probe and the target molecule is calculated using a Gausian-type distance-dependent function. Similarity indices fields were generated with an attenuation factor
= 0.3. The attenuation factor shows the steepness of the Gaussian-type function. The probe used had a 1 Å radius, charge + 1, hydrophobicity + 1, hydrogen-bond donor + 1, and acceptor properties + 1. The agglomerative hierarchical clustering (23) option of Sybyl version 6.9 was applied to CoMSIA fields. According to this technique the clusters are built from the bottom up, first by merging individual items into clusters, and then by merging clusters into superclusters, until the final merge brings all items into a single cluster. The distance between the clusters was calculated using the complete-linkage method, i.e., using the distance between the most distant pair of data points in both clusters. The last four levels of the hierarchy were considered for supertype definition.
Nonhierarchical clustering on z-scores
The protein sequences for each class II locus were aligned. As in the CoMSIA study, amino acids outside the binding site were excluded. Each amino acid was described by five z-descriptors: z1 (hydrophobicity), z2 (steric bulk), z3 (polarity), z4, and z5 (electronic effects) (34). An X-matrix was formed for each locus. Rows corresponded to the number of proteins and columns equaled five times the number of polymorphic amino acids in the binding site. The X-matrices were imported into MDL QSAR version 2.2. K-means clustering was applied, with the initial set of k seeds equal to the number of clusters generated by the hierarchical clustering. The members of the clusters generated by the hierarchical and nonhierarchical clustering were compared, and the commonly clustered members were calculated as a percentage of all alleles for every locus.
| Results |
|---|
|
|
|---|
Making use of similarity fields generated by CoMSIA, agglomerative hierarchical clustering was applied to amino acids forming the binding sites on HLA class II molecules (Table I). CoMSIA is a 3D grid method, in which a probe is placed at all lattice points in a regular 3D grid in and around the target molecule (24, 25). At each point, a similarity index between probe and target is calculated using a Gaussian-type distance dependent function. Five similarity fields were calculated: steric bulk, electrostatic potential, local hydrophobicity, and hydrogen-bond donor and acceptor abilities. In hierarchical clustering, each level defines a partition of the data set into clusters. However, in general it is not clear which level is best in terms of splitting the data set into a "natural" number of clusters, so that each cluster contains the most appropriate compounds (23). In the present study, the number of clusters was selected to be in a good agreement with previous classifications and known binding motifs, where these are available. Usually, the last four levels were considered for the supertype definition. The number of clusters defined in the hierarchical clustering was used as the input k cluster number in the nonhierarchical k-means clustering.
Nonhierarchical k-means clustering was applied to a set of z-properties describing each amino acid of the HLA class II binding site. These z-scales, as defined by Hellberg et al. (33), reflect the most important properties of amino acids and are referred to as "principal properties". These scales were derived by principal component analysis from a data matrix consisting of a large number of physicochemical variables, such as m.w., pKas, 13C NMR-shifts, etc. The first principal component (PC) reflects amino acid hydrophobicity, the second PC reflects their size, and the third, their polarity. The three PCs are labeled: z1-, z2- and z3-scales, respectively. More recently, Sandberg et al. (34) extended the three z-scales to five, adding z4 and z5, which account for electronic effects of the amino acids. With these z-scales, it is possible to quantify numerically the structural variations within a series of related peptides, by arranging the z-scales according to the amino acid sequence. In the present study the five z-scales were used to describe the polymorphic amino acid sequences of the binding site of HLA class II molecules (Table I).
The common members of the clusters derived by both methods were expressed as a percentage of all alleles for every locus. Supertype fingerprints were defined on the basis of common amino acids found within the multiple sequence alignment of alleles belonging to one supertype.
HLA-DR supertypes
Hierarchical clustering.
The hierarchical clustering, using CoMSIA fields, of HLA-DRB is plotted in Fig. 1, and the detailed content of each cluster is shown in Table II. At the second level of the hierarchy there are three clusters: two small clusters flanking one huge central cluster. At this level, the clustering is associated with polymorphism around pocket 9. The leftmost cluster is composed of structures with Try9
, the middle cluster has Glu9
, and the rightmost has Lys/Gln9
. As the structures in the middle cluster were still quite diverse, mainly around binding pocket 4, (
70,
71, and
74), this group was further subdivided. At the fourth level there are five roughly equally sized clusters, which corresponds well with known DR binding motifs. The clusters were defined as supertypes and named after the lowest serotype included.
|
|
. Residues
9,
30,
38,
57, and
61 are part of pocket 9, which accommodates peptide residue 9. Trp9
makes the pocket shallow and allows the binding of small nonpolar residues: Leu, Ala, Gly, and Pro. The binding motif for HLA-DRB1*0101 favors Leu or Ala at position 9 (Table II) and that for HLA-DRB1*1501, favors Gly, Ser, Pro, or Thr (Table II). Trp9
is the fingerprint residue for the DR1 supertype.
The second cluster, called DR3 supertype, comprises DR3 (DRB1*030125), DR52 (DRB3*010110, DRB3*020118, DRB3*030103), DRB1*0422, and DRB1*1107. The common features for this cluster are Gln70
, Lys71
, and Gln/Arg74
. Residues
70,
71, and
74 are the polymorphic residues forming pocket 4. This pocket binds one of the main anchor residues for MHC class II molecules (9, 10, 19). Amino acid side chain charges are important for interaction between TCRs and peptide-DR complexes (48). Due to Lys71
the total charge in pocket 4, for this supertype, is positive, which corresponds well to the preference for negatively charged Asp and Glu at position 4 in the DRB1*0301 binding motif (55, 56, 57). DR3 supertype fingerprint residues are Gln70
, Lys71
, and Gln/Arg74
.
The third cluster consists of DR4 (DRB1*0402, 12, 15, 25, 36, 37, 47), DR5 (DRB1*110147, DRB1*120109), DR6 (DRB1*130162, DRB1*1403, 16, 22, 25, 27, 40), and DR8 (DRB1*080125). As most DR4 alleles go into another cluster, this supertype was named after the next numerically lowest serotype DR5. The common feature for all MHC molecules belonging to this supertype is Asp70
. To distinguish certain DR51 alleles with Asp70
which belong to another supertype, Glu9
is added to the DR5 supertype fingerprint. It is probable that Asp70
is negatively charged in the pocket. When Glu71
is next to Asp70
, as in DRB1*0402, negatively charged residuesAsp and Gluat peptide position 4 are detrimental to binding (61). If Arg or Lys are available at position
71, the pocket becomes both positively and negatively charged and can accommodate neutral amino acids like Leu, Val, and Ile, as in the DRB1*1101 binding motif (59, 64).
The fourth cluster includes DR4 (DRB1*0401, 0348 without alleles from the DR5 supertype), DR5 (DRB1*1113, 17, 26, 34, 42), DR6 (DRB1*1309, DRB1*140148), DR10 (DRB1*1001), and DR53 (DRB4*010106). This cluster was called the DR4 supertype. The DR4 supertype fingerprint is Gln/Arg70
, Arg/Lys71
, and Glu/Ala74
. Arg or Lys at
71 position makes pocket 4 positively charged and residues like Arg and Lys at peptide position 4 become detrimental for MHC binding, as is evident from binding motifs for DRB1*0401 and DRB1*0404 (58, 59, 60, 61).
The last cluster, which we call the DR9 supertype, is composed of DR9 (DRB1*0901 and DRB1*0902) and DR51 (DRB5*010112, DRB5*020205), and has the fingerprint Lys/Gln9
. Lys/Gln9
coexists with Asp11
. Both residues take part in the formation of binding pocket 9. Asp11
makes the pocket negatively charged and peptides with Arg and Lys at position 9 are preferred as is evident from the DRB5*0101 binding motif (53, 54).
Nonhierarchical clustering. The contents of clusters derived by nonhierarchical clustering are given in Table II. The major discrepancies concern alleles DR3 (DRB1*030125), DR7 (DRB1*070107), and DR51 (DRB5*010112, DRB5*020205). The first was classified as DR3 by hierarchical clustering and as DR4 by nonhierarchical. The second was considered part of the DR1 supertype by hierarchical clustering and as part of DR9 by nonhierarchical. The third belongs to the DR9 supertype according to hierarchical clustering and to DR5 according to nonhierarchical. Despite these minor differences, 82% (285 of 347) of the DR alleles were classified in the same supertype by both clustering methods.
HLA-DQ supertypes
Hierarchical clustering.
The hierarchical clustering of HLA-DQ molecules is shown in Fig. 2 and the cluster contents are listed in Table III. Two clusters exist at the first level. The structural differences here concern the polymorphic region 8487 of the
-chain. Alleles from the left cluster contain Gln84
, Leu85
, Glu86
, and Leu87
(DQB1*02, 03, and 04), whereas the right cluster members have Glu84
, Val85
, Ala86
, or Gly86
, and Tyr87
or Phe87
(DQB1*05 and 06), respectively. Position
86 is part of pocket 1, together with the amino acids at positions 24, 31, and 52 from the
-chain. X-ray data for DQ8 (DQA1*0301/DQB1*0302) indicates that pocket 1 is lined by two positively charged side chainsHis24
and Arg52
inside the entrance and two negatively charged residuesGlu31
and Glu86
deeper in the pocket (43). Together, they form a hydrogen-bonding network. The replacement of Glu86
with Ala or Gly destroys this network and the side chain of Arg52
might reorient closer to the pocket entrance. This might explain the pronounced intolerance of positively charged amino acids at position 1 according to the DQA1*0102/DQB1*0602 binding motif (65). The cluster with a fingerprint Ala/Gly86
was called the DQ1 supertype. It includes the serotypes: DQ1 (DQB1*0611, 12), DQ5 (DQB1*0501, 02, 03), and DQ6 (DQB1*060105, 09) and the rest of molecules containing DQB1*05 or 06.
|
|
, which is the main difference compared with other DQ
-chains. As these alleles were not classified as a separate cluster by the nonhierarchical clustering, we decided to define them as outliers rather than as a separate supertype. The alleles from the other two clusters differ in several positions at the binding site and correspond well to known motifs.
The cluster containing the serotype DQ2 (DQB1*0201, 02, and 03) was labeled the DQ2 supertype. Its fingerprint is Glu86
and Lys71
. The amino acid at position 71 takes part in the formation of pockets 4 and 7. DQ2 is the only serotype with Lys at position 71. This gave the pockets a strongly basic character, a factor accounting for the almost absolute requirement for acidic residues, principally at peptide position 7, but also at position 4 (67). A computer model of DQ2 with HLA class I
4660 peptide in the binding cleft indicates that Lys71
makes a direct salt-bridged hydrogen bond to the glutamic acid or aspartic acid at position 7 of the peptide (67).
The cluster, named DQ3, includes several serotypes: DQ3 (DQB1*0306), DQ4 (DQB1*0401, 02), DQ7 (DQB1*0301, 04), DQ8 (DQB1*0302, 05), DQ9 (DQB1*0303), and the rest of alleles carrying DQB1*03 or 04. Its fingerprint is Glu86
and Thr71
(DQB1*03) or Asp71
(DQB1*04). Peptide positions 4 and 7 must be aliphatic amino acids (Table III), although Godkin et al. (72) found that Arg is also tolerated.
Nonhierarchical clustering. The nonhierarchical clustering did not classify DQA1*03 alleles in a separate cluster. For the rest of the DQ molecules, we found an 83% (615 of 738) agreement with the hierarchical classification (Table III).
HLA-DP supertypes
Hierarchical clustering.
The DP dendrogram is plotted in Fig. 3 and the cluster members are listed in Table IV. Like the DR locus, DP clustering depends on the polymorphism of the
-chain only. Two large clusters are apparent at the first level of the hierarchy. The left cluster comprises alleles with Asp84
, Glu85
, Ala86
, and Val87
. The right one includes alleles with Gly84
or Val84
, Gly85
, Pro86
, and Met87
. All four residues play a major role in forming the contact area between
- and
-chains and only position
84 and partly
85 are involved in forming the surface of pocket 1 (46). Further division appears at the second level of the hierarchy, and it is connected with position
69, which is involved in forming pockets 4 and 6 (47). Clustering at the third level is contingent upon positions
55 and
56. Position
55 is part of pocket 9. At this level of the hierarchy, DPB1*0401 and 0402 split into separate clusters. As recent experimental data indicate that they probably belong to the same supertype (22), the second level of the dendrogram was chosen for definition of the DP supertypes.
|
|
and Lys69
. The MHC molecules forming this supertype have a negatively charged pocket 1 and a positively charged pocket 4. To the best of our knowledge, no binding motif is currently available for any of the alleles in this supertype. One may imagine that peptides with complementary chargespositively charged amino acids at position 1 and negatively charged residues at position 4should, in general, bind well to members of this supertype.
The second cluster involves DPw6 (DPB1*0601), DPB1*08, 09, 10, 11, 13, 16, 17, 19, 21, 22, 29, 30, 37, 44, 54, 55, 58, 69, 88, and 93. This supertype was called DPw6. All alleles have Asp84
and Glu69
, except DPB1*1101 and DPB1*6901 which have Arg69
. Unfortunately, no binding motif was available for any alleles from this supertype. Again, peptides with complementary chargespositively charged amino acids at positions 1 and 4may be supposed to bind well to this supertype.
The third cluster, called the DPw2 supertype, consisted of DPw2 (DPB1*0201 and 0202), DBP1*32, 33, 41, 46, 47, 48, 71, 81, 86, and 95. Its fingerprint is Gly84
or Val84
and Glu69
. Alleles of the supertype have a deep, nonpolar pocket 1 capable of accepting bulky amino acids, as is evident from the available binding motif (Table IV) (74). Due to a negatively charged Glu69
, pocket 4 of HLA-DP2 showed high affinity for peptides with positively charged residues at this position (47).
DPw4 (DPB1*0401 and 0402), DPB1*15, 18, 23, 24, 28, 34, 39, 40, 49, 51, 53, 59, 60, 62, 66, 72, 73, 74, 75, 77, 80, 83, 94, 96, and 99 form the last cluster, which we call the DPw4 supertype. Its fingerprint is Gly84
or Val84
and Lys69
; only DRB1*1501 and 74 have Arg69
. Again, consistent with our results, known motifs for DPB1*0401 and 0402 indicate preferences for bulky aromatic amino acids at positions 1 and 4 (Table IV) (22).
Nonhierarchical clustering.
The contents of clusters derived by nonhierarchical clustering are listed in Table IV. Among several minor differences, DPB1*11 and 69 (Arg69
) are clustered into the DPw1 supertype (Lys69
), as opposed to DPw6 (Glu69
), which was identified by hierarchical clustering. 85% (972 of 1140) of the DP molecules were clustered into the same supertype by both methods.
| Discussion |
|---|
|
|
|---|
In this study, we have applied a combined bioinformatics approach, using both protein sequence and structural data, to 2225 HLA class II molecules, to detect similarities in their peptide binding sites and to define supertype fingerprints. Two chemometric techniques were used: hierarchical clustering on 3D CoMSIA fields and nonhierarchical k-means clustering on sequence-based z-descriptors. The former method classifies the molecules on the basis of binding site similarities, in terms of steric bulk, electrostatic potential, local hydrophobicity, and hydrogen-bond-donor and acceptor abilities. The latter method uses five principal properties (z-scales) of the amino acids and classifies the proteins according to their sequence-based binding site similarities.
An average consensus of 84% was achieved, i.e., 1872 of 2225 class II molecules were classified in the same supertype by both techniques. Twelve class II supertypes were defined: five DRs, three DQs, and four DPs. The DR supertypes are DR1 (fingerprint Trp9
), DR3 (Glu9
, Gln70
, and Gln/Arg74
), DR4 (Glu9
, Gln/Arg70
, and Glu/Ala74
), DR5 (Glu9
, Asp70
), and DR9 (Lys/Gln9
). The DQ supertypes are DQ1 (Ala/Gly86
), DQ2 (Glu86
, Lys71
), and DQ3 (Glu86
, Thr/Asp71
) and the DP supertypes are DPw1 (Asp84
and Lys69
), DPw2 (Gly/Val84
and Glu69
), DPw4 (Gly/Val84
and Lys69
), and DPw6 (Asp84
and Glu69
). Apart from the good agreement between known binding motifs and our classification, several new supertypes have been defined and thematic binding motifs have been outlined for them. In the following, we discuss the congruence of our systematic structural analysis of binding with extant data on the biology of class II human MHCs, rather than making unsupported speculations.
HLA-DR molecules account for >90% of the HLA class II isotypes expressed on APCs (83). Although the HLA-DRA locus is monomorphic, >300 alleles have been described for the HLA-DRB1 locus (3). X-ray data indicate that 12 hydrogen bonds exist between conserved DR atoms and main-chain atoms of the bound peptide (9). As they do not involve the side chains of the peptide, these hydrogen bonds are likely to play a common role in peptide binding to HLA-DR.
Five binding pockets, pockets 1, 4, 6, 7, and 9 (named after the corresponding positions on the binding peptide), were found to be common for most DR proteins (9, 10). Specificity of pocket 1 is modulated by a Gly/Val86
dimorphism. DR proteins with Gly86
show strong preferences for large hydrophobic side chains (Trp, Tyr, Phe) at peptide position 1, whereas Val86
restricts the pocket size and alters the preferences to small hydrophobic side chains (Val and Ala) at this position. The main difference in the preferences concern bulky aromatic residuesTrp, Tyr, and Phewhich are not accepted at pocket 1 when it contains Val86
. However, the medium sized hydrophobic amino acids Leu and Ile are well accepted in all DR molecules and peptide position 1 could not be considered as an anchor able to distinguish between different DR alleles.
Pocket 4 is formed by polymorphic amino acids at positions
13,
26,
28,
70,
71,
74, and
78. Residues at positions
70,
71, and
74 play a significant role both in protein binding and T cell recognition (Refs.4 , 9 , 10 , 19). Residues
71 and
74 also take part in the formation of pockets 6 and 7 (9, 84). Ou et al. (19) made a functional categorization of DR alleles on the basis of pocket 4 polymorphism, associating each group with certain autoimmune diseases. Good agreement was found between this categorization and our classification. The DR3 supertype corresponds to the functional DR restrictive supertype pattern (RSP) "R". It contains the pattern Gln70
, Lys71
, and Arg/Gln74
and the overall charge within pocket 4 is positive, which requires negatively charged amino acids Asp and Glu at position 4 of the binding peptide (Table II, motif DRB1*0301). This supertype is associated with two autoimmune diseases: systematic lupus erythematosus and Hashimotos thyroiditis (19, 83). The DR4 supertype corresponds to DR RSP "A" (19). Its pattern, Gln/Arg70
, Arg/Lys71
and Glu/Ala74
, is close to that of DR RSP "R", differing only in position
74. When Ala appears at
74, pocket 4 increases in size and can accommodate larger amino acids such as Phe, Trp, and Ile (Table II, motifs DRB1*0401, 04, 05). Unfortunately, no binding motif is available for any allele bearing Glu74
, but one could suppose that small polar residues, like Ser and Thr, will be accepted. This supertype is associated with a susceptibility to rheumatoid arthritis (19, 83). The DR5 supertype corresponds to DR RSP "D" with pattern Asp70
, Glu/Arg71
, and Leu/Ala74
(19). The main feature here is the negatively charged Asp at position
70, which restricts the accommodation of negatively charged amino acids at peptide position 4 (Table II, motif DRB1*0402). Juvenile rheumatoid arthritis (JRA), pemphigus vulgaris, and allergic bronchopulmonary aspergillosis are autoimmune diseases associated with this supertype (19).
Residues
9,
30,
37,
38,
57, and
61 are involved in the formation of pocket 9 (9, 84). The polymorphism at
9 determines the pocket size and hence binding motif preferences at this position. The clustering at the first and second level of the DR dendrogram (Fig. 1) is associated with the
9 polymorphism. Trp9
is the fingerprint for the DR1 supertype, Lys/Gln9
for DR9, and Glu9
for DR3, DR4, and DR5. Small amino acids (Ala, Val, Gly, Ser, Thr, Pro) are accepted in pocket 9 of the DR1 supertype (Table II, motifs DRB1*0101, 1501). Glu9
, in combination with Asp57
, makes this pocket negatively charged, facilitating the accommodation of positively charged amino acids, such as Lys (motifs DRB1*0401, 0404) and His (motif DRB1*0402). In most MHC class II alleles, Asp57
makes a salt-bridged hydrogen bond with Arg76
, allowing the pocket to also accommodate aliphatic and polar amino acids (43). In cases where Asp57
is replaced by Ser (DRB1*0405) or Ala (DQ8), the hydrogen bonding network is destroyed and Arg76
can strongly attract negatively charged amino acids (Asp, Glu) available at position 9 of the binding peptide (motif DRB1*0405). Lys/Gln9
always coexists with Asp11
and Asp/Gly30
. Vogt et al. (53) suggested that the positively charged anchor residue R and K (motif DRB5*0101) may form a salt bridge with Asp at position 11 and/or position 30 of the DRB5*0101 molecule.
During the last 10 years, interest in HLA-DQ proteins has increased because certain DQ alleles are associated with susceptibility to type 1 diabetes and celiac disease (85, 86). The x-ray structure of DQ8 (DQA1*0301/DQB1*0302) complexed with an immunodominant peptide from insulin was solved (43). Several DQ binding motifs have been defined (65, 66, 67, 68, 69, 70, 71, 72, 73). The initial hypothesis was that class II molecules with non-Asp
57 (i.e., DQ2, DQ8, I-Ag7) preferentially bind peptides with negatively charged anchor residue at peptide position 9, such as peptides from insulin
-chain, gliadin, glutenin, and present them to islet-infiltrating T cells or mucosal T cells (87, 88, 89, 90). As was discussed above, the molecular explanation for this phenomenon is that Asp57
forms a salt bridge with Arg76
, whereas in non-Asp57
molecules Arg76
is free to interact with the negatively charged peptide anchor at position 9 (43). However, recent data does not support this hypothesis: not all non-Asp57
class II molecules have a preference for negatively charged anchor residues at peptide position 9 and should thus be associated with susceptibility to type 1 diabetes and celiac disease (69). For example, in the Japanese population the class II molecule DQA1*0301/DQB1*0401, which has the same
-chain as DQ8, but has a
-chain containing an Asp57
, is associated with increased susceptibility to type 1 diabetes (43). Other exceptions include molecules DQA1*0201/DQB1*0201 and DQ9 (DQA1*0301/DQB1*0303). The former does not contain Asp57
but is neutral-protective to type 1 diabetes (43), while the latter does contain Asp57
yet is associated with susceptibility to celiac disease (73).
The DQ classification defined in the present study is based on two important amino acids from the
-chain: positions
71 and
86. Residue
71 participates in the formation of pockets 4 and 7, while residue
86 is part of pocket 1. Pocket 1 is a deep, very polar pocket in HLA-DQ molecules, formed by two positively and two negatively charged amino acids, which form a hydrogen bonding network. Replacement of Glu86
with Ala or Gly will destroy this network and leave Arg52
free to contact the side chain of peptide position 1 (43). This is consistent with strong intolerance for positively charged amino acids at position 1 for the DQ1 supertype (Table III, motif for DQA1*0102/DQB1*0602). Ala/Gly86
coexists with Phe/Tyr87
. The last residue is also part of pocket 1 and the Phe/Tyr
Leu replacement increases the pocket size. Large hydrophobic amino acids (Trp, Tyr, Phe) at position 1 are well accepted by alleles bearing Glu86
/Leu87
and belong to supertypes DQ2 and DQ3 (Table III, motifs DQA1*0501/DQB1*0201, DQA1*0301/DQB1*0301), whereas alleles with Ala/Gly86
and Phe/Tyr87
(supertype DQ1) prefer medium sized hydrophobic or polar amino acids (Leu, Ile, Thr, Ser) (Table III, motif DQA1*0102/DQB1*0602).
DQ pocket 4 is significantly deeper than the corresponding pocket 4 in DR molecules (43). Lys71
accounts for the strong basic character of this pocket in DQ2 supertype molecules. Lys71
makes a salt bridge with acidic residues at position 7 of the binding peptide (67). Asp and Glu are preferred amino acids at positions 4 and 7 of the DQ2 binding motif (Table III). In the DQ3 supertype, Lys71
is replaced by Thr71
, which coexists with Glu74
. The last amino acid makes the pocket negatively charged and acidic residues (Asp and Glu) are not observed at this peptide position (motif DQA1*0301/DQB1*0301).
DQ alleles beginning DQA1*03 differ from other DQ alleles in having an additional Arg residue after Arg52
. This affects the architecture of pocket 1 (21) and determines a preference for small to medium sized amino acids at peptide position 1, including aliphatic or negatively charged side chains (Table III, motif DQA1*0301/DQB1*0301). DQA1*03 alleles were classified as outliers and not as a separate supertype.
Apart from type 1 diabetes and celiac disease, HLA-DQ alleles are strongly associated with either protection or susceptibility to other autoimmune diseases. Susceptibility to multiple sclerosis has been suggested for individuals with DQA1*0102/DQB1*0602 (91, 92); pemphigus vulgaris is associated with DQB1*0503 (93); rheumatoid arthritis with DQ3 (DQA1*03/DQB1*03 and DQA1*03/DQB1*04) and DQ5 (DQA1*0101/DQB1*0501) (94); systemic sclerosis with DQA1*0501 (95); and protection against type 1 diabetes with DQA1*0102/DQB1*0602 (96). Although these associations concern single HLA-DQ alleles, one could draw a more general conclusion, connecting susceptibility to multiple sclerosis, pemphigus vulgaris, or rheumatoid arthritis as well as protection against type 1 diabetes with alleles from DQ1 supertype.
In contrast to HLA-DR and DQ, HLA-DP molecules have not been studied extensively, as they have been viewed as less important in immune responses than DRs and DQs. Moreover, currently, no x-ray data exist for any peptide/HLA-DP complex. However, it is now known that HLA-DP proteins contribute to the risk of graft-vs-host disease (97, 98), and that some DP alleles are associated with chronic beryllium disease (99), sarcoidosis (100), and JRA (101). Both the
- and
-chains of HLA-DP are polymorphic, allowing multiple combinations, but only a few DP molecules are abundant globally. For example, DPA1*0103/DPB1*0401 and 0402 are overrepresented, carried by
76% of individuals in the Caucasian population (22).
The HLA-DP classification, made in this study, is based on two key amino acids of the DP
-chain: positions
69 and
84. These positions correspond to DR/DQ
71 and
86. Both are important for DQ classification, while only
71 takes part in the DR classification. Positions
84 and, to a lesser extent,
85 take part in the formation of pocket 1. Almost half (40 of 95) of the
-chains have Gly/Val84
and Gly85
, the other half (55 of 95) have Asp84
and Glu85
. The chemical nature of the two pairs is very different and this determines the strong differences in the pockets formed by them. Pocket 1 with Gly/Val84
is deep and nonpolar and could accept large hydrophobic amino acids like Phe, Tyr, and Leu (Table IV). Pocket 1 with Asp84
is more shallow and negatively charged. Because no binding motif is available for alleles with Asp84
, one might suppose positively charged amino acids, such as Arg and Lys, may be favored here. Position
84 was found to be a key amino acid in Castellis HLA-DP classification (22). They defined three supertypes, based on positions
11 and
84, in contrast to the four identified by our analysis.
Glu/Lys dimorphism exists at position DP
69. Additionally, there are four alleles (DPB1*11, 15, 69, and 74) with Arg69
. Because Lys and Arg are similar, these alleles were grouped into Lys69
clusters. Position
69 affects the shape and charge distribution of pockets 4 and 6 (47). Pockets 4 and 6 with Glu69
show high affinity for positive polar residues like Arg, Lys, Gln, and Asn or nonpolar aromatic residues (Phe, Trp, Tyr, and His), but reduced affinity for large nonpolar aliphatic residues (Table IV, motif DPA1*0103/DPB1*0201). Because Glu69
is associated with sarcoidosis, one could suppose a connection between susceptibility to this disease and alleles from DPw2 and DPw6 supertypes (100). The susceptibility of JRA is strongly associated with DPB1*0201 allele (101). By analogy, a relation between JRA and the DPw2 supertype could be supposed. Pockets 4 and 6 with Lys/Arg69
have reduced amino acid selectivity, with aromatic residues most preferred (motifs DPA1*0103/DPB1*0401 and 0402). Additionally, Lys69
favors the binding of large residues endowed with the capacity to form hydrogen bonds (such as Arg) with residue Gln60
(47).
Analysis of our classification of HLA class II proteins into supertypes reveals several general trends. First,
-chain polymorphism within the peptide binding site plays the leading role in the overall polymorphism of human MHC. The key polymorphic positions revealed to be important for our and other supertype definitions (22) all belong to
-chains. Second, despite the extraordinary diversity of HLA proteins, common structural features and similarities could be detected and used as fingerprints for their identification and classification into supertypes. The number of amino acids involved in the supertype fingerprints is strikingly small, i.e., one to three. Finally, the classifications proposed here are based on key amino acids with very different, even opposite, properties. For example, position Glu/Lys69
for HLA-DP alleles could be considered as a key position, because of the opposite properties of Glu and Lys. However, position Gly/Val86
could not be a key position for DR classification, because of the similar properties of Gly and Val.
The MHC is among the most polymorphic of human proteins, and this has greatly complicated the discovery of epitope vaccines. Supertype analysis is one approach taken to address this confounding problem. We have previously identified class I supertypes using computational methods (40), which we now complement with our present analysis of human class II supertypes. The veracity of this analysis is confirmed, as far as possible, by reference to known peptide binding motifs. Although such motifs are an imperfect, or at least incomplete, representation of binding (102, 103), they have clear utility as an approximation to peptide specificity. All supertypes are theoretically derived. Supertypes, based on "binding motifs", may possess a certain verisimilitude, but are, at best, only a partial definition of supertypic membership, limited by the lack of available data for most MHC molecules. Indeed, all work based on the analysis of experimental work, including our own (104, 105, 106, 107), is necessarily limited by the paucity and haphazard nature of extant experimental binding studies. The approach presented here is complementary to such analysis and to existing supertype analyses (3, 19, 20, 21, 22). However, our approach is fundamentally different, at a conceptual and technical level, from other, earlier attempts to cluster alleles into supertypes using structural approach.
We have discussed such data as exists which supports and verifies our analysis, rather than speculating in a specious and uncorroborated manner. In the context of human class II MHC, this data is, unfortunately, only partial. Further demonstration of the accuracy of our classification will come in either of two ways: through the accumulation of further motifs in the literature or by the exploration of the peptide specificity repertoire of MHC molecules through systematic study. The utility of the method, though obvious to us, will again require independent, external validation for a sufficiently large number of peptides and alleles that its accuracy can be shown to work to statistical significance. We see supertype definition as a grand challenge with significant scientific and utilitarian merit: it is difficult, and thus exciting, and is also truly valuable, as a pivotal tool in the drive to develop new and better vaccines.
| Disclosures |
|---|
|
|
|---|
| Footnotes |
|---|
1 This work was supported by GlaxoSmithKline, Medical Research Council, Biotechnology and Biological Sciences Research Council, and United Kingdom Department of Health. ![]()
2 Address correspondence and reprint requests to Dr. Darren R. Flower, Edward Jenner Institute for Vaccine Research, Compton, Berkshire, U.K., RG20 7NN. E-mail address: darren.flower{at}jenner.ac.uk ![]()
3 Abbreviations used in this paper: 3D, three dimensional; CoMSIA, Comparative Similarity Indices Analysis; RSP, restrictive supertype pattern; JRA, juvenile rheumatoid arthritis. ![]()
Received for publication September 8, 2004. Accepted for publication February 28, 2005.
| References |
|---|
|
|
|---|
residues 9, 11, 35, 55, 69 and 8487 in T cell allorecognition and peptide binding. Int. Immunol. 15: 565-576.
69 human leukocyte antigenDP polymorphism on peptide-binding specificity. Tissue Antigens 62: 459-471.[Medline]
, G. Jung, H.-G. Rammensee. 1994. Pool sequencing of natural HLA-DR, DQ, and DP ligands reveals detailed peptide motifs, constraints of processing, and general rules. Immunogenetics 39: 230-242.[Medline]
, V. Gnau, G. Jung, A. Melms. 1993. Natural peptide ligand motifs of two HLA molecules associated with myasthenia gravis. Int. Immunol. 5: 1229-1237.
, H.-G. Rammensee. 1996. Natural ligand motifs of closely related HLA-DR4 molecules predict features of rheumatoid arthritis associated peptides. Mol. Basis Dis. 1316: 85-101.
-
-dimers. J. Immunol. 150: 499-507.[Abstract]
1*0501,
1*0201) molecule. Eur. J. Immunol. 26: 2764-2772.[Medline]
1*0501,
1*0201) vs the non-disease-associated DQ(
1*0201,
1*0202) molecule. Immunogenetics 46: 484-492.[Medline]
57) has no particular preference for negatively charged anchor residues found in other type 1 diabetes-predisposing non-Asp
57 MHC class II molecules. Int. Immunol. 10: 1229-1236.
gene contributes to susceptibility and resistance to insulin-dependent diabetes mellitus. Nature 329: 599-604.[Medline]
/
heterodimer. J. Exp. Med. 169: 345-350.
chain protects against type I diabetes. A family study. Proc. Natl. Acad. Sci. USA 85: 8111-8115.
chain and susceptibility to develop insulin-dependent diabetes mellitus. Hum. Immunol. 26: 215-225.[Medline]
1*0501,
1*0201) restricted T cells isolated from the small intestinal mucosa of celiac disease patients. J. Exp. Med. 178: 187-196.
1*0102,
1*0602) heterodimer may confer susceptibility to multiple sclerosis in the absence of the HLA-DR (
1*01,
1*1501) heterodimer. Tissue Antigens 50: 15-22.[Medline]
in a tropical population. Exp. Clin. Immunogenet. 16: 131-138.[Medline]
allele is associated with pauciarticular juvenile rheumatoid arthritis but not adult rheumatoid arthritis. Proc. Natl. Acad. Sci. USA 86: 9489-9493.This article has been cited by other articles:
![]() |
N. Schwensow, M. Eberle, and S. Sommer Compatibility counts: MHC-associated mate choice in a wild promiscuous primate Proc R Soc B, March 7, 2008; 275(1634): 555 - 564. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Richards, F. A. Chaves, F. R. Krafcik, D. J. Topham, C. A. Lazarski, and A. J. Sant Direct Ex Vivo Analyses of HLA-DR1 Transgenic Mice Reveal an Exceptionally Broad Pattern of Immunodominance in the Primary HLA-DR1-Restricted CD4 T-Cell Response to Influenza Virus Hemagglutinin J. Virol., July 15, 2007; 81(14): 7608 - 7619. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mitra-Kaushik, J. Cruz, L. J. Stern, F. A. Ennis, and M. Terajima Human Cytotoxic CD4+ T Cells Recognize HLA-DR1-Restricted Epitopes on Vaccinia Virus Proteins A24R and D1R Conserved among Poxviruses J. Immunol., July 15, 2007; 179(2): 1303 - 1312. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Tong, T. W. Tan, and S. Ranganathan Methods and protocols for prediction of immunogenic epitopes Brief Bioinform, March 1, 2007; 8(2): 96 - 108. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Tong, T. W. Tan, and S. Ranganathan In silico grouping of peptide/HLA class I complexes using structural interaction characteristics Bioinformatics, January 15, 2007; 23(2): 177 - 183. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Cohen, J. Tirindelli, M. Gomez-Chiarri, and D. Nacci Functional implications of Major Histocompatibility (MH) variation using estuarine fish populations Integr. Comp. Biol., December 1, 2006; 46(6): 1016 - 1029. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |