Abstract
“Classical” MHC class I (I-a) genes are extraordinarily polymorphic, but “nonclassical” MHC class I (I-b) genes are monomorphic or oligomorphic. Although diversifying (positive) Darwinian selection is thought to explain the origin and maintenance of MHC class I-a polymorphisms, genetic mechanisms underlying MHC class I-b evolution are uncertain. In one extreme model, MHC class I-b loci are derived by gene duplication from MHC class I-a alleles but rapidly drift into functional obsolescence and are eventually deleted. In this model, extant MHC class I-b genes are relatively young, tend to be dysfunctional or pseudogenic, and orthologies are restricted to close taxa. An alternative model proposed that the mouse MHC class I-b gene thymus leukemia Ag (TL) arose ∼100 million years ago, near the time of the mammalian radiation. To determine the mode of evolution of TL, we cloned TL from genomic DNA of 11 species of subfamily Murinae. Every sample we tested contained TL, suggesting this molecule has been maintained throughout murine evolution. The sequence similarity of TL orthologs ranged from 85–99% and was inversely proportional to taxonomic distance. The sequences showed high conservation throughout the entire extracellular domains with exceptional conservation in the putative Ag recognition site. Our results strengthen the hypotheses that TL has evolved a specialized function and represents an ancient MHC class I-b gene.
The origin and function of MHC-linked class I-b genes remains unresolved. One view suggests they derive from “classical” class I-a MHC genes (1), the chief presenters of Ag peptides to T cells. Class I-a molecules exhibit extraordinary polymorphism, apparently maintained by positive Darwinian selection within the Ag recognition site (ARS)4 for the ability to present diverse pathogen peptides (2). In contrast, class I-b genes are monomorphic or oligomorphic. Hughes and Nei (1) suggest that many mouse class I-b genes, including thymus leukemia Ags (TL), do not have human orthologs but rather were more closely related to mouse class I-a genes. They proposed that mouse class I-b genes were derived by gene duplication from mouse I-a alleles but then rapidly drifted into pseudogene status. Indeed, many class I-b genes are poorly expressed in most cell types, either because of low transcription or protein instability, and some are clearly pseudogenes (3). Moreover, some mouse class I-b genes such as Qa-2 (4) and H2-B1 (5) still appear to be more closely related to mouse class I-a genes than to any rat MHC gene. This process is consistent with Ohno’s (6) model of gene duplication, in which one duplicated copy is released from selection pressure and usually degenerates into a pseudogene.
Obata et al. (7) studied seven alleles of TL from inbred strains of Mus musculus and reported it is not closely related to other mouse class I genes. They suggested TL originated at or before the time of the mammalian radiation (∼100 million years ago (MYA)). Moreover, the ratio of nonsynonymous (dN) to synonymous (dS) substitutions in the putative ARS was <1 and not higher than for non-ARS residues in TL, consistent with negative or neutral selection (7). The possibility of negative (purifying) selection operating on TL, coupled with its apparently ancient origin, suggested that TL may have evolved a specialized and highly conserved function early in mammalian evolution.
TLs are encoded by certain MHC class I-b genes within the T subregion (8). They are ∼45-kDa cell surface glycoproteins noncovalently associated with β2-microglobulin (β2m) (3) and are expressed on activated T cells (9), developing thymocytes and small intestinal epithelium, and intraepithelial lymphocytes (10, 11) and certain leukemias (12). Cell surface expression of TL is TAP-independent (13, 14, 15). The observations that TL is expressed at a site enriched for γδ T cells and is oligomorphic led to the hypothesis that TL presents conserved Ags to γδ T cells in the gut (10, 16, 17). However, no peptides or motifs have been characterized (13, 18), leaving open the possibility that TL does not present peptide.
Unlike H2-M3 and Qa-1 (19, 20, 21), TL orthologs have not been reported in other species (22, 23). Obata et al. (7) suggested TL is not closely related to other mouse H2 genes but their phylogenetic analysis could not rule out the possibility that TL arose relatively recently from another mouse MHC class I gene. The “young TL ” model is consistent with the model of Hughes and Nei (1) that monomorphic MHC class I-b genes arise by duplication from MHC class I-a alleles. However, to achieve the degree of divergence noted by Obata (7), the young TL models require that a proto-TL gene would undergo rapid diversifying evolution (positive selection) for some new function, rather than degenerate under neutral evolution. In contrast, the “old TL ” model posits that TL is as old as it appears, that is, that it has evolved like a molecular clock. This model leaves unspecified the origin of the proto-TL gene itself, for example, whether it arose from an early mammalian MHC class I-a allele. These two models differ chiefly in the time of divergence from non-TL MHC genes, and thus the rapidity with which TL must have diverged from a putative MHC class I-a origin. To distinguish between these models and obtain direct evidence for the time of divergence, we collected TL sequences from the murine genera Mus and Rattus.
The Old World subfamily Murinae within family Muridae includes several lineages whose relationships are unclear. One lineage includes Rattus and its close allies, Tokudaia and Diplothrix (24). A second lineage includes Mus and, probably, Hybomys and Mastomys. These two lineages diverged between 14 and 40 MYA. (25, 26). The genus Mus consists of four subgenera, Mus, Nannomys (African pygmy mice), Pyromys (spiny mice), and Coelymys (shrew mice). These subgenera diverged ∼9 MYA (27).
We characterized the extracellular domains of 27 different TL sequences from 10 species of Mus, including 2 subspecies from M. musculus and 3 strains of Rattus norvegicus. These molecules show hyperconservation of the putative ARS, based on the ratio of dN and dS. The conserved orthology within Murinae strengthens the hypothesis that TL evolved a specialized function before the divergence of mice and rats and likely at the beginning of the mammalian radiation.
Materials and Methods
Cell lines and animals
Cell lines from Mus dunni, Mus abbotti, Mus setulosus, Mus minutoides, Mus platythrix, Mus shortridgei, and Mus pahari were obtained from Dr. S. Chattopadhyay (National Institutes of Health, Bethesda, MD). Specimens of Mus musculus praetextus, M. M. derived from animals captured in Faiyum and Giza, Egypt, were obtained from R. D. Sage (University of California, Berkeley, CA) (28). Cell lines were generated from primary tail cell fibroblasts of M. M. praetextus (Faiyum), and M. M. praetextus (Giza) using wild-type SV40 virus provided by Dr. J. Butel (Baylor College of Medicine, Houston, TX) using the technique of Lander and Chattopadhyay (29). Genomic DNA samples of Mus cookii and Mus caroli were provided by C. Kozak (National Institute of Allergy and Infectious Diseases, Bethesda, MD). BALB/cJ, C57BL/6J, and M. pahari/Ei were obtained from The Jackson Laboratory (Bar Harbor, ME). Outbred Sprague Dawley (Holtzman strain) rats were obtained from Harlan Breeders (Indianapolis, IN). Tissues were harvested surgically and stored at −80°C until use. DNA was extracted by phenol:chloroform from tissues or cell lines. DNA from an outbred Wistar rat (Harlan Breeders) and the Fischer rat-derived CREF cell line (30) were kindly provided by Drs. J. Rosen and S. Marriott, respectively (Baylor College of Medicine).
Isolation of TL genes by PCR
For genomic amplification, oligonucleotide primers were designed from conserved regions of known TL genes. Primers, exon 2–4 forward (5′-GTTCTGGGAGGAGGTCGGAGTCTCAC) and exon 2–4 reverse (5′-CATTGTTCTTTCTCATCCACATCATAAC) were used to generate an ∼3-kb product encoding the extracellular domain of TL under these conditions: initial denaturation at 94°C for 2 min, then 35 cycles of 94°C for 1 min, 60 ± 10°C for 1 min, 68°C for 3 min, and a final extension for 10 min at 68°C. Each reaction was optimized for Mg2+ concentration, which ranged from 1–5 μM, and temperature, using a Mastercycler gradient thermocycler (Brinkman Instruments, Hamburg, Germany). This pair of primers worked for all species of Mus except M. pahari. A 3′ primer for M. pahari (5′-CTGGGAAGGGAAGGGTAAGGACATGATGG) was complementary to a different region of intron 4. A low stringency search of the rat TRACE archive (http://www.ncbi.nlm.nih.gov/blast/mmtrace.html) demonstrated putative TL sequences. Intronic primers for putative rat TL were designed based on the sequences obtained for exons 2 (5′-GGCTCCCATCGGATTCCACG and 3′-GGCCTGAGTCCTGCTCCCTTCTTG), 3 (5′-GGAAACCTCCAGACCATGCTTG and 3′-GAGGAGGCTCCCATCGGATTCC), and 4 (5′-CAACTTCCACTCTTCTCCTC and 3′-CCATCACCATTATGAATCTGTC). The following primers were used to generate an ∼1150-bp fragment of the entire coding sequence of cytochrome b (cytb): a degenerate forward primer (5′-TYTYCWTYTTNGGTTTACAARAC) and reverse primer (5′-TGAAAAAYCATCGTTGT) specific to flanking tRNA sequences (S. Steppan, unpublished observations). A total of 200 ng of genomic DNA was amplified using platinum Pfx (Invitrogen, Carlsbad, CA).
Nomenclature
We followed a convention for naming MHC alleles in which the first two letters of the genus is followed by the first two letters of the species, followed by the gene name and four digits (31). The first two digits represent a major lineage and the last two represent subtypes. For example, the first TL gene described from Mus minutoides would be designated MumiTL1401 because it is the fourteenth major lineage of TL analyzed. Subtype digits were applied to newly described TL sequences based on phylogeny of exons 2 and 3 even though allelism could not be determined. New sequences were deposited in GenBank.5 An older nomenclature exists for TL of inbred mice (Table I⇓). TLag refers to the protein encoded by the TLaw1 locus (7); in our analysis the name is changed to MumuTL0302.
Revision of nomenclature of TL genes
DNA sequencing
PCR-amplified genomic DNA was cloned into pZero TOPO Blunt II (Invitrogen) plasmid and multiple clones were selected for DNA sequence analysis. DNA sequencing was performed by Lone Star Labs (Houston, TX) on the ABI Prism Automated DNA sequencer 377XL using Big Dye Terminator Ready Reaction Cycle Sequencing kit (Applied Biosystems, Foster City, CA). The cytb gene was sequenced with the PCR primers used in the PCR and with an internal forward primer (5′-CCCTAGTCGAATGAATTTGAGG) derived from a consensus region of Mus cytb. Overlapping primers were used to sequence the 3-kb fragment of TL. External primers (M13F and M13R) specific for the plasmid backbone were used initially. Exon 2 (5′-GTTCTGGGAGGAGGTCGGAGTCTCAC forward and 5′-TGGGGACAGACTCTTAGATTT reverse), exon 3 (5′-GTTTGGAGAATTCCTAGGGTGGGCGGG forward and 5′-CTGTTGTCACCTTTTAAAATTAAA reverse) and exon 4 (5′-TTTTATGTAACCTACTGGGGAAATTTGA forward and 5′-CTGGGAAGGGAAGGGTAAGGACATGATGG reverse) specific primers were used to determine exon sequences. Intron 3 is ∼1.8 kb. Intron 4-specific primers (5′-GAACAGAAAAAAGACACAGGAGTGCACAGG forward, 5′-CACATGTGTTTTTGGAGGATCTGAGGAGAAG reverse, and internal 5′-AGGAACATGAAGAGGCTGAACCTTGAG, 5′-ACMGWTAGAATCKCCACTTG, 5′-CCTTTCATCCTGAAGAGA) were also used to sequence the entire intron 3 of representative samples. Ambiguities were resequenced in the opposite direction or called manually.
dS and dN were calculated according to the method of Nei and Gojobori (32, 33) with the Synonymous/Nonsynonymous Analysis Program (SNAP): http://hiv-web.lanl.gov. SEs were calculated by the method of Ota and Nei (34). Amino acid residues implicated in the ARS of TL Ags (7) were predicted based on sequence alignment with HLA-A2 (35).
Phylogenic analysis
DNA sequences were aligned using MEGALIGN (DNAstar) and Clustal X software (36). We excluded insertions and deletions from our analysis. Trees were constructed using Clustal X and MEGA2 (37) by the neighbor-joining method (38). The overall significance of the branching pattern for each tree was estimated by bootstrapping (39) and by internal branch test (40). The murine MHC class I sequences used in this study were H2-Kb, H2-Kd, H2-Kf, H2-Kj, H2-Kk, H2-Kq, H2-Ks, H2-Kw28, H2-D2d, H2-Dd, H2-Df, H2-Dp, H2-Dr, H2-Ds, H2-B1, H2-M3, M3-spretus, H2-Q1, H2-Q4, H2-Q5, H2-Q7, H2-Qa-1, H2-T24, H2-T10, RT1.M3, RT1.BM1, RT1.P1, RT1.P2, RT1.A2n, RT1.A2h, RT1.Af, and RT1.Au. Peromyscus maniculatus sequences used were Pm13, Pm41, Pm52, Pm62, Pm11, and Pm53. cytb sequences used were Rattus rattus, R. norvegicus, Rattus argentiventer, Sigmodon hispidus, M. caroli, Mus cookii, Mus poschiavinus, Mus musculus domesticus, Mus musculoides, and Mus speciligus.
Statistical analysis of slopes of dN/dS
We used Student’s t statistic to compare the slopes of two lines from linear regression analysis, forcing the y-intercept to zero to place all the error into the slope. To test the significance of the difference between two slopes λ1 and λ2: where
and
where σx2 is the variance of the dS values and Ni is the number of pairwise comparisons (41). The degrees of freedom were Ns − 2 where Ns is the number of sequences. Letting degrees of freedom = Ns − 2 corrects for the partial nonindependence of multiple comparisons.
Results
Phylogeny of MHC class I-a and I-b genes
The MHC class I gene family based on representative sequences from human, rat, and mouse illustrates the relationships of MHC class I-a and I-b genes (Fig. 1⇓). MHC class I-b genes distantly related to the MHC (CD1d, HFE, and FcRn) have orthologs in human, rat, and mouse. T23/Qa-1 has a rat ortholog (RT1.BM1) but HLA-E is apparently not orthologous even though these molecules appear to have a similar functions (42).
The TL genes represent a divergent multigene family. Exon 2 sequences from human, rat, and mouse MHC class I genes were aligned and used to construct dendrograms by the neighbor-joining method. The dendrogram is based on nucleotide substitution per site calculated by Jukes and Cantor method (35 ). A chicken MHC class I sequence was used as an outgroup. Parenthetical citations designate the origin of the orthologous genes: m = mouse, r = rat, and h = human. An asterisk indicates that only two A-strain genes are used for this dendrogram, TLaa1 and TLaa4, a newly described gene from A-strain derived ASL.1 cell line (12 ). TLaa2 and TLaa3 exon 2 sequences are not published. A newer nomenclature for TL genes is given after the older name. Bootstrap values and interior branch tests were calculated by MEGA2 with 1,000 replications and values >50% are shown.
The seven TL sequences available before this study fall within a significant clade (100%) that is on a long branch representing ∼20–25% divergence from the nearest non-TL neighbor. These results are concordant with those of Obata et al. (7) in which the TL clade lies outside a cluster containing other murine and human MHC-linked class I genes. The TL sequences share between 92 and 99% sequence identity at the nucleotide level. TL genes are tightly clustered and fall into at least four different groups (Fig. 1⇑, inset): A-strain sequences, T3 sequences, T18-like sequences, and Tlaf (strain 129) sequences. The putative rat TL-like genes (23), RT1.P1 and RT1.P2, are outside of the TL family and do not represent rat orthologs of mouse TL.
Isolation of cytb genomic sequences from mice and rats
We determined the entire sequence of cytb to confirm sample identity and to create an independent phylogeny among specimens sampled (Fig. 2⇓) for comparison with previously published sequences within the family Muridae (43).
cytb orthologs are highly conserved within the subfamily Murinae. A total of 1,140 nucleotides of cytb were aligned and used to create dendrograms as previously described. A single asterisk indicates GenBank sequences cytb sequence obtained from GenBank. A double asterisk indicates obtained from GenBank and confirmed by our analysis. A S. hispidus (cotton rat) cytb sequence serves as an outgroup.
This analysis revealed only one discrepancy with other phylogenetic analyses based on molecular and morphological criteria (44). Thus, Rattus and Mus formed distinct clades with high bootstrap values (92%). All samples assigned M. musculus fell into a significant clade. The four subgenera (Mus, Coelomys, Nannomys, and Pyromys) of Mus were supported. The exceptional sample was M. shortridgei. This species has been included within the subgenus Pyromys on morphological grounds (44), but multiple sequences (cytb, TL, and H-2 M3) (C. Doyle, R. Rich, R. Cook, and J. Rodgers) manuscript in preparation) suggest that our sample of M. shortridgei is more closely allied with Mus Coelomys pahari (98%).
The TL gene family is well conserved in mice and rats
Using a 5′ primer from intron 2 and a 3′ reverse primer from intron 4 of T18d, we cloned segments encoding the extracellular domains of putative TL sequences from Mus species. No specific PCR products were generated with these primers from R. norvegicus (Sprague Dawley). We screened the rat TRACE archive and rat genome database under relaxed stringency with exons 2–4 from T18d as probes. This screen revealed putative TL sequences for each exon. 5′ and 3′ primers complementary to flanking introns were made for exons 2–4 individually and these permitted isolation of TL from three strains of rats (Sprague Dawley, Wistar, and Fischer). RT-PCR analysis confirmed that these exons were derived from a single gene and not isolated fragments (data not shown).
A neighbor-joining tree of the coding region of exons 2 and 3 for 33 different sequences (Fig. 3⇓A) reveals a single TL cluster encompassing all 33 sequences with high bootstrap value (100%) when compared with the outgroups, M. M. domesticus and R. norvegicus class I-a and I-b genes. The shallowness of the TL clade in Fig. 3⇓A, inset reflects the well-conserved nature of TL orthologs. Rattus TL (RanoTL1901) is the most divergent TL gene from any mouse gene but the coding regions of all three strains of rats tested are identical. As expected, it has a degree of divergence (∼85%) similar to other divergences among MHC orthologs between these genera (45, 46). Sequences from close relatives of laboratory mice form a highly significant (100%) cluster with T3, T18, Tlaf, and A-strain sequences. As with the cytb tree, the four Mus subgenera are supported, with the exception of M. Pyromys shortridgei noted before.
Orthologs of TL exist outside of M. musculus. A, Exon 2 to 3 coding sequences were created from genomic DNA sequence based on alignment. These sequences were used to create dendrograms. B, Exon 4 sequences from contiguous DNA segments were used to create a dendrogram. Dendrograms were based on the nucleotide sequence beginning at position +3 to maintain the correct reading frame. H2-K/D/Q/T and RT1 complex exons were used as outgroups Interior branch test values are shown. A double dashed line indicates an artificial compression of the scale introduced to minimize the dendrogram. a, Sequences from a M. M. praetextus cell line derived from an animal caught near Giza, Egypt. b, Sequences from a M. M. praetextus cell line derived from an animal caught near Faiyum, Egypt.
When we constructed phylogenic trees based on the α3 region (Fig. 3⇑B), the total branch length of the tree was reduced. This may reflect the functional constraints of binding to β2m and CD8 coreceptors or might reflect homogenization of this exon (2). However, the α3 dendrograms still showed that TL sequences form a separate branch with significant interior branch strength (97%) (the bootstrap values were 32%, data not shown), although locus specificity is not as great as in exons 2 and 3. The M. shortridgei sequences (MushTL1801 and 1802) grouped significantly away from TL family members. This suggests that M. shortridgei has exchanged exon 4 with another MHC class I gene. Our overall phylogenetic data demonstrate that TL orthologs exist outside of laboratory strains of mice and are most easily defined by exons 2 and 3 and not exon 4.
dN and dS in TL orthologs
Both MHC class I-a and II genes are highly polymorphic and show evidence of positive selection in the ARS relative to the non-ARS residues of exons 2 and 3 (2, 47). In contrast, some MHC class I-b genes showed reduced or absent positive selection in the ARS (1). To determine whether the ARS of TL has been under negative selection, we calculated dN and dS values for ARS and non-ARS residues and compared the ratios of each region within species and across species (see Table II⇓ and Fig. 4⇓). The mean dN (2.2 ± 0.9) from the ARS of TL orthologs was significantly lower (p < 0.05) than mean dS values (17.8 ± 3.9) (Table II⇓) in the ARS. In contrast, the mean dN (23.0 ± 3.0) was significantly higher (p < 0.05) than mean dS (15.6 ± 3.3) in the MHC class I-a ARS, and the mean dN (23.0 ± 3.0) was significantly higher (p < 0.05) in the ARS than in the non-ARS and exon 4 (6.6 ± 1.2 and 5.1 ± 1.1, respectively).
The TL gene family has evolved under negative selection. dN and dS were calculated from pairwise comparisons of all TL orthologs (□) and MHC class I-a (♦) used in our study and plotted as a scatter diagram. Rates were calculated for A, ARS; B, non-ARS; and C, exon 4. Slopes were calculated by linear regression. The slope of the linear regression of TL ARS is significantly lower that the slope of the Non-ARS and exon 4; p < 0.005.
Mean ± SE nucleotide substitutions/100 synonymous and nonsynonymous sites in comparison among murid TL genes
Within exon 4 of MHC class I-a genes the ratio of mean dN to mean dS (1.4 ± 0.02) is significantly (p < 0.05) greater than the ratio in the non-ARS (0.58 ± 0.03). This is mainly due to a depressed dS value (see Table II⇑) that may result from homogenization of this exon within the mouse MHC class I-a genes (2). The dN:dS ratio for TL in exon 4 is similar to the dN:dS ratio in the non-ARS of TL but significantly (p < 0.05) lower than the dN:dS ratio of MHC class I-a exon 4. Nonetheless, exon 4 of TL orthologs appears to have an evolutionary history different from that of exon 4 of MHC class I-a genes. This might reflect a lack of homogenization of exon 4 of TL orthologs with other MHC class I genes, with the exception of M. Pyromys shortridgei.
The fact that the dN:dS ratio of the ARS in TL are significantly lower than the dN:dS ratio in the non-ARS and exon 4 suggested that the ARS is hyperconserved relative to other regions of the molecule. To correct for the high variance in dS due to including sequences from both close and distant taxa, we plotted the dN vs dS values for all pairwise comparisons of TL (Fig. 4⇑). The slope of TL ARS (m = 0.121 ± 0.003) is significantly less than the slope of the non-ARS (m = 0.300 ± 0.005, p < 0.005) and exon 4 (m = 0.270 ± 0.006, p < 0.005). Thus, the ARS of TL is hyperconserved relative to the non-ARS and exon 4.
Characterization of TL orthologs from the genera Mus and Rattus
A striking feature of the TL sequences is the highly conserved nature of the extracellular domains, especially the ARS. All TL orthologs contain the four conserved cysteine residues needed to form the two intramolecular disulfide bonds that are necessary for MHC class I structure (48). A majority of TL orthologs also contain N-linked glycosylation motifs (NXS/T, X≠P) (49) at position 86 and 90. Two natural variants occur: A-strain alleles have a N86S substitution that destroys the glycosylation site and Rattus TL (RanoTL1901) has only the N86 glycosylation site (Fig. 5⇓). All TL orthologs that have a glycosylation motif at N86 use the less efficient (20%) and rarely used motif, NLS (49). Almost all other MHC class I molecules use NQS (data not shown) as a recognition motif which is more efficiently glycosylated (49).
Predicted amino acid sequences of TL orthologs. A majority sequence of our data set is shown above the samples but does not represent a natural sequence. Identity is indicated by a dash. Dots represent deletions and an asterisk represents a stop codon. a and b, Sequences from two different M. M. praetextus cell lines (see Fig. 3⇑). ∧ represents residues that are rarely found in other MHC class I molecules. Conserved cysteines are indicated by bold lettering. Glycosylation sites are underlined. Only one rat TL sequence (RanoTL1901) is shown. A, Predicted α1 domain; B, predicted α2 domain; and C, predicted α3 domain.
The MHC class I-a ARS, based on HLA-A2, contains six pockets (A–F) that anchor individual residues of an extended peptide (35). Pocket A includes 10 residues that anchor the N terminus, of which three tyrosines are conserved in both TL and class I-a molecules. Pocket F seals the C terminus of the peptide (50). All of the anchoring and sealing positions conserved in class I-a molecules are also conserved in TL (Table III⇓), suggesting that if TL binds peptides they could be 8–10 aa in length. The internal pockets of TL sequences show slight variability within the TL gene family. Molecular modeling suggests that the ARS of TL is not as occluded or hydrophobic as the ARS of CD1 molecules (51), thus it seems unlikely that TL would bind lipids.
ARS Composition
Obata et al. (7) described 17 residues that are unique to seven TL sequences in laboratory mouse strains and that are rarely found in other MHC class I molecules. Of these, 13 are in the ARS and 4 are in the non-ARS. Within the ARS, six of these TL signatures are conserved in our data set: L155 and K165 are not found in 200 other mammalian MHC class I sequences. A61, E65, F73, and M81 are only found in one other sequence (PemaT24 (52), T24d, H2-M9, and H2-M10, respectively). Several other signatures in TL exist outside of the ARS region. Residues 11–13 (Ala, Leu, Ser) are highly conserved in TL sequences but are also found in several H2-M3 and Qa-1 orthologs.
Table III⇑ shows the limited variability of the TL ARS. Sixteen of 57 residues are variable in the ARS. Of the 16 variable residues, 5 are conservative changes (L82F, L169F, Y22F, R62K, and V67I), 8 have infrequent but nonconservative changes (24, 61, 69, 116, 145, 156, 163, and 166), and 3 residues (76, 149, and 150) in the ARS are frequently and nonconservatively variable. Nine residues within the ARS are invariant and conserved in other mammalian and avian MHC class I molecules (1) and might represent residues involved in maintaining the structure of the ARS. Of these, all TL molecules have a nonconservative V165K substitution.
Discussion
We describe 27 new TL sequences in 11 species of Murinae and show that they have limited variability. Our data support the model of Obata et al. (7), based on a limited data set, n = 7 in one species, which suggested TL is ancient and thus should exist in distantly related genera. They do not support the model of Hughes and Nei (1) and Rogers (53) which suggests that mouse MHC class I-b genes arise from MHC class I-a genes and rapidly evolve toward pseudogeny.
Our data show that TL genes arose before the split of Rattus and Mus and thus have been retained for at least 30 million years. The estimated time of divergence between mice and rats remains controversial. The fossil record dates the separation between 12 and 14 MYA (54). Molecular studies based on different genes estimate the divergence time of Rattus and Mus to between 20 and 40 MYA (25, 26, 55). This range is attributable to different approaches and “calibration” times (26) based on the divergence of birds and mammals. Extrapolating from our expanded data set of the TL gene family, we estimate TL to have diverged from other mammalian MHC class I genes ∼100 MYA. This estimated divergence time, at or before the time of the mammalian radiation, suggests that TL was present in ancestral mammals. We do not find a TL ortholog in the human genome, suggesting that TL was lost in this species.
Our data also support the hypothesis of multiple ancient TL genes in Mus. As seen in Fig. 3⇑A, the percentage of divergence (Jukes-Cantor distance) is as great for different loci of TL (T3 and T18) within the same species as the percentage of divergence for orthologs from different species. The amount of diversity between T3 and T18 in the ARS and non-ARS is due to an increase in dS (0.03 and 0.05, respectively) relative to the dN (0.0 and 0.02, respectively). Our extensive data set modifies the original model of Obata et al. (7) by demonstrating that the TL gene family is well conserved between mice and rats and contains multiple sequences that are ancient.
TL gene family members showed no signs of positive selection in the ARS. In particular, the slopes of dN/dS for the ARS were significantly lower, approximately one-third, than those of non-ARS and exon 4. Thus, the ARS is hyperconserved in TL sequences. These data suggest that the ARS of TL has been functionally conserved for at least 30 million years. This region may interact with a conserved ligand such as a TAP-independent peptide, or another protein such as a NK cell receptor-like molecule.
The functions of most MHC class I-b molecules are poorly defined. Some are obviously pseudogenes but others have been shown to contribute to host defense and nonimmune functions (3). The biological role of TL is unknown but the restricted expression pattern of TL suggests a role in intestinal immunology and/or T cell effector function. TL transcripts (α1–α3) are expressed in the small intestine of both M. pahari/Ei and Rattus (data not shown). The conserved expression pattern of TL orthologs strengthens the hypothesis that TL has a biological role in the mucosal immunity.
T18d was shown to bind CD8αα homodimers expressed on intestinal intraepithelial lymphocyte with greater affinity than it binds CDαβ heterodimers (56). The affinity of T18d for CD8αα was 10-fold higher than that of H-2Kb for CD8αα. T18d and H2-Kb probably bind to CD8αα analogously (57). H2-Kb complexed with CD8αα revealed two key contact regions of H2-Kb, the AB and CD loops (58). Only the AB loop differs considerably in T18d: H2-Kb has PEDK (195–198) while T18d has PEGY. Two major motifs are seen in AB loop of Mus in TL sequences: PEGY (T18-like) and PEGD (T3-like). TL Ags that contain either motif are expressed in the small intestine (10, 11). If preferential binding to CD8αα is necessary for TL function, our data suggest some plasticity in the interaction with the CD8αα homodimers. Alternatively, the TL family members have multiple and complementary functions such that only TL molecules that contain PEGY interact with CD8αα homodimers preferentially while other TL loci perform another function.
Another functional signature of TL is its cell surface expression in the absence of TAP2 (14). Cells lacking functional TAP show a marked decrease in MHC class I-a expression because peptides are limiting (59). Due to the hyperconserved nature of the ARS of TL, TL binds a relatively invariant and TAP-independent Ag or no peptide at all. No informative Ag-elution studies of TL Ags have described specific peptides or motifs (our unpublished observations) (13, 18).
An alternative hypothesis suggests that TL molecules bypass the quality control mechanisms of the peptide loading complex in the endoplasmic reticulum that normally restrict MHC class I-a maturation. Calreticulin, an endoplasmic reticulum-resident chaparone, interacts with sugar moieties on N86 glycosylation site of H2-Ld (60). Most TL molecules have two glycosylation sites in close proximity, N86 and N90, but are only monoglycosylated (data not shown). Two natural variants occur. A-strain alleles have a N86S substitution that ablates the first glycosylation site. Rattus TL molecules contain only the first glycosylation site at N86 which is the poor acceptor motif, NLS. Another quality control interaction involves the TAP-tapasin-calnexin complex. Residues 128–137 in α2 of H-2Ld have been implicated in binding this complex. This region is highly conserved in TL gene family members and MHC class I genes. There is a A136V substitution in several sequences across species boundaries in TL gene family members.
Cell surface expression of TL, like that of other MHC class I molecules, depends on β2m association (14). A total of 82 different MHC class I molecules were shown to have 19 residues that make 44 contacts to 18 β2m residues (61). Of these, 37 contact points were conserved >90%. Within laboratory strains of mice, TL molecules maintain 89% conservation of β2m-contact residues. In the larger set of 34 TL sequences, 74% of the residues are conserved. Conservation of specific contact residues suggests that all TL gene family members require β2m.
The evolution of the MHC class I genes in the mouse is characterized by a birth and death process (62). Gene duplications are thought to generate novel protein functions but little is known about the selective pressures governing this process (63). Ohno (6) hypothesized that once a gene duplicated, one copy was freed from selective pressures to drift and either assume novel function or be lost. An alternative hypothesis suggests that the selective pressures are lessened on both copies, allowing “subfunctionalization” (64). Alternatively, “complementation” occurs such that both genes complement separate functions (64). The TL gene family provides an opportunity to study the selective pressures governing a multigene family in the MHC during the past 30–100 million years.
Acknowledgments
We thank D. Brake for technical assistance and C. Doyle and J. Levitt for useful discussions.
Footnotes
-
↵1 This work was supported by the National Institutes of Health Grants RO1 AI30036 and AI18882 (to R.R.R.) and RO1 AI17897 (to R.G.C. and J.R.R.).
-
↵2 Emory University School of Medicine, Atlanta, GA 30322.
-
↵3 Address correspondence and reprint requests to Dr. John R. Rodgers, Department of Immunology, Baylor College of Medicine, One Baylor Plaza Room M929, Houston, TX 77030. E-mail address: jrodgers{at}bcm.tmc.edu
-
↵4 Abbreviations used in this paper: ARS, Ag recognition site; TL, thymus leukemia Ag; MYA, million years ago; dN, rate of nonsynonymous substitution; dS, rate of synonymous substitution; β2m, β2-microglobulin; cytb, cytochrome b.
-
↵5 The sequences have been deposited in GenBank under accession nos. AY144125–AY144179.
- Received August 26, 2002.
- Accepted October 15, 2002.
- Copyright © 2002 by The American Association of Immunologists