Abstract
The mouse MHC class I-b molecule H2-M3 has unique specificity for N-formyl peptides, derived from bacteria (and mitochondria), and is thus a pathogen-associated molecular pattern recognition receptor (PRR). To test whether M3 was selected for this PRR function, we studied M3 sequences from diverse murid species of murine genera Mus, Rattus, Apodemus, Diplothrix, Hybomys, Mastomys, and Tokudaia and of sigmodontine genera Sigmodon and Peromyscus. We found that M3 is highly conserved, and the 10 residues coordinating the N-formyl group are almost invariant. The ratio of nonsynonymous and synonymous substitution rates suggests the Ag recognition site of M3, unlike the Ag recognition site of class I-a molecules, is under strong negative (purifying) selection and has been for at least 50–65 million years. Consistent with this, M3 α1α2 domains from Rattus norvegicus and Sigmodon hispidus and from the “null” allele H2-M3b specifically bound N-formyl peptides. The pattern of nucleotide substitution in M3 suggests M3 arose rapidly from murid I-a precursors by an evolutionary leap (“saltation”), perhaps involving intense selective pressure from bacterial pathogens. Alternatively, M3 arose more slowly but prior to the radiation of eutherian (placental) mammals. Older dates for the emergence of M3, and the accepted antiquity of CD1, suggest that primordial class I MHC molecules could have evolved originally as monomorphic PRR, presenting pathogen-associated molecular patterns. Such MHC PRR molecules could have been preadaptations for the evolution of acquired immunity during the early vertebrate radiation.
The mouse class I-b molecule H2-M3 preferentially binds N-formyl peptides (1, 2), pathogen-associated molecular patterns (PAMP)4 (3) also recognized by neutrophil chemotactic receptors (4). Thus, M3 is a pattern-recognition receptor (PRR). In this respect, M3 resembles CD1, which presents mycobacterial waxy lipids to T cells (5). M3 may be important for protection against intracellular bacteria (6). Indeed, M3-restricted CTL are protective in experimental infections by the intracellular pathogen Listeria monocytogenes (7, 8, 9). The laboratory of Fischer Lindahl and colleagues (10) showed Norway rats have a gene nearly identical to H2-M3, suggesting that M3 has been conserved since the rat/mouse divergence ∼14–40 million years ago (MYA).
Like other class I-b genes, M3 is virtually monomorphic in Mus musculus (11). A minor allele, M3b, has been considered null because it does not restrict lysis by known M3-specific CTL (11); null activity was mapped to a Leu95Gln substitution in the α2 domain (12). In contrast to the minimal oligomorphism of class I-b genes (6), class I-a genes are extremely polymorphic, allowing presentation of diverse intracellular Ags to T cells (13). Polymorphism is pronounced especially in the Ag recognition site (ARS) and is thought to be generated through diversifying (positive) selection (13) evidenced by a high ratio of nonsynonymous to synonymous substitutions in the ARS (14).
The paucity of I-b orthologs shared among species of different taxonomic orders led to the hypothesis that I-b genes are relatively young, formed by duplications of class I-a genes (15). Such duplicates are often redundant and may drift rapidly under neutral selection towards pseudogeny (15). Functional divergence of gene duplicates (16) probably requires positive selection (17). Phylogenetic analyses suggest that many mouse class I-b genes, such as Qa-2 (15) and H2-B1 (our unpublished observations) arose since the rat/mouse divergence from duplications of class I-a genes. Because M3 has been unknown outside the murine genera Rattus and Mus, it may also have evolved from murine or murid I-a genes. A contrasting model, similar to one proposed for H2-TL (18, 19), suggests that M3 arose before the mammalian radiation.
The unique ligand specificity of M3 makes it especially interesting as a model to study how MHC specificities evolve, presumably in response to pathogen or other immune pressure. The mechanism of N-formyl specificity in M3 is well-studied (11). The crystal structure of M3a (20) indicated 10 amino acids coordinated N-formyl specificity, with a key contribution from histidine in position 9 (His9). Five of these residues are rarely found in other class I molecules. However, our unpublished studies in which we transplanted these residues between M3 and H2-Kb suggested other residues are required to achieve N-formyl specificity. Moreover, the differences between M3 and murine class I-a molecules are not concentrated in the ARS but are spread throughout the α1 and α2 domains. This raises the question of how M3 evolved from a class I-a gene. Identification of M3 orthologs from related species should identify transitional forms, or “missing links,” between murine M3 and I-a genes.
The phylogeny of Muridae, the largest extant rodent family, has been studied intensively. Four subgenera of Mus (Coelomys, shrew mice; Mus; Nannomys, African pygmy mice; and Pyromys, spiny mice) diverged ∼9 MYA (21). The Old World subfamily Murinae includes Mus and Rattus, which diverged 14–40 MYA (22, 23). Two genera of the New World subfamily Sigmodontinae, Sigmodon (cotton rats) and Peromyscus (deer mice), diverged ∼12 MYA (24). Subfamilies Murinae and Sigmodontinae diverged 50–65 MYA (22). We isolated 43 unique M3 sequences from 22 species of these two subfamilies. We tested these for evidence of selective pressures, and the most disparate members for N-formyl specificity. Finally, we asked whether the origins of M3 from class I-a genes could be discerned by phylogenetic analysis.
Materials and Methods
Genomic DNA templates for PCR
Genomic DNA was isolated from tissue samples or cell lines. Cell lines from Mus dunni, Mus macedonicus and Mus spicilegus, Mus Coelomys pahari, M. Nannomys minutoides and M. Nannomys setulosus, M. Pyromys platythrix and M. Pyromys shortridgei were from Dr. S. Chattopadhyay (National Institutes of Health, Bethesda, MD). Cell lines of Mus praetextus (mice from R. Sage; Ref.25) and B10.CAS2 (bearing H2-M3b) were made as described (26, 27). Rattus norvegicus DNA was from outbred Holtzman Sprague-Dawley (Harlan Breeders, Indianapolis, IN) and outbred Wistar rats (Harlan Breeders) and from the Fischer strain CREF cell line (28). DNA from African murine species, Hybomys uvivittatus (Eastern Black-striped field mouse), Mastomys (Praomys) natalensis (African multimammate rat) and Nannomys gratus were provided by P. d’Eustachio (New York University School of Medicine, New York, NY) (29). Murine species Apodemus agrarius (Asian Black-striped field mouse), Diplothrix legata (Ryukyu long-tailed giant rat), Rattus tanezumi (Sladen’s rat), and Tokudaia osimensis (Ryukyu spiny rat) DNA were from H. Suzuki (Hokkaido University, Sapporo, Japan; Ref.30). Peromyscus attwateri, Peromyscus leucopus, and Peromyscus pectoralis DNA were from R. Pfau (Tarleton State University, Stephenville, TX). A Sigmodon hispidus tissue sample was from P. Wyde (Baylor College of Medicine, Houston, TX). We authenticated every sample by sequencing cytochrome b (cytb) and phylogenetic comparison with published sequences. cytb primers were 5′-TYT YCW TYT TNG GTT TAC AAR AC (forward) (where Y = C + T, W = A + T, N = A + T + G + C, R = A + G) and 5′-TGA AAA AYC ATC GTT GT (reverse). These modifications of Ref.31 were suggested by S. J. Steppan (Florida State University, Tallahassee, FL).
High fidelity PCR
High fidelity PCR used the Invitrogen (Carlsbad, CA) Platinum Pfx kit and manufacturer’s protocol for 35 cycles with a 30 s melt at 94°C, 30 s of annealing with optimized temperature (usually 55°C), and 1 min per kilobase for extension time at 68°C and MgSO4 usually at 1 mM. Most primers were within introns. Primers for exons 2 and 3 introduced SalI and HindIII for 5′ and 3′ ends, respectively (exon 2) and HindIII and XhoI (exon 3). Mus-specific intron primers were based on H2-M3b (32). Exon 2 primers were 5′-GTC GAC CAA TGC TTG TTC ACT GGC CC (forward) and 5′-AAG CTT TGG ACC TAA ACT GAA AGT GA (reverse); exon 3 primers were 5′-AAG CTT TCA CTT TCA GTT TAG GTC CA (forward) and 5′-CTC GAG TGG TTC CTA GTT GTT CCT CA (reverse); rat-specific primers were based on putative rat M3 intron sequences (National Center for Biotechnology Information database, Accession no. AC112568). Rat-specific primers for exon 2 were 5′-GTC GAC GGT TAT CAG TGA AGG GTT (forward) and 5′-AAG CTT GGC TAA TCT AGC TTA GCA GTA (reverse); exon 3 primers were 5′-AAG CTT TGG TTT CAC TTT CAG TTT (forward) and 5′-CTC GAG CCC AGA CAA CAA GCC TCA CTT (reverse); M3 sequences from non-Mus species were obtained initially using a degenerate forward primer based on an alignment of exon 1 in Mus and Rattus M3: 5′-GGT CKC TYT GGC TGT TA. The reverse primer was based on a similar region at the beginning of exon 4: 5′-CAC ATG TGC CTT TGG GGG AT. The S. hispidus sequence allowed us to design intronic primers to clone S. hispidus M3 into our expression vector. The intronic primers to amplify exon 2 were 5′-GTC GAC GCC CAG GTT CTT GGA GGA A (forward) and 5′-AAG CTT GGA CAT GTG GGA GTT (reverse); exon 3 primers were 5′-AAG CTT GAT CCA AAC CTG GCA GAT (forward) and 5′-CTC GAG CCT AAG GTT GAG GGA TTT (reverse). PCR using these primers discovered S. hispidus M3 not found with the original primer set.
Cloning PCR products
PCR products of the expected size were gel-purified using the Qiagen (Valencia, CA) Qiaquick Gel Extraction kit and eluted products cloned with the Zero Blunt TOPO PCR Cloning kit (Invitrogen). Plasmids were sequenced by Lone Star Labs (Houston, TX) or the DNA sequencing core facility (Baylor College of Medicine) using an Applied Biosystems (Foster City, CA) ABI PRISM 377 DNA Sequencer. Most sequences were confirmed from independent clones and/or sequencing in the reverse direction. M3 and cytb sequences have been deposited at GenBank with Accession nos. AY263509-AY263623.
Phylogenetic trees
A well-aligned database of exons 2 and 3 of 160 class I genes, including the new M3 sequences, sampled four murid subfamilies, diverse mammalian orders, and five vertebrate classes. The transitional/transversional bias was 0.7 in the exon 2 and 3 data set. It was aligned using the Clustal method (33) by MegAlign (DNAstar, Madison, WI) and adjusted manually with BioEdit (34) to maintain codon alignments. Distant class I sequences such as CD1 and FcRN were excluded as they were difficult to align and often generated “long branch” effects. None of these exclusions affected the conclusions described. The databases included the new M3 genes and five M3 sequences from the National Center for Biotechnology Information database: M3a (C57BL/6, U18797); M3b (Mus musculus castaneous, M62844); M3f (B10.SHH, L36074); M3sp (Mus spretus, L36072), and RT1-M3 (Wistar, AJ249342). Trees were built in Clustal X (35) and MEGA2 version 2.1 (http://watson.hgen.pitt.edu) (36) using neighbor-joining (NJ), minimum evolution, and unweighted pair-group method with arithmetic means (UPGMA) algorithms, with significance estimated from 1000 bootstrap trials. Maximum parsimony methods were run with 100 bootstraps. Using the Tamura-Nei substitution model, the γ parameter was varied from 0.2 to 3.5 (37) without affecting our conclusions. The inclusion of long branch murid class I-b genes, such as M10, Qa-1, and TL, had no effect on the relevant conclusions. Divergence times were estimated by linear regression of genetic distance vs multiple reference divergence times.
Comparisons of genetic distances
Unless otherwise noted, genetic distances used the Tamura-Nei NJ method, γ parameter = 2. We tested a variety of distances estimating algorithms and parameters. To avoid overweighting heavily represented rodent genera in calculating genetic distances using monosubstitution models (e.g., Jukes-Cantor), between-group distances were averaged from between-genera distances. For example, Mus M3 vs nonmurid eutherian I-a distances were averaged as a single observation. Assuming constant rates of evolution, we used Tajima’s (38) relative rate test to assess which of three sequences was the outgroup. Thirty-five sets of the sequences were selected randomly and analyzed in MEGA2. χ2 values were summed and significance was tested with 35 degrees of freedom.
Nonsynonymous and synonymous substitution rates
Pair-wise nonsynonymous (dN) and synonymous (dS) substitution rates (39) in the ARS and NARS were calculated using SNAP (http://hiv-web.lanl.govl) (40). The ARS, based on HLA-A2 (41), was residues 5, 7, 9, 22, 24, 26, 57–59, 61–77, 80–82, 84, 95, 97, 99, 114, 116, 143, 145–147, 149–152, 154–159, 161–163, 165–167, 169, 171 where residue 1 is the first Gly of α1. Class I-a controls were from pairwise comparison of Db, Dd, Df, Dk, Dp, Ds, Dx, Kd, Kdv, Kf, Kj, Kk, Ks, Ku, RT1.A1°, RT1.A1b, RT1.A1c, RT1.A1f, RT1.A1h, RT1.A1k, RT1.A1n, RT1.A1q, RT1.A2°, RT1.A2b, RT1.A2c, RT1.A2n, and RT1.A2q (all available from the National Center for Biotechnology Information).
When multiple taxa are studied, variation in time of divergence has a heavy impact on variation in substitution rates (42). Therefore, we regressed (one parameter) the rate of nonsynonymous substitution (dN) on the rate of synonymous substitution (dS) and compared the slopes (43) to the value of 1 expected under neutral selection using where m is the slope, μ = 1, s is the SD of the one-parameter slope and n is the number of sequences (not the number of comparisons).
To test for hyperconservation of the 10 residues coordinating N-formyl binding, the dN/dS ratio of those residues was compared to the non-ARS (NARS) ratio using and R is the ratio, ER its relative error and ςN and ςS are the SD of dN and dS, respectively.
Construction of M3-Ld chimeric expression vectors
Correct ligation of exons 2 and 3 of M3 in an H2-Ldα3 expression vector (44), which carries the epitope for mAb 28-14-8S (HB-27; American Type Culture Collection, Manassas, VA), was confirmed by sequencing. Because the null CTL activity of M3b mapped to a Leu95Gln substitution (12), only exon 3 of M3b was inserted into pM3aLd to test for M3b N-formyl specificity.
Functional assay of N-formyl peptide specificity
DAP-3 cells (45), cotransfected with pSVneo and pM3-Ld in Lipofectamine Plus (Invitrogen), were selected in 750 μg/ml geneticin (Invitrogen) and sorted with a Beckman-Coulter (Fullerton, CA) Altra flow cytometer after overnight culture at 37°C with N-formyl-MLF-benzylamide (Sigma-Aldrich, St. Louis, MO) and staining with mAb 28-14-8S. pM3Ld transfectants were cultured overnight with 20 μM peptide or DMSO vehicle before staining with 28-14-8S and FITC-goat anti-mouse Ig (Baxter, Mundelein, IL) (46, 47). Cells fixed in 1% paraformaldehyde were analyzed on an EPICS XL-MCL flow cytometer (Beckman-Coulter, Fullerton, CA) with Beckman-Coulter System II version 3.0 software. Peptides (Baylor College of Medicine Protein Core Facility) from L. monocytogenes (fMIVIL), or influenza hemagglutinin (fHA; fMLIIW) (48) and their nonformyl forms were dissolved in DMSO.
Results
Confirmation of sample identity using cytb
To confirm sample identity, we constructed phylogenetic trees from sample cytb sequences (Fig. 1⇓) and published sequences. cytb sequences for M. Nannomys gratus and R. tanezumi were not published. As expected, M. N. gratus and R. tanezumi cytb clustered significantly (with high bootstrap values) with other Mus Nannomys and Rattus sequences, respectively. M. P. shortridgei clustered with M. C. pahari (98% of bootstraps) rather than with other Pyromys (49), consistent with our results for cytb and TL (19), suggesting Pyromys might be polyphyletic. Rattus and Diplothrix sequences formed a single cluster with 100% bootstrap values. All samples of subfamilies Murinae and Sigmodontinae clustered appropriately.
Phylogeny of cytb. ∗∗, Six cytb sequences previously reported by this laboratory. Underlined sequences were determined for this study. The accession number indicates GenBank sequences. Percent (%) values represent the percentage of bootstrap trials supporting the branch; only values >50% are shown.
Sequence analysis of M3 in murid rodents
From 22 murid species we sequenced 43 distinct genes with high homology to H2-M3 in exons 2 and 3. These genes were not well-differentiated from class I-a-like sequences in exon 4 (data not shown). To confirm orthology with M3, we constructed phylogenetic trees of exons 2 and 3 with candidate sequences, five known M3, and over 100 other class I genes. All M3 candidates were isolated together (>98% of bootstraps) in all tree-forming methods used (Fig. 2⇓, A and B), justifying their classification as “M3”. The M3 species tree (Fig. 2⇓C) resembled that of cytb, except for T. osimensis, which has one copy of M3 very similar to that of A. agrarius, as expected (30), and one highly divergent copy.
Phylogeny of M3. Sequences for exons 2 and 3 were aligned as described in Materials and Methods. A, Representative “close” phylogeny of M3 sequences. ∗, M3 sequences previously published. B, Large-scale tree representative of non-UPGMA methods. Most branches are collapsed for clarity. M3 clusters with eutherian I-a genes, but not with murid I-a, suggesting an origin at or prior to the eutherian radiation. C, UPGMA methods place M3 outside the eutherian cluster (86%), suggesting an origin prior to the eutherian radiation.
Large-scale trees (Fig. 2⇑, B and C) revealed three notable features regardless of the algorithm used. First, the M3 branch never clustered significantly with other murid class I genes even when (as in most NJ algorithms) it did cluster with eutherian class I-a genes in general (Fig. 2⇑B). Second, M3 genes were on a “long branch,” indicating a larger genetic distance between M3 and class I-a genes, than between class I-a genes of different mammalian orders. Third, in trees using an algorithm (UGPMA) that assumes constant rates of substitution, M3 appeared to have evolved before the radiation of placental mammals (Fig. 2⇑C). Thus, all tree-building methods suggested that M3 evolved before the eutherian radiation, or evolved unusually rapidly.
Thirty-nine of 43 new M3 sequences were easily aligned with class I-a genes in exons 2 and 3. Four sequences (marked insertion and/or deletion (indel) in Fig. 2⇑A) from M. C. pahari and M. P. shortridgei are altered near the 5′ end of exon 2 (Fig. 3⇓). In each of these, a 9-bp tandem duplication encoding the N-formyl interaction residue His9 is closely followed by an “indel”, predicting a polypeptide with a net gain of one amino acid without a frameshift. The indel+ M. C. pahari sequences have premature stop codons and exhibit a nonfunctional Asn Pro Ser glycosylation motif (50) at positions 86–88, a site critical for peptide binding by H2-Ld (51). In contrast, indel+ M3 from M. P. shortridgei are otherwise intact and might encode functional proteins.
Amino acid alignment of M3 genes. Nucleotide sequences predicting identical protein sequences are listed together to the left of the sequence. A, α1, B, α2. Key:—identity with M3a; ∗ stop codon; large arrows: N-formyl coordinating residues; #, deletion; @, insertion of HYFH; &, insertion of HFFH; $, insertion of RAPWMETR (a near tandem duplication).
Multiple copies of M3 have not been reported in laboratory mice. However, R. norvegicus has at least three copies (52). We isolated four sequences from M. C. pahari, three from M. P. shortridgei, and five from M. N. gratus. Assuming monomorphism, these sequences represent distinct loci. Two distinct M3 sequences were found in T. osimensis, each with multiple defects. We cannot rule out the possibility that this species carries another copy of M3 that is functional.
Conservation of the N-formyl coordination residues
Ten residues of M3a (Y7, H9, Y22, S24, L63, K66, V67, I70, V99, Y159) coordinate N-formyl binding (53) of which four (Y7, Y22, S24, Y159) are frequent in other class I molecules. Excepting the six indel+ sequences, there were only three variants in these 10 residues among 42 sequences (99% identity), suggesting hyperconservation (Table I⇓). All three variants, at residues 7 and 159, in two genes from S. hispidus, are at sites highly conserved among most MHC molecules; two were identical conservative Tyr159Phe substitutions. There was complete codon conservation for His9, Leu63, and Val99, and only synonymous substitutions at Lys66, Val67, and Ile70 sites (Table I⇓). Thus, the N-formyl-coordinating residues are extremely well-conserved and the N-formyl coordinating residues should be under negative selection.
Conservation of the 10 N-formyl coordinating residuesa
Synonymous and nonsynonymous rates of substitution in M3
To test whether the ARS of M3 has been under positive (diversifying) or negative (purifying) selection, we regressed nonsynonymous substitution rates against synonymous rates (Fig. 4⇓). As expected, mouse and rat I-a ARS were positively selected (1.48 ± 0.03), but the M3 ARS was under strong negative selection (slope = 0.236 ± 0.002, p < 0.001 versus a slope of 1). As expected, the slope of the M3 NARS, 0.162 ± 0.002, was also negatively selected (p < 0.001; Fig. 4⇓B); this slope is similar to that of many murine non-MHC genes (54). Negative selection of murine class I-a NARS sequences (0.59 ± 0.01) was consistent with other reports for class I-a genes (15). The 10 N-formyl-coordinating residues were hyperconserved (dN/dS = 0.025 ± 0.003; p < 0.001) compared to the NARS (19).
ARS and NARS residues of M3 are under negative selection. Pairwise rates of nonsynonymous (dN) substitution are plotted against synonymous rates (dS). •, M3; □, mouse H2-K and H2-D and rat RT1-A. A, ARS. The slope of M3 was 0.236 ± 0.002; the slope of I-a was 1.48 ± 0.027. B, NARS. The slope of M3 was 0.162 ± 0.002; of I-a, 0.589 ± 0.006. All four slopes differed from neutrality (dashed line, slope = 1, p < 0.01).
Conservation of N-formyl peptide binding specificity
Hyperconservation of N-formyl coordination residues suggested conservation of N-formyl binding specificity. To test this, the α1α2 domains of “null” H2-M3b and both rat and S. hispidus M3 were expressed in an Ldα3 expression vector and tested for peptide-induced surface expression in transfected cell lines (46, 47). fMIVIL and fHA but not nonformylated peptides induced surface expression of M3a, M3b, RT1-M3, and Sihi-M303 (Fig. 5⇓A), demonstrating conservation of N-formyl specificity.
Rat, cotton rat, and the null H2-M3b proteins are N-formyl specific. Cells transfected with indicated M3Ld vectors were incubated with 20 μM of the indicated peptide derived from L. monocytogenes (MIVIL) or influenza hemagglutinin (MLIIW) and stained the next day for expression of M3Ld. Cells transfected with a neoplasmid served as a control. Values are the average and SD of three independent experiments.
The indel+ M. C. pahari sequences had multiple debilitating mutations in exon 2 but the indel+ Mush-M302 was intact except for two nonconservative substitutions in the N-formyl coordinating site (Fig. 3⇑A). This suggests the indel was acquired in the ancestral species before other mutations accumulated in M. C. pahari. Null duplicates can be fixed in a population but are also rapidly lost (42); the presence of indel+ M3 in two species suggests it may retain some function, and allowed us to test experimentally and statistically whether the indel event neutralized M3.
To test whether indel+ M3 retains N-formyl specificity, we expressed the α1 and α2 domains of Mush-M302 with in the Ld vector. In contrast to our results using M3b, previously considered a “null” allele, the indel+ M3 was not induced by any tested peptide (not shown). N-formyl peptides did induce a chimeric molecule with M3a exon 2 and exon 3 from the indel+ gene (data not shown). This negative experimental result suggests the indel of exon 2 destroyed N-formyl specificity but cannot rule out the possibility that the indel simply altered peptide specificity. Compared with all other M3, the dN/dS slopes of both the ARS (0.27 ± 0.01) and NARS (0.40 ± 0.02) of the indel+ genes were below 1, but significantly higher than the ARS (0.21 ± 0.01) and NARS (0.20 ± 0.01) of indel-negative genes of the same two species, consistent with a relaxation of negative selection after indel acquisition. Moreover, in direct comparisons (n = 2) between exons 2 and 3 of M3 of M. C. pahari and M. P. shortridgei, the dN/dS ratio of indel+ M3 was 1.4 ± 0.22 compared to 0.27 ± 0.01 for indel-negative M3. Clearly, only the indel-negative genes have been under negative selection. The experimental and statistical results are consistent with the interpretation that the indel in exon 2 inactivated M3.
Three models of M3 evolution before the sigmodontine/murine divergence
Despite the long branch length of M3 genes (Fig. 2⇑, A and B), the dN/dS ratios and functional data indicated M3 evolved conservatively under negative selection since the murine/sigmodontine split ∼65 MYA. Phylogenetic trees failed to join M3 to a murid branch, and UPGMA trees put M3 completely outside eutherian I-a genes. However, UPGMA models assume constant evolutionary rates, and all models assume a uniform or monophasic γ distribution of rates at different sites along the gene. Therefore, the origins of M3 within murid I-a genes (Fig. 6⇓A, model I) might be obscured by nonuniformity or inconstancy of substitution rates.
Evolution of NARS residues in M3 and I-a genes. A, Three models of M3 evolution. I. M3 was a recent offshoot of murid I-a genes. II. M3 arose before the eutherian radiation. III. Murid rodents were an early offshoot of eutherian mammals. B, M3 and I-a genes have evolved linearly since the mammal/bird divergence. The average between-group protein p-distances were plotted against estimated divergence time for different taxa. “Young” estimates (<∼65 MYA) were obtained from between-taxa comparisons of M3 or of murid I-a genes; older estimates used marsupial and birds as outgroups. For example, the average pairwise distance between Peromyscus and Mus M3 was plotted against the divergence time (50–65 MYA) of Murinae and Sigmodontinae. “Old” estimates were computed from genetic distances between M3 (or I-a) and marsupial or bird class I genes, the latter treated as outgroups. •, M3 distance = 0.043 + 0.0024/MY, r2 = 0.98. □, I-a, distance = 0.068 + 0.0019/MY, r2 = 0.96. This plot is representative of plots produced using protein and nucleotide data, and using Poisson or Tamura-Nei corrections for multiple substitutions. Substitution rates for I-a related genes ranged from 0.0013 to 0.0019/MY (r2 from 0.80 to 0.98). Substitution rates for M3 were 10–30% higher, and ranged from 0.0016 to 0.0022/MY (r2 from 0.97 to 0.99). Δ, Distances between M3 and murid and eutherian I-a genes (0.335 ± 0.004) are plotted at the time ∼75 MYA estimated for the murid radiation under model I or at 131 MYA predicted as the M3/I-a divergence (assuming linear evolution under model II). Values using other distance measures and divergence times ranged from 100 to 200 MYA, but were linear in all the models. A saltation is indicated by the gray line leaving the I-a line after the eutherian radiation, and returning to the M3 line by the time of the murine/sigmodontine divergence. Transitional forms would be predicted to fall along this line. C, No evidence for positive selection in the NARS driving rapid evolution of M3 from murid I-a genes. Nonsynonymous substitution rates are plotted against synonymous rates. •, M3 pairs, slope = 0.17; □, I-a pairs; slope = 0.6; Δ, M3-I-a pairs, slope = 0.20. Dashed line indicates a slope of 1 (neutral selection).
Because ARS and NARS residues can evolve at different rates in class I genes, we analyzed NARS residues, which are under negative selection and should exhibit more uniform evolutionary rates. As expected, genetic distances between M3 and murine I-a genes were less using the NARS alone, compared with ARS + NARS. However, this shortening also affected distances between I-a genes as well, such that NARS genetic distances between M3 and murid I-a-like genes were significantly greater than between murid and eutherian I-a-like genes (data not shown). Moreover, as in Fig. 2⇑, the NARS of M3 genes remained on a long branch (not shown) and did not cluster with other murid I-a genes.
To test whether M3 and/or I-a evolution rates were constant over time, we plotted NARS genetic distance against estimated taxonomic divergence dates (Fig. 6⇑B). Using a variety of distance-estimating algorithms, and the range of published rat/mouse, murine/sigmodontine, eutherian/marsupial, and mammal/bird divergence times, both M3 and I-a genetic distances were linear since the mammal/bird divergence. Moreover, M3 and I-a rates were very similar, though rates estimated for M3 ranged from 0 to 30% faster than those estimated for I-a genes.
Three models might explain why M3 looks old (Fig. 6⇑A). In model I, exons 2 and 3 of M3 arose from murid I-a genes, through a “leap” (saltation) involving rapid divergence after gene duplication, during which M3 evolved N-formyl specificity and, after which, M3 evolution became conservative. The saltation generated a large genetic distance, giving M3 the appearance of old age. This model predicts M3 will be found only in murids, that transitional forms should occur in other murids, and the genetic distance between M3 genes and such transitional forms should follow the gray line indicated in Fig. 6⇑B. In model II, M3 looks older because it is older, arising before the eutherian radiation. This model predicts M3 orthologs should be found in other rodent or mammalian groups. Model III combines the first two: M3 arose in murids, but murids are older than nonmurid mammals (55, 56).
Model I predicts M3 and murid I-a genes are most closely related, while model II predicts that murid I-a and eutherian I-a sequences are more closely related. However, the distance between M3 and murid I-a genes (0.23 ± 0.02, SEM; n groups = 36) was not different from that between M3 and nonmurid eutherian I-a genes (0.21 ± 0.01; n = 18), and longer than the distance between murid and nonmurid eutherian I-a genes (0.14 ± 0.004; n = 8). Assuming equal substitution rates (Fig. 6⇑B), we tested whether eutherian I-a genes are a likely outgroup for M3 and murid I-a genes (model I) or if M3 is the outgroup (model II). We used Tajima’s (38) relative rate test to assess these two models using randomly selected sequence sets from our database. The results were significantly different from model I (p = 2 × 10−15) but not with model II (p = 0.58). Thus, NARS genetic distances and Tajima’s test both favor model II. It might be argued that the NARS of M3 expresses an altered function, and that positive selection for new functionality (17) drove rapid evolution of M3 ARS. Thus, both models I and III predict an elevated dN/dS slope in comparisons of M3 NARS with murid I-a-like genes. However, this slope was very low (Fig. 6⇑C), consistent with model II.
Discussion
M3 is a highly conserved MHC class I-b molecule with unusual specificity for N-formyl peptides—a PAMP. Using the criteria of close phylogenetic relationship, presence of residues characteristic of the N-formyl coordination site and N-formyl specificity, we identified M3 orthologs in diverse murine and a smaller subset of sigmodontine murid rodents. We did not find the expected transitional forms, but genes representing the missing link between M3 and I-a might have been missed using PCR probes biased toward detecting M3-like genes. Detecting such transitional forms, if they exist, will require a different strategy. There is a second caveat: we have assumed throughout that M3 is a “I-b” gene in all the species studied. From these data, we see that M3 appears essentially monomorphic in R. norvegicus, as it is in M. musculus. This leaves open the possibility that M3 is polymorphic—much more I-a-like—in other murids, consistent with model I.
M3 is specialized in other ways: it lacks conventional side chain specificities, and binds peptides both shorter and longer than the canonical range of 8–10 amino acids. In most other respects, M3 behaves like a class I-a molecule: widespread tissue distribution, TAP dependency, presentation to diverse αβ T cell receptors. Like many I-b genes, M3 has not been known outside a narrow taxonomic range. This is consistent with a model of frequent “birth and death” in which class I-b genes are derived by duplication from I-a genes but rarely survive long enough to transcend the boundaries of taxonomic families or even genera (15). Therefore, we expected isolation of M3 from murid species distantly related to mice and rat species to offer molecular clues as to how M3 evolved from I-a genes. However, the α1α2 domain of M3 from sigmodontine rodents maintains N-formyl specificity and scarcely differs from that of murine rodents. In particular, the N-formyl coordination residues, ARS and NARS of M3 have all evolved under intense purifying selection and no part of M3 exons 2 and 3 appears closely related to other murid I-a-like genes.
Assuming standard models of gene evolution, in which mutations occur singly and at random, M3 appears older than the radiation of eutherian (placental) I-a genes and therefore appears to have evolved well before the eutherian radiation (57). This model (II) is consistent with purifying selection for M3 function, but does not explain the apparent absence of M3 in nonmurid species.
The conventional model (I) of I-b evolution might be retained if unusual molecular mechanisms contributed significantly to M3 evolution. The algorithms used to analyze sequence evolution in this paper all assume monosubstitution mechanisms in which mutations at one site occur independently of one another. Multisubstitution mechanisms might allow both nonsynonymous and synonymous substitutions to race ahead without elevating dN/dS ratios, giving the appearance of old age to a young gene. For example, homologous gene conversion has been proposed repeatedly to contribute to class I gene diversity (58, 59, 60). However, if the donors for M3 gene conversion were homologous I-a genes, average genetic distances between I-a and M3 might still be low, contrary to our findings. However, nonhomologous (61) or error-prone gene conversion (62), perhaps involving recombination hot spots (63) or a high frequency of dinucleotide mutations (64), might generate large genetic distances without elevating dN/dS ratios (42). To identify this process, transitional forms of M3 from other murid subfamilies and rodent families would be useful. Whatever the mechanism of substitution, model I seems to require a substantial period (∼10–20 MY) of intense immune selection to drive rapid M3 evolution. Presumably, this pressure involved bacterial infections and could hardly have failed to leave its mark on other genes, perhaps accounting for the appearance of other long branch murid class I-b genes, such as Qa-1 and TL.
In contrast, model II asserts that M3 evolved before the eutherian radiation but fails to explain the apparent absence of M3 in nonmurid species. Using the basic local alignment search tool (BLAST), we identified human MHC class I gene fragments homologous to M3 or other murid I-b sequences but none with high bootstrap values. Under model II, M3 evolved under negative selection but might still be largely redundant with class I-a genes. If so, loss of M3 might be only slightly deleterious and M3 might be lost stochastically, especially in species with low effective population sizes such as humans (65).
M3 shares properties with PRR and has been under negative selection for this property for at least ∼65 MY. CD1, another monomorphic class I PRR molecule (5), is found in multiple eutherian orders, and appears even older than M3. M3 and CD1 are not at all closely related by sequence and differ in many other ways.
F. M. Burnet (66) asserted that it was their polymorphism that made MHC genes biologically significant. Certainly this is true for I-a function, but modern PRR-like I-b molecules suggest an alternate model for MHC origins. The duplication model of MHC origins (67) hypothesizes that a primitive locus expanded to become the modern polymorphic MHC, leaving unspecified whether the original MHC gene was itself polymorphic. Because most genes are monomorphic or minimally oligomorphic, and most class I-like genes not linked to the MHC are monomorphic (68), parsimony suggests the ancestral MHC locus was also monomorphic. This primitive MHC molecule, functioning as a PRR, would have been preadapted for the evolution of polymorphic class I-a molecules in the evolving adaptive immune system.
Acknowledgments
We thank J. Levitt, P. d’Eustachio, H. Suzuki, and S. Steppan for advice, the laboratories of J. Richards, J. Rosen, and S. Marriott, P. d’Eustachio, and P. Wyde for murid DNA, H. Suzuki for DNA samples of the endangered Japanese species, T. osimensis and D. legata, and two anonymous reviewers who challenged us to mount a deeper analysis of the three models.
Footnotes
↵1 This work was supported by National Institutes of Heatlh RO1 Grants AI30036 (to R. R. R. and R. G. C.) AI18882 (to R. R. R. and J. R. R.), and RO1 AI17897 (to R. G. C. and J. R. R.).
↵2 Current address: Emory University School of Medicine, Atlanta, GA 30322.
↵3 Address correspondence and reprint requests to Dr. John R. Rodgers, Department of Immunology Room M929, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030. E-mail address: jrodgers{at}bcm.tmc.edu
↵4 Abbreviations used in this paper: PAMP, pathogen-associated molecular pattern; ARS, Ag recognition site; NARS, non-ARS; MY, million years; MYA, MY ago; dN, rate of nonsynonymous substitution; dS, rate of synonymous substitution; indel, insertion and/or deletion; NJ, neighbor-joining; PRR, pattern recognition receptor; UPGMA, unweighted pair-group method with arithmetic means; cytb, cytochrome b.
- Received December 23, 2002.
- Accepted May 12, 2003.
- Copyright © 2003 by The American Association of Immunologists