|
|
||||||||
2-Microglobulin Locus of Rainbow Trout (Oncorhynchus mykiss) Contains Three Polymorphic Genes1
Departments of Structural Biology, and Microbiology and Immunology, Stanford University, Stanford, CA 94305
| Abstract |
|---|
|
|
|---|
2-microglobulin (
2m) associates with MHC and related class I H chains to form cell surface glycoproteins that mediate a variety of functions in defense. In humans, monomorphism of a single
2m gene contrasts with the diversity and polymorphism of the class I H chain genes, and a similar picture was seen in almost all other species examined. In this regard, rainbow trout (Oncorhynchus mykiss) appeared unusual: trout
2m genes gave a complicated and polymorphic pattern in Southern blots, and a minimum of 10 different mRNA encoding two distinct types of
2m were expressed by a single fish. Characterization of genomic clones from the same fish now shows that the rainbow trout
2m locus consists of two expressed genes and one partial gene that are closely linked. Four copies of the locus were identified and allelic variants of each gene defined, largely through comparison of the noncoding regions. A dramatic variation in the lengths of introns is caused by variable repetitive elements and accounts for the complex pattern seen in Southern blots. By comparison to noncoding sequences, the coding regions are conserved but the three loci differ within a cluster of codons that encode residues of
2m that do not interact with class I H chains. Additional diversity in the trout
2m genes appears to be due to somatic mutation that might be facilitated by the abundance of repetitive DNA elements within the 12
2m genes of an individual rainbow trout. | Introduction |
|---|
|
|
|---|
2-microglobulin (
2m)4was first identified as a small serum protein (1). It was subsequently shown to be the noncovalently bound L chain of highly polymorphic MHC class I molecules (2). In these membrane glycoproteins, which present peptide Ags to CTLs (3) and regulate NK cells (4),
2m forms a central part of the structure, one necessary for the proper folding and cell surface display of the class I molecule (5, 6, 7, 8). Many different loci encode class I H chain genes and only some of them are located in the MHC (9). Although all these H chains associate with
2m to form molecules with recognizably similar structures, they exhibit a wide range of different functions. However, most are concerned with immunity and defense.
In humans,
2m is encoded by a single gene for which no protein polymorphism has been discovered, despite considerable search. This monomorphism contrasts with the diversity and polymorphism of class I H chains. Limited polymorphism of a single
2m gene (10, 11) has been described for other mammalian species: three alleles in laboratory mice (12), five in wild mice (13), and three in the owl monkey (14). In chickens (15),
2m is also encoded by a single gene located on a chromosome different from that of the MHC. In diploid fish, including zebrafish (16), catfish (17), and tilapia (18),
2m is encoded by a single gene, as also seems likely for the tetraploid carp (18) and sturgeon (19).
In the rainbow trout (Oncorhynchus mykiss), another fish species with tetraploid history, a different situation appears to pertain. A cDNA library made from a single fish was found to contain numerous clones encoding
2m (20). Detailed analysis of 12 of the cDNA clones revealed 10 different sequences. These cDNA encoded two distinct forms of the
2m protein, as well as being differentiated by many noncoding differences. Such variability seemed unlikely to be due to expression of a single gene, even one present in four copies. Consistent with the hypothesis that rainbow trout genomes contain more than one
2m gene are results from Southern blotting, which showed a complicated pattern of bands that differed from one fish to another. The goal of the investigation described here was to test directly the hypothesis that the haploid rainbow trout genome contains more than one
2m gene. To do this, we have characterized the
2m genes of the same trout (fish J) from which
2m cDNA had previously been characterized. We describe a trout
2m locus which consists of three linked genes, two of which are expressed and one which is incomplete and not expressed. In the genomic DNA of fish J, the
2m locus is present in four copies that are readily distinguished. The implications of the diversity of rainbow trout
2m genes for MHC class I function are discussed.
| Materials and Methods |
|---|
|
|
|---|
A library was constructed in
DASH II from partially Sau3A digested genomic DNA from liver tissue of trout J (21). The library contained
300,000 clones and was screened directly without amplification using a probe (b2m-mp) corresponding to exons 2 and 3 of the trout
2m cDNA Jb-1. This probe hybridizes to all the trout
2m cDNA previously characterized (20).
Subcloning and sequencing of the genomic clones
DNA from the
phage genomic clones was digested with EcoRI, or with both HindIII and EcoR1. Hybridizing fragments were gel purified and ligated into vectors suitable for transposon sequencing: the pJF5 vector (R. Myers, Stanford University, Stanford, CA) or a modified pGEM vector. Ligation products were used to transform DH5
cells. Clones containing insert were recovered, and used to transform pOX38 cells that carry the Tn1000 transposon (22). (The direct transformation of pOX38 cells using the products of the ligation was too inefficient to be used routinely.) The pOX38 cells carrying the clone were recovered, and used for an in vivo transposon mobilization procedure with DH5
cells as the recipient in a bacterial mating protocol developed by the Stanford Human Genome Center.
DH5
cells carrying the plasmid and its randomly integrated transposon were recovered on plates carrying naladixic acid and ampicillin. The site of transposon integration was determined using PCR amplification of the segment from the end of the transposon to each side of the multicloning site of the vector. Between 50 and 100 clones were picked and DNA recovered. The site of integration of the transposon was mapped by PCR amplification of the segment from the multiple cloning site of the vector to the primers located in the ends of the transposon. DNA was sequenced in each direction from the site of transposon integration, using unique primers located on each end of the transposon. The sequences were assembled using the ABI Autoassembler program (Applied Biosystems, Foster City, CA). Information on the position of integration as determined by PCR was used to assemble contigs. Additional transposon insertions were done to complete clones using the GPS-1 Genome Priming System kit (New England Biolabs, Beverly, MA). Analysis of restriction digestion patterns confirmed the lengths of repetitive DNA segments. Sequences of subclones (1, 2, 3, 4, 5) were deposited in GenBank under accession numbers AY217450-4.
PCR amplification from the
clones
Partial sequence was determined for each of the genomic clones by PCR amplification of several subregions within each clone. DNA from the isolated clones was amplified by PCR. DNA fragments were gel purified and recovered using the Qia-Ex (Qiagen, Valencia, CA) gel extraction kit. DNA was sequenced directly from the amplification product. Primers used for the amplification of regions of the clones are indicated in Table I.
|
DNA was amplified from the genomic DNA of trout J, the same trout used for the construction of cDNA and genomic libraries. Primers were designed to amplify the 3' untranslated region of the type 1 gene. Forward primer (3F1) and reverse primer (3R4) were used to amplify directly from genomic DNA in five independent amplification reactions. The fragments were gel purified and ligated into the T-overhang vector Topo-TA (InVitrogen, Carlsbad, CA) and cloned using One-Shot competent cells (InVitrogen). Clones were picked at random and sequenced in both directions.
| Results |
|---|
|
|
|---|
2m gene in rainbow trout
A
library was made from genomic DNA of rainbow trout J, from which diverse
2m cDNA had been characterized (20). Screening the library with a
2m cDNA probe allowed 10 genomic clones, designated
110, to be isolated. On Southern blotting the genomic DNA of trout J gave 15 hybridizing TaqI bands (Fig. 1), of which three small bands (750, 550, and 330 bp) are common to all rainbow trout, whereas 12 larger bands are polymorphic (20). Each genomic clone had a different pattern, among which the three nonpolymorphic bands were well represented. Also well represented were polymorphic bands in the size-range 1.54.0 kb. Although only one of the hybridizing TaqI bands in whole genomic DNA was of a size larger than 4 kb, three of the genomic clones (
1, 6, and 9) had fragments >4 kb. The cause of this difference has not been identified, but it could be a cloning artifact caused by infidelity in
replication of the repetitive DNA which is abundant in the clones and in salmonid genomes in general (23).
|
6,
5, and
3) were further characterized by restriction digest, subcloning of fragments, and sequencing (Fig. 2). The
6 clone contained two complete
2m genes; the exons being found on two EcoR1 fragments, both of which were subcloned and sequenced. Subclone 1 (Sc1) contained all three exons of one gene and exon 1 of the second gene. Both genes were in the same orientation. Subclone 2 (Sc2) contained exons 2 and 3 of the second gene. Each gene corresponded to 1 of the 10
2m cDNA sequences characterized from fish J (20). The exon sequences from the gene contained entirely in Sc1 matched perfectly with cDNA clone Jb-10, whereas the exons of the gene split between Sc1 and Sc2 matched cDNA clone Jb-6.
|
6 shows the two lineages are products of different genes. Previously, the 10 trout cDNA sequences identified had been separated into four groups (I, II, III, and IV) based on similarity to each other. We now realize these sequences derive from two genes. Sequences within groups I, II, and III will include alleles and minor variants of the four alleles of the type 1 gene. The two sequences within group IV represent two alleles of the type 2 gene. For clarity, we have now designated genes like the one on subclone SC1 as type 1 genes and genes like the one on subclone SC2 as type 2 genes. Individual
2m genes are named according to gene type and the genomic clone on which they were found. Thus, the two genes in the
6 clone are designated
6.1 and
6.2.
Clone
5 contained exons derived from two different
2m genes. Subclones 3 (Sc3) and 4 (Sc4) were generated by digestion with EcoR1 and HindIII and sequenced. Sc4 contained exons 2 and 3 plus the 3' untranslated region of a type 2 gene, but their sequence did not correspond precisely to any of the known cDNA. Likewise, the EcoRI-derived Sc5 subclone from
3 also contained exons 2 and 3 plus the 3' untranslated region of a type 2 gene not represented in the cDNA. The second
2m gene in clone
5 was contained in Sc3 and represented solely by exon 2; experiments designed to locate and identify exons 1 and 3 and a 3' untranslated region for this gene were unsuccessful. The sequence of exon 2 did not correspond to any known cDNA sequence and was divergent from them. This partial gene will therefore be referred to as the type 3 gene.
The three conserved TaqI fragments derive from segments that span the boundary of intron 1 and exon 2 in the
2m genes. Moreover, each fragment is associated with one of the three different types of gene: the 730-bp fragment with type 1 (
2m1), the 550-bp fragment with type 2 (
2m2), and the 330-bp fragment with type 3 (
2m3). Thus, these fragments provide conserved and useful markers for the different genes.
The fragment generated on the 3' side of the TaqI site in exon 2 produces the larger, polymorphic TaqI bands in Southern blots. The length of this fragment differs for each of the five genes sequenced, varying not only between type 1, 2, and 3 genes, but also between the three type 2 genes examined. The variation is due to presence of a variable number of tandem repeats (VNTR) within intron 2. Of the various VNTR identified in the
2m gene sequences (Table II), VNTR a and VNTR a' contribute to length variation of intron 2 in all three gene types. The two forms (a and a') alternate in apparently random fashion, and are unusual in that they include part of exon 2 and the intron 2 donor splice site.
|
2m genes is striking variation in the lengths of the introns caused by the presence of VNTR. The type 1 gene is the most compact in size and contains only the VNTR a/a' of intron 2. By contrast, the type 2 and 3 genes are larger and have accumulated more repetitive DNA. An additional VNTR (VNTR b) is present in intron 2 of the type 2 genes and one of two different VNTR (VNTR c, d) is present in intron 1. In particular, the DNA flanking the single exon of the type 3 gene is rich with repetitive DNA: carrying a much larger tract of VNTR a/a' in intron 2 and a unique repeat, VNTR e, in the upstream region corresponding to intron 1 (Fig. 2).
Rainbow trout can have four different alleles of
2m genes
Using the three TaqI fragments as markers, we assessed the
2m gene content of all genomic
clones (Fig. 3). A total of 16
2m genes were counted: one of type 1, six of type 2, and nine of type 3. Because the trout genome is tetraploid in origin, the identification of three types of gene suggested that a single trout could have up to 12 different
2m genes. If true, then the genomic clones would have to be redundant in their representation of the type 2 and 3 genes, and incomplete in their representation of the type 1 gene. Evidence supporting the latter was the previous characterization from fish J of eight different cDNA derived from type 1 genes.
|
To investigate further the diversity of type I genes, a 550-bp region of intron 1 that corresponds to part of the 730-bp TaqI fragment was directly amplified from genomic DNA of fish J. The amplified products were cloned, and individual clones were isolated and sequenced. From the analysis of 20 clones obtained from two independent experiments, a total of four different sequences was obtained. They differed by 525 nucleotide substitutions, indicating the presence of four alleles of the type 1
2m gene. The allele present in genomic clone
6 corresponds to allele A. Pairwise comparison of the sequences derived from the four type 1 alleles revealed no evidence for them sorting into two pairs of more closely related alleles, as would be expected if the locus had reverted to diploidy (Table III).
|
6. This has been designated the A allele. To correlate the remaining seven cDNA with individual type 1 genes, we PCR amplified, cloned, and sequenced the 3' untranslated regions of the type 1 genes, the site of most cDNA sequence variation. A total of 35 clones derived from five independent amplifications were sequenced, a strategy taken to reduce the probability of artifact. The sequences of the 35 clones form four well-defined groups, represented by comparable numbers of clones and corresponding to the four type 1 alleles (Fig. 4). Alignment of the genomic sequences from the 3' untranslated regions with the 3' untranslated regions of the cDNA clones showed that each cDNA sequence preferentially aligns with one of the four genomic sequences: Jb-10 with A, Jb-1, 4, 9, and 11 with B, Jb-2 with C and Jb-3, 5, 7, and 8 with D. Thus, there is good correlation between the three groups of cDNA defined previously and the alleles of the type 1 gene: group I with allele B, group II with alleles A and C, group III with allele D.
|
2m gene (15). Additional differences between the cDNAs derive from the use of six possible polyadenylation sites, as has also been seen in the mouse
2m gene (24). For example, clones Jb-1 and -4 (900 nt in length) and clones Jb-9 and -11 (1300 nt in length) all derive from one allele. Variation between individual clones of the 3' untranslated region of a given allele was analogous to that between cDNA clones derived from that allele. Two of the four type 1 alleles carry a run of 1213 Ts and 1314 Gs, and extensive variation at this region was observed in genomic DNA as well as cDNA. In addition, single nucleotide differences were identified in 2 of 8 clones representing allele A, 4 of 8 clones representing allele B, 3 of 12 sequences representing allele C, and 2 of 7 sequences representing allele D. Three additional clones were recombinant sequences. These apparently nontemplated differences could have arisen during PCR amplification or be a result of somatic mutation in trout cells.
Type 2 gene
The type 2 gene on clone
6 corresponded to the Jb-6 cDNA in the coding region, but matched Jb-12 in the untranslated region. Five other
clones were also known to contain a type 2 gene (Fig. 3). To assess the genes they contained, homologous sequences were obtained from all six type 2 genes after PCR amplification. Three different segments of the gene (II, III, and IV in Table III) were compared. Four variants of the diagnostic TaqI fragment (segment II) of the type 2 gene were defined, indicating that all four alleles of the type 2 gene were represented in the genomic clones. In contrast only three variants of a region corresponding to intron 3 (segment III) and a region within the 3' untranslated region (segment IV) were obtained (Table III). One
clone known to contain a type 2 gene failed to amplify in each reaction, evidence for a fourth allele which had differences or deletions involving the sequences used for priming PCR amplification. The genomic clones included the two variants of the 3' untranslated region represented by cDNA clones Jb-6 and Jb-12. Genomic clones
1,
2,
6,
9, and
10 match cDNA Jb-12 and
8 matches Jb-6 within the amplified 3' untranslated region. No genomic clone carried the nucleotide difference changing amino acid 41 from lysine to glutamic acid identified in Jb-12.
Type 3 gene
Nine of the 10 genomic clones contain a type 3 gene. Three segments of each gene were determined and compared. One segment, sited upstream of exon 2 and including VNTR f, yielded identical sequence from
1,
2,
9, and
10, but was otherwise uninformative. The region upstream of exon 2 that contains VNTR e (segment V) revealed three different sequences, whereas analysis of the diagnostic 330-bp TaqI fragment of the type 3 gene (segment VI) gave six different sequences. Included in the latter were three sequences differing by single nucleotide substitutions from related sequences. These differences were reproduced in two independent experiments indicating that if they are the results of in vitro artifact they are not random. In conclusion, the analysis provides good evidence for three alleles of the type 3 gene.
The
2m locus consists of three linked genes
Comparison of the allelic polymorphisms in the type 2 and 3
2m genes indicates that clones
1,
2,
6,
9, and
10 derive from the same haplotype and form a contiguous sequence of
30 kb in length that includes a type 1, a type 2, and a type 3 gene. Thus for this haplotype, the
2m locus is shown to be a closely linked set of three genes (Fig. 5). The linkage of the type 2 and 3 genes is also demonstrated for two additional haplotypes, as defined by clones
3 and
8. In clone
5, the type 3 allele is the same as that of the haplotype covered by the
2, 6, 9, and 10 clones, but it is linked to a different type 2 gene. The
5 clone differs from
clones 2, 6, 9, and 10 upstream of exon 2, because amplification across VNTR f failed. Thus,
5 probably represents the fourth haplotype, its recombinant nature being consistent with only three different type 3 alleles having been found in fish J. Alternatively, this clone might have been the result of an in vitro recombination between two of the natural haplotypes that occurred during the construction of the genomic library. In summary, the results are consistent with fish J having four copies of a
2m locus containing three linked genes. Of these genes, the type 1 and type 2 genes are functional and expressed whereas the type 3 gene consists only of exon 2 and flanking intronic sequences.
|
2m genes
The sequences of the type 1 and type 2
2m genes were searched for differences that could influence function. The sequence encoding the mature protein is more conserved than that encoding the leader sequence or the 5' and 3' flanking sequences. The promoters are so different that their sequences cannot readily be aligned: a region including the sequence encoding the leader peptide and reaching 550-bp upstream of it having only 47% sequence similarity. In Fig. 6, the
2m promoter sequences from various species, including trout, are shown with their S-X-Y motifs in register. This motif regulates constitutive vs inducible expression of the gene through binding of the regulatory factor X and MHC class II transactivator transcription factors (25, 26). Both type 1 and 2 trout
2m genes have the motif, despite their sequence divergence. The Y motif, an inverted CCAAT box, is seen at 39-bp upstream of the TATA box in the type 1 gene, and 45-bp upstream of the TATA box in the type 2 gene. The CCAAT box resides at position -39 from the start site mapped in the human gene. The spacing between these three motifs is critical as the bound proteins function as an enhanceosome (26). A 1011 bp deletion is seen between the X2 and Y motifs of both of the trout and the zebrafish sequences. The S-X-Y motif within the proximal promoter of the Onmy-UBA gene does not have this deletion (27) suggesting it arose in the promoter of
2m early in fish evolution. Because the proximal promoters of the two trout
2m genes carry similar motifs, it is likely that the genes have some similarities in their regulation.
|
2m genes. The type 1 gene has 42 CpGs upstream of the leader, 2 in the leader, 17 in intron 1, and 10 in exon 2, but only 1 in intron 2, and none in the downstream region. The type 2 leader, was also flanked by CpGs, but to a lesser extent than the type 1 gene. The region upstream of the type 2 leader contained CpGs, 4 within the leader and 12 in the part of intron 1 encoded within subclone 1. CpG abundance in much of intron 1 is not known because this part of the type 2 gene was not sequenced, however, the region upstream of exon 2 on subclone 2 carries an additional nine CpGs. The exon 2 sequence contained 9 CpGs, and intron 2 contained 138, mostly attributable to a 2-kb region of unstable repetitive DNA flanked on either side by VNTR b. Whereas intron 3 of the type 1 gene contains no CpG, the corresponding intron in the type 2 gene has 47 CpGs. Most of these lie within a single tract, just upstream of the 3' untranslated region. The type 3 gene has 19 CpGs upstream of exon 2, 8 within exon 2, and 10 downstream of it.
Exon 2 which encodes all but four amino acids of the mature protein has been partially or fully sequenced for almost all of the
2m genes present in the genomic clones. The inferred amino acid sequences were determined for the partial sequences and aligned with those known from the cDNA for the type 1 and type 2 genes (Fig. 7). The three loci differ primarily at the sequences encoding the amino acids at positions 16, 17, 19, and 20. At residues 1620, the type 1 gene encodes residues NFGDK, the type 2 gene encodes QHGKD, and the type 3 gene encodes EYGKD. These differences reside in the loop between strands 1 and 2 of the mature
2m protein. The glycine at position 18 is required for the turn in the loop and is invariant in all species examined. Residues 16 through 20 are relatively conserved in the
2m of other fish species, having a consensus sequence EYGKE that most closely resembles that of the unexpressed trout type 3
2m gene.
|
| Discussion |
|---|
|
|
|---|
2m cDNA obtained from an individual rainbow trout, fish J. The cDNA were divided into four groups of which groups I, II, and III were closely related and group IV was more divergent. From analysis here of the
2m genes of fish J, we have shown that group I, II, and III cDNA are the products of one
2m gene (the type 1 gene) and group IV cDNA are the products of a second, linked
2m gene (the type 2 gene). These genes are composed of four exons: the 5' untranslated region and leader being encoded by exon 1, most of the mature protein by exon 2, the C-terminal four amino acids and part of the 3' untranslated region by exon 3, and the remainder of the 3' untranslated region by exon 4. In addition, a third unexpressed
2m gene fragment (the type 3 gene) consisting of a complete, normal-looking exon 2 and its flanking region is also linked to the two expressed
2m genes. Thus, the trout
2m locus consists of three different genes (Fig. 5).
Diversity within the mRNA transcribed from the two expressed
2m genes is due to polymorphism in the four alleles of each gene present in the genome of fish J. Thus, the trout genome is tetraploid for the
2m locus, as also seen for the CK-1 chemokine locus (28). Unlike the sturgeon in which the four
2m alleles sort into two pairs of more closely related alleles (19), there is no indication for diploidization of the rainbow trout
2m locus. This is in contrast with MHC class I of rainbow trout, for which a maximum of two Onmy UBA alleles can be detected in an individual (29, 30).
Certain structural features of the
2m locus account for the diversity and polymorphism of the band pattern observed on Southern blotting of TaqI-digested DNA. Two groups of TaqI fragmentslong and shortare produced and each gene contributes one fragment of each type. The short fragments (<1 kb) derive from intron 1 and the 5' part of exon 2 and are characteristic of each
2m gene; the long fragments (>1 kb) derive from the 3' part of exon 2 and intron 2 and are characteristic of particular alleles. Consequently, the small bands are common to all rainbow trout, whereas the larger bands vary between alleles causing extensive variation in the TaqI RFLP pattern in populations of rainbow trout (11). A major cause of the differences in length of the bands for genes and alleles is novel elements of repetitive DNA, especially VNTRs.
An inconsistency we have been unable to resolve is that the long TaqI fragments are of different length in
clones that, by other criteria, appear derived from the same
2m gene. In some cases, such as
1.3 this is likely due to proximity to the vector arm. However for other pairs of clones, for example
4/
7 and
2/
10, which are predicted to include the complete TaqI fragment, this cannot be the explanation. These differences could be artifact due to the repetitive DNA sequences undergoing expansion or deletion during phage replication. We should emphasize that the clones selected for subcloning and sequencing had restriction fragments that correlated to the hybridizing fragments on the Southern transfer. Another possible explanation is that the differences are due to somatic variation in the trout. Finally, we cannot rule out the possibility that these pairs of clones represent different genes, which if true, would increase the number of
2m genes in the genome of fish J beyond 12. However, all the other data obtained in this study are consistent with this trout having 12 genes.
In the cDNA analysis, mRNA from the type 1 gene was more abundant than type 2 gene mRNA. Thus, a hierarchy appears in which the type 1 gene is expressed at a high level, the type 2 gene at a low level, and the type 3 gene is not expressed. The difference between the two expressed genes is associated with highly divergent promoters and a larger amount of repetitive DNA in the type 2 gene than the type 1 gene. (The type 3 gene also has much repetitive DNA). Comparison of the three
2m genes shows that exon 2, which codes for all but the four C-terminal amino acids of the mature protein, is the most conserved element of the gene. The sequence of exon 2 is much more conserved than the DNA of the flanking introns and untranslated regions and it is only from analysis of the latter that the different alleles of each gene can confidently be distinguished. The effect is most extreme for the type 1 gene, where all four alleles have the same exon 2 sequence.
In our analysis of both
2m genes and cDNA, we encountered sequence variation that could not be explained by germline inheritance of 12
2m genes and was more prevalent than expected for errors made in PCR or cDNA synthesis. The 3' untranslated regions of the type 1 genes were a particular focus for this phenomenon. In both cDNA and amplified genomic DNA, slipped-strand induced mutations were present in the polynucleotide repeat sequence T1112 G1314 present in two of the four type 1 alleles. In fact, so many variant sequences were obtained for this polynucleotide repeat that the germline sequence could not be assigned. Point mutations were also seen in clones derived from all the alleles in the region downstream of the polynucleotide tract. DNA polymerases and repair mechanisms can have difficulty replicating reiterated sequences such as mononucleotide tracts, which as a consequence, are highly mutable (31). For repeats of less than eight nucleotides, slippage can be corrected by the 3' exonucleolytic proofreading activity of DNA polymerases, but alterations in longer sequences require the postreplicative mismatch repair process (32). The mutations we observed within and downstream of the T1112G1314 sequence could therefore be due to errors of DNA mismatch repair in
2m genes in somatic cells.
A similar phenomenon has been described for the noncoding regions of the chicken
2m gene, where a frequency of somatic variation of
1% was observed. Mutations in a region downstream of a G17 tract in the 3' untranslated region, yielded five cDNA and genomic variants of the sequence from a single animal (15). Moreover, in humans with the "mutator phenotype", caused by defective mismatch repair in colorectal cancer, the
2m gene is particularly prone to accumulation of errors (33). An 8-bp CT repeat in the leader peptide sequence was particularly variable (34, 35). Many other subtle mutations of
2m accounted for complete loss of surface MHC Ags (36). Thus, by being difficult to replicate, the
2m gene might act as a sentinel for the disruption of normal replication. Furthermore, the loss of
2m expression would confer the added advantage of rendering cancer cells more susceptible to NK-cell attack.
The function, if any, of "minisatellites" or tandem repeat DNA is not known. Putative roles that have been ascribed to VNTRs include the facilitation of gene conversion, and the promotion of recombination (37). Alternatively, it has been argued that the accumulation of repetitive DNA is neutral and occurs as a by-product of unequal chromatid exchange (38). In the highly variable human pathogen, Neisseria meningitidis, repetitive DNA flanks the majority of genes encoding cell surface receptors, and is involved in processes promoting genome fluidity, antigenic variation and thus escape from host immunity (39). It seems possible that the high and variable content of repetitive DNA in trout
2m genes may be evidence for analogous mechanisms operating on the side of the host. Perhaps the repeats flanking
2m also contribute instability through error-prone replication, and potentially useful polymorphic variation can exist in the alternate copies of the gene. Accessing that useful polymorphism might proceed by a mechanism analogous to the gene conversion involved in the exchange of 1 of 17 silent partial copies into the functional pilin locus in Neisseria (40).
The three loci differ primarily at the sequences at amino acids 16, 17, 19, and 20. The type 1 gene has the sequence NFGDK. The type 3 clones all have the sequence EYGKD. The type 2 gene differs from type 3 at positions 16 and 17, having the sequence QHGKD. These differences reside in the loop between strands 1 and 2 of the mature
2m protein. The glycine at position 18 is required for the turn in the loop and is invariant. Although the amino acid sequence at positions 16 through 20 is not invariant in other fish species, it is relatively conserved, having the consensus sequence EYGKE. The type 3 sequence most closely resembles this motif. Overall, the exon 2 sequences of rainbow trout are more similar to each other than to those of other species, indicating a monophyletic origin of the three genes within rainbow trout.
The differences in the putative amino acid sequences within the S1-S2 loop appear to be the result of selection, because there is a group of linked nonsynonymous nucleotide changes in each of the two distinct types of
2m. It is possible that the two
2m proteins have different functions in the trout. Two widely divergent types of MHC class I molecules were identified in the rainbow trout, UAA and UBA (21), although the possibility that they associate with different
2m proteins has not been examined. Alternatively, the divergence of the
2m molecule in the loops is tolerated because this part of the molecule plays no role in the function. A third possibility is that the diversification of
2m in trout may reflect direct pressure from a pathogen targeting this part of the highly conserved protein. Trout that had undergone duplications and subsequent diversification of the
2m gene were those able to elude the pathogen. Consistent with this latter hypothesis is that the type 3 sequence, which is now defunct, most closely resembles the
2m sequence of other species of fish at the S1-S2 loop.
2m sequences vary greatly between different species of fish. Despite the constraints on this molecule to interact with various class I H chains, several regions of the molecule have diverged, albeit primarily in the exposed loops. The interspecies difference reflects the need for
2m to coevolve with its partner, the rapidly evolving class I molecule. Rapid evolution of the H chain allows the molecule to function in the presentation of different Ags providing the species with protection from changing pathogens. We speculate that
2m, the conserved part of the molecule, is an easy target for viruses. For example, a viral protein could bind to
2m and alter the structure of the molecule on the surface of all infected cells. One can envision conformational changes that could interfere with cytolytic T cell recognition, without alerting the NK cells that scan for the presence of epitopes on the H chain. The diversification of
2m in trout may reflect direct pressure from a pathogen targeting this conserved protein. Likewise, the extensive interspecies variation could reflect this type of pressure over evolutionary time.
| Acknowledgments |
|---|
| Footnotes |
|---|
2 Current address: Department of Biological Sciences, University of Alberta, Edmonton, Alberta, T6G 2E9, Canada. ![]()
3 Address correspondence and reprint requests to Dr. Peter Parham, Department of Structural Biology, Stanford University School of Medicine, Sherman Fairchild Building D-159, 299 Campus Drive West, Stanford, CA 94305-5126. E-mail address: peropa{at}stanford.edu ![]()
4 Abbreviations used in this paper:
2m,
2-microglobulin; Sc, subclone; VNTR, variable number of tandem repeats. ![]()
Received for publication September 12, 2003. Accepted for publication December 23, 2003.
| References |
|---|
|
|
|---|
2- Microglobulina free immunoglobulin domain. Proc. Natl. Acad. Sci. USA 69:1697.
2-Microglobulin on the cell surface: relationship to HL-A antigens and the mixed leucocyte culture reaction. Tissue Antigens 4:186.[Medline]
2-microglobulin is fixed during de novo synthesis and irreversible by exchange or dissociation. J. Immunol. 142:2751.[Abstract]
2-microglobulin in peptide binding by class I molecules. Science 250:1423.
2-microglobulin, class I heavy chain conformation, and tapasin in the interactions of class I heavy chain with calreticulin and the transporter associated with antigen processing. J. Immunol. 158:2236.[Abstract]
2-microglobulin genes. Cell 29:661.[Medline]
2-microglobulin gene: primary structure and definition of the transcriptional unit. J. Immunol. 139:3132.[Abstract]
2-microglobulin in the mouse. Proc. Natl. Acad. Sci. USA 77:7395.
2-microglobulin affects binding of class I MHC molecules by the W6/32 antibody. Immunogenetics 49:312.[Medline]
2-microglobulin gene is located on a non-major histocompatibility complex microchromosome: a small, G + C-rich gene with X and Y boxes in the promoter. Proc. Natl. Acad. Sci. USA 93:1243.
2-microglobulin gene in the zebrafish. Immunogenetics 38:1.[Medline]
2-microglobulin of ictalurid catfishes. Immunogenetics 48:339.[Medline]
2-microglobulin transcripts from two teleost species. Immunogenetics 38:27.[Medline]
2-microglobulin in a primitive fish, the Siberian sturgeon (Acipenser baeri). Immunogenetics 50:79.[Medline]
2-microglobulin sequence diversity in individual rainbow trout. Proc. Natl. Acad. Sci. USA 93:2779.
-chain diversity (D
) and joining (J
) segments in the rainbow trout: presence of many repeated sequences. Mol. Immunol. 34:653.[Medline]
2-microglobulin gene. Nature 302:449.[Medline]
2-microglobulin genes. Immunity 9:531.[Medline]
2-microglobulin gene transactivation. J. Immunol. 167:5175.
2-microglobulin mutation in mismatch repair-defective colorectal carcinomas. Curr. Biol. 6:1695.[Medline]
2-microglobulin gene mutations: a study of established colorectal cell lines and fresh tumors. Proc. Natl. Acad. Sci. USA 91:4751.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |