|
|
||||||||





* Department of Molecular Life Science, Division of Basic Medical Science and Molecular Medicine, Tokai University School of Medicine, Bohseidai, Isehara, Kanagawa, Japan;
Laboratory of Animal Physiology, Faculty of Agriculture, Tokyo University of Agriculture, Atsugi, Kanagawa, Japan;
Laboratory of Human Sequencing/Immunogenomics, Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom;
Centre for Bioinformatics and Biological Computing, School of Information Technology, Division of Arts, Murdoch University, Murdoch, Western Australia, Australia
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
|
To better elucidate the molecular structure and gene organization of the quail Mhc and to investigate how the genomic differences between quail and chicken might affect their respective immune responses, we determined the nucleotide sequence of the 180-kb quail Mhc that is possibly in synteny with the 92-kb chicken Mhc (B-F/B-L region).
| Materials and Methods |
|---|
|
|
|---|
Cosmid clones were isolated by colony-hybridization screening of a previously constructed quail cosmid library (23). Approximately 1.3 x 106 independent colonies derived from this library were individually screened with the Coja class I cDNA clone, QF63, Coja class IIB, and Coja-BG7 PCR products. The Coja class IIB PCR product was obtained with class IIB-specific primers covering the exons 13; CIIBA (5'-CCGGTCTGCCTACGGAAC-3') and CIIBB (5'-AACCACTTCACCTCGATCTC-3'), and the Coja-BG7 PCR product was made with BG7-specific primers covering exons 13: QBGA (5'-TCAGCCCACACCTCTTCCA-3') and QBGB (5'-GGCTCATCTCGTAGCTCTGCA-3'). PCR primer pairs and probes from PCR products were also made for 29 locus-specific markers based on the end sequences of cosmid clones and for other genes such as RING3, TAPBL, and TAP2. Contigs were assembled by Not I restriction mapping, PCR-based mapping with 29 locus-specific primer pairs, and nucleotide sequence determination by PCR-based sequencing of portions of class I, class IIB, RING3, TAPBL, and TAP2 (Fig. 1).
|
Six cosmids covering a 180-kb segment of the quail Mhc from the Coja-BG8 to Coja-H3 genes were subjected to nucleotide sequence determination by the shotgun method (27). DNA sequencing was performed by the cycle sequencing method using AmpliTaq-DNA polymerase FS, fluorescently labeled BigDye terminators in a GeneAmp PCR system (Applied Biosystems, Foster City, CA). A 377 and 3100 ABI DNA sequencer were used for automated fluorescent sequencing (Applied Biosystems).
Assembly and genomic analyses
Individual sequences were minimally edited to remove vector sequences, transferred to a SPARC station (Sun Microsystems, Palo Alto, CA) and assembled into a contig using the GENETYX-
/SQ software (Software Development, Tokyo, Japan). Remaining gaps or areas of ambiguity were analyzed by the direct-sequencing procedure using PCR amplification products obtained with appropriate PCR primers. The final sequence was initially analyzed using the GENETYX software (Software Development). The analysis was complemented by using BLAST (www.ncbi.nlm.nih.gov/blast) for homology searches, Genscan (http://genes.mit.edu/GENSCAN.html) for the prediction of coding sequences, Repeatmasker2 (http://repeatmasker.genome.washington.edu/) for the identification and classification of repeat sequences, and Grail for the identification of CpG islands. Prediction of transmembrane regions and primary structures of amino acid sequences of Coja-BG genes were conducted using the SOSUI (http://sosui.proteome.bio.tuat.ac.jp/sosuiframe0.html) and Chou-Fasman programs. Dot matrix analyses were performed using the Harrplot 2.0 software (Software Development). Multiple alignment of sequences and construction of phylogenetic trees were conducted using the Clustal W software at DDBJ (http://www.ddbj.nig.ac.jp/).
RT-PCR cloning and sequence determination
Total RNA was extracted from quail tissues with the TRIzol-LS reagent (Invitrogen, Groningen, Netherlands) based on the guanidinium thiocyanate method. Coja class IIB gene-specific oligonucleotide primers were used for RT-PCR amplification: the sense primer (5'-CCGGTCTGCCTACGGAAC-3') and the anti-sense primer (5'-AACCACTTCACCTCGATCTC-3'). The unique sequence in the signal peptide region (exon 1) of the Coja class IIB loci was used to design the sense primer and the sequence of the conserved
2 domain region (exon 3) was used to design the antisense primer. RT-PCR was performed by the oligo(dT) primer method following the protocol provided by the manufacturer (TOYOBO, Osaka, Japan). RT-PCR products (10 ng) were amplified by PCR with 2.5 U of TaqNT polymerase (Nippon Gene, Toyama, Japan) using the thermal cycler GeneAmp PCR system 9700 (Applied Biosystems). The reaction mixture (10 µl), containing dNTPs (200 µM), MgCl2 (2.5 mM), and gelatin (0.01%), was subjected to 35 cycles of 30 s at 96°C, 30 s at 65°C, and 30 s at 72°C. PCR products were cloned into the pCR2.1 vector with the TA cloning kit according to the protocol provided by the manufacturer (Invitrogen) and subjected to nucleotide sequence determination. To avoid PCR and sequencing artifacts generated by polymerase errors,
100 clones per organ were sequenced.
| Results |
|---|
|
|
|---|
To determine the genomic sequence of the quail Mhc region, a sequence-ready contig was constructed by screening a quail cosmid genomic library made from a 3-wk-old male. This male was from an inbred line of high IgY producers (16, 23) and was confirmed in this study to be heterozygous for Mhc alleles. Screening was performed by colony hybridization with the Coja class I cDNA clone, QF63 (25), and PCR products of the Coja class IIB gene, Coja-BG7 gene, and Coja-E 5' flanking regions as probes. A total of 28 cosmid clones were isolated and assembled into a map of five contiguous sequences (1, 2, 3, 4, 5) covering 508 kb by using restriction enzyme Not I mapping, PCR-based mapping with 29 locus-specific markers derived from the end sequences of cosmid clones, and direct sequencing of 29 PCR-based mapping regions. The overlapping clones were also assessed and ordered in part by PCR-sequencing analysis of the class I and class IIB genes, and nonclass I and nonclass II genes such as RING3, TAPBL, and TAP2 (Fig. 1). The nucleotide sequence differences between the class I loci (215%), the class II loci (128%) and the 29 locus-specific markers (18%) within the cosmid clones were used to confirm that the cosmids were assembled correctly into their respective contig 15. Contig 3 and 5 correspond to clusters 1 and 2 that we reported previously (23).
Contig 1 represents the first fully sequenced Coja Mhc haplotype A and it is composed of 10 cosmid clones covering a 180-kb region with 7 class IIB genes (DGB1, DFB1, DEB1, DDB1, DCB1, DBB1, and DAB1), 4 class I genes (D1, D2, B1, and E), TAPBPL, RING3, and TAP2. Contig 2 is part of the other Coja Mhc haplotype B composed of eight clones covering 130 kb with eight class IIB genes (tentatively designated as IIB9, IIB8, IIB7, IIB6, IIB5, IIB4, IIB3, and IIB2), and TAPBPL. Contig 3 is also part of the Coja Mhc haplotype B but it is composed of five clones covering 70 kb with one class IIB gene (tentatively designated as IIB1), six class I genes (D1, D2, B1, B2, E, and C2), RING3, and TAP2 (Fig. 1). The gap between contig 2 and 3 has not been connected due to the absence of overlapping cosmid clones for this region. The size of the gap between the two contigs was estimated to be shorter than 2 kb in length by PCR analysis using one primer set (sense primer: 5'-ACCCGCCAGCAGCAACCGCAGCCCCGCAGCCAT-3' and anti-sense primer: 5'-TCTGGAACCACTTCACCTCGATCT-3'). The nucleotide sequence of one of the sequence tagged site markers, C7R in contig 3, showed a perfect match with the LINE sequence identified between the DBB1 and DAB1 genes in contig 1. In addition, the TAPBPL, RING3, and TAP2 genes, which are highly conserved among jawed vertebrates, and the three Not I sites were localized to almost the identical positions in contig 2 and 3, suggesting that they are physically linked to each other and are different from that of contig 1. Such different gene organizations between Mhc haplotypes have been well characterized previously in the HLA-DRB gene region of HLA class II haplotypes (28). Contig 4 and 5 are relatively short segments covering 40 kb and 78 kb with two (A and C1) and four (D3, C3, D4, and C4) class I genes, respectively (Fig. 1). The exact locations of contig 4 and 5 on the quail genome, their linkage to each other, and their linkage to other contigs, remains to be established.
Contig 1 (haplotype A), which appears to be in synteny with the chicken Mhc B-F/B-L region, was selected for genome sequencing (Fig. 2A). Six contigs (C52, C37, C9, C1, SW70, and SW201) were shotgun sequenced to produce 179,985 bp (GenBank accession number AB078884 and Fig. 2B) of continuous sequence linking Coja-BG8 (one of quail orthologous genes of chicken B-G) to histone (quail histone-H3-like gene). This result was obtained with a high redundancy of 7.4. Overlaps between all the cosmid clones were ascertained at the nucleotide level. The G+C content was 54.6% in the quail Mhc region (Fig. 2, C and E) and 56.8% in the chicken Mhc B-F/B-L region (3).
|
Analysis of the complete sequence with the RepeatMasker 2 program detected the following repeats: 8 long interspersed nuclear elements (LINEs), 17 long terminal repeats (LTRs), 2 minisatellite repeats (QMR1 and QMR2), and 37 simple repeats (Fig. 2D). Of the eight LINEs, two were located within the intergenic regions between the DAB1 and Lec1 genes and between the DFB1 and Lec4 genes. Although the Lec2 gene is composed of four exons, Lec1 and Lec4 lack exons 1 and 2, with the LINEs occupying
3 kb around each of these areas (Fig. 2B and Table I). Two QMR1-LTR and QMR2-LTR sequences were located in tandem within the intergenic regions between DDB1 and BG2, and between DGB1 and BG3, respectively. QMR1 and QMR2 are long imperfect repeat sequences, (ACTGGATTTGCCAGCAGGC)30 and (ATTTATCGATCGTTGC)4, respectively (Fig. 2D). In contrast, none of these repeat sequences were found in the chicken Mhc B-F/B-L region (3).
|
The 180-kb genomic sequence stretching from BG8 to histone-H3 was subjected to gene identification analysis using the BLAST, Genscan, and Rummage programs. These analyses revealed 41 genes, corresponding to one gene per every 4.4 kb. This gene density is slightly higher than that in the chicken Mhc B-F/B-L region with one gene per every 4.6 kb (3), but four times higher than in the HLA region with one gene per every 15.6 kb (29). The 41 genes within the quail Mhc include 15 "expressed" genes as assessed by their sequence identity with cDNA or expressed sequence tag clones (six class IIB genes, DGB1, DEB1, DDB1, DCB1, DBB1, and DAB1; four class I genes, D1, D2, B1, and E; and five other genes, TAPBPL, RING3, TAP1, TAP2, and C4). Twelve other genes, based on their structural integrity, were assessed to be "possibly expressed" (four Coja class II genes, DFB1, DMA1, DMB1, and DMB2; three Coja-BG genes, BG7, BG6, and BG4; four Coja-NK genes, NK1 to NK4; and one Coja-Lec gene, Lec2). Fourteen genes were classified as pseudogenes because of the presence of premature termination sequences or lack of exons (three class I genes, F, G, and H; five Coja-BG genes, BG8, BG5, BG3, BG2, and BG1; five Coja-Lec genes, Lec1 and Lec3 to Lec6; and the histone-H3) (Fig. 2B and Table I). Four Coja class I genes, D1, D2, B1, and E have the same isotypic loci in contig 2/3 or cluster 1 that was previously described (23). All of the other genes are described and designated in this study for the first time. The donor/acceptor sites for the 15 expressed and 12 possibly expressed genes were conserved in all intron/exon boundaries (data not shown), except for DBB1 (see below). Among the 27 expressed or possibly expressed gene sequences within this region, 26 of them, with the exception of RING3, have a previously determined function in the immune response system.
Coja class I genes
The Coja class I region of haplotype A has seven class I loci with the gene order of D1, F, G, H, D2, B1, and E from the Coja class IIB to class I regions, spanning
31 kb (Fig. 2B and Table I). The length of the quail class I gene region is twice as long as the corresponding chicken class I gene region (16 kb) (3). Gene orientations are the same for D1, F, G, H, and D2, but the reverse for B1 and E. The F, G, and H genes are class I pseudogenes newly identified within the 10-kb segment between the D1 and D2 genes (Fig. 2B and Table I). The F gene encompass only 224 bp from intron 6 to exon 8, and has 84.0% nucleotide identity with D2 and E. The G and H genes encompass 460 and 133 bp from exon 1 to intron 1 with 87.4 and 99.2% nucleotide identity to B1, respectively. Thus, several class I pseudogenes are embedded in the quail Mhc region as compared with the absence of Mhc class I pseudogenes in the chicken Mhc (3). Of the four expressed class I genes, D1 and D2 appear to be the nonclassical class I loci in terms of a weak expression and limited tissue-specificity, as described previously for B-FA2 in the chicken Mhc (3).
Phylogenetic trees of the quail and chicken Mhc class I genes showed that the quail class I genes are more closely related to each other than to the chicken class I B-FA1 and B-FA2 and Y-FA genes (Fig. 3A). This suggested that the quail class I genes were generated after speciation of the quail and the chicken from the common ancestor. The locations of D1, F, G, H, and D2 genes in the quail Mhc region correspond to that in the chicken B-FA2 gene region, whereas locations of the quail B1 and E genes correspond to the chicken B-FA1 gene region (see Fig. 6B). However, two amino acid insertions observed in the D1 and D2 proteins at amino acid position 53 in the midst of the
1 domain (exon 2) were absent from the quail B1 and E proteins and the chicken FA1 and FA2 proteins. In addition, two amino acid deletions at position 292 of the transmembrane region (exon 5) were observed in the quail D1, D2, and E proteins, but not in the quail B1 or the chicken FA1 and FA2 proteins (25, 26). Therefore, neither D1, D2, nor E are likely to be the orthologous genes of the chicken B-FA1 or B-FA2. In fact, the Coja-B1 displays the highest nucleotide identity (86%) with B-FA1, and they may be the orthologous counterparts.
|
|
The gene order of the 10 Coja class II genes was established within an 88-kb segment from DGB1 to DMB1 (Fig. 2B and Table I). If the class II gene region in the quail Mhc is defined as the genomic area from BG8 to DMB1, then it is 125 kb in length as compared with 42 kb in the corresponding chicken class II gene region (Fig. 4). The gene orientation is the same for DGB1, DFB1, DEB1, DDB1, and DCB1 but reversed for DBB1 and DAB1. All of the class IIB genes, except for DBB1, are composed of six exons with an average length of 1.2 kb, and an extremely compact structure with small intronic sizes (the total length of introns 15 is
400 bp) (Table I). The lengths of the quail genes and introns are similar to those of the chicken class IIB B-LB1 and B-LB2 genes.
RT-PCR analysis of the Coja class IIB loci by cloning and sequencing RT-PCR products revealed DAB1 and DBB1 transcription in four immunological organs, the lymphocytes, bursa of Fabricius, thymus, and spleen. In contrast, transcripts of DCB1, DEB1, DFB1, and DGB1 were found only in one or other of these immunological organs in a tissue-specific manner. No DDB1 transcripts were detected in the four immunological organs that we examined (Table II). Although the transcript of DBB1 was observed in all of the immunological tissues, it contained an additional 30-bp nucleotide sequence corresponding to the hypothetical intron 1 region. The donor site of this hypothetical DBB1 intron 1 region is conserved (GT), but its acceptor site changes from AG to CT, which results in no splicing of the 30-bp genomic fragment. The amino acid sequence deduced from the inserted 30-bp nucleotide sequence does not contain any structural defects such as premature termination and frame shift in codon usage, suggesting that DBB1 consists of five exons (exon 1: signal peptide + additional 10 aa +
1 domain; exon 2:
2 domain; exon 3: transmembrane domain; exons 4 and 5: cytoplasmic domain) that are likely to encode a class IIB-like protein.
|
1 domain is the most variable of the nucleotide and amino acid sequences among the Coja class IIB genes (75.697.8% nucleotide identities; 83.3% on average), as similarly observed in the class IIB genes of other species. The 18 putative peptide binding sites, predicted from human and pheasant class IIB structural analyses (30, 31), displayed a high degree of nucleotide variations mostly accompanied by an amino acid exchange, not only among the Coja class IIB loci, but also between the Coja and Gado class IIB loci (Table III). The number of divergent amino acid residues within the peptide binding sites of the Coja class IIB loci ranged between 2 and 15, 11.4 on average, which is higher than the average of eight divergent residues detected between the two Gado class IIB loci, B-LB1 and B-LB2, as listed in Table IV. In contrast, exon 3 that encodes the
2 domain is completely conserved (100% nucleotide identity) and exons 46 are well conserved (93.9100% nucleotide identities) among the Coja class IIB loci, as was similarly observed between the chicken B-LB1 and B-LB2 (data not shown). Interestingly, these well-conserved regions have very high GC contents (
70%) and contain CpG islands (Fig. 2, C and E).
|
|
1,
2, transmembrane, and cytoplasmic regions) ranged from 90.8 to 99.0% (between the Coja class IIB loci and from 87.9 to 89.8% between the Coja and Gado class IIB loci. The sequence identity was 97.7% between Gado B-LB1 and B-LB2. The DAB1 and DBB1 loci were found within the 11-kb segment between TAPBPL and RING3, and this location corresponds to the chicken B-LB2 locus in the Gado Mhc region. Moreover, these two quail genes were expressed in lymphocytes, bursa of Fabricius, thymus, and spleen, similarly to B-LB2 that encodes a major class IIB protein in the chicken. In comparison, the quail DCB1, DDB1, DEB1, and DGB1 genes had limited expression in the immunological organs. These genes are located within a 52-kb segment between Coja-Lec6 and TAPBPL and correspond in location to the chicken B-LB1 locus, which is also only moderately expressed (Table II and Ref.32). Three nonclassical class II genes, DMA1, DMB1, and DMB2, that possibly encode the class II Ag-processing molecules (33), were identified within the 18-kb segment between the RING3 and D1 genes near the class I and class II boundary region (Fig. 2B). The amino acid sequences of the translated genes DMA1, DMB1, and DMB2 have 82.5, 85.7, and 83.8% similarity with the chicken B-MA1, B-MB1, and B-MB2 molecules, respectively (see Fig. 6D). The amino acid identity between the Coja-DMB1 and the Coja-DMB2 was considerably lower (52.8%), although six cysteine sites essential for formation of functionally important disulfide bridges were conserved in both molecules. This relatively low sequence identity may indicate that DMB1 and DMB2 have distinct but unknown functions rather than the chaperone-like stabilizing role previously assigned to the DM proteins as a function in the class II-mediated Ag-presentation pathway (33, 34).
Coja-BG, -Lec, and -NK genes
The quail genes that are in synteny with the chicken B-G genes were tentatively designated as Coja-BG instead of Coja-G to avoid confusion with the quail class I gene, Coja-G. Of the eight Coja-BG loci, only three genes, BG4, BG6, and BG7, were predicted by Genscan analysis to have intact genomic structures consisting of 22, 15, and 16 exons, respectively, that would express transcripts of 56 kb (Fig. 5 and Table I). Gene orientation of BG6 and BG7 is the same as the chicken B-G gene whereas BG4 is in the opposite direction (Fig. 2B). Of the various coding exons, exons 13 and the last exon are well conserved with 87.689.8% nucleotide sequence identity. Exon 2 encodes the Ig variable-like region with 42% amino acid identity to both the human myelin oligodendrocyte glycoprotein and the human butyrophilin (GenBank accession numbers Q61885 and U90548, respectively) (Fig. 3B). Exons 1 and 3 were predicted to encode the hydrophobic transmembrane domain by the SOSUI program. However, the internal exons between exon 3 and the last exon were not well conserved with
65% nucleotide sequence difference among BG4, BG6, and BG7. All of these exons are composed of 21 nucleotides and they encode the
-helical coiled coils structure as predicted by the Chou-Fasman program (Fig. 5). The other five Coja-BG loci are apparently pseudogenes. Namely, BG1 has a nonsense mutation in exon 2, and BG8, BG5, BG3, and BG2 are cryptic genes lacking most exons (Table I and Fig. 5). Phylogenetic analysis of their Ig variable-like region domains (exon 2) suggests that the genetic distance from the chicken B-G gene is closer to Coja-BG6 than to BG4 and BG7 (Fig. 3B). Therefore, BG6 may be the quail orthologous gene for the chicken B-G, although the number of exons is different between these two genes, 10 for BG6 and 11 for B-G (35, 36).
|
On the basis of their sequence identity and gene structure, the four Coja-NK loci (NK1 to NK4) all appear to encode proteins. The nucleotide sequence identities in the coding region of the Coja-NK genes range from 96.0 to 97.2%. The nucleotide sequence identities between each of the Coja-NK14 genes and the chicken B-NK gene range from 83.4 to 84.6%. The Coja-NK14 genes are composed of 136 aa, encoding a C-type lectin-like receptor with amino acid identity to the C-type lectin-like domain segments of a human NK cell receptor protein, NKR-P1A (33% identity) and a human cell adhesion molecule, CD209 (ICAM-3) (27% identity) (GenBank accession numbers NP_002249 and NM_021155, respectively). The human genes encoding these proteins are located on the human chromosome 12p13-p12 within the NK cell receptor gene cluster and 19p13 within the Mhc paralogous genomic regions, respectively (37, 38).
The Coja-Lec2 and Coja-NK14 gene products have a low amino acid identity with each other (29%) but they both belong to the C-type lectin-like gene superfamily and they share seven conserved cysteine residues involved in disulfide bridges that may play a role in eliciting their receptor function. The seven cysteine residues are well conserved also in LLT1, CLRF, AICL, CD69, NKR-P1A, and CD209 at the corresponding positions as seen in chicken B-NK (3). The Coja-Lec2 shares the C-type lectin-like domain with Coja-NK14, B-Lec, B-NK, and other immune-response-related genes in various species.
Other genes
The other quail genes, TAPBPL, TAP1, TAP2, and RING3 (Figs. 2B and 4B), are found within orthologous positions of the chicken Mhc region (3) but in the class II gene region of the human and mouse Mhc (28, 39). Interestingly, the quail and chicken TAP1 and TAP2 genes reside within the class I region, as similarly observed in the zebrafish, pufferfish, and Xenopus Mhc regions (40, 41, 42).
The quail TAPBPL or Tapasin gene, known as the TAP-binding protein gene (43), was found within a 6-kb segment between the DCB1 and DBB1 genes (Fig. 2B). It is homologous to the chicken, human, and mouse Tapasin genes with 90, 34, and 33% amino acid identities, respectively. The quail TAP1 and TAP2 genes that belong to the ATP-binding cassette Mhc class I Ag-transporter gene family (44) were identified within a 10-kb segment between the D2 and B1 genes (Fig. 2B). The amino acid sequence identities of the quail TAP1 and TAP2 proteins were 91, 45, and 43% with the chicken, human, and mouse TAP1 and 90, 43, and 41% with their TAP2 proteins, respectively (23). The amino acid sequence of TAP2 in contig 1 has two amino acid substitutions (Thr/Lys and Ala/Gln at amino acid positions 328 and 329, respectively) (Fig. 1) when compared with TAP2 in contig 2/3 (Ref.15 , GenBank accession number AB007195, and Fig. 1) due to two single nucleotide polymorphisms between the two different haplotypes. The genomic and deduced amino acid sequences of TAPBPL, TAP1, and TAP2 in contig 13 had no structural defects that could be deleterious to Mhc class I or class II Ag presentation.
The RING3 gene that encodes a nuclear serine-threonine kinase protein and has the bromodomain (45) was identified within a 14-kb segment between the DAB1 and DMA1 genes (Fig. 2B). It is homologous to the chicken, human, and mouse RING3 genes with 99, 69, and 68% amino acid identity, respectively. Exon positions were determined by sequence alignment with the quail and chicken RING3 genomic sequences. Consequently, quail RING3 was found to consist of 11 exons in contrast to the 12 exons in the human and mouse RING3 genes. This difference can be explained by the absence of a quail exon that would correspond to exon 1 in the human and the mouse and transcribe the 5' untranslated region. The quail RING3 protein (735 aa) and chicken RING3 protein (733 aa) are smaller than the human (798 aa) and mouse (801 aa) RING3 proteins. The biological function of RING3 remains to be established although its gene is always linked to the Mhc of all the jawed vertebrates so far examined (29).
In human and mouse, the C4 gene that encodes the complement factor C4 resides in the class III region that is located between the class II and class I regions (28, 39). In quail, the C4 gene was found within a 17-kb segment between the E and histone-H3 genes (Fig. 2B). The quail C4 has 38 exons with 91.7% nucleotide and 91.8% amino acid sequence identity with the chicken C4. The translated quail C4 gene had
35% amino acid sequence identity with the human and mouse C4A and C4B proteins. The highest similarity was observed within the domain regions of the
2-macroglobulin N-terminal family, the anaphylotoxin-like domain, the
2-macroglobulin family, and the netrin C-terminal (NTR/C345C) domain, which are essential for interacting with the C3, C4, and C5 proteins (data not shown).
The quail and chicken histone-H3-like genes, located at the terminal end of the class I gene regions, are pseudogenes that lack a large part of their coding regions. As with the Gado B region (3), there were no class II
and LMP2/7 genes found within the Coja Mhc region.
Dot-matrix analysis
The patterns of regional genomic duplication were examined by dot-matrix analysis of the 180-kb Coja sequence (Fig. 4A). The analysis revealed that the class IIB region has >22 pairs of duplicated segments ranging in size from 1 to 20 kb. This striking observation prompted us to look more closely into each of these segments. Five class IIB genes (DGB1, DFB1, DEB1, DDB1, and DCB1) are oriented in the same direction and two upstream 20-kb segments have a high degree of similarity. The 20-kb upstream segment with DCB1 (BG2, NK1, BG1, Lec2, and DCB1) has 95.7% nucleotide similarity with the 20-kb upstream segment carrying DGB1 (BG5, NK4, BG4, Lec6, and DGB1). Therefore, there are five class IIB-linked homologous segments that share a unique combination of three genes that are members of the NK, BG, and Lec multigene families, such as the DCB1 segment (BG2, NK1, BG1, Lec2, and DCB1), the DDB1 segment (NK2, Lec3, and DDB1), the DEB1 segment (Lec4 and DEB1), the DFB1 segment (BG3, NK3, Lec5, and DFB1) and the DGB1 segment (BG5, NK4, BG4, Lec6, and DGB1). Furthermore, the DFB1/DEB1 unit (DFB1, LINE, Lec4, and DEB1) resembles the DBB1/DAB1 unit (DBB1, Lec1, LINE, and DAB1), although they are in the opposite direction to each other (Fig. 2B). Thus, successive cis- or trans-segmental duplications of this basic unit appear to have produced the present Coja class IIB region. No similar segmental duplications were recognized in the quail class I or the chicken class I or class II region by dot matrix analyses. However, dot matrix of the Coja and the B-F/B-L regions revealed that the Coja class IIB duplicated units converged at the single chicken Mhc class IIB unit (B-G, B-NK, B-Lec and B-LB1) (Fig. 4B). The RING3 to Coja-E region is well conserved between two species with 84% nucleotide sequence identity, when the quail-specific region around the Coja-D1 and Coja-D2 segment was excluded from this comparison.
Genome diversity between the quail and chicken Mhc regions
The genome diversity for 13 representative genes within the quail and chicken Mhc regions was determined by calculating the differences in the nucleotide sequences of the genes (exonic and intronic regions) and the amino acid sequences deduced from the nucleotide sequences (Fig. 6). The average percentage nucleotide differences for the genes (53,287 bp), exons (18,090 bp) and introns (35,197 bp) were 13.9, 10.8, and 16.2%, respectively (Fig. 6). The average amino acid sequence divergence for the coding regions (6030 aa) was 14.7%. The range of diversities between the 13 genes for each of the analyses can be seen in Fig. 6. The amino acid divergences varied considerably between the quail and the chicken depending on the gene products (Fig. 6D). In contrast, the nucleotide diversities were relatively uniform across the entire gene sequence for all 13 genes (Fig. 6, AC). In this regard, the amino acid sequences of the quail BG6, NK1, DCB1, and B1 proteins, which fulfill immunological functions as cell surface molecules, were more divergent than in the other proteins. For example, the amino acid divergence of NK1 (31.3%) was twofold higher than the average divergence (14.7%), and BG6, DCB1, and B1 were also highly divergent (22.0, 20.1, and 23.7%, respectively). DMA1, DMB1, and DMB2 displayed moderate divergence that was close to the average divergence between quail and chicken (17.5, 14.3, and 16.2%, respectively). Lec2 was less divergent than other cell surface molecules (10.3%). TAPBPL, TAP1, and TAP2 were more conserved with similar divergences of 10.2, 9.3, and 9.7%, respectively, than the other proteins. RING3 was the most conserved protein sequence with the lowest amino acid diversity of 1.2% between quail and chicken, supporting the previous finding that the RING3 is evolutionarily stable and highly conserved from yeast to human.
It is noteworthy that BG6 and NK1 revealed a surprisingly high degree of amino acid diversity. The diversity between these loci in quail and chicken is equivalent to or higher than that at the class I (B1) and class II (DCB1) loci. This is surprising because class I and class II are among the most polymorphic loci in vertebrates. This observation suggests that BG6 and NK1 may also be under selection for diversity. Therefore, not only the class I B1 and class II DCB1, but also the BG6 and NK1 appear to undergo selective pressure to maintain genetic polymorphisms. The other expressed NK and BG genes also encode a high degree of amino acid divergence (30.9% for NK2, 33.1% for NK3, 30.9% for NK4, 24.2% for BG4, and 24.2% for BG7) when compared with the corresponding chicken B-NK and B-G proteins.
The percentage nucleotide divergence of the intronic regions between quail and chicken was relatively uniform at 16.2% for the 13 genes on average (Fig. 6B). The separation time (T) between the quail (Coja) and chicken (B-F/B-L) Mhcs was calculated as 81 ± 1.9 million years ago by using the equation d = 2µT, where d is the average nucleotide distance (Kimura 2 parameter; 0.162) between the intronic regions of the two species and µ is the mutation rate determined for introns and pseudogenes in primates (1.0 x 109) (46).
| Discussion |
|---|
|
|
|---|
The structural differences between the Mhc regions of the quail and chicken may be explained by segmental duplication of two distinct duplication units (Fig. 4). For instance, the genomic segment consisting of the Coja-class IIB, Lec, BG, NK, and BG gene family members has been duplicated at least five times along with some deletion events to form part of the present-day Coja class IIB region. Similarly, repeated gene duplications of the essential class I segment have contributed to the main organization of the Coja class I region (Fig. 4). Phylogenetic analysis of the quail and chicken Mhc class I and class II genes showed that these gene sequences were more closely related within species than between species, suggesting that the quail class I and class II genes were generated after the separation of the quail and chicken from their common ancestor. The genomic structure of the Coja Mhc region was probably shaped by duplication events in response to environmental conditions and pathogens that the quail encountered after its divergence from the chicken. For example, the chicken is a nonmigratory bird originating from Southeastern Asia while the quail is a migratory bird originating from Northern and Southern Asia and flying only short distances at a time. Therefore, the quail immune response system may have been stimulated by a larger variety of environmental pathogens during its migratory history, promoting an increase in the number of class I and class IIB loci and/or the degree of genetic polymorphism. A similar explanation may be applied to other migratory birds such as the warbler (47). In addition, the chicken may have lost or translocated several Mhc loci and gene isotypes after its separation from the quail lineage and during its domestication and/or expansion of Rfp-Y, resulting in the emergence of the minimal essential Mhc (32, 48).
To trace the evolutionary process of the quail Mhc class II region, phylogenetic analyses were performed using the BG, NK, and Lec amino acid sequences that were derived from the nucleotide-coding sequences (Fig. 3). However, these analyses could not identify the progenitor gene nor elucidate the detailed duplication processes, probably because large-scale insertions/deletions have taken place frequently in the Coja class II regions after the regional duplication events (Fig. 4). Generally, our findings support the "birth and death" model for the evolution of vertebrate multigene families (49, 50). In addition, eight LINE and seventeen LTR sequences were identified in the 180-kb Coja region, but no LINE and LTR sequences were present in the 92-kb Gado Mhc region (Fig. 2D). In the human Mhc region, the density of LTR is significantly higher around the duplicated regions. For example, the MHC class I polypeptide-related sequence B (MICB) to HLA-C region has an LTR content of 23% on average, whereas nonduplicated regions have an LTR content of 8.2% on average. Indeed, both ends of the HLA class I-duplicated units contain LTR sequences (51, 52), implying that there were retrotransposon-mediated segmental genome duplications within the Mhc regions (53, 54). Therefore, the presence and absence of the retrotransposons in the quail Mhc class II region and the chicken Mhc class I and class II regions, respectively, is consistent with a model of retrotransposon-driven segmental genome duplication.
Of the class I and class IIB expressed loci that were identified in contig 1 of the quail Mhc region, D1, D2, DCB1, DDB1, DEB1, and DGB1 appear to be nonclassical class I or class II loci in terms of a weak expression and limited tissue-specificity as previously observed for the B-FA2 and B-LB1 genes in the chicken Mhc region. Although the function of the DMA1, DMB1, and DM2 genes are not known, they are unlikely to be the major class II loci. In contrast, the expression of B1, E, DAB1, and DBB1 is very high in a large variety of tissues, as similarly observed for the B-FA1 and B-LB2 genes in the chicken. Therefore, these genes are probably the major or classical class I and class II loci (3, 32). It is likely that the classical class I and class II genes acquired their locus-specific function gradually by repeated nonsynonymous substitutions after their generation by duplication. The amino acid diversities between the quail and chicken Mhc class I and class IIB loci were found to be as high as 20% and the maintenance or progression of genetic polymorphisms within these two species is probably favored by a balancing or overdominant selection. The TAPBPL, TAP1, TAP2, DMA1, DMB1, and DMB2 genes were also found to have relatively high degrees of amino acid diversity, ranging from 9.3 to 17.4% (Fig. 6D). The amino acid divergences observed in the Ag-processing pathway-associated transporters or chaperones are much higher than that in RING3 (1.2%), which seemingly is a genetically conserved housekeeping or regulatory gene (45). Therefore, it may be speculated that the quail Mhc class I and class II genes and its Mhc Ag-processing genes have coevolved through various genetical interactions in pursuit of efficient allelic combinations that are necessary for the establishment and maintenance of a sophisticated immune response system.
The peptide binding sites deduced from the amino acid sequences of the quail Mhc class I and class IIB genes are highly variable (63% amino acid diversity) among the class I and class IIB proteins (Tables III and IV, and Ref.23). In contrast, the amino acid diversities between the chickens class I proteins (B-FA1 and B-FA2) and between the class II proteins (B-LB1 and B-LB2) were much less than between the corresponding quail class I and class II proteins (Table IV). This suggests that the Ag-presentation repertoire in the quail is larger and better adapted to environmental pathogens than in the chicken that has been domesticated and raised under artificial selection and in more simplified environmental conditions for at least the last 7500 years (55). In this regard, the affect of domestication on the possible genomic contraction of the B-F/B-L and the expansion of Rfp-Y region should not be overlooked or dismissed completely, and therefore, these two chicken Mhc regions might be worth investigating in the ancestors of the domestic breeds (55). In mammals, the T cells undergo negative selection in the thymus where Mhc molecules with self peptides are recognized to avoid autoimmune reactions. Therefore, the expression of too many Mhc loci may reduce the T cell repertoire (56). If a similar mechanism is operating in the quail, then the TCR repertoire is likely to be smaller in the quail than the chicken due to the larger number of expressed Mhc loci. This hypothesis seems to be consistent with the successful xenogenetic transplantation of extrathymic chick tissue into the quail in early embryonic stages (21) when a low TCR repertoire would be expected to increase immunotolerance in xenotransplantation. However, many of the quail class I and class II genes are nonclassical (two of four expressed class I genes and four of seven class II-expressed genes) in that they are weakly expressed in a tissue-specific manner (Table II) and they may limit Ag presentation against not only pathogens, such as Mareks disease and Newcastle disease, but also self Ags. Thus, there are only two major (classical) class I genes and two classical class IIB genes in the quail Mhc. This scenario is similar for the Mhc of the chicken (one for each of the class I and IIB loci) and human (three loci for class I and three for class IIB). The relatively small number of classical class I and class II genes in the quail Mhc may allow a more efficient immune response to foreign and self Ags by maintaining a large TCR repertoire that may still permit some immunotolerance in xenotransplantation.
The average nucleotide sequence divergence of intronic regions between the quail and chicken was 16.2% for 13 genes (Fig. 6B), which is similar to the average divergences in other chromosomal intronic regions (
1417%) and in the mitochondrial sequence (14.3%) (Ref.57 and data not shown). Because of this sequence diversity, chicken microsatellite primers did not generally allow successful PCR amplification in the quail genome (58). The intronic regions have supposedly undergone neutral selection, although they are surrounded by the exonic regions with a high degree of nucleotide and amino acid diversities that have been maintained under positive selection (Fig. 6D). Based on the average of 16.2% nucleotide divergence in the intronic regions, the divergence time between the quail and the chicken (d = 0.162) was estimated to be 81 ± 1.9 million years ago. To our knowledge, this divergence time is the first estimation based on a genomic sequence comparison between these two species.
The genetic distance between the quail and the chicken appears to be deceptively closer than to other birds, possibly because the quail and chicken can produce chimeric animals. However, the estimated 81 million years of separation between quails and chickens is almost the same time difference as the 90 million years of separation between humans and cattle (59). Therefore, why quails and chickens can still produce hybrid animals in experimental crosses by artificial insemination with chicken semen (15) and what are their genetic and immunological interrelationships remains to be elucidated. In this respect, it will be necessary to investigate the immunological functions of the chicken and quail Mhc molecules and their related proteins for a better understanding of avian immunology. Further comparative genomic analyses will help to provide clearer insights into the molecular mechanism and evolutionary processes responsible for Mhc diversity between and within different avian species. So far, the only reports available for the avian genomic sequences are for the Coja and Gado Mhc regions (Present study, and Refs.3 , 23 , 25 , and 26) and the 32 kb and 45 kb of genomic sequence for the class II regions of house finch (Carpodacus mexicanus) and red-winged blackbird (Agelaius phoeniceus), respectively, that belong to the genus Passeriforms (60, 61, 62). We are continuing to conduct large-scale sequencing of the Mhc regions in the quail as well as in other birds. Such comparative genomic studies among avian species will give us more information about the genetic, immunological, and evolutionary differences that exist between avian genomes.
| Footnotes |
|---|
2 The sequence presented in this article has been submitted to GenBank under accession number AB078884. ![]()
3 Address correspondence and reprint requests to Dr. Hidetoshi Inoko, Department of Molecular Life Science, Division of Basic Medical Science and Molecular Medicine, Tokai University School of Medicine, Bohseidai, Isehara, Kanagawa 259-1193, Japan. E-mail address: hinoko{at}is.icc.u-tokai.ac.jp ![]()
4 Abbreviations used in this paper: Gado, Gallus domesticus; Coja, Coturnix japonica; contig, contiguous cosmid; LINE, long interspersed nuclear element; LTR, long terminal repeat. ![]()
Received for publication May 30, 2003. Accepted for publication March 15, 2004.
| References |
|---|
|
|
|---|
genes are closely linked to the class I genes and the nucleolar organizer. EMBO J. 7:2775.[Medline]
-chain gene, a truncated class II
-chain gene, and a large CR1 repeat. Immunogenetics 55:100.[Medline]
-chain (B-LB) gene flank the Tapasin gene in the B-F/B-L region of the chicken major histocompatibility complex. Immunogenetics 51:138.[Medline]