|
|
||||||||











* Department of Experimental Immunology and
Department of Genome Analysis, Helmholtz Centre for Infection Research, Braunschweig, Germany; and
Torrey Pines Institute for Molecular Studies, San Diego, CA 92121
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
The murine Igh locus is
3 million bases (Mb) in size and is located close to the telomere on chromosome 12. The locus is known to be highly polymorphic within the genus Mus (3). Based on detailed Southern blot analyses, the Igh loci of inbred strains were assigned to different haplotypes (4, 5). The strain 129/Sv possesses the Igha haplotype, as does BALB/c, the strain that perhaps has been most widely used in studies of Ab responses in mouse. The Ighb haplotype is present in C57BL/6 and in the context of the mouse genome project (6) the Ighb locus was entirely sequenced (7, 8, 9). In this study, we provide a 1.6-Mb sequence of the 129 substrain 129S1/SvImJ, abbreviated 129S1, spanning the CH
exons up to the beginning of the large VHJ558 family region. This Igha sequence expands the picture of the murine Igh locus: on the one hand, the comparison of genomic sequence from the two haplotypes allows a detailed view of recent evolutionary changes in copy number and sequence of VH genes and other features, e.g., interspersed repeats. In contrast, the sequence is also of general use and will support the elucidation of the special phenomena occurring in the Igh locus: VDJ recombination, class switch recombination, and somatic hypermutation, all of which are guided, regulated, or at least influenced by different sequence motifs (10, 11, 12).
| Materials and Methods |
|---|
|
|
|---|
Igha locus BACs were isolated from the CitbCJ7, abbreviated CT7–, library (see http://www.tree.caltech.edu/). This library was prepared from the CJ7 ES cell line (13) derived from the 129S1/SvImJ mouse strain (ftp://ftp.informatics.jax.org/pub/reports/ES_CellLine.rpt). Library superpools and high-density membranes for screening and individual BAC clones were purchased from Research Genetics, now Invitrogen Life Technologies, and are also available from OpenBiosystems. Superpools were screened with a series of sequence-tagged site (STS) described in Ref. 14 . These included sites in the 3' end of the CH region (IgA exon 3), JH region, DH region (DFL16.1), and various VH region sites (VHgroupIII-VH7183, VH11, VHS107, VHGAM3-8, and VHJ606), and a yeast artificial chromosome (YAC) end (ADGC9-left arm) located in the VH10 region. Positive mouse BAC clones were obtained from Research Genetics, plated on agar plates containing 12.5 µg/ml chloramphenicol, and confirmed by colony PCR with the identifying primer sets. BAC ends were sequenced either directly from T7 and SP6 primers or following amplification by Vector-Hexamer PCR (15). BAC end sequences were deposited in EMBL/GenBank/DDBJ accession nos. BH021141–BH021349. Contig assembly proceeded by assessing STS content using the screening sites, D12Mit markers and others, and extensive Southern blot analysis using VH probes and BAC and YAC ends, all as described in Ref. 14 . Gaps were closed by developing new screening PCR assays from BAC end sequences.
Sequencing and assembly
Sequencing of 23 BACs of the CT7 library was performed by a shotgun approach as follows: sheared fragments of either 1 or 3 kb in length (GeneMachines) were subcloned separately into a pTZ18R vector. At least 800 clones were selected from each clone library, most of the plasmid DNA was prepared following a protocol supplied by Millipore. One-third of the selected clones were amplified by TempliPhi (384-well format) basically following the instructions of the supplier (Amersham Pharmacia Biotech). Cycle sequencing was routinely performed using a DYEnamic ET terminator cycle sequencing premix kit (Amersham Pharmacia Biotech) and UPO/RPO primer (MWG-BioTech). Most of the separations were run on Applied Biosystems 377 slab gel sequencers and one-third of the samples on MegaBACE capillary sequencers. Data were assembled and edited using the GAP4 program (16).
Sequence analysis
The VH genes were annotated using a procedure developed for the automatic generation of the database VBASE2 (17). In this procedure, VH genes are detected by a BLAST search (18) of the BAC sequences with known germline VH gene sequences. BLAST hits with a minimum identity of 80% and minimum alignment length of 200 bp are analyzed with the DNAPLOT program (W. Müller and H. H. Althaus, unpublished data; http://www.dnaplot.de) and matched to VDJ rearrangements from the EMBL/GenBank/DDBJ The DNAPLOT analysis is limited to exon 2 of the VH genes, but includes detection of RSS elements. Exon 1 of each VH gene, DH gene, JH gene, and CH gene have been annotated manually by sequence comparisons with BLAST, PipMaker (19) and other alignment programs, referring to previously published annotations; the EMBL/GenBank/DDBJ accession no. of the respective reference sequences are given in the feature table of the EMBL/GenBank/DDBJ entry AJ851868.
Dot plots and percent identity plots (PIPs) have been generated with the PipMaker program, parameters: "search both strands," "show all matches" sensitivity mode: "default," "show all matches" for dot plots, and "chaining" for PIPs. Interspersed repeats have been detected by using the Repeatmasker web server (A. F. A. Smit, R. Hubley, and P. Green, unpublished data; http://www.repeatmasker.org). The searches were performed against the Repbase mouse dataset with the "cross_match" search engine and slow sensitivity mode. Phylogenetic trees have been generated with IMGT alignments created with DNAPLOT (20) and visualized with MEGA version 3.1 (21). Further sequence analysis and formatting have been done with the emboss program package (22) and Perl scripts.
| Results |
|---|
|
|
|---|
To elucidate the genomic sequence of the murine Igha haplotype, 23 BACs from the Citb, or CT7, library of the 129S1 mouse were selected using a physical map of the locus assembled with YAC and BAC end STS, D12Mit simple sequence length polymorphism sites, and Igh-C, J, D, and V gene segments (Refs. 14 and 15 ; C. Chevillard and R. Riblet, unpublished observations). The BACs were sequenced with a 10-fold coverage on average. The BAC inserts represent an overall length of 2.5 Mb and were assembled to a sequence of 1.6 Mb. This sequence covers approximately one-half of the Igh locus, including the CH region, JH region, DH region, and the JH-proximal part of the VH region (Fig. 1). The assembly is contiguous between the BACs 407I12 and 34H6, except for two simple repeat stretches of unknown length, one in 34H6 and one in the overlap of 459E6 and 436C3. The unfinished sequence of BAC 4K8 overlaps 407I12 and extends the assembly toward the JH distal part of the VH region with 12 single fragments. The order of these fragments is hypothetical, based on sequence comparison with the homologous Ighb region in C57BL/6. The complete assembled sequence and annotation are available from EMBL/GenBank/DDBJ (accession no. AJ851868).
|
The Igha sequence assembly was screened for VH genes using a BLAST search and a subsequent V gene analysis method based on the program DNAPLOT (17). This method annotates the V gene beginning from the coding sequence of the mature peptide chain up to the noncoding RSS sites at the 3' end of each V gene. The results of this procedure were complemented by the annotation of exon 1. VH genes with obvious defects like stop codons, defective splice sites, and abnormal RSS elements were marked as pseudogenes. To further address the question of V gene functionality in a defined in silico approach, VH genes were matched against 6190 Ig VDJ rearrangements extracted from EMBL/GenBank/DDBJ. Those VH genes with a 100% match to a rearranged sequence are classified as functional. A physical map of the annotated region is shown in Fig. 1.
In the 3' half of the Igha V region, 128 VH genes were annotated, of which 49 have been found in rearrangements. Of 79 VH genes without clear evidence of functionality, 63 are obvious pseudogenes. The remaining 16 VH genes are designated as "possibly functional." To further analyze the potential functionality of the VH genes, we performed a pairwise sequence comparison of all 79 possibly functional and pseudogenes with the 49 functional VH genes. The result (summarized in Table I) shows that the majority of pseudogenes have major changes in their sequence, whereas some of them are very similar to functional genes. Although 12 possibly functional VH genes have at least 90% sequence identity to a functional gene, this is also the case for 6 pseudogenes. This indicates that a high sequence similarity to a functional gene cannot be taken as evidence for functionality, and the functionality of the 16 possibly functional genes cannot be proven based on the available data. However, because somatic mutations in the VDJ rearrangements prevent a 100% match to a germline sequence, it is likely that at least some of the possibly functional sequences are indeed functional.
|
|
The JH proximal VH region of the Igha and Ighb haplotype: sequence comparison and VH gene allele assignment
The distinct polymorphisms in the murine Ig receptor loci are due to the strong evolutionary dynamics of these loci. The divergent evolution of VH genes has been described as resulting from diversifying selection and evolution by birth and death process (24). The inbred strains provide something like a snapshot of these loci, so that haplotype and even allele assignment is possible. We compared the Igha and Ighb haplotype by a dot plot of the homologous VH region sequences from 129S1 and C57BL/6 (Fig. 2A). This plot indicates that the overall structure, namely, the position of the discrete VH gene family clusters, is very similar in both haplotypes, a finding that confirms previous experimental results based on Southern blot analyses (25). Concerning individual VH gene family clusters, the dot plot shows two regions of expansion in the Igha locus: the VH7183/VHQ52 region (previously described in Ref. 26) and the interspersed VHGAM3-8/VH12 families. The VH7183/VHQ52 cluster is well conserved between both strains at the 3' end, followed by a sequence region which is much larger in Igha than in Ighb. This local expansion results in approximately twice as many functional VH 7183/VH Q52 genes in the Igha repertoire compared with Ighb (Table II). Similarly, the VHGAM3-8/VH12 region is larger and comprises more VH genes in Igha compared with Ighb. A detailed view of the differences between Igha and Ighb is provided by a percent identity plot of the homologous regions (supplemental data). 7 The plot displays annotations and interspersed repeats in the 129S1 sequence along with the percent identity detected in the C57BL/6 sequence.
|
|
In the Igh-D region of 129S1, 15 DH genes are annotated and assigned to one of the sequence families DSP2, DFL16, DST4, or DQ52. In addition, three copies of the truncated DH gene DMB1 (27), DMB1–DMB3, were found. The DH gene DSP2.2 is duplicated and is designated as DSP2.2a and DSP2.2b. Two new DH genes are seen, DFL16.3 and DST4.3, which are probably not functional (see below). The dot plot of the DH region (Fig. 2B) shows that the DH genes are arranged in two separate homology blocks: The main block, including all DSP2 genes, starts before DFL16.1 and ends after DST4. Upstream of this major DH cluster there is an additional small block consisting of the genes DST4.2, DMB1, and DFL16.3. The DH gene DQ52 constitutes the 3' end of the DH region and is separate from the main DH gene block. Both the main and the small blocks are composed of multiple related repeats of a 3-kb sequence containing one DH gene.
The sequences of the DH genes DFL16.3, DST4.2, and DST4.3 (Fig. 3) were checked in silico for potential functionality. A BLAST search of the DH gene sequences was performed against a database comprised of the junctional regions of 6190 Ig VDJ rearrangements extracted from EMBL/GenBank/DDBJ. For DFL16.3 and DST4.3, no specifically matching rearrangements were found, so that both DH genes are classified as nonfunctional. For DST4.2, six rearrangements were found that shared the DST4.2-specific insertion of a cytidine nucleotide in the second position. Interestingly, three rearrangements stem from a mouse line where the main block of DH genes has been deleted (27). Thus, it seems that the lack of DH genes from the main block enforces the usage of the otherwise seldom or nonused DH genes of the small block.
|
Although the Igh-V regions of different inbred strains of mice allow a clear assignment to a relatively small number of haplotypes, the Igh-D regions show even more polymorphism (28). Although DH gene usage and the role of the DH region within the process of VDJ rearrangement have been intensively studied (29, 30, 31, 32, 33), a complete nucleotide sequence of the DH region of BALB/c has never been published. The existing map is based on Southern blot hybridization experiments (34, 35). The Mouse Genome Sequencing Consortium provided a complete sequence of the DH region of C57BL/6, which has recently been annotated (7, 36). Fig. 4 compares the DH regions of 129S1, BALB/c, and C57BL/6. The three maps have a common feature, namely, the main DH gene block is separate from the downstream DH gene DQ52. The small DH gene block is present in 129S1 and C57BL/6, but it is unknown whether it exists in BALB/c. Compared with 129S1 and BALB/c, the main DH gene block is smaller in C57BL/6 and contains fewer genes. The main blocks of 129S1 and BALB/c contain different DSP2 genes. Differences in the size of the main DH blocks may be due to the fact that the BALB/c map is of limited resolution. However, the available data show notable differences in the DH region of both Igha haplotype strains 129S1 and BALB/c.
|
Close to the most downstream DH gene DQ52, four JH genes are located within a short stretch of 1.5 kb. The adjacent CH region contains eight genes coding for eight different isotypes of the H chain of the Ab. Each CH gene consists of a group of three to five exons coding for different domains in the Ab molecule. B cells can switch the isotype of the produced Ab by class switch recombination. This is mediated by repetitive sequences upstream of the particular isotype exon group, the switch (S) regions. JH genes, CH region exons, and S regions were annotated referring to previously published annotation of these sequences from BALB/c mice (for references, see EMBL/GenBank/DDBJ AJ851868). In addition to the previously known CH
pseudogenes (37), one new CH
pseudogene group was found, consisting of CH2, CH3, and a truncated M1 exon. The new pseudogene group was designated as CHpsi
0 and is in an inverted orientation located upstream of the CH
3 group, thereby heading the CH
cluster.
To depict the genomic structure of the CH region, comprising internal sequence homologies as well as location and size of repetitive sequences, we performed a dot plot of the genomic sequence (Fig. 5). In the dot plot, S regions appear as black boxes adjacent to the I exon. The S region of CH
1 is noticeably enlarged. Four sequence blocks comprising the CH
exon groups show strong conservation, starting upstream of the I exons and extending beyond the membrane exons of each group. Parts of this conserved block are inversely inserted upstream of CH
3, CH
2b, and CH
2a. These findings about the overall CH region structure are in accordance with data published on the CH region of BALB/c (37, 38). A refined view of the relationship between the JH and CH loci of both strains is gained by comparison of the coding sequences. The sequences of the four JH genes of 129S1 and BALB/c are identical, except for a silent point mutation in the third valine codon of JH1 (C vs T). Remarkably, the CH region coding sequences from 129S1 and BALB/c show several amino acid exchanges concerning the isotypes IgD, IgG1, IgG2b, IgE, and IgA (Table IV). However, the CDS of IgM, IgG3, and IgG2a are identical.
|
|
| Discussion |
|---|
|
|
|---|
For a sensible gene nomenclature, the gene name should provide a reasonable amount of information about the gene. In particular for the Igh locus of mouse, numerous nomenclature suggestions have been made, each representing different aspects of the genes. The IMGT (http://imgt.cines.fr/) established a systematic nomenclature by assigning a number to each V gene family and numbering V genes within each family (39). This nomenclature regards genes independent of their chromosomal position and haplotype, but it accounts for V gene alleles. de Bono (8) applied another rule set to the Ighb-V genes, introducing position-dependent numbering. The nomenclature rules that Johnston et al. (9) and we applied to the genomic sequence of the Ighb and Igha loci, respectively, include both haplotype and positional information, thereby increasing the information content of the name. Concerning the family nomenclature, we used the well-established family names that trace back to the original myelomas and other cell lines from which the first family members were derived. It is obvious that there is no benefit in maintaining different nomenclatures in parallel. However, until the Mouse Genomic Nomenclature Committee establishes a definite standard, we have to accept diverse opinions and the resulting parallel nomenclatures. To generate transparency, we have made available a list of corresponding names for each V gene in the germline V gene database VBASE2 (http://www.vbase2.org).
Heterogeneity within the Igha haplotype
Serological studies using polyclonal antiallotype sera classified 129/Sv and BALB/c together as Igha (40), and later studies with mAbs to BALB/c (i.e., IgM, IgD, and IgG1) bound 129/Sv Ig as well (41, 42). RFLP analyses of the VH region showed a strong correspondence between the CH region and VH region haplotypes (4, 43), meaning that strains with identical CH region patterns often also exhibit the same VH region patterns. For 129/Sv, comprehensive Southern blot analyses revealed restriction patterns identical to the VH region of BALB/c, with the exception of the pattern for the VH3909P family at the distal part of the locus (5).
Based on this, it was unexpected to find the remarkable differences between the CH coding sequences of 129S1 and BALB/c that are listed in Table IV. The DH region sequence of 129S1 shows a mixed haplotype, represented by a DST4-Igha allele and a DQ52-Ighb allele. A detailed comparison with the DH region of BALB/c is limited by the lack of BALB/c genomic sequence. However, based on the available sequence information, there are obvious differences between the DH regions of BALB/c and 129S1. Taken together, our results point to a distinct heterogeneity within the Igha haplotype that had not been detected in previous experiments and which might affect, to a minor degree, also the VH region of 129S1.
Interspersed repeats in the murine Igh locus
An unusually high content of interspersed repeats has been repeatedly noted for the murine Igh and Igk loci (14, 15, 44). In the elucidated VH region of 129S1, Repeatmasker analysis shows that interspersed repeats occupy 54% of the sequence: the content of LINE-1 (L1) elements is 36%, whereas the SINE content is <2%. This considerably deviates from the average for the mouse genome, where 19% L1 and 8% SINE content have been reported (6). The distribution of repetitive elements in the VH region of 129S1 is very similar to the distribution that was reported for the VH region of C57BL/6 (9). This unusually high density of LINE elements has several interesting parallels; a high density has been noted around other monoallelically expressed genes (45), and the X chromosome has a high LINE density. Lyon (46) has proposed that this L1 density is a factor in the heterochromatization of the inactive X. To visualize interspersed repeats in relation to the V, D, J, and C gene positions, we performed a percent identity plot (PIP) of the Igha vs Ighb sequence with the PipMaker program (supplemental data). 7 The PIP shows not only differences between the haplotypes, but also the position and class of interspersed repeats in the 129S1 sequence and can therefore be taken as a high-resolution map of the region. It graphically displays the relatively uniform distribution of L1 elements throughout the VH region and rarity in the CH region; also apparent are insertions of L1 elements in 129S1 that are absent in C57BL/6, consistent with the continuing evolution of this complex locus.
Expansion of functional and nonfunctional VH genes by block duplications
Duplications of both large and small sequence blocks are a common phenomenon in the Ig loci (9, 47, 48, 49) and are assumed to be an essential force in the generation of multiple gene copies in these loci. Ota and Nei (24) explain the maintenance of diversity within the large VH gene family by selective forces favoring diversification of the duplicated genes. To explain the high number of pseudogenes within the Ig loci, Kawasaki et al. (48) suggested the coamplification and fixation of pseudogenes along with adjacent functional V genes. We tested this hypothesis on the VH7183/VHQ52 region of the murine Igh-V locus, which is expanded in the 129S1 strain compared with C57BL/6 (Fig. 2A). We generated a phylogenetic tree of the VH7183 Igha family (Fig. 6A) and noticed that there is a clear separation of nonfunctional and functional VH7183 sequences, indicating that the nonfunctional sequences have evolved independently from the functional sequences and there are, with the exception of VH7183.a1psi.1 and VH7183.a43psi.70, no pseudogenes related to the functional sequences. Looking at the physical map of the VH7183 region, one can see repeated patterns where a certain functional gene is close to a pseudogene in a conserved distance and order (Fig. 6B). The first, most obvious example of such a block consists of a VH relic sequence next to a functional VHQ52 gene. This block, indicated by a noncolored box in Fig. 6B, appears four times within the VH7183/VHQ52 region. In addition, we find other blocks involving pseudogenes of the VH7183 family next to functional genes of the VHQ52 or the VH7183 family, indicated as colored boxes in Fig. 6B. When we superimpose these blocks on the phylogenetic tree shown in Fig. 6A, we can nicely see clustering of the pseudogenes within the phylogenetic tree. From our analysis, we conclude that functional VH genes are duplicated as large stretches of DNA containing more flanking sequences than necessary to encode for a functional V gene and by this "blockwise" duplication pseudogenes are expanded as well. In Fig. 6A, we further show the relationship to corresponding alleles of the C57BL/6 locus, indicating that the phylogenetic distribution between the functional and nonfunctional VH genes remains valid also in case of the C57BL/6 strain. When we analyzed the order of functional and nonfunctional sequences within the entire Igha region, we could see an underrepresentation of two functional VH genes next to each other (data not shown). In case of a random distribution, we would have expected a higher fraction of neighboring functional VH genes. This finding supports the idea of blockwise expansion of large segments of DNA containing at least one functional VH gene. We cannot rule out that the adjacent pseudogenes comprise functional regulatory elements that are used, for example, for the opening of the locus during B lymphocyte development or later by the functional VH genes and are thereby positively selected. However, the latter possibility seems unlikely since there are examples of pseudogenes located both upstream and downstream of a functional gene within the indicated homology blocks. In addition, the fact that we find only one orientation of VH genes, irrespective if these are functional, nonfunctional, or relics, points to a directed mechanism in the evolution of this locus. Currently, we have no conclusive explanation for the driving forces underlying the evolution of VH genes. Given the complexity of the genealogical trees of mouse inbred strains (50), we cannot state when and how modifications of the VH genes happened. Either they occurred during the generation of inbred strains within the last 100 years or these differences represent allelic variants of wild mice selected over a time span of 100,000 years or more that have been fixed during the generation of inbred lines. As BALB/c and 129S1 mice, although distantly related in the genealogical tree of mouse inbred strains, have similar Igh haplotypes, we favor the latter possibility.
|
| Acknowledgments |
|---|
| Disclosures |
|---|
|
|
|---|
| Footnotes |
|---|
1 This research was supported by the German Bundesministerium für Bildung und Forschung through Grant PT DLR (FKZ 01KW0003) and the Bioinformatics Competence Centre "Intergenomics" Grant 031U110A/031U210A and National Institutes of Health Grant R01 AI23548. ![]()
2 I.R. and C.C. are co-first authors. ![]()
3 Current address: Institut National de la Santé et de la Recherche Médicale, Immunology and Genetics of Parasitic Diseases, Marseille, France and Laboratory of Parasitology-Mycology, Faculté de Medecine, Université de la Méditerranée, Marseille, France. ![]()
4 H.B., W.M., and R.R. are co-last authors. ![]()
5 Address correspondence and reprint requests to Prof. Werner Muller at his current address: University of Manchester, Bill Ford Chair of Cellular Immunology, Michael Smith Building, Oxford Road, Manchester, U.K. ![]()
6 Abbreviations used in this paper: RSS, recombination signal sequence; BAC, bacterial artificial chromosome; STS, sequence-tagged site; PIP, percent identity plot; YAC, yeast artificial chromosome. ![]()
7 The online version of this article contains supplemental material. ![]()
Received for publication February 12, 2007. Accepted for publication May 29, 2007.
| References |
|---|
|
|
|---|
gene loci of murine immunoglobulin heavy chains. Genomics 41: 100-104. [Medline]
(IGK) genes. Exp. Clin. Immunogenet. 18: 255-279. [Medline]
gene sequence. Immunogenetics 56: 490-505. [Medline]
genes and pseudogenes during evolution. J. Exp. Zool. 288: 120-134. [Medline]
locus and the germline repertoire of the V
genes. Eur. J. Immunol. 31: 1017-1028. [Medline]Related articles in The JI:
This article has been cited by other articles:
![]() |
S. M. McLachlan, H. A. Aliesky, P. N. Pichurin, C.-R. Chen, R. W. Williams, and B. Rapoport Shared and Unique Susceptibility Genes in a Mouse Model of Graves' Disease Determined in BXH and CXB Recombinant Inbred Mice Endocrinology, April 1, 2008; 149(4): 2001 - 2009. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |