The mouse Ig H chain (Igh) complex locus is composed of >100 gene segments encoding the variable, diversity, joining, and constant portions of the Ab H chain protein. To advance the characterization of this locus and to identify all the VH genes, we have isolated the entire region from C57BL/6 and C57BL/10 as a yeast artificial chromosome contig. The mouse Igh locus extends approximately three megabases and contains at least 134 VH genes classified in 15 partially interspersed families. Two non-Igh pseudogenes (Odc-rs8 and Rpl32-rs14) were localized in the distal part of the locus. This physical yeast artificial chromosome map will provide important structure and guidance for the sequencing of this large, complex, and highly repetitive locus.
The Ig H chain locus, termed Igh in mice, encodes the H chains of Abs. It is comprised of adjacent clusters of gene segments for VH, DH, JH, and the different isotype H chain constant regions (CH). To produce a secreted or cell surface receptor Ab, a B cell must assemble an active V region gene by fusing a segment from each of the V, D, and J clusters. The overall nature of the structure and functioning of these genes is well established, but much remains to be explained. If we are to thoroughly understand, and be able to predict and manipulate, the functioning of the Ab response, we must obtain a complete description of the structural loci, both coding elements and regulatory sequences of Ab H and L chains. In the human IGH locus, this goal is nearly complete (1), but for experimental manipulation these loci must be characterized in the mouse. Recently, the mouse κ L chain locus was extensively described by Zachau and colleagues (2). All Igk constant, joining, and variable coding elements have been identified and sequenced. For the mouse H chain locus, the constant and joining gene segments were mapped and sequenced by pioneering work of Honjo and coworkers (3), and the diversity segments have been identified, mapped, and sequenced (4, 5, 6). The V region gene segments, Igh-V or VH, have posed a larger challenge; estimates of their numbers range from hundreds to thousands (7, 8, 9, 10). Genetic mapping experiments indicated that the locus is a centimorgan or larger (11, 12, 13), suggesting a sequence size of several million base pairs of DNA, consistent with a gene content of hundreds to thousands of coding elements. Detailed studies of the organization of VH gene families by Brodeur and colleagues (14) have provided a better understanding but much remains to be elucidated.
We have undertaken several approaches to complete the characterization of mouse Igh; these will ultimately lead to determination of the DNA sequence of the entire locus. In this study, we describe the assembly of a yeast artificial chromosome (YAC)3 contig, or array of overlapping clones, that spans the Igh locus in the C57BL mouse strains, and the initial characterization of its size, physical structure, and gene content. To develop this YAC contig and resultant physical map of the mouse Igh locus, we screened four YAC libraries by PCR using multiple sequence-tagged sites (STSs) within the locus and identified 36 YACs. Several additional YACs were obtained from a fifth library through reference to Internet-posted data (15). YAC insert ends were isolated from all clones by vector-hexamer PCR (16) to characterize YAC overlap and assemble the contig, and to detect chimeric clones. This physical YAC map will provide important structure and guidance for the sequencing of this large, complex, and highly repetitive locus.
Materials and Methods
The Princeton University mouse YAC library (Princeton, NJ) was prepared from C57BL/6J female mouse genomic DNA (17, 18). It consists of 26,000 individual pYAC4 vector clones in yeast host strain AB1380. The average clone length is 250 kb and the total estimated genome coverage is 2.2 haploid genomic equivalents. This library was screened by PCR using DNA pools from Dr. S. M. Tilghman at the Howard Hughes Medical Institute, Princeton University.
The first Whitehead Institute (WI-I) mouse YAC library (Cambridge, MA) was prepared from C57BL/6J female mouse DNA (19). It consists of two groups of pYAC4 vector clones in yeast host AB1380. The first group contains 4,100 clones with an average size of 480 kb, and the second contains 15,840 clones with an average size of 640 kb. The total estimated coverage for the whole library is 4.3 haploid genomic equivalents.
The second Whitehead Institute (WI-820) mouse YAC library was made from female C57BL/6J mouse DNA (20). It contains 38,400 clones with an average size of 820 kb, providing 10-fold coverage. All clones from this library are in the pRML vector in yeast host strain J57D. Extensive characterization of this library and identification of YACs bearing over 600 genetic loci is published (15). The Whitehead Institute libraries can be screened by pool PCR or membrane hybridization, and clones can be obtained from Research Genetics (Huntsville, AL).
The Saint Mary’s Hospital Medical School (London, U.K.) RAD52 mouse YAC library was prepared from C57BL/10 female mouse DNA (21). It contains 41,568 pYAC4 vector clones in the RAD52 mutant yeast host strain 3a with an average insert size of 240 kb, providing 3.5 genome equivalents. Clones from this library are available from the Mouse Genome Center (Harwell, U.K.; contact Dr. P. Denny: firstname.lastname@example.org. See Web server at http://www.mgc.har.mrc.ac.uk/).
The Imperial Cancer Research Fund mouse YAC library (London, U.K.) was prepared from C3H male mouse DNA (22). It contains 15,000 pYAC4 vector clones in yeast host AB1380 with an average insert size of 700 kb. It covers three haploid genomic equivalents and can be screened by high-density filter colony hybridization on membranes from the Resource Center/Primary Database (Berlin, Germany) of the German Human Genome Project. (See the information server of the Resource Center/Primary Database at http://www.rzpd.de).
YAC library screening
Four YAC libraries were screened by PCR according to the protocols accompanying the DNA pools. Screening pools for the Princeton University and Whitehead Institute (WI-I) YAC libraries were obtained from Dr. S. M. Tilghman. The pools for RAD52 and the Imperial Cancer Research Fund libraries rearrayed as the “3D” library were obtained initially from Dr. S. D. M. Brown at Saint Mary’s Hospital Medical School and later from Dr. E. Brundage at the Baylor College of Medicine (Houston, TX). Identified clones were obtained from these same sources.
A fifth library (Whitehead Institute (WI-820); Ref. 20) was produced as the vehicle for establishing a YAC physical map of the entire mouse genome; it was screened with many MIT simple-sequence repeat markers on all chromosomes (this data is available at http://www-genome.wi.mit.edu; Ref. 15). Three markers used in this screening, D12Mit263, D12Mit134, and D12Mit150, are localized in the Igh locus, and the identified clones were purchased from Research Genetics.
STS content analysis of YAC clones
Clones identified in the screening were confirmed after selecting single yeast colonies. YAC DNAs were prepared as described (23). PCR primer sequences and the PCR conditions used are indicated in Table I⇓ (10, 14, 24, 25, 26, 27, 28, 29). The PCR products were separated by electrophoresis on 10% acrylamide gel and visualized by ethidium bromide staining.
Pulsed field gel electrophoresis (PFGE) and Southern blot
YAC plugs were prepared as described (23). YAC clone chromosomes were separated on 1% Seakem LE agarose gels (FMC, Rockland, Maine) using a contour-clamped homogeneous electric field apparatus (CHEF-DRIII; Bio-Rad, Richmond, California) and visualized by ethidium bromide staining and, when needed, blotted and probed with labeled total mouse DNA. For YAC content analysis, YAC DNAs were digested with EcoRI (Stratagene, La Jolla, CA), electrophoresed on Seakem LE agarose Gels (FMC), blotted onto Hybond-N+ membranes (Amersham, Arlington Heights, IL), and probed according to Sambrook et al. (30). DNA probes were labeled by random priming with the Prime-it II kit (Stratagene). Probes included plasmid-cloned VH, DH, and CH gene segments and YAC end PCR products.
Isolation of YAC insert ends, sequencing, and generation of STSs and probes
Both ends of each YAC were isolated by the vector-hexamer technique (16). This method consists of the PCR amplification of YAC insert ends using a nested series of vector primers in conjunction with a hexamer primer that anneals to a short arbitrary sequence randomly located in the mouse insert. Insert end PCR products were gel purified and sequenced with the ABI PRISM dye terminator cycle sequencing kit and ABI373 automated sequencer (PerkinElmer, Foster City, CA). For many ends, several products made with different hexamer primers were sequenced. Sequences were analyzed using the basic local alignment search tool (BLAST) e-mail and Web servers (at http://www.ncbi.nlm.nih.gov/BLAST; Ref. 31). The YAC end sequences were deposited in GenBank with accession numbers B07512 to B07602. PCR assays for STS markers were designed with PRIMER 0.5 (32). YAC insert ends were prepared for hybridization probes by reamplifying third-round gel purified ends with the innermost nested primer to reduce the content of the vector arm in the PCR product. Amplification products were electrophoresed on acrylamide gels and the correct size bands were isolated (16). Depending upon the brightness of the band, 1–10 microliters of gel eluate were labeled for use as probes.
Chromosome 12 synteny analysis
When YAC insert ends did not hybridize to other YACs in the contig they were tested on DNAs from somatic cell hybrid lines to determine whether they mapped to chromosome 12. Cell lines Mae28 and Mae32 were kindly provided by Dr. P. D’Eustachio. These are Chinese hamster cell lines containing mouse chromosomes X and 12 and X and 16, respectively.
YAC library screening
To assemble a YAC contig and establish a physical map of the mouse Igh locus with as much redundancy as possible, we screened all four available YAC libraries and isolated 36 clones. The libraries were screened by PCR for a series of STSs representing C region, D region, and various V region sites: Ch α exon 3, DhFL16.1, VH group III, VH11, VHEse26.1, VH group II, VHG8 (Table I⇑). Certain of the VH PCR assays (VH group II, VH group III) were designed to detect multiple VH genes for efficient screening; although unintended, other assays performed similarly, e.g., VHEse26.1. Two additional screenings (with yADGC9 left end and yFCDB1 right end) were done later to close a gap in the YAC contig. More recently an additional library, prescreened for many “D_Mit_” microsatellite markers, has become available (15, 20). Two YAC clones bearing the D12Mit134 marker (y138G1 and y139H4) were added to this study.
The YAC clones were analyzed with all PCR markers localized in the Igh locus. These include five microsatellites (D12Mit41, D12Mit134, D12Mit150, D12Mit208, D12Mit263) genetically mapped in the region (33). The sequences of the primers, the PCR conditions and the reference sequences for each marker are described in Table I⇑. The results of this characterization by PCR are indicated in Fig. 1⇓. Most YACs are positive for several loci, and the overlapping of these clones is obvious, although it is not possible to construct a completely consistent marker order from this data. Some of the VH assays do not detect a single copy locus; rather, they amplify several or many sites which may be widely spread throughout the locus. In addition, some YACs have internal deletions; for example, in the C region locus (the structure of which was completely determined with phage clones; Ref. 3), y36D5 contains IgA and IgG1 constant region sequences but not the intervening D12Mit41, a microsatellite in the 3′ flank of the IgG2b gene.
YAC sizing and end clone characterization
To measure the length of the physical map, we determined the size of each YAC by PFGE. The sizes ranged between 100 and 800 kb. Five clones contain two different artificial chromosomes (yA8D10, yC79B11, yC110F7, yC174B4, yFEEH5), but blotting the PFGE gel and probing with an Igh region probe identified the chromosome 12 YAC in each case.
For this study, we developed a new generalized strategy, called vector-hexamer PCR, for the isolation of insert ends from YACs (16). YAC insert ends were isolated from all 38 clones to characterize YAC overlap and to detect chimeric clones. Eighty-one ends were recovered from these 38 YACs. More than two ends were isolated from some of the clones which contained more than one YAC. End sequences were compared with the GenBank database with BLAST (31). A supplementary Table with GenBank accessions for the end sequences, BLAST results, YAC sizes, and end probe hybridization results is at http://www.tpims.org/research/riblet-genetics.html.
Repetitive sequences were identified in 45% of ends, and most of these, 40% of the total, belonged to the LINE1, or L1, family of high-copy number repetitive sequences. Of the 81 ends, 11 were similar to known Igh sequences, usually sequences flanking coding regions. With a few exceptions, these matches were only 70–90% identical to GenBank entries; these likely represent diverged duplications that are not yet present in GenBank. These sequence identifications were consistent with Southern blot and PCR characterization of the content of these YACs, and they served to anchor the developing YAC contig and physical map to the established genetic and deletion maps. Known genes other than Igh or repeat elements were identified by five YAC end sequences, but they appear to be the result of YAC chimerism or other artifact. For example, when yFDXB10 left (dopamine transporter), yF3A5 right (zinc finger) and left (potassium channel), and yC187G2 left (γ-aminobutyric acid receptor) ends were used as probes on panels of Igh YACs, they only hybridized to the YACs from which they were isolated, indicating chimerism. The first 129-bp portion of the yFDXB10 right end sequence is unique and hybridizes to other YACs in the contig; however, the remainder of this sequence matches the Escherichia coli mdoGH gene (accession number X64197).
Most of the PCR insert ends (53/81 = 65%) were suitable for hybridization probes on YAC blots, including some of the more highly diverged L1 ends. Forty ends hybridized to at least one EcoRI fragment shared with other Igh YACs and were used to construct the contig. The majority of these cross-hybridized to multiple fragments on Igh YACs, reflecting the duplicated nature of this locus. The other 13 probes hybridized only to the YAC from which the probe was generated, indicating that they are not present in the Igh locus. Two of these probes were from clones containing more than one YAC. Thus 11 of 38 YACs are judged chimeric.
VH gene content of YACs
The content of VH genes in each YAC was determined by Southern blot using probes for each of the 15 VH gene families (14).4 Most of these families are relatively small, containing 1–10 genes and a like number of hybridizing EcoRI fragments on Southern blots. These were easily identified on Southern blots of digested YAC DNAs, and the content of small VH gene families in each YAC is shown in Fig. 2⇓. It is evident that there is considerable variation in size and content of the YACs, and that one YAC, ADGC9, carries nearly all of these VH genes except the distally mapping VH15, VH3609P, and VHJ558 families. The VHJ558 family (the largest family) was more difficult to analyze. Seventy-seven EcoRI fragments were clearly mapped on the different YACs (Fig. 3⇓), but this is unlikely to be a complete analysis due to problems with resolving small differences in mobility, potential deletions in some or all YACs, and judging weak cross-hybridizations on these YAC clone blots.
The YAC contig of Igh
The YAC clones were arranged in overlapping order using the patterns of hybridization obtained when YAC panel blots were probed with each useful insert end. This is represented in the lower portion of Fig. 4⇓. Note that the map is presented in chromosomal orientation, centromere to the left, telomere to the right, rather than the transcriptional V→D→J→C orientation usually seen (14). Note also that only three of the YACs are derived from C3H and none are crucial to the construction of the contig. The overlaps yielded a continuous path of YACs from a point midway in the constant region gene cluster at the left of the figure through all known VH gene families. All hybridizing bands visible on genomic Southern blots of C57BL/6 DNA are present in this YAC contig indicating that we have recovered the entire Igh locus. We estimate that the overall length of the mouse Igh locus is at least 2.5 million base pairs as follows: the 3′ portion of the C region gene cluster containing the 3′ enhancer and IgA, IgE, and IgG2a CH genes comprises ∼100 kb and is proximal (3′, left in Fig. 4⇓) to YAC yD24C4, 380 kb. yD24C4 overlaps yADGC9, 765 kb, and their combined length is ∼1100 kb. At the 5′ end of the locus is yFFQG5, which is 540 kb and contains the most distal VHJ558 genes, lacking only the most terminal one. Between yADGC9 and yFFQG5, but not overlapping them, is the large YAC y138G1. Its length was not determined, but it is from the Whitehead Institute (WI-820) library of large YACs, and its VH gene content and its overlaps with other YACs indicate a length of ∼1 Mb. Thus, we conclude that Igh spans 2.5- 3 Mb.
The pattern of overlaps of YACs generated a series of sequential segments or bins along the contig ranging in size from 20 to 200 kb or more. Each PCR marker or hybridizing Southern blot band was assigned to one of these bins producing a detailed physical map of the entire locus. The bins are indicated in the upper portion of Fig. 4⇑ where we have summarized the blot hybridization results from Fig. 2⇑ listing the number of hybridizing bands from each VH gene family that reside in each bin. This physical dissection of the VH gene array provides a more accurate count of the number of hybridizing restriction fragments and minimum estimate of the number of VH genes (and pseudogenes). Hybridization of YAC panel Southern blots with probes from all 15 VH gene families reveals 134 bands. This is likely to be an underestimate of the number of VH genes due to multiple genes on some restriction fragments and multiple unresolved fragments appearing as single bands.
This physical map is consistent with the detailed deletion map (14); the VH7183 and VHQ52 families are nearest the DH segments and are thoroughly interspersed. Following the first two VH families are the VHS107 V1 and V3 genes and then a region of complex interspersion of the remaining small VH families. The VHJ558 family begins in this region, as well, and extends to the distal end of the locus. The entire distal half or more of the VH gene array is occupied by the VHJ558 gene family. Also in the distal region are the VH3609P genes interspersed among the VHJ558 genes. The VHJ558 family is shown extending proximally (leftward) approaching the VH7183/VHQ52 region. The identities of these most proximal YAC bands that hybridize with a J558 probe are not known; because the effective stringency of these YAC clone blots is reduced they may, in fact, be members of a related family, e.g., VHSM7. The more stringent genomic blots in the deletion mapping studies (14) did not suggest any VHJ558 genes in this region. There are disagreements in gene placements between this map and the deletion map; for instance, we place a VH11 gene immediately distal (5′) to a bin occupied by the VHS107 V1/V3 pair and a VHJ558 or VHSM7 gene, while the deletion map inserts a VHX24 gene in this interval. This could reflect haplotype differences, i.e., the YAC map is of C57BL/6 and C57BL/10 (the 3 C3H YACs in our study do not contribute to the binning structure) while the deletion map is a composite of C57BL/6 and BALB/c, and in the region just discussed it is dominated by BALB/c information. Overall, this physical map confirms and provides additional detail and physical scale to the previous deletion and genetic maps of Igh.
Five D12Mit markers (33) that map in the Igh locus are clearly localized on the contig. D12Mit41 marks the IgG2b gene and is present on YACs that hybridize with CH region probes. D12Mit263 maps in the interval between the last DH segment, DFL16.1, and the first VH gene, VH81X. D12Mit208 marks the region of interspersed small VH families. D12Mit134 marks the proximal (3′) part of the VHJ558 region, and D12Mit150 marks the end of the VH array. We localized on the contig several additional genetic markers previously mapped in or near Igh. D12N1 is an anonymous DNA segment identified by an aberrant H chain rearrangement (34); it is located near D12Mit208 in the bin containing all VHJ606 and VH3609N genes. Two pseudogenes, Odc-rs8 and Rpl32-rs14, map at the end of the VH array just beyond D12Mit150.
We have screened four YAC libraries representing ∼12-fold coverage of the mouse genome and have identified and analyzed 38 YACs bearing portions of the Igh locus. We isolated the ends of each YAC insert and used these to identify overlaps among the YACs and assemble a continuous path of YAC clones through Igh. Mouse Igh is large, 2.5–3 Mb, and contains a minimum of 134 VH gene (or pseudogene) segments. This contrasts with the recently sequenced human IGH locus (1) which is more compact, only 1 Mb, but contains nearly as many VH segments, independently counted as 95 (35) and 123 (1). At least half of the human VH segments are pseudogenes. The mouse Ig κ L chain locus, Igk, is also large, 3.5 Mb, and contains ∼140 Vk segments of which as many as two-thirds are functional (2). The total complement of mouse VH genes and its proportion of expressed vs pseudogenes remain to be determined by more detailed physical characterization and sequencing. Relatively little analysis of VH genes has been done in the C57BL/6 or other Ighb mouse strains relevant to this work with the notable exceptions of the VHJ558 family in C.B-20 (9), the VHS107 family in B10.P (36), and the VH10 (37) and VH7183 families (38) in C57BL/6. A thorough analysis of germline and expressed VHJ558 genes identified 67 candidates considered to be expressible germline genes (9). This estimate is surprisingly close to our band count of 77 and suggests that this gene family may have relatively few pseudogenes . All four germline VHS107 genes were cloned and sequenced from the closely related B10.P strain (36); one of these, V3, was found to be a pseudogene in B10.P as it is in BALB/c. We were unable to consistently detect the V13 gene in our analysis and have omitted it. Langdon et al. (38) isolated 13–15 germline VH7183 genes by PCR from C57BL/6 liver DNA; we count only nine hybridizing fragments. If the PCR experiment is correct then either some restriction fragments carry several VH7183 genes or some hybridization bands conceal multiple similar-sized fragments.
Have we recovered the entire Igh locus in this YAC contig? Gaps remain in the representation of the constant region; presumably the isotype switch sequences or other repetitive elements render this portion of the Igh locus especially unstable in yeast. The human IGH constant region locus was also resistant to YAC cloning (39). In the VH gene array, we found another potential impediment to YAC stability, an extraordinary frequency of LINE1 elements. In our sequencing of YAC insert ends, we found that 40% belonged to the L1 family of high copy number repetitive sequences (16); this frequency is surprisingly higher than the 5% incidence observed in another YAC contig on chromosome 11 (40), but more similar to the 25% incidence in the Igk locus (41) and a probably higher frequency in the MHC (42). Interestingly, the human IGH locus also has a 40% content of L1 sequence (1). This density of repetitive sequences makes these regions potentially unstable in YACs because of the high activity of the homologous recombination machinery in yeast cells; this and the repetitive nature of the VH gene families themselves likely are responsible for the many deletions we detected in these YACs and perhaps additional deletions that we cannot detect at this level of analysis. Although we are aware of deletions in many YACs in the contig, other clones appear stable and intact, and the redundancy, or depth of coverage, that we sought in screening all available YAC libraries has resulted in apparently complete representation of all VH genes. Every hybridizing fragment seen on C57BL/6 Southern blots probed with all 15 VH families is present in the YAC contig.
Eleven of the YAC insert ends matched or were similar to known Igh sequences. The orientation of these 11 sequences relative to chromosome 12 was consistent with all other known elements in the H chain locus, supporting the generality that all gene segments of the Igh locus are in the same transcriptional orientation and that rearrangement in Igh occurs exclusively by deletion. The binning structure provided by the YAC overlaps has added a great deal of detail to the physical map of the VH gene array and reveals the relatively massive scale of the VHJ558 gene family.
We thank the following for generous assistance and access to YAC libraries: S. Tilghman, D. Koos, and G. Guan, Princeton University; S. D. M. Brown, F. Chartier, and G. Argyropoulos, St. Mary’s Hospital Medical School; and E. Brundage, Baylor College of Medicine. We thank R. Lieberson, L. Eckhardt, P. Brodeur, M. Caulfield, and C. Carmack for primers and R. D. Miller, C. Carmack, and R. Epstein-Baak for early help and encouragement.
↵1 This work was supported by National Institutes of Health Grant AI23548.
↵2 Address correspondence and reprint requests to Dr. Roy Riblet, Torrey Pines Institute for Molecular Studies, 3550 General Atomics Court, San Diego, CA 92121. E-mail address:
↵3 Abbreviations used in this paper: YAC, yeast artificial chromosome; STS, sequence-tagged site; PFGE, pulsed field gel electrophoresis; BLAST, basic local alignment search tool.
↵4 Southern blots of YAC panels probed with 15 VH gene families and other probes may be seen at http://www.tpims.org/research/riblet-genetics.html.
- Received November 2, 2001.
- Accepted March 28, 2002.
- Copyright © 2002 by The American Association of Immunologists