|
|
||||||||





,
* Center for Molecular and Human Genetics, Columbus Childrens Research Institute, Columbus, OH 43205; and
Integrated Biomedical Science Graduate Program,
Department of Pediatrics,
Department of Internal Medicine, and
¶ Department of Statistics, Ohio State University, Columbus, OH 43210
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Technical challenge exists and hinders rapid progress in studies of inherent CNVs in human diseases. Unlike somatic gene amplifications in oncogenesis or gene expression studies that involve a change in the copy numbers of target sequences by more than one order of magnitude, the inherent gene copy difference among different individuals is usually within the range of 1–10 copies in a diploid genome. It is essential to be able to accurately distinguish the subtle and yet discrete differences such as one, two, three, or four copies of a target gene among different subjects. There is also a need to distinguish and determine the copy numbers of polymorphic variants of target genes that share a high degree of sequence identities. To date, the most definitive method to elucidate CNVs in the range of 5–500 kb are probably: 1) long range mapping using genomic DNA digested by rare restriction enzyme cutters that are not affected by DNA methylation and resolved by pulsed field gel electrophoresis (PFGE); and 2) carefully designed genomic RFLP analyses with appropriate probes for hybridization. The long range mapping technique yields information on the number of segmental duplications on each haplotype. Deliberately designed genomic RFLP analyses may give detailed information on structural variation within duplicated modules or complexes. Major limitations for these Southern blot-based strategies, however, include the time-consuming procedures that require 3–14 days for one complete cycle of experiment and the necessity of having a relatively large quantity of high m.w. genomic DNA in the range of 5–10 micrograms for each reaction. Large-scale epidemiologic studies of CNVs in human diseases usually require sensitive and high throughput methods. Thus, the high sensitivity and the fast performance of real-time PCR becomes an attractive alternative to help determine the CNVs of well-characterized genomic loci.
In a real-time PCR, a fluorescence dye is incorporated into the amplified DNA and emits light proportional to the number of amplified copies as the PCR proceeds. The kinetics of the entire PCR is recorded as the light emitted during each PCR cycle is being detected by the real-time machine. Quantification of the initial amount of DNA present in a reaction is based on the number of cycles it takes to reach a threshold (CT), at which the fluorescence is increased significantly and passes an arbitrarily defined value (9). During a real-time PCR, the greater the initial amount of DNA template present, the sooner the reaction reaches the fluorescence threshold and the smaller the CT observed (Fig. 1B). Several dye chemistries are available to monitor the kinetics of the amplification process. The TaqMan dye chemistry is a preferred choice because of its specificity and precision in quantitative PCR (10). Therefore, we investigated the feasibility of applying real-time PCR using the TaqMan chemistry to determine the CNV of the human complement C4.
|
The human MHC is noted for the high degree of polymorphisms of the class I and class II genes, the long range disequilibrium of classes I, III, and II alleles in certain ancestral haplotypes, and the association with numerous autoimmune and complex diseases (24, 25, 26). There have been intensive investigations in the past three decades to determine polymorphic markers, including microsatellites and single nucleotide polymorphisms (SNPs), as surrogates of MHC haplotypes that are highly significant for transplantation matching and for genetic studies of MHC-associated diseases (27, 28, 29, 30, 31, 32, 33). Unfortunately, there are still no accurate markers in the MHC that can accurately reflect the CNVs of C4A and C4B or RCCX modules, and the C4 CNVs in most human MHC ancestral haplotypes remain relatively uncharacterized.
From our cumulative studies of C4 genotypes and phenotypes from >2000 healthy subjects and SLE patients of different ethnic groups, it becomes clear that the copy number of total C4 genes mainly varies between two and seven, that of C4A genes between zero and five, that of C4B genes between zero and four, that of long C4 genes between zero and six, and that of short C4 genes between zero and four (5, 8, 14, 19, 20). Our objectives are to design and validate methods that can definitively elucidate each of these genetic features.
The most informative methods for elucidating C4 GCN variation are Southern blot analyses of the following: 1) PmeI-digested genomic DNA in agarose plugs resolved by PFGE that deciphers the copy number and size of RCCX modules in haplotypes; 2) TaqI genomic RFLP that reveals the configurations and numbers of the long and short C4 genes plus the copy numbers of the neighboring genes CYP21A and CYP21B and TNXA and TNXB (5, 8, 18, 34); and 3) PshAI-PvuII genomic RFLP that discloses the molar ratio of C4A and C4B genes (35, 36, 37). To reduce the amounts of genomic DNA needed for C4 genotyping, alternative methods including module-specific PCR and labeled primer, single-cycle, DNA polymerization PCR were developed to determine the total number of C4 genes and the relative dosage of C4A and C4B, respectively (35, 38). However, these methods are relatively laborious and time consuming and therefore not favorable for high throughput epidemiologic studies.
In this study, we present five different quantitative real-time PCR (qPCR) assays that allow an in-depth interrogation of the CNVs of C4A and C4B, C4L and C4S, and RCCX modules. These methods have been vigorously tested in selected genomic DNA samples with C4 GCN varying from 2 to 7 and from common human cell lines. In addition, the CNVs of C4A, C4B, C4L, C4S, and RCCX modules in 50 consanguineous subjects that contain many MHC ancestral haplotypes have been elucidated by these qPCR strategies.
| Materials and Methods |
|---|
|
|
|---|
Genomic DNA was isolated from the peripheral EDTA-blood of consented donors (20, 36) or from cultured cells using the Puregene DNA isolation kit (Gentra Systems) following the manufacturers instructions. C4 GCNs of each sample of DNA were previously determined by TaqI RFLP-Southern blot analysis and PvuII-PshAI RFLP-Southern blot analysis. The RCCX modules for each sample were confirmed by PFGE (14, 35, 37). Genomic DNA samples from consanguineous subjects were purchased from the International Histocompatibility Working Group (IHWG). Details of the HLA class I and II genotypes for the samples can be found at http://www.ihwg.org/cellbank/dna/refpan_hla_consang_table.html.
Cosmid DNAs
Three genomic DNA cosmids containing human complement C4 and its neighboring genes were used for the qPCR assays with the standard curve method. Cos 3A3 contains a genomic DNA fragment spanning from SKI2W to the 3' end of a long C4A gene (13, 39, 40). Cos KEM-1 was isolated from a subject with a monomodular short C4B (41). Cos 8 was isolated from a genomic library made from MOLT4 and contains genomic DNA fragment with a long C4 gene, a short C4 gene, and TNXA-RP2 sequence at intergenic region between the two C4 genes (12, 42, 43). In addition to containing the target gene, these cosmids also contain sequences for the endogenous control gene RP1.
Real-time PCR using TaqMan dye chemistry
All of our real-time PCR assays used TaqMan minor groove binder (MGB) probes (Applied Biosystems). The target probes were VIC labeled and the endogenous control probe was FAM labeled. Each reaction consisted of each of the forward and reverse primers (0.5–1 µM) for both the target and control amplicons, 100 nM the target probe and endogenous control probe, 15 ng of test genomic DNA (diluted to
5 ng/µl and used at 3 µl per reaction), 2x TaqMan universal PCR master mix (P/N 4324018; Applied Biosystems). Final volume was adjusted to 10 µl with molecular grade water. Each sample was analyzed in triplicate and reactions were conducted in a MicroAmp fast 96-well optical reaction plate (P/N 4346906; Applied Biosystems) sealed with an optical adhesive cover (P/N 4311971; Applied Biosystems). Real-time PCR was performed using the ABI 7500 fast real-time PCR system using PCR cycles of 95°C for 10 min followed by 40 cycles of 95°C for 15 s and 59–60°C for 1 min. Specific features for each amplicon are shown in Table 1. Sequences for PCR primers and probes are shown in Table II.
|
|
To calculate the copy number of the target genes, the relative standard curve method was used. To set the 96-well plate for quantification using this method, we selected the "absolute quantification (standard curve) assay option in SDS software (Applied Biosystems). To construct the standard curve for each plate, a serially diluted "standard" sample is required. In this case, we serially diluted selected cosmid samples with defined C4 and RP1 gene by 6 logs of DNA concentrations that cover a CT range of
10–30. Note that in this method the "absolute copy number" for the cosmid sample in each dilution is not necessarily required because we are only interested in deciphering the molar ratio of the target gene to the endogenous control (ENDO) gene, and all of the cosmid samples used contain equal copy numbers of the target and ENDO genes. For example, a sample for the lowest dilution can be assumed to have 106 copies of the target gene and the ENDO gene, the next 10-fold diluted sample has 105 copies, the next one has 104 copies, and so on. Each test genomic DNA sample was diluted to
5 ng/µl, and 3 µl was used for each reaction. This quantity will yield an ENDO CT of 24–26. After the PCR was completed, we allowed the machine to automatically set the fluorescence threshold for both the target and ENDO gene to calculate the CT for each sample. A standard curve is generated with the log ("absolute" copy number) of DNA for the ENDO or the target genes vs their corresponding CT for each dilution (Fig. 1, C and D). Based on these standard curves, the initial "absolute" copy numbers of each of ENDO and target genes in the test samples were calculated by the SDS software. Because the target and ENDO genes were amplified from the same diluted DNA sample, the ratio of the "absolute" copy number of the target gene to the "absolute" copy number of the endogenous control gene represents the molar ratio of the target gene to the endogenous control. Because the copy number of our endogenous control RP1 in a diploid genome is always 2, this ratio multiplied by 2 is the GCN of our target genes per diploid genome. Our individual qPCR assays are capable of accurately distinguishing GCN from zero to three copies. However, at higher gene dosages (four copies or more) there is an intrinsic underestimation of the actual GCN. To correct such an underestimation, we include samples with the known GCN of the target gene in each plate (Fig. 1D). A calibration curve (similar to Figs. 2B, 3B, and 4B) is constructed by plotting the calculated GCN vs the actual GCN of these samples. Using the inverse predicted equation generated from this calibration curve, the GCNs of test samples are calculated and rounded to the closest integers.
|
|
The CT was recorded by the SDS version 1.3.1 software accompanying the ABI 7500 fast real-time PCR system (Applied Biosystems). For assays based on relative standard curved methods, we imported data with CT and calculated an "absolute" copy number generated by the SDS software and performed additional calculations in Excel (Microsoft). The calibration equations (i.e., Figs. 2B, 3B, and 4B) used to adjust the intrinsic underestimation of the observed GCNs to the actual GCNs were generated as the best fitted linear line in Excel (Microsoft).
| Results |
|---|
|
|
|---|
Five nucleotide changes within a span of 20 bp in exon 26 contribute to the isotype-specific sequences for C4A and C4B (Fig. 2A). Based on these sequences, amplicons specific for C4A and C4B were designed. The GCNs (GCNs) of C4A and C4B are determined by independent assays. In each assay, a reverse primer (sequence showed in box) discriminately anchors to the isotypic sequence of C4A or C4B (Fig. 2A). In a typical reaction to interrogate the copy number of a target gene in a genomic DNA sample, there are two amplicons: one for an endogenous control gene and one for a target gene. The copy number of endogenous control gene is invariable, while the copy number of targets varies but can be deduced when its molar ratio to the endogenous control (ENDO) gene is known. In our assays, exon 4 of the RP1 gene, a region that has no known sequence polymorphism or duplicated regions in the human genome, was used to design the ENDO amplicon.
The specificity of each primer set for C4A or C4B was tested and confirmed in reactions using genomic DNA samples that have both C4A and C4B, C4A only, and C4B only. To illustrate the accuracy of the quantification of C4A and C4B GCNs, we selected 28 human genomic DNA samples that had good representation of every GCN group for C4A or C4B. The GCNs of those samples were previously defined by PmeI-PFGE and genomic TaqI RFLP for the total copy number of C4 genes and by PshAI/PvuII RFLP for the relative copy numbers of C4A and C4B (20).
In Fig. 2B, the mean C4A and C4B GCNs from DNA samples determined by real-time PCR (i.e., observed copy number) were plotted against the actual copy number determined by Southern blot analyses. Values with SD are shown in Table III. For the C4A GCN, individuals with 0–5 copies per diploid genome have been identified, and the most common C4A GCN group is 2. As in most real-time PCR quantification assays, when the GCN of the target amplicon is 0, depending on where the fluorescence threshold is set an undetectable or very high threshold cycle number for the target (i.e., more than five cycles later than the CT of ENDO) will be observed. These samples can be easily identified as a homozygous deficiency for that particular target gene. Using the relative standard curve method, the observed copy numbers for C4A GCN groups 1, 2, 3, 4, and 5 were 1.22 ± 0.12, 2.08 ± 0.16, 2.84 ± 0.07, 3.33 ± 0.23, and 4.19 ± 0.23, respectively. Therefore, the C4A assay can accurately distinguish among zero, one, two, and three copies of C4A genes. At the higher C4A GCN groups 4 and 5, there is a tendency of underestimation as well as decreased resolution between the neighboring groups.
|
To correct the intrinsic underestimation of the actual GCNs, especially for the higher end, a calibration curve is constructed by finding the best fitted line between the observed mean GCN against their actual GCN. A linear equation in the form of Y = mX + b with Y being the observed GCN and X being the actual GCN is produced. Using this equation, an inverse prediction equation is obtained with X' = (Y – b)/m. X' calculated by this method is adjusted for the underestimation seen in GCN above 2 and therefore more closely represents the actual GCNs. The adjusted mean based on this method that is listed in Table III showed improved estimation to the actual GCN with an overall observed to actual GCN quotient of >0.95 for GCNs greater than one.
Determining the copy numbers of C4L and C4S
A human C4 gene can either be a long gene of 21 kb or a short gene of 14.6 kb. The difference is due to the insertion of an endogenous retroviral element, HERV-K(C4), into intron 9 of a long C4 gene. To selectively amplify a long gene or a short gene, we use the distinct sequences created by the insertion of HERV-K(C4). A common forward primer and a common TaqMan probe are used for both the C4L and the C4S assays (Fig. 3A). To selectively amplify the C4L gene, a reverse primer that anchors at the 3' long terminal repeat (LTR) of HERV-K(C4) is used. Because there is no insertion of HERV-K(C4) in a short C4 gene, C4S is not amplified by this primer set. Likewise, for the specific amplification of C4S a reverse primer is designed to span the putative integration site that would have been the location of HERV-K(C4) in a long gene and, therefore, is incapable of amplifying the long C4. Both primer sets for C4L and C4S were tested and proven to be specific for their target genes by using genomic DNA samples with a long C4 only and samples with a short C4 only.
The GCN assays for long and short C4 were performed in a similar manner as those for C4A and C4B by using the RP1 exon 4 amplicon as an endogenous control. Copy numbers were calculated based on the relative standard curve method. The GCN of C4L observed in an individual ranges from 0 to 6, whereas the GCN of C4S ranges from 0 to 4. In the GCN assays of C4L, the observed mean copy numbers for GCN groups 1, 2, 3, 4, 5, and 6 are 1.05 ± 0.10, 1.93 ± 0.16, 2.62 ± 0.24, 3.42 ± 0.32, 4.40 ± 0.30, and 4.89 ± 0.26, respectively. Accurate results are obtained for C4L GCNs 0, 1, 2 and 3, but those with 4, 5, and 6 are undervalued. After adjustments using the calibration curve (Fig. 3B), the GCN of C4L can be deduced with higher precision by using the ratio of adjusted observed value to actual value (adjusted observed/actual ratio) of 1.00 ± 0.06 for all GCN groups (Table III).
For the GCN assays of C4S, the observed mean copy numbers for GCN groups 1, 2, 3, and 4 are 1.11 ± 0.09, 1.87 ± 0.15, 2.88 ± 0.12, and 3.60 ± 0.04, respectively. Along with the relatively smaller SD (<8%) and its smaller potential range (0–4 copies), this assay worked with high resolution for all GCN groups. As in the C4A, C4B, and C4L assays, the effect of an intrinsic underestimation of the high GCN groups existed but is adjustable by using a calibration equation (Fig. 3B and Table III).
Quantifying the number of TNXA-RP2 junctions to corroborate the total number of C4 genes
The GCN of total C4 can be deduced separately by summing the copy numbers of C4A and C4B or the copy numbers of C4L and C4S from the same subject. Under conditions of experimental errors or a high copy number in C4A, C4B, C4L, or C4S, ambiguous results can be obtained. Therefore, we designed an independent assay to interrogate the GCN of total C4 based on the unique structural feature of the RCCX modular variation. This assay specifically quantifies the junction of the TNXA and RP2 genes in any duplicated RCCX module by using a TNXA-specific forward primer and a RP2-specific reverse primer and coupling with a TaqMan probe that spans the junction of TNXA and RP2 (Fig. 4A). In a monomodular RCCX, TNXA and RP2 are absent and hence there is no amplification of the amplicon for the TNXA-RP2 junction. In a bimodular, a trimodular, or a quadrimodular RCCX, one, two, or three copies of TNXA-RP2 junctions are present, respectively. A subject homozygous for monomodular RCCX with a total of two C4 genes has no TNXA-RP2 junction; a subject heterozygous for a monomodular and a bimodular RCCX with a total of three C4 genes has one copy of the TNXA-RP2 junction; a subject homozygous for bimodular RCCX with a total of four C4 genes has two copies of the TNXA-RP2 junction, and so on. Therefore, by using the characteristic modular duplication of RCCX we can quantify the total C4 gene dosages independently by interrogating the copy number of TNXA-RP2 junction(s). The relatively lower copy number of TNXA-RP2 junctions improves the accuracy and resolution of subjects with high C4 gene dosages as it brings down the copy number being interrogated by two. To date, individuals with 2–7 copies of C4 genes have been identified. The XA-RP2 junction copy number therefore ranges from 0 to 5. In the TNXA-RP2 GCN assay the observed mean copy numbers for one, two, three, four, and five copies are 0.99 ± 0.05, 1.97 ± 0.10, 2.72 ± 0.23, 3.44 ± 0.39, and 4.12 respectively. The respective observed vs actual GCN quotients were 0.99, 0.98, 0.91, 0.86, and 0.82. After adjusting for underestimations at higher GCN groups, the adjusted observed/actual GCN quotients improve to >0.98 (for GCN > 1). Using this assay in combination with the complementary C4A and C4B and C4L and C4S assays, we can consistently resolve the CNVs of C4 and RCCX modules with a high degree of accuracy.
|
To facilitate the molecular genetic studies of the role of the C4 gene CNV in human diseases, we characterize the CNVs of C4A, C4B, C4L, C4S, and RCCX in eight common cell lines that can be used to construct standard curves or as control samples to construct an internal calibration curve. The cell lines are the B lymphoid cell line Daudi, the T lymphoid cell line MOLT4, the monocyte cell line U937, the cervical carcinoma cell line HeLa, the intestine cell line HT29, the teratocarcinoma cell line T2, and the nasopharyngeal carcinoma cell lines C666 and NPC-HK1 (44). These cell lines were characterized by quantitative C4 and RCCX real-time PCR assays, TaqI genomic RFLP, and PshAI/PvuII genomic RFLP (Fig. 5). The detailed C4 gene copy numbers of these cell lines are listed in Table IV along with the observed C4 GCNs obtained in the real-time PCR assays. The assignments for the copy numbers of C4 genes and RCCX modules for these cell lines were based on the results from the real-time assays in which the results determined by the genomic Southern blots were not revealed until after the assignments were made. The results from the real-time PCR assays and the genomic Southern blot analyses were 100% congruent. Briefly, the C4 GCNs 2, 3, and 4 are each represented by three cell lines in each group. These cell lines contain zero, one, or two copies of C4A, C4B, and C4S, and zero, two, or three copies of C4L. It is noteworthy that T2 had two C4B but no C4A genes, while C666 and NPC-HK1 each had two C4A but no C4B genes. Daudi and U937 both had two C4A genes and one C4B gene. HeLa, HT29, and MOLT4 each had two C4A and two C4B genes.
|
|
We applied the C4 qPCR methods to determine the C4 and RCCX CNVs in 50 genomic DNA samples derived from cell lines of consanguineous subjects whose HLA class I and class II alleles have been defined by the IHWG. Because of the consanguinity, the HLA haplotypes for these selected cell lines are overwhelmingly homozygous and, therefore, the GCNs for total C4, C4A, C4B, C4L, and C4S are expected to be in even numbers or 0 if the corresponding locus is absent. Results of the qPCR assays revealed that the copy number of total C4 genes in a diploid genome among these samples varied from two to eight (Table V).
|
|
For bimodular or multimodular RCCX haplotypes, the orders of the long and short C4 genes and the C4A and C4B genes with respect to HLA class I and class II genes cannot be determined by qPCR and they are therefore listed alphabetically. Thirty-two samples contained homozygous bimodular RCCX haplotypes. Among them, 14 haplotypes contained two long genes (LL), 17 contained one long gene and one short gene (LS), and only one contained two short genes (SS). Two LL haplotypes had isoexpression of C4A from both RCCX modules (LL-AA; IHW-9029, HLA B*1401- DRB1*0401; and IHW9030, HLA B*510101-DRB1*0407). Each of the other 30 bimodular RCCX haplotypes consisted of one C4A and one C4B (LL-AB; LS-AB), including the haplotype with two short genes (SS-AB; IHW9069, HLA B*4001-DRB1*0801). This phenomenon underscores the relatively high prevalence of bimodular haplotypes with one C4A and one C4B in the MHC, which contributed to the earlier two-loci model for C4 genetics (45, 46, 47).
Four samples contained homozygous trimodular LLL haplotypes. One haplotype consisted of two C4A and one C4B (LLL-AAB; IHW-9061, HLA B*1801-DRB1*1401); the other three each had one C4A and two C4B (LLL-ABB; IHW-9016, HLA B*510101-DRB1*1602; IHW-9064 and IHW-9099, both are HLA B*1501-DRB1*1402). In other words, each of these four samples had six copies of C4 genes in a diploid genome. There were four C4A plus two C4B genes in one sample and two C4A plus four C4B genes in each of the other three samples.
Remarkably, sample IHW-9060 with HLA B*1501-DRB1*1301 was homozygous for a quadrimodular RCCX haplotype (i.e., a total of eight C4 genes in a diploid genome). This haplotype contained three long and one short C4 genes coding for two C4A proteins and two C4B proteins (LLLS-AABB). This is first example demonstrating the presence of eight C4 genes (four C4A plus four C4B) in a human sample.
The remaining five samples in the IHW consanguineous panel did not appear to be homozygous in the HLA haplotypes, as different alleles were present in one or more of the class I, II, or III genes.
Conservation of MHC haplotypes
Long-range linkage disequilibrium (LLD) among alleles of class I, III, and II genes with conserved sequences or identical genetic markers spanning hundreds to thousands of kilobases is a remarkable feature of many haplotypes of the MHC (32, 33, 48). Such LLD is conspicuous when we analyze the CNVs of RCCX-C4 and class I and class II alleles in the HLA consanguineous panel. There are three pairs of samples that contained virtually identical genetic markers throughout the entire MHC: IHW-9048 (LBUF) and IHW-9096 (LBF), IHW-9072 (SPACHECO) and IHW-9101 (SPL) and IHW-9064 (AMALA) and IHW-9099 (LZL). There are four pairs of samples that contained continuously identical gene alleles spanning across the three MHC regions and two pairs of samples that contained identical alleles spanning two MHC regions (from HLA-Cw to RCCX-C4) (Table VI). In contrast, we did observe an exception on two samples, MOU (IHW9050) and PITOUT (IHW9051) (Table VI). Despite the high similarities of the flanking class I and class II genes, MOU had monomodular-long with a C4B gene (L-B) and PITOUT had bimodular long-short with a C4A and a C4B gene (LS-AB).
|
| Discussion |
|---|
|
|
|---|
In designing the qPCR strategies, we consider specificity, accuracy, cost, and convenience. For amplicons to determine the GCNs of C4A and C4B, the specificities for the A and B isotypes are based on two reverse primers that incorporate specific sequences of five SNPs within 20 nucleotides close to the 3' end of exon 26. For determining the copy numbers of the long C4 and short C4 genes we use a common probe and a common forward primer that hybridize to the 5' region of intron 9. The specificities for the long and short genes were built into the reverse primers. The reverse primer for the long C4 gene amplicon is located in the 3' LTR of the endogenous retrovirus HERV-K(C4), while that of short C4 gene amplicon is located downstream of the putative HERV-K(C4) integration site. We designed an additional method that yields the copy number of RCCX modules and therefore the number of total C4 genes by interrogating the number of junctions for TNXA-RP2. In these C4 GCN assays we use the same endogenous control ENDO amplicon that is designed at an invariable region of RP1 exon 4, which is 7.9 kb upstream of the breakpoint of RCCX modular duplication at RP1 exon 7 and is independent of the CNVs of the C4 and RCCX modules (15). The ENDO amplicon has a constant copy number of two in a diploid genome among all human subjects. It was tested not to interfere with the amplification in any of the five target amplicons described here. Therefore, PCRs for the ENDO and target gene amplicons were performed in the same reaction mixture to ensure accurate quantification of the molar ratio of the target gene to the ENDO gene.
An important prerequisite pertaining to a qPCR of human CNVs is a means to ascertain the accuracy of experimental results, even under the constraint of a limited supply of genomic DNA samples. We addressed this issue by creating five different real-time PCR amplicons, each of which yields relevant and complementary data. The copy number of total C4 genes equals the copy number of C4A plus C4B or the copy number of C4L plus C4S. Agreements of data for total C4 genes from three independent sources on the same sample increase the confidence of experimental accuracy.
In using real-time PCR for quantification assays a frequently used approach is the 
CT method, in which the difference between the endogenous control and the target amplicon of a single calibrator is used as a reference and the fold changes are based on the ratio of each samples
CT to the
CT of the calibrator. However, this method requires almost identical amplification efficiency approaching 100% for both the endogenous control and the target amplicon. Otherwise, the results are sensitive to a slight variation in DNA concentration (35, 52). In C4 gene dosage quantification assays the specificities of each assay are restricted to sequences that define the A and B isotypes or the long and short genes. We found it exceedingly difficult to design target amplicons that have a high amplification efficiency identical to that of an ENDO amplicon, and the 
CT strategy tended to yield ambiguous or inaccurate results that are not desirable. We solved such drawback through the application of relative standard curve methods for quantification. For each amplicon to determine the CNV of C4A, C4B, C4L, C4S, or RCCX modules, we assign the copy number of target genes after two levels of calibrations. In the first level we use cloned genomic DNA covering 6 logs of DNA concentrations for the ENDO and target amplicons. Such calibration allows the calculation of copy numbers of the target amplicon relative to the ENDO amplicon at a specific DNA concentration and therefore minimizes the discrepancies in amplification efficiencies for the ENDO and target amplicons caused by variations among test DNA concentrations when a single calibrator at a single concentration is used. In our experience, unequivocal results are obtained for low copy numbers of the target genes but a slight underestimation is an intrinsic tendency for subjects with high copy numbers of a target gene. In the second level of calibration we correct the intrinsic underestimation of high GCN groups by creating a calibration curve for observed and actual GCNs among all GCN groups. We have applied these C4 qPCR assays to determine the C4 CNVs in >1000 human samples with autoimmune or neurological diseases (S. L. Savelli, R. A. S. Roubey, Y. W. Wu, G. Buxton, and C. Y. Yu, manuscript in preparation; K. Mayilyan, D. R. Weinberger, Y. L. Wu, B. Kolachana, and C. Y. Yu, manuscript in preparation). This technique has been proven to be robust, sensitive, and reliable. We observed that in
5% of samples the data from the five independent assays may not be in total agreement. Under such scenarios we usually repeat those assays with the inconsistent data and, if possible, seek data for C4A and C4B phenotypes of the same subjects and from family members to support the final assignment.
The quality of DNA is an important factor in performing qPCR assays. In our experience, partially degraded DNA often yields conflicting results between complementary assays. Because DNA is rather unstable in a diluted state we recommend diluting genomic DNA using Tris-EDTA buffer rather than water and performing the required experiments within 2 wk after sample dilution. We also observed that whole genome-amplified DNA yielded a wild variation of target GCNs and is therefore not suitable for qPCR of CNV.
To facilitate application of the qPCR techniques for C4 genotyping, we have elucidated the genetic diversities of C4 in eight common human cell lines including Daudi, HeLa, HT29, MOLT4, and U937. Genomic DNA samples of these cell lines can be used as controls for calibration purposes. The monocyte cell line U937 is commonly used for studies of complement C4 gene expression (53, 54). This cell line contains two C4A and one C4B genes and, therefore, the expression of total C4, C4A, and C4B transcripts would have to be interpreted together with its genotypic background.
The IHWG consanguineous panel is overwhelmingly homozygous for HLA class I and class II alleles and is therefore ideal for the discovery and haplotyping of SNPs and the characterization of CNVs of C4 and RCCX in defined HLA haplotypes. The former is exemplified by the MHC Haplotype Project in which eight representative haplotypes are being sequenced and characterized (http://www.sanger.ac.uk/HGP/Chr6/MHC/) (32, 33). The latter is being demonstrated in this study for the presence of monomodular, bimodular, trimodular, and quadrimodular RCCX haplotypes with different copy numbers of long and short C4 genes, each either coded for C4A or C4B. The results illustrate CNV as a mechanism for generating the genetic diversity of an important immune effector protein. As we and others had shown previously, the phenotypic outcome for the sophisticated genotypic diversity of complement C4 is a wide range of plasma or serum C4 proteins among different subjects and two isotypes (C4A and C4B) with multiple protein variants (allotypes) that can have different physiologic functions (8, 14, 19, 55, 56, 57, 58, 59, 60).
The mechanism leading to the LLD of numerous polymorphic markers as large "frozen blocks" of genomic sequences in the MHC is not known (24, 33). The length variations caused by interindividual CNVs at the class II DRB region and the class III RCCX-C4 region could create mismatches during meiosis and play a role in suppressing productive recombinations among certain haplotypes (5, 8, 61). The LLD of MHC alleles on chromosome 6 that persists in human populations is also known as an ancestral haplotype (AH). Some of the MHC ancestral haplotypes are associated with autoimmune or genetic diseases (24, 26). For example, AH47.1 (IHW-9047) with HLA B*4701, RCCX: L-C4A, DRB1*0701 is associated with congenital adrenal hyperplasia (5, 62), AH8.1 (IHW-9022, 9023) with HLA B*0801, RCCX: S-C4B, DRB1*0301 (DR3), and DQB1*0201 is associated with SLE and type 1 diabetes mellitus, AH7.1 with HLA B*0702, RCCX: LL-C4B-C4A, DRB1*1501 (DR2) is associated with SLE and multiple sclerosis (24, 63), and AH57.1 with HLA B*5701, RCCX: LS-AB, DRB1*0701 is associated with psoriasis (24, 30, 64). In-depth characterization of genetic variations of all MHC genes including the polymorphisms of complement factor B and C2 (65, 66) and the constituents of RCCX modules using the consanguineous panel would prove highly informative and help the understanding of the genetic basis of MHC-associated diseases.
HLA-DR3 has been consistently implicated as a risk factor in SLE (24, 28, 30), but HLA haplotypes with DR3 can have different RCCX or C4A and C4B gene contents. In the IHW consanguineous panel were three samples with DR3 haplotypes: two with HLA-B*0801, RCCX monomodular-short with a single C4B gene (S-C4B) and the absence of C4A (COX and VAVY) and the third with HLA-B*1801, monomodular-long RCCX with a single C4A gene (L-C4A) and the absence of C4B (QBL). In European Americans we found that the absence or low GCN of C4A, but not of C4B, is a risk factor for SLE disease susceptibility. By contrast, a high GCN of C4A is a protective factor against the onset of the systemic autoimmune disease (20). To examine genetic risk factors in the HLA-associated diseases, it is prudent to elucidate the status of complement C4A and C4B CNVs in addition to SNPs of MHC genes and the conventional class I and class II alleles.
| Acknowledgments |
|---|
| Disclosures |
|---|
|
|
|---|
| Footnotes |
|---|
1 This work was supported by National Institute of Arthritis and Musculoskeletal and Skin Diseases Grant R01 AR050078 and National Institute of Diabetes and Digestive and Kidney Diseases Grant P01 DK55546 from the National Institutes of Health. ![]()
2 Address correspondence and reprint requests to Dr. C. Yung Yu, Room W402, Columbus Childrens Research Institute, 700 Childrens Drive, Columbus Ohio 43205. E-mail address: cyu{at}chi.osu.edu ![]()
3 Abbreviations used in this paper: CNV, copy number variation; AH, ancestral haplotype; C4L, long C4 gene with endogenous retrovirus HERV-K(C4); C4S, short C4 gene without HERV-K(C4); CT, threshold cycle; ENDO, endogenous control; GCN, gene copy number; LLD, long-range linkage disequilibrium; LTR, long terminal repeat; qPCR, quantitative real-time PCR; RCCX, RP-C4-CYP21-TNX module; PFGE, pulsed field gel electrophoresis; SLE, systemic lupus erythematosus; SNP, single nucleotide polymorphism; TNX, tenascin-X. ![]()
Received for publication April 12, 2007. Accepted for publication June 22, 2007.
| References |
|---|
|
|
|---|