|
|
||||||||
Department of Clinical Immunology, Odense University Hospital, Denmark
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
SHM is dependent on several cis-acting elements, including the promoter and the elements of the Ig enhancer regions (1, 2, 3, 4, 5, 6). The promoter and enhancer requirement is likely to be due to a requirement for transcription, although other properties of the enhancers, e.g., protein binding, are likely to be involved as well (6). The transcription dependence is furthermore supported by the fact that the mutation frequency decays exponentially from a starting point
150200 bp from the promoter (7, 8). The 3' boundary is in the intron region downstream of the J genes and no mutations are normally found in the constant domain. In contrast, the entire variable domain is targeted by SHM. The CDRs have been described as being more prone to mutation than the framework regions (FRs), and this been attributed to the presence of more RGYW hot spot motifs (9, 10).
The only trans-acting factor described to be absolutely mandatory for SHM is activation-induced cytidine deaminase (AID). AID is not only required for SHM but also for class switch recombination (CSR) and gene conversion (11, 12, 13). Ectopic expression of AID can turn on mutation and CSR in human hybridomas, Escherichia coli, and murine fibroblasts (14, 15, 16, 17), proving that it is the only B cell-specific factor necessary for SHM. AID is a cytidine deaminase shown to be able to deaminate cytidine residues in ssDNA, in particular in WRC motifs (18, 19, 20). It has also been suggested that AID could be involved in SHM by modulation of the mRNA of an involved protein (11).
According to the current model for SHM (21, 22), the process is initiated by cytidine deamination by AID. The targeted sequence is thought to be single stranded because of the ongoing transcription from the Ig promoter. The generated uracil can either be replicated over, generating a C to T transition in the sister cell (or G to A if the transcribed strand is targeted) (phase I), or it can be removed. Uracil DNA-glycosylase (UNG) and a complex of MutS homologs (MSH) 2 and 6 (MSH2 and MSH6) have both been described as being capable of uracil removal.
Deletion of or mutation in the murine and human UNG gene changes the mutation pattern of G and C residues almost exclusively to transitions (21, 23, 24). MSH2/MSH6 deficiency leads to impaired mutation of A and T residues (21, 25, 26), and UNG-MSH6 double knock-out mice have almost exclusively C to T and G to A transitions, i.e., phase I mutations (27). A and T residues are targeted during phase II of SHM. Phase II involves resolution of the abasic site generated by uracil removal and has been suggested to involve repair by error-prone DNA polymerases such as polymerase (pol)
and pol
and the involvement of exonuclease I (EXO1) (28, 29, 30, 31, 32, 33, 34, 35). Pol
has been shown to be involved in phase II C/G mutation (36). An alternative hypothesis suggests that incorporation of dUTP may be the cause of A/T mutations (22).
In this study we have analyzed 6,912 substitutions in 386 mutated, nonproductive human H chain rearrangements using IGHV3-23*01 and 542 substitutions in 56 mutated rearrangements using the IGHV3-h pseudogene. This high number of nonproductive sequences enables us to study nonselected mutations with a good statistical power. We show that mutations in C and G residues can be assigned to AID deamination equally targeted to both strands. The substitution rate depends on the motifs in which the nucleotide is found and the recognized motifs are at least four nucleotides long. The substitution rates in A and T residues also depend on the motifs but, as for phase I C/G mutations, we find no sign of strand specificity. Interestingly, the phase II substitution rates in T and G showed an inverse correlation to the distance to the nearest 3' WRC, indicating that phase II mutations predominately occur in T and G residues 5' of the initial AID-deaminated cytidine residue or, alternatively, in the corresponding A and C residues on the opposite strand.
| Materials and Methods |
|---|
|
|
|---|
| Results |
|---|
|
|
|---|
|
Fig. 2 shows the substitution frequencies in the different positions in the VH region in the 386 nonproductive IGHV3-23*01 rearrangements. The overall substitution frequency per nucleotide in the VH-region was 6.3%, varying from 5.6% for T residues to 6.8% for A residues (see Table I). The transition/transversion ratio was 1.01. In the mutated productive sequences the transition/transversion ratio was significantly higher (1.30, p < 0.0001) and the mutation rate lower (5.0%, p = 0.0026). This was largely due to a lower rate of replacement mutation yielding a lower replacement-to-silent mutation ratio (2.53 vs 4.02, p < 0.0001) that is indicative of selection and highlights the importance of using nonselected sequences to study the mechanism of SHM. To confirm that the mutations in the nonproductive rearrangements were indeed unselected, we studied mutations (including insertions and deletions) abrogating the open reading frame of the VH segment. Only 46 (1%) of 3701 mutated, productive rearrangements contained one or more stop codons. These were likely to result from Taq errors because B cells lacking a functional Ag receptor are rapidly lost from the circulation and should therefore not appear in our material (40). In contrast, as many as 171 (44%, p < 0.0001 when compared with productive rearrangements) of the 386 nonproductive sequences contained stop codon(s) in the VH segment. This significant high proportion was not different from that found in rearrangements of the pseudogene IGHV3-h (45%, p = 1.00), supporting the notion that substitutions in the nonproductive IGHV3-23*01-derived sequences are indeed unselected.
|
|
Fig. 3 shows the distribution of substitution rates for all C (Fig. 3a), G (Fig. 3b), A (Fig. 3c), and T (Fig. 3d) positions, respectively, divided into the three different substitution types. The curves suggest that the substitutions in C and G residues were caused by a mechanism with a very high preference for some positions (substitution rates >30%) and a low preference for other positions (substitution rates <1%). In general, transitions predominated followed by transversions to the complementary nucleotide. In contrast, the substitution rates in the A and T positions were less variable and the rates for the two types of transversions were comparable. The graphs for C and G have similar courses and so have the graphs for A and T, suggesting that the mutation mechanism is strand symmetric.
|
To further investigate strand specificity, we compared the substitution rates in different motifs with those of the reverse complementary motifs when both were present in the nontranscribed strand of the germline gene. In case the mutations are targeted to both strands by similar mechanisms, this method should show a correlation between the mutation rates because the mutation of a given residue in a motif on the nontranscribed strand should correspond to the mutation of the same nucleotide in the same motif on the transcribed strand.
We analyzed all possible three-, four-, and five- nucleotide motifs and substitutions in all positions within these motifs. The strongest correlations between substitutions in the same position of the same motif on the two strands were found for C and A residues in position 3 in four-nucleotide motifs on the nontranscribed strand (compared with G and T, respectively, in position 2 in the reverse complementary motifs of the same strand) (data not shown). Table II (top portion) shows the substitution rates in all pairs of reverse complementary C/G motifs. Motifs are ranked by declining total substitution rate in the C residue or the corresponding G in case the C motif was not found on the nontranscribed strand. For a given motif, only sequences in which all other motif positions than the nucleotide in question were unmutated were included to minimize the risk of looking at an influence from neighboring substitutions. With this restriction,
70% of the 6,912 detected substitutions were included in the analysis. The 10 most mutable motifs (that all have a substitution rate of >10% for C and/or G) included six of the seven RGYW/WRCY (where R is A or G, Y is C or T, and W is T or A) motifs present in the sequence. RGYW/WRCY has earlier been defined as hot spot motifs for SHM in vivo (41, 42, 43). The last RGYW/WRCY motifs (AGCC/GGCT) had mutation rates of 7.8% and 7.3%, respectively, indicating that RGYW/WRCY is indeed good at predicting high mutability (the targeted nucleotides of the four nucleotide motifs are set in boldface type). WRCY includes WRC that has been found to be a hot spot for AID deamination of the C residue (19, 20, 44), and hence we find that there is a good correlation between deamination hot spots and C/G SHM hot spots. The other four highly mutable motifs (CGCT, AGCA, TACG, and CGCT) only deviate from RGYW/WRCY in position 1 or 4. Noticeably, the first three of these motifs are WRC/GYW motifs, suggesting that the three first nucleotides within the motifs are most important for the mutability.
|
|
A set of rearrangements using IGHV3-h was also analyzed. Fifty-six of 103 sequences had more than three mutations in the VH-region and were thus included in the substitution analysis. IGHV3-h is a pseudogene due to a disturbed translation initiation codon and thus these sequences have not undergone selection following SHM. Generally the mutation frequencies in the different motifs were lower than in the similar motifs in the IGHV3-23 using rearrangements. ATCC, TACT, AGCT, and AACT were the only motifs where the C mutation frequency was >10%. The last three are included in the consensus WRCY hot spot motif. ATCC and AGCT are both present only once in the sequence, leading to considerable uncertainty in the relatively small sample. All possible combinations of SYC and SSC present in the germline sequence were also found to have a mutation frequency of <1% in IGHV3-h.
A and T substitution rates in different motifs
For A/T substitutions, the 17 most mutated four-nucleotide motifs (see Table II, bottom portion) contained WA/TW mutations previously described as A/T SHM hot spots (30, 45). No motifs had an A/T substitution rate of <1%, suggesting that there are no A/T mutational cold spots.
In IGHV3-h sequences the A/T mutation frequency tended to be lower than in IGHV3-23. This was also the case for the C/G mutation frequency, indicating that IGHV3-h is less targeted by SHM than IGHV3-23. However, the three most mutable A/T motifs were the same in the two VH genes.
Correlation between substitution rates in reverse complementary motifs
The apparent correlation between the substitution rates in reverse complementary motifs prompted a closer analysis. Fig. 4a shows a significant correlation between the total substitution rates in C residues in position 3 in four-nucleotide motifs and in G residues in the corresponding reverse complementary motifs (p < 0.0001, Pearsons correlation analysis), and Fig. 4b shows that the case is the same for A/T substitutions (p < 0.0001). These data show that reverse complementary motifs were targeted equally well, indicating that the SHM machinery targeted individual motifs similarly on the two strands.
|
Seven areas in the FRs contain 36 G residues in a row and showed a particularly low degree of substitution that correlates well with CCC/GGG being a cold spot. An interesting observation, however, was the substitution pattern of the T residue in position 3 of codon 15 immediately adjacent to a run of six G residues. This residue had a very high substitution frequency of 22.5%, the highest for any T residue (marked by an asterisk (*) in Fig. 3d). Ninety-three percent of the substitutions were transversions to G, a substitution type that only accounted for 23% of substitutions in other T residues (p < 0.0001, Fishers exact test). Only two of 262 (0.8%) sequences with less than three mutations in the VH region had a substitution in this T residue (one to G and one to C). This is significantly lower than in mutated sequences (p < 0.0001), indicating that Taq errors cannot account for the high substitution rate.
In the IGHV3-h-using sequences the substitution rate in the corresponding residue was also remarkably high (14.6%) and again consisted of mostly T to G transversions. Also, the T residue in the last position of codon 7 preceding a run of five G residues showed predominately T to G transversions (75%), but at a somewhat lower substitution rate (5.3%).
Substitution rates when the motif is present more than once
Many of the motifs are present more than once in the IGHV3-23 gene. The highly mutable AGCT motif, for example, is present four times, but many other four-base motifs are present up to six times. As seen in Table III, the substitution rates of C and G in the individual AGCT motifs are clearly different without any obvious relation to the location in the gene, i.e., whether the position is in a FR or a CDR. Nor is there any correlation between the substitution rates of C on the nontranscribed and transcribed strands (G on the nontranscribed strand) within the same motif.
|
Correlation between the substitution rate and the distance to the nearest 3' AID target
Replication over an AID-generated uracil can only account for C to T transitions and G to A transitions (when occurring on the transcribed strand), and the different processes involved in the repair of a uracil are thought to generate other mutations. These may involve the generation of an abasic site by uracil removal or the removal of a stretch of nucleotides by endonucleases and/or exonucleases followed by gap filling by error-prone polymerases that introduce substitutions in the flanking nucleotides. If this is the case, one would expect to find a correlation between the substitution rate and the distance to the nearest AID target motif. We tested such correlations in both directions on both strands, that is, the distance to the nearest 5' WRC and the distance to the nearest 3' WRC. Naturally, one cannot determine whether a mutation originally occurred on the nontranscribed or the transcribed strand, so correlations were calculated for two extreme scenarios: namely that all substitution had occurred on the nontranscribed or the transcribed strand, respectively. Substitutions on the transcribed strand were counted as the complementary substitution on the nontranscribed strand and correlations were calculated to G in GYW on the nontranscribed strand. Table IV shows that there is a statistically significant inverse correlation between substitution rates in T and G residues on both strands and the distance to the nearest 3' WRC motif on the same strand. The only exception is T to G transversions on the nontranscribed strand. However, when the substitutions of the T residues in position 3 of codons 15 and 7 (the ones preceding the runs of G residues) are omitted, the inverse correlation between T to G substitutions and the distance to WRC on the nontranscribed strands increases (correlation coefficient of 0.24, p = 0.06). For substitutions of A there was a trend toward an inverse correlation between the substitution rate and the distances to the nearest 3' AID hot spots that was borderline significant for transversion to C only. There was no correlation between substitution rates and the distance to the nearest 5' WRC for any substitutions, suggesting that the error-inducing repair process following AID deamination only works 5' of the deaminated C.
|
We also tested whether substitutions in C or G in a given AGCT target would influence the mutation rate in the neighboring G/C by changing the hot spot motif to a less mutable one. When comparing sequences with such substitutions to those without, we found that the substitution rate in the neighboring G/C was less than half (average 43%). This was true for all four AGCT motifs present in the germline sequence, indicating that substitutions in a given position can indeed influence the mutability of the neighboring nucleotide in subsequent rounds of SHM. We saw, however, no significant differences in the substitution pattern further away than the closest nucleotide (data not shown).
Substitution rates in JH gene and D gene motifs resemble those of the VH region
Because of the size of our material, we had an opportunity to study mutations in individual JH genes and in several D genes. Fig. 5 shows the substitution rates for individual residues in IGHJ6 (average 4.7% per nucleotide) and IGHJ4 (average 3.9%) that are not significantly different (p = 0.19). The highest substitution rates were seen in the 5' end of the JH genes, which falls within the CDR3 and contains most of the motifs found to have a high substitution rate in the VH region (e.g., TACT, GGTA, CTAT, and CTAT; see Table II). Substitution rates in these motifs in the JH genes correspond well to the rates found in the IGHV3-23, indicating that it is the same regulatory mechanism that controls VH and JH mutations. The 3' ends of the genes that encode FR4 have fewer mutations, consistent with a high content of cold spots (e.g., GCC, GGC, GTC, GGC, and GGG).
|
|
We noticed that the fraction of unmutated sequences, defined as sequences with less than three mutations in the VH region, varied depending on the JH gene. Twenty-one percent of the sequences using JH4 (30 of 143 sequences) were unmutated, whereas 46.5% of the JH6-carrying sequences were unmutated (223 of 473 sequences). This was statistically significant (p < 0.0001). When looking at the mutated sequences only (more than three mutations in the VH region) we also found that the substitution frequency varied between the two subsets of sequences. JH4-carrying sequences had an average of 18.1 substitutions in the VH region compared with 15.2 for JH6 (p = 0.03). Substitution rates in the different hot and cold spot motifs were comparable in sequences using JH6 and JH4. Also, there was no difference in the ratio between C/G and A/T substitutions in JH6- and JH4-carrying sequences (p = 0.29), suggesting that it was the overall substitution rate that was decreased.
| Discussion |
|---|
|
|
|---|
Using the hitherto largest published set of nonfunctionally rearranged, somatically mutated human IgH sequences, we found that the most mutable motifs for the SHM of C and G residues corresponded well to the previously described WRCY/RGYW four-nucleotide motifs (41, 42, 43). It has been claimed that WRCH/DGYW is an even better predictor for C/G mutability (46); however, this cannot be supported by our data (targeted nucleotides are set in boldface type). TACA and TGCA, for example, have mutation rates as low as 1.3 and 3.3%, respectively, which is lower than average. The discrepancies can be due to the differences in sample sizes and methods.
The WRCY motifs include the reported WRC deamination preference of AID (19, 20, 44). This is in line with a previous report based on 25 nonproductive human
IgL rearrangements and 17 IgH rearrangements (47). Similarly, the reported deamination cold spots (SYC and SSC) (19, 20, 44) were found to be cold spots for SHMs in C residues on both strands.
It is interesting to note that the many highly mutable four-nucleotide motifs include a hot spot for C deamination on both strands, e.g., AGCT, as this provides a simple explanation for the double-strand breaks shown to appear during the course of SHM (48, 49). Others, however, have not been able to find a clear correlation between SHM and double-strand breaks in the BL2 cell line (50).
Replication over an AID-generated uracil can only account for C/G transitions and, thus, the preferences of other enzymes involved in the mutation process may influence the mutability of the nucleotides in and around a given motif. UNG or a complex of MSH2 and MSH6 are proposed to be able to remove the created uracil, and the sequence specificities of these enzymes may therefore influence the resolution of the U:G mismatch and hence the mutability of different motifs. Bovine and E. coli UNG have been shown to have high activity in ATU (51, 52), which corresponds well with the finding of ATC being a hot spot. However, there are also some discrepancies because AGU, corresponding to the AGC mutational hot spot, displays intermediate to low uracil removal efficiency (51, 52). It is possible that human UNG has a different nucleotide preference or that MSH2/MSH6 provides the necessary backup. Substitutions in C and G residues during phase II could also influence the substitution rates. In support of this, it has been shown in mice that the inactivation of pol
reduces the number of C/G substitutions, particularly in hot spot motifs (36). Another possible phase II C/G mutator is Rev1. Rev1-deficient mice have been shown to have fewer C to G and G to C mutations while the relative frequencies of C to A, T to C, and A to T substitutions were increased (53), suggesting that at least in mice Rev1 is involved in the generation of several types of phase II substitutions.
A and T substitutions
A and T substitutions occur only during phase II. Several enzymes have been shown to be involved, among them some error-prone DNA polymerases likely to be involved in DNA repair following the removal of the AID-generated uracil. One such error-prone polymerase is the translesion pol
. Pol
-deficient mice and patients with variant xeroderma pigmentosum who have a mutation in the gene encoding pol
display a reduced level of A/T substitution despite a normal overall mutation rate (29, 31, 54). Mouse pol
is expressed in germinal center B cells (29) and have been found to interact with MSH2/6 (31), suggesting a possible way of recruitment. The substitution pattern of mouse and human pol
in vitro shows a preference for mutations in WA/TW motifs (30), which corresponds well with our findings that the most mutable A and T four-nucleotide motifs include the WA and TW motif, respectively.
Pol
and pol
have also been suggested as being involved in phase II mutations (32, 33, 35, 55), and their error preferences may also influence the substitution patterns. Pol
is, for example, known for a preference for creating A to G transversions and for incorporating G and T opposite dUTP and A opposite an abasic site (56).
Strand symmetry indicates that SHM happens on both strands
Substitution rates in complementary A and T residues showed strand symmetry, indicating that both strands are targeted by phase II mutations. Likewise, the correlation between the C substitution frequency in a particular motif and the G substitution frequency in the motif corresponding to the same motif on the transcribed strand strongly suggests that AID can deaminate both strands equally well during the initial phase of SHM. This is in agreement with data from Foster et al. and Boursier et al. who found that SHM could target both strands in the human
locus (42) and
locus (47), respectively. Studies of AID deamination in vitro are, however, contradictory at this point, as some find only deamination of the nontranscribed strand (19, 57) while others have shown that the transcribed strand can also be targeted (58, 59, 60), although in some cases to a lesser extent than the nontranscribed strand. This discrepancy could be due to the different experimental ways of detecting deamination in vitro, and the presence of cofactors in vivo may help the targeting of AID to both strands. One such cofactor, which has very recently been shown to be involved in targeting AID in engineered mice, is MSH6 (61). MSH6 thus seems to be involved in not only phase II but also phase I SHM.
Models to explain correlations between substitutions in T and G residues and the distance to the nearest 3' AID target
Interestingly, we found significant inverse correlations between phase II substitution rates on T and G residues and the distance to the nearest 3' WRC AID hot spot. Only nonsignificant or borderline significant trends were found for A and C substitutions. In contrast, no correlations were found between substitution rates and the distance to the nearest 5' WRC. It could be argued that the inverse correlation to the distance to WRC is an artifact caused by the location of most WRC sequences in the CDRs. However, the fact that not only the distance but also the direction is important shows that this is not the case. These findings suggest that phase II SHM predominately targets T and G residues in the AID-targeted strand 5' of the deaminated C. Alternatively, phase II substitutions could target the corresponding A and/or C on the opposite strand. Because both strands are targeted, both models account for phase II substitutions in all four nucleotides. These two models are discussed further below.
G and T substitutions 5' of the AID target suggests involvement of a 3'-5' nuclease followed by gap filling
The inverse correlation between the substitution rate in G and T residues and the distance to the nearest 3' WRC motif suggests a molecular mechanism involving a 3'-5' exonuclease and/or an endonuclease. Such enzyme(s) could be recruited to the abasic site created at the site of the initial deamination event where it/they could remove a stretch of DNA 5' of the abasic site. DNA removal in turn could be followed by error-prone gap filling.
Several human 3'-5' exonucleases are known. These include polymerases
and
, WRN, APE1, and MRE11 (62). MRE11 forms the MRN complex along with RAD50 and NBS1. Ectopic expression of NBS1 increases SHM in a hypermutating Ramos cell line (63), suggesting that MRN is involved. This is further supported by the finding that MRE11 binds to a rearranged VH region only in mutating cells and that recombinant MRE11/RAD50 can cleave abasic sites in ssDNA (64). The ability to cleave DNA is separable from the 3'-5' activity (64) and it is possible that both functions are important for SHM. APE1 is also capable of DNA cleavage at abasic sites; however, APE1 does not bind to VH region (64), speaking against an involvement in SHM.
EXO1-deficient mice have normal mutation frequencies but their mutations are C/G biased and hot spot focused (34), suggesting a possible involvement of EXO1 in phase II SHM. EXO1 binds to the VH region, but not the C region in hypermutating BL-2 cells (34). However, EXO1 is a 5'-3' exonuclease and therefore does not fit into this model unless it also has 3'-5' exonuclease or endonuclease activity as previously suggested (65). Alternatively, its involvement in SHM may not be as a nuclease.
As mentioned earlier, several error-prone DNA polymerases have been suggested as being involved in phase II mutations including polymerases
,
,
, and
(29, 31, 54, 32, 33, 55). These could be involved in gap filling following DNA removal. However, to account for the finding of a correlation only between T and G substitution rates and distances to WRC, the involved polymerase(s) would have to make mistakes mainly opposite A and C residues.
An alternative explanation that easily accounts for the strong correlation between substitutions in T residues but a less strong correlation for A substitutions is that a large fraction of the T/A substitutions could be caused by the occasional incorporation of dUTP (instead of dTTP) opposite A during phase II repair. The occasional incorporation of dUTP as a means of generating SHMs has been suggested by Neuberger et al. (22). According to their model, the incorporated dUTP would subsequently be excised and substitutions would be generated during replication over the abasic site.
Phase II substitution on the opposite strand
As mentioned above, the finding of inverse correlation between T and G substitution rates and the distance to the nearest 3' WRC can also be explained if the main targets for phase II mutations are C and A residues on the strand opposite the AID target. This could, for example, be the case if the generated uracil is either removed to generate an abasic site or is left untouched until replication. During replication, the abasic site/uracil could cause the DNA polymerase to stall and recruit an error-prone translesion polymerase. In fact, all of the error-prone polymerases implicated in SHM (
,
,
, and
) are known translesion polymerases. The translesion polymerase would be predicted to be engaged opposite the targeted C and introduce errors while synthesizing a short DNA segment. If errors are preferentially introduced opposite T and G residues, this model would explain our findings. This is, for example, the case for polymerase
, which has been show to preferentially incorporate G opposite T, leading to a T to C transition on the AID-targeted strand 5' of the targeted C (56).
Influence by substitutions on the neighboring nucleotides
We found that substitutions in C and G residues in hot spot motifs decreased the substitution frequency of the neighboring G/C residue. This can be explained if the SHM machinery does not normally make substitutions in neighboring nucleotides during the same cell cycle, because substitutions in the hot spot motifs most often lead to a less mutable motif such as AGCA (mutated in 23.8% of the motifs) becoming AACA (8% mutation), for example. It is also possible that substitutions do occur on both strands during the same round of SHM but that the two strands subsequently end up in sister cells before the substitutions are fixed.
Despite the previously shown inverse correlation between the mutation rate of a given nucleotide and the distance to the nearest 3' WRC, we find that mutations in a given AGCT position (included in the WRC/GYW motif) do not influence the overall mutation distribution in the sequence. This suggests that the C/G nucleotide that undergoes the initial deamination step (index nucleotide) leading to phase II SHM is sometimes repaired, while on other occasions the mutation is fixed during phase II. If the index nucleotide mutation is always fixed during phase II we would expect to find more mutations 5' of the index nucleotide in the sequences mutated in the index and, on the contrary, if the index is always repaired we would expect to find fewer mutations.
Substitution pattern of T residue preceding a run of G residues
The substitution pattern of the T residue in the last position of codon 15 is remarkable because the frequency is very high, it has almost exclusively T to G transversions, and it is outside any known SHM hot spot motifs. The adjacent run of six G residues contains almost no substitutions but a high frequency of 13 nucleotide insertions likely to be caused by Taq polymerase slippage. However, comparison of the numbers of substitutions in the T residue in the mutated (62 in 383 sequences) and unmutated (2 in 263 sequences), nonproductive sequences clearly shows that Taq errors are not the cause of the unusual substitution pattern. Substitutions in T or A residues preceding the other runs of at least four G residues are also predominantly to G. The JH genes also have a run of four G residues, but this region is found to contain very few mutations in all sequences.
Runs of three or four G residues in G-rich motifs are known to be able to form G quartets when single stranded (66). G quartets, are for example, formed in the G-rich nontranscribed strand of the switch region during CSR, and AID is found to bind to them (67). It can be hypothesized that the runs of G residues in the variable region also fold into G quartets during the transcription-dependent, single-stranded phase of SHM. Although AID may bind to the quartets, the activity of the enzyme may be inhibited, which would account for the low substitution rate. How this can lead to a T to G transversion in the flanking base is unknown. One possibility is that the quartets attract other proteins. GQN1 is a human endonuclease highly expressed in B cells and has been shown to cleave DNA 25 nucleotides upstream of G quartets (66). This endonuclease may possibly be involved in the cleavage of the DNA leading to SHM, although the observed substitution pattern is not readily explained.
D and JH gene substitutions
We found that substitution rates in motifs in the VH region were comparable to the substitution rates found in the same motifs in the D and VH genes. It is noteworthy that the two JH genes analyzed contain very few C/G mutational hot spot motifs in the region encoding FR4 while the regions encoding CDR3 have several hot spots, for example four overlapping TACT motifs in JH6 creating hot spots for T, A, and C mutations. Hot spots are also common in the D genes contributing to the CDR3. In contrast, the FR4 regions encoded by the JH genes contain many cold spot motifs and showed very low substitution rates. This suggests that there has been an evolutionary selection against mutational hot spots in the FR4 region. That would be in line with earlier studies suggesting that the codon usage in CDR regions of IgV has been optimized for SHM (9, 10, 68).
The substitution rate depends on the JH gene
Although substitution patterns in the JH genes are the same as in the VH region, we find that the mutation status of a rearrangement partly depends on the JH gene. JH6-carrying sequences are less likely to be mutated than JH4-carrying sequences and, when mutated, they contain fewer substitutions on average. The mutation pattern does not seem to change but the overall frequency is decreased. The fact that the difference was found in nonproductive rearrangements makes the simplest explanation, namely that the cells with a JH6-containing rearrangement constituted a special cell subset containing fewer mutations, very unlikely, because to account for the observed findings rearrangements on both alleles would then have to use the same JH gene. This is not thought to be the case.
Rearrangements using JH6 tend to have longer CDR3 loops than rearrangements with, for example, JH4 (69) (L. Ohm-Laursen et al., manuscript in preparation) and CDR3s are longer among unmutated sequences compared with mutated (69, 70, 71 and S. Petersen and T. Barington, manuscript in preparation). This corresponds well with the finding of more JH6 sequences in the unmutated subset. However, even when the mutation analysis is restricted to rearrangements within a narrow range of CDR3 lengths (4452 bp), we still find that the JH6-carrying sequences are less likely to be mutated and, when mutated, are significantly less mutated than JH4-carrying sequences (data not shown). Thus, the length of the CDR3 does not seem to account for the changed mutation frequency.
Therefore, we suggest that rearrangements using JH6 have special properties influencing the mutation rate. Perhaps a binding site for a cofactor is located within the intronic region upstream of JH6 and is therefore deleted when JH6 is used. Also, it is possible that JH6 is simply too close to the regulatory elements in the 3' intronic enhancer (Eµ) (72, 73) for optimal effect.
Another possible regulator could be the E box motif 5'-CAGGTG-3', which is known to bind the regulatory E47 protein (74). This motif is found in the 3' end of JH1, JH2, JH4, and JH5 but not in JH3 and JH6, where the last nucleotide of the motif is exchanged to an A. When inserted into the
locus, the 5'-CAGGTG-3' motif has been shown to enhance SHM in transgenic mice without changing the mutation pattern (74). Mutation of the E box motif to 5'-AAGGTG-3' decreases this effect. Furthermore, inactivation of the E2A gene in the DT40 chicken B cell line reduces SHM. Mutations can be restored by the expression of either of the E2A splice variants, E47 or E12, showing the importance of these proteins for the level of SHM (75). Because the 5'-CAGGTA-3' motifs found in JH3 and JH6 deviate from the consensus 5'-CANNTG-3' E-box motif, we therefore hypothesize that this one nucleotide difference may be involved in reducing the mutational load of JH6-carrying sequences compared with JH4-carrying sequences. Regardless of the cause, the finding of a variable mutation frequency being dependent on the type of JH gene has implications for the affinity maturation and fine tuning of the repertoire, as JH6 and JH4 are the two most commonly used JH genes in the repertoire (76).
Concluding remarks
The Ig repertoire is known to be shaped by the SHM of the variable regions during an immune response. In this study we report that the mutation machinery operates equally well on both strands. The substitution frequency of a given residue is dependent on the motif in which it resides and the distance to the nearest 3' AID deamination hot spot, suggesting that phase II substitutions occur 5' of the site of the initial deamination. Alternatively, phase II substitutions occur on the opposite strand 3' of the G residue facing the AID-targeted C residue. Substitutions in the neighboring nucleotide also influence the substitution frequency of C and G in AGCT double hot spots. Motifs are the same in VH, D, and JH genes; however, the JH gene of the rearrangement influences the overall mutation frequency, because JH6-using rearrangements are found to contain fewer mutations than JH4-using rearrangements.
The sequences in this study use the IGHV3-23*01 VH gene, and it can therefore be argued that the findings may be special to this VH gene. However, when possible we have confirmed the results by analysis of a set of sequences using the IGHV3-h pseudogene. Also, previous work from many groups suggests that the mutation process is similar irrespective of which VH genes have been studied. We therefore suggest that the results presented in this paper are also applicable to other human VH genes.
| Acknowledgment |
|---|
| Disclosures |
|---|
|
|
|---|
| Footnotes |
|---|
1 This study was supported by Danish Medical Research Council Grant 22-01-0156. ![]()
2 Current address: University of Oxford, The Peter Medawar Building for Pathogen Research, South Parks Road, Oxford, U.K. ![]()
3 Address correspondence and reprint requests to Prof. Torben Barington, Department of Clinical Immunology, Odense University Hospital, 5000 Odense C, Denmark. E-mail address: barington{at}dadlnet.dk ![]()
4 Abbreviations used in this paper: SHM, somatic hypermutation; AID, activation-induced cytidine deaminase; CSR, class switch recombination; EXO1, exonuclease I; FR, framework region; MSH, MutS homolog; pol, DNA polymerase; UNG, uracil DNA glycosylase. ![]()
Received for publication July 13, 2006. Accepted for publication January 8, 2007.
| References |
|---|
|
|
|---|
gene: critical role for the intron enhancer/matrix attachment region. Cell 77: 239-248. [Medline]
chains is independent of local and neighbouring sequences and related to the distance from the initiation of transcription. Eur. J. Immunol. 27: 3115-3120. [Medline]
. J. Immunol. 167: 327-335.
plays a major role in Ig and bcl-6 somatic hypermutation. Immunity 14: 643-653. [Medline]
J
rearrangements: targeting of RGYW motifs on both DNA strands and preferential selection of mutated codons within RGYW motifs. Eur. J. Immunol. 29: 4011-4021. [Medline]
light chain transgene. Proc. Natl. Acad. Sci. USA 99: 9954-9959.
) gene segments suggests that both DNA strands are targets for deamination by activation-induced cytidine deaminase. Mol. Immunol. 40: 1273-1278. [Medline]