|
|
||||||||
Biomathematics Program, Department of Statistics, North Carolina State University, Raleigh, NC 27695.
where N is the total number of counts, ni· =
j nij and n·j =
i nij. This rescaling of the residuals makes their marginal distributions under the null hypothesis approximately standard normal.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
In addition, this sequence specificity has itself had a striking influence on the evolution of Ig V genes allowing an enhancement of their plasticity under affinity maturation. Codon bias differs between framework and complementarity-determining regions, with the result that the framework nucleotides are less mutable than those in the complementarity-determining regions (5, 6, 7). Direct counting of mutations accumulated in nonproductively rearranged Ig genes confirms that this difference hypothesized under relatively simple empirical models for mutability is indeed realized in a significant and observable way (8).
There is another aspect of the mutational mechanism that has the
potential for providing a distinctive signature and thereby information
about the underlying mechanisms of somatic hypermutation: the mutation
spectrum. By this we mean the frequencies at which specific bases occur
at a particular position, given that the original nucleotide at that
position has mutated (again, under selection-free conditions). One
aspect of this issue has been addressed already: the transition to
transversion ratio has been determined to be approximately 2:1
(3, 9). We have analyzed a much larger dataset than has
previously been considered and therefore can provide a detailed
characterization of the mutation spectrum for all nearest-neighbor
interactions. This dataset is comprised of four independent sets of
nonfunctional Ig V sequences and Ig introns that have undergone somatic
hypermutation free of selective pressure. Our analysis is based
primarily on straightforward
2 tests of multidimensional
contingency tables. We find that the spectrum depends not only on the
identity of the mutating base, but also on the identity of the
immediate 5' and 3' neighbors.
An effect of this sort has been documented for meiotic (germline) mutations by comparing a large dataset of human genes and their related pseudogenes (10). We find that the mutation spectrum and its context specificity for somatic hypermutation is very different from that observed in meiotic mutation. The context-dependent effects under somatic hypermutation can be very crudely summarized by the observation that the effect of flanking nucleotides is frequently to promote homogeneity of the local sequence. For example, a nucleotide within a homodimer is more likely to experience a transition than the same nucleotide in another context, while a nucleotide neighbored by its complementary base is more likely to experience a transversion to its complement.
Somatic hypermutation shows a marked strand bias; adenines mutate more frequently than do thymines, for example. Recently, this simple observation has been complicated by the results of analyses finding correlation between the mutability of trinucleotide motifs (11) or quartets of the RGYW motif (12) and the mutability of the corresponding inverted complement motifs. This apparent symmetry is largely confined to symmetry between G and C nucleotides, so the tentative picture now drawn is that of a compound mechanism that mutates A and T in a strand-biased manner, but that mutates G and C without notable bias (13, 14, 15).
We find that the mutation spectrum exhibits very little strand bias at all. In particular, the symmetry between A and T is quite marked. We speculate that this difference between the targeting of somatic mutation and the resulting mutational spectrum is due to the action of multiple distinct mechanisms responsible for the biochemistry of somatic hypermutation, even beyond the multiple mechanisms hitherto postulated.
| Materials and Methods |
|---|
|
|
|---|
The pooled set of somatically mutated sequences contains a total
of 1721 mutations: 610 A, 452 G, 336
C, and 323 T. The sequences included in this
analysis are as follows: murine J-C intron sequences
containing 510 mutations (4), nonfunctionally rearranged
human heavy chains containing 349 mutations (16),
nonfunctionally rearranged human heavy and
and
light chains
with 67, 319, and 84 mutations, respectively (8), and
murine 3' flanking region sequences (3' VJ
1,
154 mutations (17, 18); 3' JH1, 162
mutations (19)) and J-C intron sequences (77
mutations (20, 21)). We performed comparisons with
germline mutations using pseudogene data from Hess et al.
(10).
Because the sequences come from a variety of sources and include both
murine and human gene segments, we tested for whether the mutation
spectrums from the different datasets have a similar distribution using
the heterogeneity
2 test for pooling contingency tables
(22) (see below for statistical methods). The tests found
no differences at the 0.05 level for complete 4 x 4 x
4 x 3 tables for 5' and 3' adjacent nucleotides (p =
0.167 and p = 0.158, respectively). Thus, the
pooling of data from all datasets is very unlikely to cause errors in
the statistical inferences of interest to us here. This procedure does
not provide an exhaustive comparison of the characteristics of each
dataset, however, and should not be taken as positive evidence that
they are identical in all respects. Further data might very well reveal
differences between murine and human sequences or between exon and
intron sequences, but the lack of differences under the heterogeneity
2 test does provide confidence that the effects
discussed in this paper are not artifacts of the pooling process.
Statistical methods
To test independence of the mutation spectrum from the identity
of the mutating nucleotides 5' or 3' neighbor, we formed two 4
x 3 x 4 contingency tables with 24 degrees of freedom each in
which the rows categorize the identity of the 5' or 3' flanking base,
respectively, the columns categorize the identity of the destination
base (the base that a mutation is to) and the tiers categorize the
mutating, or original, base. We used a
2 test of the
null hypothesis of independence of neighboring base and destination
base, conditional on mutating base (23). That is, for each
mutating base Y, we tested whether the identity of the
destination base is random with respect to the identity of the 5' or 3'
neighbor base. When testing for conditional independence, the total
2 is the sum of the partial
2 values,
which, in our case, represent the effect for each of the four mutating
bases. Under the null, each of these partial
2 values as
well as the total is distributed like
2 with the
appropriate number of degrees of freedom.
The available pseudogene data (10) take the form of
substitution frequencies for the center base pair in each of the 32
base-pair triplets. The authors of that study "collapsed" the data
by summing the frequencies for each triplet with that for its
complement, thereby obviating the need to discriminate between the two
DNA strands. For example, the mutation frequency of base C
to base T in the context ACG was computed as
![]() |
XY'Z) is the total number of
mutations of Y to Y' when flanked by X
and Z. To make comparison with this pseudogene data
possible, the somatic hypermutation data was coerced into this format
as well. For both the pseudogene data set and this collapsed somatic
hypermutation data set, 4 x 3 x 2 contingency tables were
set up as described above, but in this case, there are just two
mutating bases, C and T. To test for differences between the two data sets, we constructed two 4-way tables (2 x 3 x 2 x 4) in which the classifications are 1) data source (somatic hypermutation or pseudogene), 2) destination base, 3) mutating base, and 4) neighboring base (5' and 3' neighbors, respectively). We tested the null hypothesis of independence of data source and destination base conditional on both mutating base and neighboring base, that is, for each of the eight dimers XY (YZ), we tested whether the identity of the destination base is random with respect to the source of the data.
For graphical representation of the contingency tables, we computed the adjusted residuals3 for each cell (23). These have the attractive property that they are distributed approximately as standard normal random variables under the null hypothesis. Thus, in studying the figures, values of z larger in absolute value than 1.96 are significant at the 0.05 level, those larger than 2.58 are significant at the 0.01 level, and so on.
We computed and tested Spearmans rank correlation statistic using SPLUS (MathSoft, Seattle, WA) to evaluate the degree to which the sequence specificity of the mutation spectrum is symmetric and to check for similarities between the mutation spectra of the somatically mutated Ig sequences and the germline mutated pseudogene sequences.
Parameterization of the mutation probabilities
For convenience, we have adopted the following notation.
(for "self") will be used to designate the identity of the mutating
base;
(self-complement) will be used to designate the bases
complement;
(transition) will be used to designate the bases
transition base;
(transition-complement) will be used to
designate the complement of the bases transition base. For convenient
reference, we have provided a translation guide in Table I
.
|
in the motif X
or
Z that are transitions
![]() |
that are mutations to
the complement of
,
![]() |
| Results |
|---|
|
|
|---|
The mutation spectrum at a given base is clearly not independent
of the microsequence containing that base. The
2 values
for the test of the effect of both 5' and 3' nucleotides are
significant at the traditional 0.05 level (Table II
). The p value for the
conditional independence of the 5' base and the destination base is
smaller than 0.001. In fact, the effect of the 5' base seems to be
larger than that of the 3' base for all four mutating bases. For the
effect of the 5' nucleotides, all of the partial
2
values are much larger than expected under independence, as are the
partials testing the effect of the 3' nucleotide on C and
G. The effect of the 3' base on the mutation spectrum of
A and T appears to be weak, consistent with the
relative weakness of the 3' nucleotide in general.
|
) while a reduction in the probability of
transversions is insignificant. This enhancement of
the transition probability for homodimers is consistently observed when
the four datasets are analyzed separately: 27 of the 32 relevant
adjusted residuals are positive (9 of them significantly so), 2 of them
are approximately 0, and 3 of them are negative (none of them
significantly so; data not shown).
|
Scaled transition frequencies, pS, computed
by dividing the transition frequency for the mutating base in a dimer
by its background transition frequency, are plotted in the contour
plots in Fig. 3
, upper panel.
The red row at the bottom of each figure reveals that both nucleotides
in 
dimers have an elevated transition frequency
pS. For example, the 3' C in the
homodinucleotide CC is more likely to mutate to T
than would be expected based on the transition frequency of base
C when considered out of context.
|
,pC
The lower panel of Fig. 3
shows the scaled proportion
of transversions that are to the complement of the mutating base,
pC, computed as described above for the scaled
pS. The lower left panel of this figure suggests
that when T or A is preceded by its complement
(
, for
= T, A), the proportion of
transversions to complement is enhanced. In fact, a preceding
T may inhibit pC for all bases except
its own complement A, while it enhances
pC when preceding A. A similar
pattern appears to hold for A. A 5' A may enhance
pC for all bases except itself, while inhibiting
pC when preceding itself.
The row in the lower right panel of Fig. 3
corresponding to the
mutating base being followed by its complement, 
, suggests
that a base having its complement as its 3' neighbor base enhances the
transversion to complement frequency. Bases T, A,
and G mutate more often to their complement when they are
followed by their complement. The exception is for the transversion
CG
GG. Of the 336 mutations of C,
only 8 occur for C in the dimer CG; only two of
these are transversions, none of these is a mutation to G.
The expected number of CG
GG transversions
is 1.8.
T 3' of C, T, or G inhibits the transversion to complement frequency, just as a 5' T; a 3' A appears not to have the same effect as a 5' A.
Strand symmetry
One of the characteristic features of somatic hypermutation is its apparent strand asymmetry. For example, mutations are found at adenines much more frequently than at thymines (3, 9). This has been taken as evidence for strand bias of the mutator, that mutations are introduced preferentially into one strand. Recent analyses suggest a more complex picture than this; the sequence specificity of the mutator indicates some degree of symmetry, especially between G and C mutations (11, 12).
Our analyses indicate a high degree of strand symmetry in the effect
that neighboring nucleotides have on mutation spectra. Inspection of
Fig. 3
reveals a great deal of similarity between the effects of
X on Y in the dinucleotide XY and
those of the complements X on Y in
the complementary dinucleotide YX; e.g., the
effect of A preceding C is quite similar to the
effect of T following G. The figures have been
constructed in such a way that corresponding plots will be identical if
symmetry under complementation is exact.
Indeed, computing the Spearmans rank correlation for scaled pS, in which the scaled pS for XY (Y mutating) is paired with scaled pS for YX (Y mutating), we find that there is a moderately high correlation, r = 0.51, and in spite of the small number of points, it is significantly different from zero (p = 0.047). This means that when the transition probability pS for the dimer XY (Y mutating) is elevated (or inhibited), so is that for the complementary dimer YX (Y mutating) suggesting that Y in XY is replaced by Y' on both strands of the DNA with similar probabilities. The complementary pairs for pC, formed in just the same way are even more strongly correlated: r = 0.77, p = 0.003.
Collapsed somatic hypermutation data
To compare the mutation spectrum of the somatic hypermutation data
with that of the pseudogene data, we first analyzed the somatic
hypermutation data in the format of the pseudogene data as described
above to ensure that the statistical patterns were not lost or changed
by combining reverse complement motifs. The 4 x 3 contingency
tables were analyzed as described in Materials and Methods,
and, as can be seen from Table III
, the
mutation spectrum remains nonrandom with respect to the target context
in the collapsed dataset. The total effect
2 values for
5' and 3' neighbors are significantly large, as are the partial
2 values.
|

homodimers,
while the probability of
transversions is reduced
(data not shown). The enhancement of pS for

homodimers is also evident in Fig. 4
and T
dimers have an enhanced
transversion frequency (Fig. 4
|
The mutation spectrum of the pseudogene data is dependent on the
target context; the patterns of dependency, however, are not the same
as those for the somatic hypermutation data. To test for independence
of the mutation spectrum and the target context in the pseudogene data,
4 x 3 contingency tables were analyzed as described in
Materials and Methods. The
2 values are shown
in Table IV
; all of the
2
values are highly significant indicating that the mutation spectrum
does depend on the target context. To test for differences between the
mutation spectra of the two datasets, 2 x 3 contingency tables
were set up for each of the 8 dimer motifs, and their partial
2 values used to compute the relevant total
2 values (described in Materials and
Methods). Both of the total effect
2 values are
significant (see Table V
), indicating
that the mutation spectra of the two datasets do differ.
|
|
| Discussion |
|---|
|
|
|---|
The most consistent dependence is in the increased tendency of homodimers to mutate via transitions and the attendant decrease of homodimer mutations to the complementary base. This is true regardless of the identity of the mutating base and whether it appears in the 5' or 3' position within the dimer. For example, both As in the homodimer AA have an enhanced probability of mutating to G and a reduced probability of mutating to T when compared to As in any other context.
Another feature is the tendency of A/T mixed dinucleotides
to homogenize. That is, when A flanks T (or
T flanks A), the mutating base tends to become
that neighbor; e.g., AT
AA is enhanced. This
effect is not seen for G/C dinucleotides.
There is a striking symmetry under complementation, especially for A or T mutating. This is in notable contrast to what has been suggested for the targeting of mutation, in which A and T seem to be more asymmetric than G and C.
We compared the effects of neighboring bases under somatic hypermutation to that observed in pseudogenes and found not only that the patterns differ, but that there is, in fact, no correlation between them. Thus, there is no evidence here to support the hypothesis that the mechanism of hypermutation is essentially related to normal DNA repair pathways. Our own analysis, however, of the targeting of somatic mutation shows a strong correlation between the microsequence specificity of the mutation targeting under somatic mutation and that under meiotic mutation.4
It is our hope that the patterns we have begun to elucidate will help identify the elusive mechanism(s) of somatic hypermutation. While we are not prepared to propose specific hypotheses in this regard, we would like to offer one general observation. The behavior of the mutation spectrum under complementation symmetry is rather different from the behavior of the targeting of mutations. Whereas the spectrum is strongly symmetric, especially between A and T nucleotides, there is a strong disparity between the targeting of mutation at A and T nucleotides. We suggest that this fact makes quite plausible the notion that the mechanism is complex, involving at least two stages, the introduction of mutations followed by their resolution. For example, the first stage might involve the insertion of mis-paired bases, in a way that depends on the local microsequence. A second stage might consist of the recognition and resolution of the noncanonical base pairs, again in a local microsequence-dependent manner, but one that is wholly different from that of the first stage. This scheme is consistent with our analysis4 of hypermutation targeting, which further suggests that the first stage is closely related to the "targeting" of mutation under meiotic processes. Within the first stage, there may be two distinct mechanisms as suggested by others: one stage with strong strand bias, the other acting symmetrically (11); these two mechanisms may effect A/T and G/C nucleotides differently (13, 14, 15). We present evidence for an additional stage during which the distribution of resultant nucleotides is determined in a sequence-specific and strand-independent manner.
|
| Acknowledgments |
|---|
| Footnotes |
|---|
2 Address correspondence and reprint requests to Dr. Thomas B. Kepler, Biomathematics Program, Department of Statistics, Box 8203, North Carolina State University, Raleigh, NC 27695-8203. E-mail address: ![]()
3 Adjusted residuals are defined as follows. For a contingency table with two factors, let the count in cell (i, j) be denoted nij and its expected value eij. Then the adjusted residuals zij are given by
![]()
4 T. B. Kepler, M. Oprea, and L. G. Cowell. The targetting of somatic hypermutation closely resembles that of meiotic mutation. Submitted for publication. ![]()
Received for publication August 13, 1999. Accepted for publication December 9, 1999.
| References |
|---|
|
|
|---|
genes: unequal distribution of mutation in 5' and 3' flanking regions. Int. Immunol. 5:255.
and its 5' flanking sequences determines the location of somatic mutations in the J
locus. J. Immunol. 146:3652.[Abstract]
This article has been cited by other articles:
![]() |
U. Hershberg, M. Uduman, M. J. Shlomchik, and S. H. Kleinstein Improved methods for detecting selection by mutation analysis of Ig V region sequences Int. Immunol., May 1, 2008; 20(5): 683 - 694. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Spencer and D. K. Dunn-Walters Hypermutation at A-T Base Pairs: The A Nucleotide Replacement Spectrum Is Affected by Adjacent Nucleotides and There Is No Reverse Complementarity of Sequences Flanking Mutated A and T Nucleotides J. Immunol., October 15, 2005; 175(8): 5170 - 5177. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Kong and N. Maizels DNA Breaks in Hypermutating Immunoglobulin Genes: Evidence for a Break-and-Repair Pathway of Somatic Hypermutation Genetics, May 1, 2001; 158(1): 369 - 378. [Abstract] [Full Text] |
||||
![]() |
M. Oprea, L. G. Cowell, and T. B. Kepler The Targeting of Somatic Hypermutation Closely Resembles That of Meiotic Mutation J. Immunol., January 15, 2001; 166(2): 892 - 899. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |