The Journal of Immunology, 2006, 177: 3857-3864.
Copyright © 2006 by The American Association of Immunologists, Inc.
A Model for TCR Gene Segment Use1
Aryeh Warmflash* and
Aaron R. Dinner2,
,
,
,¶
* Department of Physics,
Department of Chemistry,
Committee on Immunology,
Institute for Biophysical Dynamics, and
¶ James Franck Institute, University of Chicago, Chicago, IL 60637
 |
Abstract
|
|---|
The TCR
-chain is assembled by somatic recombination of variable (V) and joining (J) gene segments at the CD4+CD8+ stage of development. In this study, we present the first analytical model for deletional rearrangement and show that it is consistent with almost all available data on V
J
use in mice and humans. A key feature of the model is that both "local" and "express service" models of rearrangement can be obtained by varying a single parameter that describes the number of gene segments accessible at a time. We find that the window is much larger for V
segments than J
segments, which reconciles seemingly conflicting data for the former. Implications for the properties of the repertoire as a whole and experiments that seek to probe them are discussed. Special considerations for allelic inclusion are treated in the Appendices.
 |
Introduction
|
|---|
To provide protection from pathogens, T lymphocytes must react to an enormous variety of foreign molecules. The specificity of a clone for Ag is determined by the TCRs on its surface, which are typically heterodimers of
- and
-chains. The diversity of the TCR repertoire derives in large part from the fact that both
- and
-chains are generated during intrathymic development by somatic recombination of gene segments encoding the constant and variable portions of these molecules. The latter include variable (V), diversity (D), and joining (J) gene segments in the case of
-chains and V and J gene segments in the case of
-chains.
Obtaining a quantitative understanding of how physical factors impact rearrangement of TCR loci is important for assessing the diversity of specificities in the TCR repertoire. Loci with D segments can only rearrange each chromosome once because all but the used D segment are deleted by the primary rearrangement. In contrast, those without D can rearrange repeatedly; here, we focus on
loci. In mice, there are 104 V
segments and 61 J
segments; in humans, the number of J
segments is about the same, but there are only about half as many V
segments. When a rearrangement brings specific V
and J
segments together, intervening gene segments are deleted rather than inverted due to the orientation of the recombination signal sequences (1).
There is general agreement that J
segments are used in an essentially sequential manner (termed "local service"), starting at the 5' end (proximal to the V
segments) and proceeding to the 3' end (2, 3, 4, 5, 6, 7, 8). There is no such consensus concerning the V
segments. One group reported preferential use of 3' (J
proximal) over 5' (J
distal) V
segments (8), but another argued for nonsequential use ("express service") of V
segments based on the fact that there is only a loose correlation in V
/V
-pairs in cells with productive rearrangements on both chromosomes (6). In the present paper, we develop a mathematical model that reconciles these seemingly conflicting observations. We estimated the V
and J
window sizes (WV and WJ, respectively) from the average separations of segments on different alleles in selected cells and then show that, without further adjustment, these parameters yield good agreement with independent experimental data on V
and J
use in the selected TCR repertoire. We find that the window is much larger for V
segments than J
segments (37 segments compared with 13 segments, respectively), which accounts for quasisequential use of V
gene segments (8) with only loose correlation between alleles (6). Beyond shedding light on mechanisms of rearrangement, the model is useful for extrapolating statistics on the TCR repertoire from data obtained with limited numbers of V
-specific mAbs, as demonstrated in the Appendices.
 |
Materials and Methods
|
|---|
We derived an expression for the number of possible ways each V
J
gene configuration can be generated and then used it to estimate probabilities of observing in the periphery: 1) V
and J
segments irrespective of with which gene they are paired, 2) V
J
pairs, and 3) V
/V
and J
/J
pairs in dual TCR cells.
Counting paths to V
J
gene configurations
As mentioned in the Introduction, we can treat V
and J
rearrangements with the same expressions by encoding the degree to which each set is used sequentially by a parameter (W) that describes the number of accessible gene segments. In other words, we assumed that rearrangements at any time can be made to the W most proximal gene segments remaining; this scheme is discussed in further detail below. A small value of W corresponds to more sequential use, and a large one to more random use.
For clarity, we numbered the V
and J
segments separately according to their initial (prerearrangement) positions starting from the most proximal ones (9) and refer to the ordered list of (V
or J
) segments sampled in multiple rearrangements of a chromosome as a "path" (Fig. 1). The specific goal of this section is to determine g(n,k,W), the number of paths of length k (i.e., those corresponding to k rearrangements) that end in segment n subject to the window constraint described above.

View larger version (14K):
[in this window]
[in a new window]
|
FIGURE 1. Schematic of the model. A path of length of two rearrangements is illustrated. In each rearrangement, intervening DNA is deleted, and the window of accessible gene segments is shifted to begin at the point of the last rearrangement.
|
|
To this end, we determined the number of unrestricted such paths and then removed the contribution from those that violate the window constraint. We began by noting that the segment to which the last rearrangement is made is fixed, and the number of ways of choosing the remaining increasing sequence of k 1 segments from the n 1 prior ones was
 | (1) |
From these unrestricted paths, we subtracted the number of combinations with immediately sequential segments separated by more than W ("forbidden jumps"). For this purpose, we defined the quantity
 | (2) |
where m is the number of forbidden jumps in a path. The first binomial coefficient gives the number of ways of placing the m forbidden jumps among the k rearrangements. The second gives the number of ways of choosing the gene segments for the first k 1 rearrangements given that certain jumps are required to be of length greater than W. In the advent that n mW < 1, we take the second factor on the right side to be 0 here and below.
It is important to note that the quantity f(n,k,W,m) is not in itself the number of paths with m forbidden jumps because it also counts those with more than m forbidden jumps repeatedly. In particular, a path with q (q
m) forbidden jumps contributes q!/m!(q m)! times to f(n,k,W,m) because there are that many ways of choosing the minimum of m jumps specified to violate the window constraint. To obtain only the number of allowed paths, begin by considering f(n,k,W,0). By comparing Eqs. 1 and 2, it can be seen that f(n,k,W,0) is the number of unrestricted paths. It is thus necessary to subtract f(n,k,W,1) from f(n,k,W,0). Then paths with one forbidden jump will no longer be counted. However, doing so overcompensates with regard to the paths with more than one forbidden jump because they are counted q!/1!(q 1)! = q times in f(n,k,W,1). Adding back f(n,k,W,2) overcompensates for paths with more than two forbidden jumps in the opposite direction, and so it is necessary to continue subtracting and adding terms with increasing numbers of forbidden jumps until the maximum possible (k) is reached. Generalizing, in this alternating sum of f(n,k,W,m) over m, the number of times paths with q
1 are counted is
 | (3) |
where the first equality derives from the binomial expansion. In other words, in an alternating sum of f(n,k,W,m) over m, contributions from paths with forbidden jumps cancel. Thus, the number of paths that satisfy the window constraint is
 | (4) |
Eq. 4 is the main result of this paper, and all subsequent expressions are derived from it.
Probability distributions
To use Eq. 4 to estimate the probabilities of observing particular V
and J
segments in various contexts, we assume that the likelihoods of making productive rearrangements (Pr) and being selected (Ps) are constants. Clearly, the latter does in fact vary for specific gene segments, but here we are concerned with the overall statistics of the repertoire; moreover, we effectively average over different
-chains and CDR sequences. Given Pr and Ps as well as the window sizes, we combine these variables into case-dependent aggregate probabilities for attempting to generate a particular gene configuration and then being or not being selected (a and b, respectively).
V
or J
probability distributions.
Here, we consider the probability of observing a single V
or J
segment (indexed n), irrespective of with which gene it is paired [P(n)]. The likelihood of making a productive rearrangement and then being selected is the product PrPs, and its complement is 1 PrPs. We normalize these expressions by the window size for a and b because there is an equal chance of picking any accessible gene segment:
 | (5) |
To obtain the probability of interest, we perform a sum over the numbers of rearrangements (k < Nr, where Nr is the maximum number of rearrangements, which acts as a surrogate for time in the thymus) weighted according to the number of paths:
 | (6) |
where Q is a normalization factor determined by summing over all possible values of n. The factor bk1 arises from the fact that to be selected in k rearrangements, a clone must fail to be selected in k 1 previous attempts.
V
J
pair distributions.
In the case that we want the probability of observing V
segment nV with J
segment nJ [P(nV,nJ)], it is necessary to normalize the aggregate probabilities a and b instead by the number of possible pairs (WVWJ):
 | (7) |
where WV and WJ are sizes of the windows of accessible V
and J
segments. As above, we sum over the number of rearrangements weighted by the numbers of paths to nV and nJ:
 | (8) |
where again Q is a normalization factor, but, in this case, it is computed by considering all possible pairs of V
and J
segments.
V
/V
and J
/J
distributions in dual TCR cells.
Statistical data are available for clones with two productively rearranged
loci (6, 10), so it is of interest to determine the likelihood of pairing like types of segments on different alleles. Although the expression for this joint probability is similar to that for a particular V
J
pair, a and b must be adjusted for dual TCR cells. Assuming for convenience that both alleles are rearranged simultaneously but independently (see Appendix and Ref. 11 for discussions of this simplification), the probability of productively rearranging both chromosomes and then being positively selected is Pr2Ps. This product determines a. The aggregate probability b accounts for clones that edit their gene configurations. It is thus necessary to subtract the contributions from both selected dual and single TCR cells: Pr2Ps and Pr(1 Pr) Ps, respectively (the possibility of cell death is addressed below). Normalizing by the number of gene segment pairs from the two sets of interest:
 | (9) |
The probability for selecting a cell with segment n1 from the first chromosome and segment n2 from the second is
 | (10) |
which differs from Eq. 8 in that both counting factors use the same window size, and the normalization factor Q is adjusted accordingly.
Evaluation of simplifications
The model described above makes no reference to underlying molecular details and is consistent with any mechanism that only allows a certain number of sequential gene segments to be accessible at a given time. To make the model mathematically tractable, additional simplifications were made. In particular, we assume that rearrangements can always access the W most proximal remaining segments. However, data for the Ig H chain locus suggest that the windows of accessible gene segments are predetermined (12). In this case, the number of segments to which rearrangements can be made varies because the ends of the windows are restricted to fixed points along the locus.
To determine how varying the mechanism for making gene segments accessible impacts the results, we performed stochastic simulations for models with a sliding window of constant size (as in the mathematical derivations above) and one with fixed ends but variable size (as mentioned immediately above). The behavior of the latter depends on the number of rearrangements between shifts in the window position and we tried several values. Overall, the results from the two models are similar (Fig. 2). The main difference is that, when the window moves infrequently in the case of predetermined regions of accessible gene segments, segments toward the ends of the windows are used to a somewhat greater degree. Indeed, the abrupt changes in use across the boundaries of the windows are similar in shape to distributions observed for distal J
segments in cells that are forced to edit rather than die due to transgenic expression of Bcl-xL (7).
For simplicity, we also explicitly considered only one allele in deriving the first two probability distributions above. However, positive selection following functional rearrangement of a second chromosome terminates rearrangement of the chromosome of interest. Because the alleles are otherwise independent, we can easily account for both by interpreting Nr as the number of rearrangements per allele rather than the total. Computer simulations confirmed that, as long as the normalization is treated consistently (see Appendix B), considering one allele with Nr rearrangements and two alleles with 2Nr rearrangements yielded identical results to within simulation error (data not shown).
In the derivations above, cell death (due to autoreactivity or neglect) is not considered explicitly for clarity. Accounting for this phenomenon requires subtracting from the numerators of the aggregate probability b in Eqs. 5, 7, and 9 products of the form PrPd where Pd is the probability of cell death. As mentioned with regard to Ps, treating Pd as a constant is expected to be adequate because our focus is on the statistics of the repertoire rather than specific 
TCR heterodimers.
Lastly, the T early
promoter situated upstream of the J
segments appears to target primary rearrangements to the 5' end of that locus, but there is also evidence that a second cis-regulatory element initiates rearrangements at a point further downstream (13). This possibility can be incorporated into the model by using in place of g(n,k,W) the weighted average
 | (11) |
where Pu is the probability of initiating rearrangement at the upstream targeting element, Wd is the number of gene segments in the downstream window, and nd is the index of the first gene in the downstream window. The downstream promoter targets rearrangements to J
49-J
45 (13), which corresponds to Wd = 5 and nd = 13. In Eq. 11, the first term, which is weighted by Pu, accounts for the fraction of rearrangements to gene segment n starting from the V-proximal end of the J
locus as in the original model. The second term, which is weighted by 1 Pu, adjusts for the primary rearrangements to the downstream window. The function h(n,k) counts these targeting events; it is 1 if k = 1 and n is inside the downstream window and 0 otherwise. The sum counts paths that start in the downstream window and end at gene segment n. The arguments to the function g in this case can be understood as follows. The first, n nd i 1, is the number of gene segments between the gene segment of interest (n) and the initiation point for rearrangement (the i-th gene segment of the downstream window), which corresponds to shifting the gene segment indices to count from the initiation point. The second argument, k 1, is the number of secondary rearrangements, which reflects the fact that the primary rearrangement is already determined. The third argument is simply the window size, which we take to be the same for rearrangements starting from either cis-regulatory element. For this same reason, one factor of W/Wd is necessary to correct for the size of the primary rearrangement window, which enters through the composite probabilities a and b in Eqs. 510.
Based on the areas of the peaks in Fig. 3 in Ref. 13 , we estimate Pu to be
0.7. With this choice, we recalculated the curves in Fig. 3 (data not shown) and found that the modified model yields WJ = 10 gene segments, which is somewhat smaller than our original estimate (more generally, WJ increases with Pu until the original model is recovered for Pu = 1). As a result, secondary rearrangements tend to be shifted upstream slightly; primary rearrangements are shifted downstream slightly due to the second cis-regulatory element. Similar agreement with the data is obtained overall. Specifically, the predictions for V
20S1 in Table I and V
6 in Fig. 4a are improved (compare with inset), those for V
19 in Table I and Fig. 4b are a bit poorer (data not shown).
Choice of parameters
There are five parameters in the model: the V
and J
window sizes (WV and WJ), the probability that rearrangements are productive (Pr), the probability of positive selection following productive rearrangements (Ps), and the maximum number of rearrangements per allele (Nr). To estimate values for WV and WJ from a relatively small amount of data, we assume values for Pr and Ps. We take Pr to be 0.3; this choice is somewhat less than the one-in-three chance that a rearrangement is in-frame to account for pseudogenes and the possibility of generating stop codons (14). There is little information from experiments to guide the choice of Ps, the probability of selection following productive rearrangement. We expect it to be small because <5% of thymocytes are positively selected (Ref. 15 , and references therein). Based on this fact, we take Ps to be 0.03, which yields an overall selection probability of 1 (1 0.3 x 0.03)5 x 100% = 4.4% given the choices of Pr and Nr (discussed below). Nearly identical results were obtained with PrPs values ranging over an order magnitude.
Data from mice incapable of reintroducing recombination activating gene (RAG)3 at the double-positive stage suggest that
35% of the T cell repertoire is formed by primary
-chain rearrangements (16) (but see Fig. 5 and associated discussion below). Assuming constant Pr and Ps and that the number of gene segments is not limiting, it is straightforward to show that this percentage is theoretically
 | (12) |
This statistic is not very sensitive to the values of Pr and Ps, but it does depend strongly on Nr since it is in the exponent. Reasonable values for P1° in the range 2034% are obtained with 4
Nr
6. We take Nr = 5, which is consistent with other estimates (15).

View larger version (11K):
[in this window]
[in a new window]
|
FIGURE 5. Comparison of calculated (gray and white) and observed (black) J use in mice incapable of reintroducing RAG at the double-positive stage. We simulated this system by assuming that following each round of rearrangement there is a 70% probability of being incapable of further rearrangements due to insufficient RAG expression; gray contributions are from primary rearrangements, and white contributions are from secondary rearrangements. Experimental data is from Ref. 16 .
|
|
 |
Results
|
|---|
In our model for TCR
rearrangement, the degree to which a set of gene segments (V or J) is used sequentially is encoded in a single parameter that describes the size of the window of accessible gene segments (Fig. 1). Using this idea, we derive in Materials and Methods an analytic expression for the number of possible rearrangement "paths" leading to a selected gene configuration (Eq. 4), as well as probabilities for observing specific V
and J
gene segments and their combinations on selected cells (Eqs. 510). Here, we estimate the V
and J
window sizes (WV and WJ, respectively) from the average separations of segments on different alleles in selected cells and then show that, without further adjustment, these parameters yield good agreement with independent experimental data on V
and J
use in the overall TCR repertoire.
Separation of gene segments on different alleles
To estimate the numbers of V
and J
segments available for rearrangement at any given time, we calculate the average separation between like types of segments on different alleles in selected cells as a function of window size (Fig. 3). Consistent with intuition, the separation between segments on different chromosomes increases monotonically with W. In other words, the correlation between alleles decreases as use becomes less sequential.
For the 61 mouse J
genes, the experimentally measured average separation is 7.1 (SD 6.7), and, for the 58 human V
genes, it is 13.8 (SD 9.3) (6). Given these data, we can read the V
and J
window sizes off Fig. 3. These averages and SDs correspond to WJ = 12 or 15 and WV = 38 or 31, respectively (Fig. 3); the fact that the averages and SDs yield relatively close estimates for the window sizes suggests that the large SD values observed are inherent to the rearrangement process rather than due to experimental uncertainty.
From the ranges above, we chose WJ = 13 and WV = 37 because the mean generally converges more quickly than the second moment of the distribution and the raw data are quite limited. These values confirm the idea that rearrangement of the J
segments is less random (more sequential) than that of the V
segments. The value WJ = 13 is also consistent with the observation that mice that lack the T early
promoter are unable to use the 10 most proximal J
segments (5), which suggests a window size of about that number of gene segments.
Due to the limited amount of data, we assume that our values of WJ and WV are common to mice and humans, which appears to be justified given the remarkably good agreement with experiment that we obtain below. Taking the window sizes to be the same in mice and humans also reconciles seemingly conflicting data on V
segment use (M. Krangel, unpublished observation). Because 37 gene segments represent roughly 65% of the human but only 35% of the murine V
segments, the former appear to be used randomly while the latter appear to be used sequentially.
J
distributions for particular V
genes
We now fix the five parameters in the model at the values estimated above (Pr = 0.3, Ps = 0.03, Nr = 5, WJ = 13, and WV = 37) and compare calculated V
J
pair frequencies with measured ones (8, 17, 18, 19). We begin by considering the data from Ref. 8 for J
segments paired with V
6 (located at the J-proximal end of the V
locus) and V
19 (at the distal end). For these extreme V
segments, the model and experimentally observed probabilities agree well (Fig. 4). The estimated frequency for V
6-J
48 pairs is very sensitive to whether targeting by the cis-regulatory element at J
49 is considered because almost all of these pairs derive from primary rearrangements and J
48 falls in the downstream window but not the upstream one. The model predictions compare favorably with the experimental data for V
19 and V
20S1 (located at the proximal end of the locus) studied by Huang and Kanagawa (17) as well (Table I). Again, including the downstream initiation site improves the estimates for pair frequencies involving the proximal V
segment, V
20S1 (data not shown).
Although the same qualitative trends were observed in Ref. 18 , the model cannot reproduce the reported J
use in detail due to the fact that the distributions are not unimodal. However, it is important to note that these data are putatively for individual members of the V
2 superfamily, which are difficult to conclusively identify, as noted by Huang and Kanagawa (17). Additional data for this superfamily was obtained recently for mice with only a single functional J
segment and control animals (19). In this case, reasonable agreement with calculated frequencies for the model in which the ends of the windows are restricted to fixed points along the locus (see Fig. 2), but the model exhibits a greater bias toward J
-proximal V
segments than was observed (data not shown).
Regulation of secondary rearrangements
The quasisequential scheme on which our model is based is consistent with the results of knockout experiments directed at elucidating the factors that regulate secondary rearrangements. In mice that are unable to reinduce RAG expression at the double-positive stage because they lack a necessary regulatory element, residual RAG from rearrangement of TCR
at the double-negative stage catalyzes only limited rearrangement of the TCR
locus. J
use is restricted to the 5' (proximal) segments in T cells from these mice (16). To simulate these experiments, we performed simulations of our model in which there was a large probability of losing residual RAG and stopping rearrangement following each round of rearrangement. We found that probabilities between 60 and 80% give good agreement with the experimental data (Fig. 5). Although there are data suggesting that the half-life of RAG is short (
10 min), they were obtained for the destruction of RAG at the G1-S phase of the cell cycle (20); that following the formation of the
-chain could be slower. In any event, some RAG must be present in these cells to account for the observed
-chain rearrangements.
 |
Discussion
|
|---|
In this study, we present a model for TCR gene segment use in which the degree that rearrangement is sequential is determined by the size of a window of accessible gene segments. The model is based only on this notion and the fact that intervening gene segments are deleted when V
and J
are brought together. We deduced the window sizes from data on the correlations between the two V
or J
genes used in selected cells, and showed that without further adjustment, these parameters yield statistics in good agreement between the model and almost all available data on TCR
gene segment use. Although the model cannot reveal the detailed molecular mechanism, it strongly indicates that V
use is quasisequential, which reconciles seemingly conflicting data for mice and humans.
The model is exactly solvable and provides the first expressions that can be used to extract information directly from data on gene segment use in lymphocytes. Previous theoretical studies were limited to simulations (numeric "experiments") (21, 22) and focused on the use of L chain segments in B cells, in particular J
. There are only four functional such gene segments, which are used with a slight bias toward the two more proximal segments. By either assigning probabilities to each of the four (21) or varying the ratio of the likelihoods of choosing each gene segment and the one immediately upstream (22), it was found that quasisequential use best explains the available J
data as well.
One consequence of the mechanism identified is that the most distal V
segments are incapable of pairing with the most proximal J
segments and vice versa, consistent with Refs. 23 and 24 . Although we calculate that 70% of mouse V
J
and 58% of human V
J
pairs are expressed at frequencies within an order of magnitude of the uniform distribution (with the differences arising from the fact that the V
locus in humans is roughly twice the size of that in mice),
4% of V
J
pairs are incapable of forming in both species. Thus, the use of specific V
J
pairings can vary dramatically depending on their chromosome locations, and care must be used in extrapolating statistics from experiments specific to particular TCR, as discussed in Appendix A.
In general, as rearrangement becomes increasingly sequential, there is a tradeoff between diversity and cell conservation in the thymus. Consequently, it is natural to ask whether the diversity of the TCR repertoire is limited significantly by the quasisequential nature of rearrangement. If rearrangement were totally uniform, every V
J
pair would be present in the repertoire with probability 1/NVNJ, where NV and NJ are the total numbers of V
and J
genes. To compare different mechanisms, we used our model to calculate the SD of pairing frequencies for all possible values of WV and WJ (Fig. 6). A lower SD corresponds to a more uniform distribution. Interestingly, the window sizes estimated from the experimental data (Fig. 3) fall close to the bottom of the very shallow basin around the minimum (Fig. 6). Large SDs are obtained for either a perfectly sequential model (WV = 1 and WJ = 1) or a random deletional one (WV = 104 and WJ = 61) because gene segments from the proximal and distal ends, respectively, tend to be used disproportionately. These calculations lead us to speculate that the quasisequential rearrangement mechanism that we identified evolved to maximize the diversity of the repertoire.
 |
Acknowledgments
|
|---|
We thank Martin Weigert for helpful discussions and critical reading of the manuscript and Barry Sleckman for providing data on V
2 use in advance of publication.
 |
Disclosures
|
|---|
The authors have no financial conflict of interest.
 |
Appendix 1
|
|---|
Corrections for phenotypic allelic inclusion frequencies
Roughly one-quarter of mature 
T lymphocytes have productive rearrangements at both their TCR
-chain gene loci (genotypic allelic inclusion) (10 ), but not all of these cells express two Ag receptors on their surfaces (25 26 27 28 29 ). Dual TCR cells can thus serve as windows to the mechanisms that regulate T cell surface expression in general and, in turn, how events in the lives of T cells can lead to variations in molecular populations underlying autoimmunity (11 ). To determine the factors that influence phenotypic allelic exclusion, it is important to be able to quantitate its extent accurately. Unfortunately, only a few mAbs for specific V
protein segments are available, so it is necessary to extrapolate from limited FACS data to estimate the total fraction of mature T cells that express two TCR on their surface. In this Appendix, we use our model for TCR V
J
use together with a brief additional counting argument to improve means for interpreting such experiments.
Correction for overcounting
To determine the extent of phenotypic allelic inclusion in a population of T lymphocytes, FACS is used to count the number of cells that bind reagents specific for two different V
protein segments. Typically, these data are then used to calculate the fraction
 | (13) |
where Ni is the number of cells that are V
i+, Nij is the number of cells that are V
i+/V
j+, and N is the total number of cells. Although often quoted as such (26 27 28 29 ), fij is not the frequency of phenotypic allelic inclusion in the total population (F). Rather, it is an approximation for the fraction of V
i+ and V
j+ cells which display two receptors at the cell surface (in other words, fij
fi and fj, respectively). To relate fij to F, it is necessary to avoid double-counting cells that express two different V
protein segments.
The experiments sort cells only according to V
, which effectively averages over J
, CDR, and the
-chain. Consequently, it is not unreasonable to assume fij
fi
fj
f is essentially the same for all V
(discussed below). Then, denoting the number of V
i+ cells that express two TCR by di, we can write for each V
protein segment an equation of the form fNi = di. Summing over segments,
 | (14) |
The sums above count dual TCR cells with different V
twice since Nij contributes to Ni, Nj, di, and dj. Denoting the total number of dual TCR cells by d and the number of such cells with the same V
on both alleles by ds,
 | (15) |
Substituting the expressions in Eq. A3 into Eq. A2 and solving for F = d/N, we find
 | (16) |
The second (approximate) equality follows from the fact that ds is expected to be much smaller than N (using the model described in the main text ds/N
0.003), which allows neglect of the term in square brackets. Eq. A4 thus provides a practical means of estimating the fraction of mature T cells that express two TCR on their surfaces from a measurable quantity (f
fij, but see below).
Correction for biases in gene rearrangement
V
gene segments paired in dual TCR cells are weakly correlated (6 ). As discussed in the main text, such biases in the repertoire come from quasisequential use of the two sets of V
gene segments at the same rate. In particular, application of Eq. 10 shows that the fraction of dual TCR cells observed depends on the separation of the two V segments detected (Fig. A1). In many cases, fij as computed from Eq. A1 will be a poor approximation for f. Here, we show how our model of TCR V
J
rearrangement can be used to mitigate the bias that quasisequential deletional rearrangement introduces to interpretation of measured fij.
To this end, we use the model with the parameters given in the main text to calculate the fraction of dual TCR cells as a function of V
gene segment position and the calculated value of fij:
 | (17) |
where P(i,j) is given by Eq. 10 and P(i) is computed as explained in Appendix B. Note that P(i) includes both single and dual TCR cells. We then scale the measured fij by the ratio of fijcalc and its average value in the model (
ficalc
, where the average is weighted by the number of V
i+ cells) to obtain a corrected fraction for use in Eq. A4:
 | (18) |
Basing this correction on our model for TCR V
J
rearrangement implicitly assumes that the extent of phenotypic allelic inclusion is directly proportional to the extent of genotypic allelic inclusion. Although this cannot be wholly the case, it is not an unreasonable approximation given, as discussed above, the effective averaging over J
, CDR, and the
-chain.
One feature of Eq. A6 that might be confusing to the reader is that fi is a property of one gene segment, but fij is clearly a property of two. However, fij as defined by Eq. A1 (26 27 28 29 ) is an estimate for fi
fj. Eq. A6 is used to correct for cases when this is a poor estimate due to the separation between gene segments used in the experiment. It is worth noting that the error in f could thus be reduced by instead staining for one V
protein segment and CD3 as in Ref. 25 . Because the ratio of total TCR to CD3 is roughly constant, cells with a low ratio of CD3 to the V
protein segment studied express only that V
, while those with a high ratio express another V
as well.
The discussion immediately above leads to a related point. Given the dependence of fij on the separation in position of gene segments V
i and V
j, it is natural to wonder whether the assumption above that fij
f is essentially the same for all V
protein segments is appropriate. In deriving Eq. A4, the f used represents the actual fraction of dual TCR cells as a function of V
gene segment position, not the fraction estimated from a specific pair of V
i and V
j. Due to averaging over V
j, fi calculated with Eq. A5 is much flatter than P(i,j) calculated with Eq. 10 (Fig. A1, inset). The assumption that fi is essentially the same for all V
is thus reasonable. Moreover, it improves as the number of rearrangements becomes larger, although a kink persists at the transition from primary to secondary rearrangement.
Total fraction of phenotypic dual TCR cells
We use the corrections in Eqs. A4 and A6 to re-evaluate previously published data. The reagents used were for the V
2, V
8, and V
11 gene families. In these cases, P(i,j) is the probability that any gene segments from family i is paired with any member from family j and P(i) is the probability that any member of gene family i is used. Generally speaking, these gene families are distributed throughout the chromosome; the V
2 family is slightly biased toward the 3' end whereas the V
11 family is slightly biased toward the 5' end. The V
8 family is nearly evenly distributed. Thus, it is to be expected that V
2/V
8 and V
8/V
11 experiments need little correction, while V
2/V
11 experiments significantly underestimate the fraction of dual TCR cells. In agreement with this expectation, the calculated correction factors for experiments with V
2/V
8, V
8/V
11, and V
2/V
11 are 1.02, 1.03, and 1.86, respectively. Because the experiments overestimate the fraction of dual TCR cells by nearly a factor of 2 due to overcounting, the data from V
2/V
11 experiments remains nearly the same after applying both corrections. However, since no genotypic bias was introduced in V
2/V
8 and V
8/V
11 experiments, there was significant overestimation of the fraction of dual TCR cells (Table AI). The separation of V
2 and V
11 and the consequent biases in their statistics are likely to account for the fact that the same authors generally found a higher fraction of dual TCR cells in V
2/V
8 and V
8/V
11 experiments with the exception of two aberrant data points (1821% in Ref.28 ; 31.0% in Ref.27 , the latter of which may have been due to gating on CD8). Thus, the corrected statistics suggest that the overall rate of phenotypic allelic inclusion is 211%, which supports the idea that posttranslational control mechanisms regulate TCR surface expression (see Refs. 11 , 30 , and 31 for discussion).
 |
Appendix 2
|
|---|
Normalization for dual TCR cells
We explicitly considered only one allele in deriving Eq. 6. However, it is important to treat both chromosomes to normalize the probabilities in Eq. A5 consistently. Specifically, in Eq. A5, for the probability of a cell expressing a receptor which uses the mth gene segment regardless of whether it is a single or dual TCR cell, we have
 | (19) |
The first term represents paths in which both alleles have rearranged to the mth gene segment and at least one of them is in-frame. The second term represents paths in which only one of the two alleles has rearranged to the mth gene segment, in which case that allele must be in-frame and the other can be either in- or out-of-frame. The aggregate probabilities are thus
 | (20) |
with b given by Eq. 9. Finally, we set the normalization Q such that
 | (21) |
 |
Footnotes
|
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
1 A.W. was supported by a National Science Foundation graduate research fellowship. 
2 Address correspondence and reprint requests to Dr. Aaron R. Dinner, The University of Chicago, Gordon Center for Integrative Science, 929 E. 57th Street, Chicago, IL 60637. E-mail address: dinner{at}uchicago.edu 
3 Abbreviation used in this paper: RAG, recombination activating gene. 
Received for publication March 24, 2006.
Accepted for publication June 29, 2006.
 |
References
|
|---|
- Krangel, M. S., J. Carabana, I. Abbarategui, R. Schlimgen, A. Hawwari. 2004. Enforcing order within a complex locus: current perspectives on the control of V(D)J recombination at the murine T cell receptor
/
locus. Immunol. Rev. 200: 224-232. [Medline] - Thompson, S. D., J. Pelkonen, R. Marja, S. Jacqueline, J. L. Hurwitz. 1990. Nonrandom rearrangement of T cell receptor J
genes in bone marrow T cell differentiation cultures. J. Immunol. 144: 2829-2834. [Abstract] - Thompson, S. D., J. Pelkonen, J. L. Hurwitz. 1990. First T cell receptor
gene rearrangements during T cell ontogeny skew to the 5' region of the J
locus. J. Immunol. 145: 2347-2352. [Abstract] - Petrie, H. T., F. Livak, D. Burtrum, S. Mazel. 1995. T cell receptor gene recombination patterns and mechanisms: cell death, rescue, and T cell production. J. Exp. Med. 182: 121-127. [Abstract/Free Full Text]
- Villey, I., D. Caillol, F. Selz, P. Ferrier, J. P. de Villartay. 1996. Defect in rearrangement of the most 5' TCR-J
following targeted deletion of T early
(TEA): implications for TCR accessibility. Immunity 5: 331-342. [Medline] - Davodeau, F., M. Difilippantonio, E. Roldan, M. Malissen, J. L. Casanova, C. Couedel, J. F. Morcet, M. Merkenschlager, A. Nussenzweig, M. Bonneville, M. Malissen. 2001. The tight interallelic positional coincidence that distinguishes T-cell receptor J
usage does not result from homologous chromosomal pairing during V
J
rearrangement. EMBO J. 20: 4717-4729. [Medline] - Guo, J., A. Hawwari, H. Li, Z. Sun, S. K. Mahanta, D. R. Littman, M. S. Krangel, Y. W. He. 2002. Regulation of the TCR
repertoire by the survival window of CD4+CD8+ thymocytes. Nat. Immunol. 3: 469-476. [Medline] - Pasqual, N., M. Gallagher, C. Aude-Garcia, M. Loiodice, F. Thuderoz, J. Demongeot, R. Ceredig, P. N. Marche, E. Jouvin-Marche. 2002. Quantitative and qualitative changes in V-J
rearrangements during mouse thymocytes differentiation: implication for a limited T cell receptor
chain repertoire. J. Exp. Med. 196: 1163-1173. [Abstract/Free Full Text] - Bosc, N., M. P. Lefranc. 2004. The mouse (Mus musculus) T cell receptor
(TRA) and
(TRD) variable genes. Dev. Comp. Immunol. 27: 465-497. - Malissen, M., J. Trucy, E. Jouvin-Marche, P.-A. Cazenave, R. Scollay, B. Malissen. 1992. Regulation of TCR
and
gene allelic exclusion during T-cell development. Immunol. Today 13: 315-322. [Medline] - Warmflash, A., M. Weigert, A. R. Dinner. 2005. Control of genotypic allelic inclusion through TCR surface expression. J. Immunol. 175: 6412-6419. [Abstract/Free Full Text]
- Chowdhury, D., R. Sen. 2001. Stepwise activation of the immunoglobulin mu heavy chain gene locus. EMBO J. 20: 6394-6403. [Medline]
- Hawwari, A., C. Bock, M. S. Krangel. 2005. Regulation of T cell receptor
gene assembly by a complex hierarchy of germline promoters. Nat. Immunol. 6: 481-489. [Medline] - Mason, D.. 1994. Allelic exclusion of
chains in TCRs. Int. Immunol. 6: 881-885. [Abstract/Free Full Text] - Mason, D.. 2001. Some quantitative aspects of T-cell repertoire selection: the requirement for regulatory T cells. Immunol. Rev. 182: 80-88. [Medline]
- Yannoutsos, N., P. Wilson, W. Yu, H. T. Chen, A. Nussenzweig, H. Petrie, M. C. Nussenzweig. 2001. The role of recombination activating gene (RAG) reinduction in thymocyte development in vivo. J. Exp. Med. 194: 471-480.