Abstract
The nature of CD8^{+} T cell memory is still incompletely understood. We have previously reported that the response to an HLAA2restricted influenzaderived peptide results in a complex T cell repertoire. In this study we extend this analysis and describe the repertoire with more rigor. In one individual we defined 141 distinct T cell clonotypes on the basis of the unique DNA sequence of the third complementaritydetermining region of the TCR βchain. The frequency distribution of the clonotypes is not what is expected of a normal distribution but is characterized by a large lowfrequency tail. The existence of a complex population indicates a mechanism for maintaining a large number of Agspecific clonotypes at a low frequency in the memory pool. Ranking the clonotypes allowed us to describe the population in terms of a power lawlike distribution with a parameter of decay of ∼1.6. If the repertoire is divided into subsets, such as clonotypes that use BJ2.7 or those whose third complementaritydetermining region encodes the amino acid sequence IRSS, the clonotype frequencies could also be described by a power lawlike distribution. This indicates a self similarity to the repertoire in which smaller pieces are slightly altered copies of the larger piece. The power lawlike description is stable with time and was observed in a second individual. The distribution of clonotypes in the repertoire could be mapped onto a polygonal spiral using a recursive algorithm. Self similarity, power laws, and recursive mapping algorithms are associated with fractal systems. Thus, Agspecific memory CD8 T cell repertoires can be considered as fractal, which could indicate optimized flexibility and robustness.
The manner in which naive T cells give rise to the memory repertoire, how the repertoire is sustained, and how it changes in time defines a complex system (1). Upon initial antigenic challenge, the responding repertoire is strongly dependent on the size of the naive precursor pool before immunization (2) and the avidity of the TCR for the AgMHC (3). As the level of Ag decreases, the number of responding cells decreases, but the relative changes in clonotype frequency resulting from the initial stimulation are maintained (4). Upon restimulation there is considerable conservation of the primary repertoire (5, 6), but new additions can be identified. Based on the hypothesis that high avidity T cells will be selectively expanded, it would be predicted that following repeated challenges with Ag, a memory repertoire would contain multiple copies of T cells expressing a limited number of TCR. There have been reports that the repertoire focuses on repeated stimulation (7, 8, 9). Although this would provide an efficient memory response, it might do so at the risk of facilitating pathogen evasion in which small changes in the Ag lead to loss of recognition by the focused repertoire.
Memory T cell repertoires generally represent the culmination of a memory generating process that will be responsible for responding to Ags during the lifetime of an individual. Such repertoires should be highly optimized and generally fault tolerant as they function efficiently on a time order of decades. The analysis of such repertoires should provide important and perhaps novel insights into how robust longterm immunity is generated.
We have undertaken an analysis of the memory compartment responding to a relatively stable Ag from influenza to derive as complete a picture as possible of the complexity of the memory repertoire. An adult blood donor was identified displaying a robust recall (memory) response to an influenza A virusderived antigenic peptide (M1_{5866}) presented by the class I MHC molecule, HLAA2. Preliminary repertoire analysis indicated a significant repertoire complexity (10) as defined by the nucleotide sequence of the third complementaritydetermining region (CDR3)^{4} of the TCR βchain. This response is characterized by the extensive use of T cells expressing the BV17 TCR gene and a restricted amino acid sequence motif in the CDR3 of the TCR βchain (11, 12). As the CDR3 interacts with the antigenic peptide, a restricted motif is indicative of Agdriven selection. The precursor frequency of T cells specific for this viral peptide can be very high as measured by MHC class I tetramer staining (13, 14, 15) and by clonotypespecific hybridization (10).
In this study we further analyze the nature of the responding repertoire, investigating the clonotype distribution as defined by the rankfrequency relationship. Our rankfrequency analysis of the repertoire indicates that the distribution of clonotypes does not follow the commonly expected normal distribution. Rather, a large part of the repertoire can be described in a rigorous mathematical manner by a power law. Such distributions are characterized by an equilibrium of larger numbers of infrequent members and smaller numbers of more frequent members. Furthermore, structurally defined subsets of the repertoire have distributional properties similar to those observed for the entire repertoire. This self similarity indicates that the repertoire possesses fractal characteristics. The implications of a complex fractal repertoire are discussed in terms of the number and nature of structural parameters important for Ag recognition. A model for generating a repertoire similar to that observed in this study is proposed.
Materials and Methods
Generation of M1_{58–66}/HLAA2specific CTL
PBMC were collected from buffy coat from the first HLAA2.1 individual (individual A), and triplicate CTL lines were established. Another CTL line was generated from PBMC collected 23 mo after those used for the first analysis. PBMC were cultured at a concentration of 1 × 10^{6} cells/ml in the presence of influenza A matrix peptide M1_{58–66} (0.5 μg/ml) and recombinant IL2 (10 U/ml). Restimulation used HLAA2.1matched B cell lines, M1_{58–66} peptide, and IL2 at the same concentrations as the starting stimulation. Specific cytotoxicity was measured starting from wk 3 using a standard ^{51}Crrelease assay (16). PBMC collected from the buffy coat of a second individual (individual B) were treated with Miltenyi CD8 microbeads (Miltenyi Biotec, Auburn, CA) to isolate CD8^{+} T cells. The purity of isolation exceeded 95%. The CD8 cells were stimulated with T2 cells (defective for TAP1 and TAP2) that had been pulsed with M1_{58–66} peptide (1 μM) for 24 h and washed and irradiated (3000 rad) before use. CD8^{+} cells were cultured at a concentration of 0.25 × 10^{6} cells/ml in the presence of 0.5 × 10^{4} Agpulsed irradiated T2 cells, recombinant IL2 (10 U/ml), and 14% supernatant from the IL2producing cell line MLA 144 (17). Cells were restimulated weekly under the same conditions as the initial stimulation.
RNA isolation and cDNA synthesis
Unseparated and CD8^{+} selected cells were used for RNA isolation and cDNA synthesis as described elsewhere (10).
PCR product cloning, sequencing, and colony counting
cDNA synthesized from individual A was PCR amplified with a BV17 familyspecific primer containing a PstI restriction site and a Cβspecific primer containing a BamHI restriction site. Amplification products were cloned into the corresponding sites of the plasmid vector pGEM3Zf^{+}10), a dilution of the cDNA was used for the cloning PCR, and the cycle number was below saturation for the PCR to ensure elongation from the primer at the final round, thus guarding against heteroduplex formation.
Sequencing used the Taq
Taq and/or reverse transcriptase.Statistical analysis and modeling
Clonotype distribution.
To display a clonotype distribution, frequencies of each clonotype in each culture are plotted in descending order.
Rankfrequency summary.
To analyze a clonotype distribution, we used a rankfrequency summary, an approach that assigns each clonotype a rank based on its frequency (colony count). Rank 1 consists of clonotypes observed as single copies, rank 2 consists of those observed twice, etc. The grouping of clonotypes with similar frequencies into ranks permits further analysis of the clonotype distribution data as the rank variable can be enumerated. If the rankfrequency summaries show extensive representation of low ranks with a rapid concave decay toward a few clonotypes in the higher ranks, it can be described by a power law equation, y = a/x ^{b}, where x is the rank and y is the frequency for a given rank.
Loglog transformation.
In the simplest situation, plotting a log/log transformation of the data (log y = log a − b log x) should yield a straight line. The parameter a, derived from the intercept, indicates the frequency of observing single copy clonotypes, and parameter b, derived from the slope, describes the shape of the curve by indicating how rapidly the curve decays. To normalize a rankfrequency summary, we used a loglog transformation with the natural log base. For visualization purposes we used a base chosen as the maximum value of both rank and frequency to ensure the [0, 1] boundaries in the loglog plots.
Separation of a loglog rankfrequency curve into two portions.
To achieve the best fit, we separated the portion of loglog rankfrequency summaries with a rapid decay (which represents clonotypes with low frequencies such as singletons, doublets, etc.) from the portion where a few clonotypes were represented by a large number of copies. We assumed that this separation occurs at some critical point, x_{c}, which corresponds to the rank the frequency of which is the reciprocal of the number of different clonotypes, e.g., x_{c} = exp((log N + log a)/b), where N is the number of unique clonotypes.
Estimation of parameters.
The estimates of a slope and an intercept for the first portion of loglog rankfrequency plots were obtained iteratively using a Monte Carlo approach. The estimation starts with an initial value of a_{0} = y_{1}, b_{0} = 1 − a_{0}, x_{c0} = exp((log N + log a_{0})/b_{0}). The estimation progresses by allowing for a random walk for the a parameter with a small increment of 0.0001. Then, at each iteration, j, for each rank, i, we assign new values for a b parameter. For the first portion of the curve, where x_{i} ≤ x_{cj}, b_{j} is the average of the rate of decay, with b_{j} = Σ(log (a_{j}/y_{i})/log x_{i})/M, where M is the number of ranks, and x_{cj} is the current value of x_{c} at the j iteration. For the second portion, when x_{i} > x_{cj}, b_{j} = log(a_{j} N)/log x. At each iteration, for a given rank, i, we calculated the difference (residual) between the observed frequency, y_{i}, and expected frequency, y_{i} ∗. The expected frequency was calculated as follows: y_{i} ∗ = a _{j}/x_{i} ^{bj}. We then minimize the sum of squared residuals, R = Σ(y_{i} − y_{i} ∗)^{2}. The estimates of the a and b parameters, at which R achieves its minimum, were reported in the text and used for plotting superimposed predicted curves on rankfrequency summaries and loglog plots.
To assess the error associated with estimating the parameter values, we performed a bootstrapping procedure that consisted of the following steps: 1) We generated a synthetic data set by expanding the original data set according to our proposed model. 2) A random sample was chosen from this data set. The sample size corresponded to the same number of colonies that defined the original data set. The parameters a and b for this sample were estimated as described above. 3) Repeating this random sampling and parameter estimation procedure 1000 times provided us with a distribution of parameter values from which the SD for the parameters was estimated.
All analysis was performed using SPLUS statistical software (Insightful, Seattle, WA).
Results
Analysys of the memory CD8^{+} T cell repertoire in the M1_{58–66} response
To analyze the memory CD8^{+} T cell repertoire in the M1_{58–66} response, triplicate cultures were generated from the peripheral mononuclear cells of individual A. The cultures showed cytotoxicity specific for the Ag and HLAA2 by the third in vitro stimulation (10). At this point, T cells expressing the BV17 were the predominant species in the cultures. The unit of measure of the repertoire was the clonotype, represented by the unique DNA sequence that encodes the CDR3. The clonotypes present in each culture were identified by the BV17specific PCR of the CDR3, bacterial cloning of the amplified CDR3, and DNA sequencing. The number of bacterial colonies representing identical clonotypes was counted and used as a measure of clonotype frequency. The PCR was performed using a dilution of the cDNA to ensure that no species would be saturated. A strong correlation between the clonotype frequency, as counted by bacterial colonies, and relative frequency, based on hybridization with clonotype and V regionspecific probes (10), supports the use of the bacterial counting procedure.
The clonotype distribution in the recall repertoire of individual A
The clonotype distribution provides important insights into the origin and nature of the memory repertoire. Models in which avidity drives repertoires to focus would yield a few expanded clonotypes. Alternatively, models in which avidity based expansion is not as important would result in a larger number of less expanded clonotypes. Analysis of the three cultures from individual A identified 501 bacterial colonies containing TCR inserts, and these were accounted for by 141 unique clonotypes (data available at www.bloodctrwise.org/research/faculty/Gorski/gorskinsf.html.). When the frequency of each clonotype in each culture is plotted in descending order (Fig. 1⇓, A–C), a striking characteristic is the extensive tail of single copy clonotypes. A large part of the single copy tail in each culture represents clonotypes unique to that culture alone (Fig. 1⇓, filled bars). All three cultures showed this particular clonotype distribution. As expected, combining the data also showed a similar clonotype distribution profile (Fig. 1⇓D).
The rankfrequency distribution of the clonotypes can best be described as composed of two components, the first of which can be fit to a power law
The data shown in Fig. 1⇑ describe the number of times each clonotype was observed. To further analyze the frequency distribution of the clonotypes, we assigned each clonotype a rank based on its frequency (colony count). Grouping of clonotypes with similar frequencies into ranks allows manipulation of the distribution data, as the rank variable can be enumerated. There is an underlying assumption that clonotypes with equal ranking are related. In this case the relation is that they have expanded equally in response to Ag. The relative frequency of clonotypes for each rank is plotted in increasing rank order (Fig. 2⇓). This shows extensive representation of low ranks with a rapid concave decay toward a few clonotypes in the higher ranks. The repertoire distribution can be best described by treating it as composed of two components. Separation of the curve into two portions at a critical point, x_{c}, allows for a good fit of the first portion to a power law that takes the following form: frequency = a/rank ^{b}. This first portion includes the ranks represented by a high number of clonotypes, i.e., the extensive single copy tails from Fig. 1⇑. Applying a power law fit to the first portion provides the following estimates of the parameters for the three cultures, respectively: a = (0.63 ± 0.05, 0.60 ± 0.06, 0.68 ± 0.05); b = (1.77 ± 0.15, 1.64 ± 0.23, 1.82 ± 0.13). For all three samples the estimates are very reproducible (a = 0.64 ± 0.04; b = 1.74 ± 0.09) and similar to the estimates for the combined data set as follows: a = 0.60 ± 0.03, b = 1.67 ± 0.07 (Fig. 2⇓, inset). For all three samples and the combined data set, the power law describes the distribution of the clonotypes in the repertoire. It does so better than other possible descriptions because of the interpretability of the parameters a and b. This is in contrast to other possible descriptions in which parameters such as mean and variance are hard to interpret with respect to the data set.
Distribution analysis to structural properties of the repertoire
One implication of a system in which the distribution can be described in terms of power laws is that the system should be self similar (18). The implication would be that no matter how we dissect the repertoire the distribution should be similar to that observed for the whole repertoire. The repertoire can be dissected on the basis of the structural properties of the clonotypes. VB gene usage is a structural property of a repertoire that is often measured in such a manner. Because we limited our analysis to clonotypes using the BV17 gene, determining the distribution of BV gene usage is not a possibility. Another such property is the CDR3 amino acid sequences in the four key positions (2, 3, 4, 5) that define this response. The CDR3 amino acid sequence use among all the clonotypes shows a power lawlike distribution (Fig. 3⇓A). There are 61 examples of CDR3 sequences, and the distribution of these sequences can be estimated assuming a power lawlike distribution with parameters a = 0.72 ± 0.04 and b = 1.90 ± 0.17. Another property, the usage of BJ regions, also shows a similar distribution, with a number of J regions occurring infrequently and one J region, BJ2.7, occurring very frequently (62% of all occurrences). We also analyzed the CDR3 amino acid sequence usage within the subset of clonotypes using BJ2.7. These showed a distribution similar to that for the entire repertoire (a = 0.61 ± 0.07, b = 1.51 ± 0.24, n = 22). These data indicate that structural properties of the repertoire show similar distributional characteristics to clonotype distributions in the repertoire.
The clonotype distribution of structurally defined subsets of the repertoire is similar to that of the entire repertoire
We then determined the clonotype distribution of a number of repetitive subsets based on structural characteristics. When the portion of the repertoire that uses the BJ2.7 region (n = 89) is analyzed, a power law rankfrequency relationship (Fig. 3⇑B) (a = 0.57 ± 0.04, b = 1.52 ± 0.10) is observed that is similar to that of the entire repertoire (Fig. 1⇑D). Similar distributions of clonotypes are obtained if one examines that part of the repertoire in which the CDR3 encodes IRSS (Fig. 3⇑C). This case is of interest because all IRSS clonotypes identified in this individual were also BJ2.7. Thus, these represent T cells with identical TCR βchains that have arisen from independent rearrangement events. Of the 35 clonotypes in this data set, the parameters a = 0.51 ± 0.06 and b = 1.45 ± 0.18 have values similar to those observed for the entire data set or other subsets therein. Another subset, that part of the repertoire using BJ1.2, also shows similar properties. Although the number of observations in this case is lower, the singletons again contribute a majority of the observations (50%) with a rapid decay at higher ranks. In this way the repertoire shows similarity not only in the distribution of its structural properties as described in the previous section but also in clonotype distributions in subsets defined by structural characteristics, i.e., it is self similar at a number of levels.
Extending the analysis to structural properties of the distributional subsets of the repertoire
The clonotype distribution of the repertoire can be described by ranks, with the lower ranks containing a large number of members. We asked how the structural properties of the clonotypes would distribute within a particular rank. There are 86 clonotypes in rank one, which offers a sufficient sample for analysis of the CDR3 amino acid sequences as was performed earlier on the entire repertoire. There are 51 different CDR3 sequences represented in rank 1, and they show another example of a power lawlike distribution (Fig. 3⇑D) that is similar to the one for the entire repertoire (a = 0.76 ± 0.06, b = 2.23 ± 0.23). Thus, structural properties of the repertoire, distributional properties of structurally defined subsets of the repertoire, and structural properties of distributiondefined subsets of the repertoire all can be described by a power lawlike distribution.
Temporal self similarity in the repertoire
To determine whether the distributional characteristics of the repertoire are stable, the distribution of clonotypes was measured in a culture from PBMC collected 2 years after those used for generating the triplicate cultures (Fig. 4⇓). Forty clonotypes were observed among the 175 bacterial colonies analyzed. There are a number of clonotypes in this culture that were not observed at the earlier time (Fig. 4⇓, filled bars). In this culture, the number of clonotypes observed only once (40%) is dissimilar to that observed in the three previous cultures (∼60%). The rankfrequency analysis of the distribution (Fig. 4⇓B) can be described as power lawlike, with parameters a = 0.41 ± 0.06, and b = 1.28 ± 0.16. Thus, the overall shape of the repertoire distribution is maintained in time, although the value of the parameters differ.
The complexity of the HLAA2restricted M1_{58–66} response is a general property
We wished to determine whether the complexity of this response can be observed in more than this one case. As a first test, we used single strand conformational polymorphism to analyze the CDR3 complexity of BV17 TCR from two cell lines established from two other HLAA2 individuals. The single strand conformational polymorphism gels showed a diffuse DNA smear migrating in the region of expected products (data not shown). Such diffuse bands are indicative of high sequence complexity. One of these lines (from individual B) was examined in detail by clonotype sequence analysis. Among the 93 colonies analyzed, 41 clonotypes were identified. The distribution of the clonotypes (Fig. 5⇓A) showed a similar distribution to that of the data presented in Fig. 1⇑. The rankfrequency relationship (Fig. 5⇓B) was comparable to that of the triplicates from the first data set, and the loglog plot also showed a biphasic character (Fig. 5⇓B, inset) with the complex portion being described by a power law with a = 0.53 ± 0.06, and b = 1.5 ± 0.19. These values are very similar to those obtained from the first data set.
Discussion
Our analysis of the in vitro recall response to a single viral peptide describes a complex system composed of two components. The first component represents ∼57% of the repertoire and is comprised of a large number of clonotypes whose distribution can be described as power lawlike. The second is represented by a few high frequency clonotypes that together constitute ∼43% of the T cells in the memory repertoire. This second component probably represents the T cells that are most often analyzed in immune responses, those with high initial precursor frequencies and excellent response characteristics, which make them amenable to T cell cloning or hybridoma generation. The former have not yet been described in any great detail, and this study is the first detailed report of their characteristics. The existence of such a complex repertoire is a stable phenomenon and is observed in more than one individual.
Our data would indicate that generation of a T cell repertoire should be thought of in terms of selection on populations, a mechanism applied to current theories of evolution (19). Within a population there are particular lineages defined by the unique rearrangements giving rise to the CDR3. These lineages will survive and expand based on factors relating to the affinity for the Ag as well as additional factors (expression levels of the cytokine receptors, cytokine milieu, etc). These expansions are in turn constrained by available space, in terms of the physical contact in the lymph node or at the site of effector function, as well as by other factors needed for cell proliferation. Thus, the repertoire can be viewed as a network of cells responding to the presence of the peptide Ag, but showing lateral connections (20). In addition to those of shared space and energy source, there is the possibility of Agspecific regulatory circuits with the engagement of regulatory CD4^{−} APCeffector CD8 cells. CD8 T cells have been implicated in regulatory roles (21), and some of the clonotypes we have identified may be involved in such Agspecific regulation.
The presence of a complex repertoire of T cells adds a level of optimization and robustness to immune memory that would not be present in its absence. The system is robust in that there is a buffering capacity for changes in the pathogenic peptide. As long as the peptide continues to bind the MHC and mutations in the T cell contact residues are not of a drastic nature, there is a reasonable chance that a subpopulation of the complex portion of the repertoire will be able to respond. These could then expand to fill the niches vacated by those clonotypes that have a lost or diminished capacity to respond. There have been a number of recent analyses of complex networks testing their level of absorbing damage without impairing function. Networks in which the connectivity follows a power law have been observed to be robust under such conditions (e.g., Ref. 22). Such a general characteristic of T cell memory would limit the degradation of the overall immune response until a critical point would be reached, after which collapse would follow.
The existence of a complex set of responding T cells raises the issue of how they arose and how they are related to each other. We propose that the complex repertoire reflects the original frequency of possible responders in the preimmune repertoire. The seven high frequency clonotypes are interpreted as representing those with the best response characteristics reflected in their increased expansion. The distribution of these clonotypes is not smooth, and there is only one example at a particular rank. This can be indicative of an additional level of selection on these clonotypes, perhaps implicating an alternative mode of repertoire generation.
In proposing a model to explain how power lawlike distributions arise, we need to make some assumptions about the starting naive distribution of clonotypes. For the sake of simplicity, we consider the BV17IRSSBJ2.7 clonotypes in the repertoire, as the influence of the TCR βchain is constant. First, we assume that clonotypes leave the thymus as single copies. We then make the simple assumption that the αchain and two other surface molecules (e.g., IL2R and CD28) will add sufficient or good avidity to the overall TAPC interaction. Setting sufficient as 0 and good as 1, there are eight possible BV17IRSSJ2.7 clonotype classes, those with values 0,0,0 (just sufficient) through 1,1,1 (the best) for the three receptors, respectively. This can be described as 2^{3} avidity classes. Because the sum of the interactions decides the avidity, a plot of avidity of the clonotypes demonstrates the power lawlike starting distribution of the unique IRSS clonotypes in the periphery (Fig. 6⇓A). These starting assumptions are unreasonably simple and are just presented to provide the basic idea of initial distributions of clonotypes when thought of in terms of avidity. To reflect the real situation, one might consider the contribution of the CDR3 and CDR2 of the αchain separately and provide each interaction with more avidity scores such as sufficient, good, excellent, and outstanding (0, 1, 2, and 3, respectively) for each interaction. In such a case the distribution for four avidity elements (αchain CDR3, αchain CDR2, molecule 1, and molecule 2) would be described by 4^{4} possibilities, again by a power law. A different view of the starting population can be obtained by ranking the clonotype groups on the basis of their avidity. In the simple example shown in Fig. 6⇓A, the avidity values would be from 0 to 3, with one example of 0, three examples of 1, three of 2, and one of 3. A more complex ranking, using three “receptors” with four possible avidity values, is shown in Fig. 6⇓B. There are 4^{3} (i.e., 64) possible combinations of clonotypes, and a majority of these would have avidities representing middle values. The clonotypes within these avidity rankings would have different probabilities of expansion. We show one such possible correlation of avidity with cell division as the curve in Fig. 6⇓B. This is based on an inverse relative decrease in the probability of dividing. Thus, the highest avidity clonotype is guaranteed to divide, clonotypes in the second highest rank have onehalf the chance, clonotypes in the third highest rank have onethird the chance, etc. Fig. 6⇓C shows the rankfrequency plot of the clonotype distribution after five divisions and assumes a 50% sampling efficiency. The overall shape of the distribution is similar to that observed in Fig. 2⇑. A loglog plot of the rank and rank frequency is approximately linear, with a slope of ∼1.4 (Fig. 6⇓C, inset). This can be compared with the actual distribution of VB17IRSSBJ2.7 clonotypes (Fig. 3⇑C). Thus, even this relatively simple model shows characteristics that are similar to those observed in this study. Modeling using more powerful approaches should provide a closer approximation of the observed memory repertoire. Assumptions about the number of times a clonotype divides after a single stimulation and assumptions about repertoire contraction cycles would have to be incorporated. The importance of repertoire contraction in maintaining lower frequency clonotypes has been recently modeled (23). Incorporation of the probability of missing first contact would all have to be considered. The time of first Ag contact has been shown to be important in establishing dominant responders in the immune repertoire (24).
Power laws and self similarity are expected properties of selfsimilar fractal systems. A simple test to determine whether a system is fractal is to describe it by a recursive selfsimilar rule. The textbook example of a simple fractal, the Koch curve, starts with a single line in which the middle third is substituted by the sides of an equilateral triangle of the length of the original middle third. The rule is then repeated, i.e., substitute the middle third of each line with the sides of an equilateral triangle, etc. In our case, we can take advantage of the power law exponent b to generate a recursive rule for a visual presentation of the clonotype distribution of the component of the repertoire for which b has been derived. An alternative way of fitting the curve for a power law distribution is in the form of a decreasing sequence, such as 1/1 ^{b}, 1/2 ^{b}, … 1/n ^{b}. When b > 1, as is the case in this study, this curve diverges and can be mapped using a polygonal spiral if appropriately transformed (25). The resulting curve can be thought of as sweeping out areas (squares) corresponding to clonotypes in the first rank, the second rank, etc. This is in fact similar to the representation of the “golden spiral,” which is encountered in many situations in biology where growth follows an optimum pattern of accretion of related units (26).
We displayed the rank of the clonotypefrequency relations using a polygonal spiral where the radii of the spiral reflect the relative proportions of distinct clonotypes observed at a particular rank. The radii form a sequence where the ratio of two consecutive radii is g = y_{i}/y_{i+1}, where i is the rank. We let g be proportional to (i + 1) ^{b}/b i ^{b}, and we let b be equal to 1.67. We calculate the Cartesian coordinates for the points on a spiral using the distance to the origin, r_{i}; the angle, λ_{ij}; and the reduction parameter, 1/b ^{i}: x_{ij} = (r_{i} cos(λ_{ij}))/b ^{i} and y _{ij} = (r_{i} sin(λ_{ij}))/b ^{i}, where i is the rank of j bead, j = 1, 2, …, k_{i}. Coordinates of the origin shift depending on the rank and branch of the spiral. We use this iterative approach and the polygonal spiral mapping to generate a representational model of the entire repertoire (Fig. 7⇓). For the first 10 ranks shown, the expected number of beads, k_{i}, was derived from the power law equation using parameters estimated from the actual data (a = 0.61, b = 1.67) and the actual number of observed clonotypes, 141, yielding the following sequence: 86, 28, 14, 9, 5, 4, 3, 2, 2, 1. In this representation, each segment of the spiral corresponds to a rank, and the number of beads within the segment corresponds to clonotypes in that rank. To reflect the actual clonotype distribution, the curve trifurcates at the end of the first segment of the spiral. Two spirals continue as mirror images. The first segment of each of these two spirals displays the clonotypes that appear twice, i.e., the second rank. The third spiral continues into the upper right quadrant and describes the remaining levels in a similar manner starting with rank three. The same process is repeated for the remaining portion of the distribution and then repeated again. These steps describe a selfsimilar recursive rule for generating a map of the repertoire. If one subtracts the first segment and the first two mirror spirals, the remaining portion of the model is an identical image of the entire model, providing a graphic demonstration of the self similarity of the repertoire. This graphic mapping of the clonotypes indicates that the repertoire can be described as a fractal.
There are a number of implications that follow from the description of the repertoire as being a selfsimilar fractal. In fractal systems, increasing the level of resolution should supply additional detail about the repertoire. We would predict that with a more extensive analysis all BJ regions would be used, and the use would continue to be described by a power law. Additional clonotypes would be added to the lower ranks and push some preexisting clonotypes to a higher rank, filling in some of the currently existing gaps and perhaps increasing the number of turns in the spiral. Fractal systems can never be counted in a strict sense (26). As the tools for measuring responding repertoires increase in sensitivity, new clonotypes would be added. This in part could explain some of the large repertoire estimates based on tetramer staining. To the extent that fractal repertoires are the norm, this would indicate that studies based on a small number of clones may be very poor at revealing the details of the system. However, after a certain depth of analysis a general description of the system is available with the proviso that additional detail can be obtained by studying the system further.
The selfsimilar distribution of clonotypes in the various repertoire subsets analyzed in this study indicates that the characteristics defining the subset did not isolate crucial parameters. For example, if the nature of the CDR3 amino acid sequence was important to the response, then the critical sequence would have been over represented in the higher ranks, and we would not have observed a power lawlike distribution when amino acid sequence use was analyzed. Thus, the nature of the J region, the CDR3 length, or the exact CDR3 amino acid sequence are not limiting parameters in the response but rather function within the context of other parameters. Thus, within the context of a particular J region, the CDR3 length may be important, etc. Those clonotypes with an identical TCR βchain (i.e., those using BJ2.7 and encoding IRSS) are selected on further parameters as modeled above.
The data presented in this study indicate that the human immune system can maintain a complex Agspecific CD8^{+} memory repertoire. We propose that it is generated in this manner to optimize the recognition of a particular antigenic epitope. A complex repertoire, in which selection for the highest possible avidity for the currently presented Ag is only one of the parameters, can provide a highly specific response and also improve the chances of recognizing antigenic shifts and optimize the coverage of a complex array of Ags.
Acknowledgments
We acknowledge the capable assistance of Jeminah Pagel and Kristin Ammon. We thank Dr. Raymond Welsh for reading the manuscript and a reviewer for suggesting the use of a bootstrapping procedure for assessing the errors.
Footnotes

↵1 This work was supported by National Institutes of Health Grants PO1HL44612 (to J.G.), PO1AI49320 (to L.K.S.), and The Cancer Center of the Medical College of Wisconsin.

↵2 Y.N.N. and E.N.N. contributed equally to this work.

↵3 Address correspondence and reprint requests to Dr. Yuri N. Naumov, Department of Pathology, Room S2122, 55 Lake Avenue North, University of Massachusetts Medical School, Worchester, MA 01655.

↵4 Abbreviation used in this paper: CDR3, third complementaritydetermining region.
 Received August 8, 2002.
 Accepted February 6, 2003.
 Copyright © 2003 by The American Association of Immunologists