Abstract
The next-generation sequencing technology has promoted the study on human TCR repertoire, which is essential for the adaptive immunity. To decipher the complexity of TCR repertoire, we developed an integrated pipeline, TCRklass, using K-string–based algorithm that has significantly improved the accuracy and performance over existing tools. We tested TCRklass using manually curated short read datasets in comparison with in silico datasets; it showed higher precision and recall rates on CDR3 identification. We applied TCRklass on large datasets of two human and three mouse TCR repertoires; it demonstrated higher reliability on CDR3 identification and much less biased V/J profiling, which are the two components contributing the diversity of the repertoire. Because of the sequencing cost, short paired-end reads generated by next-generation sequencing technology are and will remain the main source of data, and we believe that the TCRklass is a useful and reliable toolkit for TCR repertoire analysis.
Footnotes
↵1 X.Y. and D.L. should be regarded as joint first authors.
This work was supported by National Basic Research Program of China (“973” project) Grants 2013CB531500 and 2015CB554204.
The online version of this article contains supplemental material.
- Received May 9, 2014.
- Accepted October 16, 2014.
- Copyright © 2014 by The American Association of Immunologists, Inc.