Abstract
Chemokine receptors CXCR4 and CCR5 regulate WBC trafficking and are engaged by the HIV-1 envelope glycoprotein gp120 during infection. We combine a selection of human CXCR4 and CCR5 libraries comprising nearly all of ∼7000 single amino acid substitutions with deep sequencing to define sequence-activity landscapes for surface expression and ligand interactions. After consideration of sequence constraints for surface expression, known interaction sites with HIV-1–blocking Abs were appropriately identified as conserved residues following library sorting for Ab binding, validating the use of deep mutational scanning to map functional interaction sites in G protein–coupled receptors. Chemokine CXCL12 was found to interact with residues extending asymmetrically into the CXCR4 ligand-binding cavity, similar to the binding surface of CXCR4 recognized by an antagonistic viral chemokine previously observed crystallographically. CXCR4 mutations distal from the chemokine binding site were identified that enhance chemokine recognition. This included disruptive mutations in the G protein–coupling site that diminished calcium mobilization, as well as conservative mutations to a membrane-exposed site (CXCR4 residues H792.45 and W1614.50) that increased ligand binding without loss of signaling. Compared with CXCR4–CXCL12 interactions, CCR5 residues conserved for gp120 (HIV-1 BaL strain) interactions map to a more expansive surface, mimicking how the cognate chemokine CCL5 makes contacts across the entire CCR5 binding cavity. Acidic substitutions in the CCR5 N terminus and extracellular loops enhanced gp120 binding. This study demonstrates how comprehensive mutational scanning can define functional interaction sites on receptors, and novel mutations that enhance receptor activities can be found simultaneously.
This article is featured in In This Issue, p.3665
Introduction
Mutational analysis of a protein sequence is classically a “small data” problem. A limited number of mutants are made at sites of suspected importance, and the activity of each variant is individually tested. However, by combining directed evolution and deep sequencing with saturation mutagenesis to create unbiased, diverse libraries, it is now possible to track the activities of thousands of sequence variants in a single experiment (1–5). Known as deep mutational scanning, mutations that enhance protein activity are enriched following an appropriate selection or screen, whereas deleterious mutations are depleted; the calculated enrichment ratio for each sequence variant is a proxy for relative protein activity. A comprehensive sequence-activity landscape for the protein is determined.
Most deep mutational scans have been applied to proteins expressed in yeast, phages, or bacteria because of well-practiced methods for directed evolution in these hosts. Directed evolution of mammalian transmembrane (TM) proteins, which often are not functionally expressed in microbial hosts, has made slower progress. For example, only a small number of G protein–coupled receptors (GPCRs), including the neurotensin, tachykinin NK1, κ-type opioid, and α-adrenergic receptors, have been evolved in bacteria and/or yeast, mostly for the purpose of generating variants with enhanced expression for structure determination (6–9). This strategy is unsuitable for all GPCRs, which are often not functionally expressed in bacteria or yeast, have critical posttranslational modifications, or interact with large protein ligands that cannot readily cross the bacterial outer membrane or yeast cell wall.
The class A GPCRs CCR5 and CXCR4 recognize small chemoattractant proteins to regulate cell trafficking, especially of WBCs during development and inflammation (10). CXCR4 binds the chemokine CXCL12 to direct homing of hematopoietic stem cells to the bone marrow and is overexpressed on cancers where it can drive metastasis, tumor growth, and therapeutic resistance. CCR5 promiscuously interacts with multiple inflammatory chemokines, notably CCL3, CCL4, and CCL5. Both CCR5 and CXCR4 are also engaged by the HIV-1 envelope glycoprotein Env during infection. HIV-1 Env forms a homotrimeric complex that is cleaved by host proteases during maturation into extracellular gp120 and membrane-tethered gp41 subunits (11, 12). The gp120 subunit binds the primary host receptor CD4, inducing a conformational change that exposes a binding site for the secondary coreceptor (11, 13, 14), CCR5 or CXCR4 (15–17). In nearly all cases, initial infection is by R5 strains that bind CCR5, whereas X4 or dual-tropic R5X4 strains emerge during infection and are associated with accelerated patient decline to AIDS (18, 19). Once the coreceptor is bound, fusogenic regions of gp41 mediate membrane fusion and viral entry into the host cell.
In this study, we use deep mutational scanning to define the sequence-activity landscapes of CXCR4 and CCR5, evolving single-site saturation mutagenesis (SSM) libraries for surface expression and binding interactions to protein ligands, including conformation-specific Abs that block HIV-1 infection, the chemokine CXCL12, and an R5 gp120 subunit bound to CD4. Discernable features from the landscapes include TM regions that are intolerant of polar substitutions, extracellular loops that are often intolerant of mutations to cysteine, and partially overlapping conserved extracellular surfaces for ligand recognition. The sequence-activity landscapes confirm many known details of GPCR structure and chemokine receptor interactions, whereas new mutations are discovered that enhance receptor–ligand interactions and provide insight into potential allosteric mechanisms.
Materials and Methods
Tissue culture
Human Expi293F cells (Life Technologies) were cultured in Expi293 Expression Medium (Life Technologies) at 8% CO2, 37°C, and 125 rpm. A CXCR4-knockout cell line was generated with Cas9/CRISPR-based genome editing. Expi293F cells were cotransfected with plasmids encoding human codon-optimized Streptococcus pyogenes Cas9 (20) and guide RNA (20) with targeting sequence 5′-ACTTCAGATAACTACACCG-3′ for the second CXCR4 exon. Endogenous CXCR4 was stained with fluorescent 12G5 Ab (BD Biosciences), and negative cells were collected by sorting for two rounds on a BD FACSAria II at the University of Illinois Urbana-Champaign (UIUC) Roy J. Carver Biotechnology Center. The culture was periodically retested to ensure the CXCR4− phenotype was maintained.
Cells were transfected with 500 ng plasmid DNA/ml of 2 × 106 cells with ExpiFectamine (Life Technologies) unless stated otherwise. Transfected cells were analyzed or sorted 24–26 h posttransfection.
Library generation
Human CCR5 and CXCR4 sequences were constructed by oligo assembly and from gBlocks (Integrated DNA Technologies) and cloned into pCEP4 (Invitrogen). Each receptor was fused to an N-terminal hemagglutinin signal peptide, c-myc tag, and spacer upstream of receptor residues 2–352. SSM libraries were generated by overlap extension PCR (21). The CXCR4 SSM library covered 6995 out of 7020 possible single amino acid mutations (median of 124 reads per sequence variant), based on a minimum of 10 reads in the deep sequenced library, and was transfected into CXCR4-knockout cells for sorting or analysis. An original CCR5 SSM library covered 6895 out of 7020 possible single amino acid mutations (median of 173 reads). This CCR5 library was transfected into wild type (WT) Expi293F cells and was sorted for surface expression after staining with anti-myc FITC or for APC-conjugated 2D7 Ab binding. Because the original CCR5 library was missing many mutations in extracellular loop 2, an important region for ligand interactions, plasmids encoding the missing mutations were added to generate a second CCR5 SSM library, which covered 6982 single amino acid substitutions (median of 183 reads). This second CCR5 library was transfected into CXCR4-knockout cells and was sorted for surface expression after staining with anti-myc Alexa 647 or for gp120–CD4 binding.
The CCR5 combinatorial library for enhanced gp120–CD4 binding was synthesized by oligo assembly and cloned into pCEP4.
Libraries were transfected into Expi293F cells such that, on average, no more than one coding sequence was acquired per cell. Cultures at a density of 2 × 106 cells/ml were transfected with 1 ng/ml library DNA and 2 μg/ml pUC18 as carrier DNA. Two hours posttransfection, the medium was replaced.
Sorting libraries for surface expression
Transfected cells were washed with cold PBS supplemented with 0.2% BSA (PBS-BSA), stained for 40 min on ice with excess anti-myc FITC (rabbit polyclonal, 1/40 dilution; Immunology Consultants Laboratory) or anti-myc Alexa 647 (clone 9B11, 1/133 dilution; Cell Signaling Technology), washed twice, and sorted on a BD FACSAria II at the Roy J. Carver Biotechnology Center. Gating conditions are described in Supplemental Table I. Sorted cells were frozen at −80°C. To maintain cell viability and mRNA quality during the experiment, samples were sorted for a maximum of 4 h; to collect a greater numbers of cells, libraries were prepared again and frozen sorted cell pellets from multiple days’ experiments were pooled during RNA extraction.
Sorting libraries for binding to conformation-specific Abs
Transfected cells were washed with PBS-BSA and stained with anti-myc FITC to detect surface receptor expression as described above. The CCR5 and CXCR4 cell libraries were simultaneously incubated with APC-conjugated anti-CCR5 clone 2D7 (1.5 μg/ml; BD Biosciences) or anti-CXCR4 clone 12G5 (0.15 μg/ml; BD Biosciences), respectively. Ab concentrations were chosen to be similar to published dissociation constants (22, 23). Cells were washed twice with cold PBS-BSA and sorted as described in Supplemental Table I.
Sorting a CXCR4 SSM library for CXCL12 binding
A synthetic gene encoding human CXCL12 fused at the C terminus to superfolder GFP (sfGFP) via a 16-residue Gly/Ser-linker was cloned into pcDNA3.1 (Invitrogen). The plasmid was transfected into CXCR4-knockout Expi293F cells using ExpiFectamine (Invitrogen) as per the manufacturer’s directions, and culture supernatant was harvested 5 d posttransfection. The supernatant was filtered and passed over a HiTrap SP HP cation exchange column (GE Healthcare Life Sciences). The column was washed with 4 column values (CV) of 20 mM Tris (pH 7.5)/40 mM NaCl and 2 CV of 20 mM Tris (pH 7.5)/100 mM NaCl, and CXCL12–sfGFP was eluted with 2 CV of 20 mM Tris (pH 7.5)/1.2 M NaCl. The protein was further purified on a Superdex 75 10/300 gel filtration column (GE Healthcare Life Sciences) with running buffer PBS. The protein was concentrated with a centrifugal filtration device, and protein concentration was determined based on sfGFP absorbance at 485 nm. CXCL12–sfGFP was mixed with an equal volume PBS-BSA, and aliquots were flash frozen and stored at −80°C.
CXCR4-knockout Expi293F cells were transfected with the CXCR4 SSM library. Cells were washed with cold PBS-BSA and incubated on ice for 40 min with excess anti-myc Alexa 647 (clone 9B11, 1/133 dilution; Cell Signaling Technology) and 5 μM CXCL12–sfGFP. Cells were washed twice with PBS-BSA and sorted on a BD FACSAria II (gating conditions are in Supplemental Table I).
Sorting CCR5 libraries for gp120–CD4 binding
A human codon-optimized synthetic gene was constructed from gBlocks (Integrated DNA Technologies) encoding, from the N to C terminus; a CD5 leader sequence (MPMGSLQPLATLYLLGMLVASVLA) for enhanced expression (24); HIV-1BaL gp120 aa 31–511 (GenBank AAA44191.1: https://www.ncbi.nlm.nih.gov/protein/AAA44191.1) with numbering based on the HXB2 reference strain, a 21-residue Gly/Ser-rich linker, CD4 domains D1–D2 (aa K26–S209), an avi tag, and a FLAG tag. The gene was cloned into pCEP4 and transfected into CXCR4-knockout Expi293F cells. Supernatant was harvested and filtered 4 d posttransfection, and aliquots were stored at −20°C.
CXCR4-knockout Expi293F cells transfected with a CCR5 SSM library were washed with cold PBS-BSA and incubated for 60 min with culture supernatant from gp120–CD4-expressing cells. Cells were washed twice with PBS-BSA, stained for 20 min on ice with chicken anti-DYKDDDDK FITC (1/133 dilution; Immunology Consultants Laboratory) and anti-myc Alexa 647 (clone 9B11, 1/133 dilution; Cell Signaling Technology), washed twice, and sorted. Gating conditions for SSM libraries are listed in Supplemental Table I. For the CCR5 combinatorial library that combined mutations predicted to enhance gp120–CD4 binding, the sorted cells were gated for the top 10% of events in the FITC channel, after gating for the top 50% of receptor-positive cells in the Alexa 647 channel.
Deep sequencing
Total RNA was extracted from sorted cells, and first-strand cDNA was synthesized with a high-fidelity reverse transcriptase (AccuScript; Agilent Technologies) primed with the EBV-Reverse sequencing primer. The cDNA was PCR-amplified in two rounds. In the first round (18 thermocycles), primer overhangs added complementary sequences for the Illumina sequencing primers. In the second round (15 thermocycles), primer overhangs added barcodes and adaptor sequences for annealing to the Illumina flow cell. Thermocycling was kept to a minimum to reduce the introduction of PCR biases and errors. Both CXCR4 and CCR5 coding sequences were amplified as three overlapping fragments to achieve full sequencing coverage. DNA was sequenced at the UIUC Roy J. Carver Biotechnology Center on an Illumina MiSeq v3 (2 × 300nt kit) or HiSeq 2500 (2 × 250nt kit).
Deep-sequencing data were analyzed with Enrich (25). Log2 enrichment ratios of mutants were normalized by subtracting the enrichment of the WT sequence. Enrich commands to recreate our analyses and raw data are available in the data deposition with the National Center for Biotechnology Information’s Gene Expression Omnibus (26) under series accession number GSE100368 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE100368).
Calcium mobilization
Transfected cells were washed with assay buffer (PBS-BSA and 1 mM CaCl2) at ambient temperature (22–23°C). Cells were suspended at a density of 4 × 106 cells/ml in assay buffer containing 2 μM Fluo-4 acetoxymethyl ester (Life Technologies) and 1/250 anti-myc Alexa 647 (clone 9B11; Cell Signaling Technology) and stained for 30 min in the dark. Cells were washed, resuspended in assay buffer at 1.2 × 106 cells/ml, and analyzed on a BD Accuri C6 Cytometer. Cells were stimulated at 60 s run time with injection of CXCL12 into the sample tube while spinning to mix (concentrated CXCL12 stocks prepared in assay buffer; PeproTech) and at 180 s with ionomycin (4 μM final concentration; Life Technologies).
Purification of BaL gp120–CD4–FLAG
BaL gp120–CD4–FLAG plasmid (500 ng/ml of cells) was transfected into Expi293F cells at 2 × 106 cells/ml using ExpiFectamine (Life Technologies). Transfection enhancers (Life Technologies) were added 20 h posttransfection, and the culture was centrifuged for 20 min at 4000 × g after 200 h. Protease inhibitors (SIGMAFAST inhibitor mixture; Sigma-Aldrich) dissolved in PBS were added to the supernatant. The supernatant was incubated with anti-FLAG M2 affinity gel (Sigma-Aldrich) for 4 h at 4°C. The resin was collected by centrifugation at 1000 × g for 5 min, washed with PBS, and transferred to a gravity column for elution. BaL gp120–CD4–FLAG was eluted with five CV of 200 μg/ml DYKDDDDK peptide (ApexBio) in PBS. Eluted protein was concentrated using a Vivaspin device (MWCO 100 kDa; Sartorius) and further purified on a Superose 6 increase 10/300 GL (GE Healthcare Life Sciences) size exclusion chromatography column equilibrated with PBS. Peak fractions were pooled and concentrated, and aliquots were snap frozen in liquid nitrogen before storage at −80°C.
Bimolecular fluorescence complementation
As outlined above, CCR5 and CXCR4 were cloned into pCEP4 with N-terminal signal peptides from influenza hemagglutinin, followed by a c-myc or FLAG epitope tag, a six-residue spacer, and receptor residues 2–352. Human metabotropic glutamate receptor subtype 3 (mGluR3) was cloned into pCEP4 with the signal peptide of HLA class I A-2 α-chain, a c-myc or FLAG epitope tag, and a three-residue linker to receptor residues R30–L879. At the cytosolic C termini of the receptors, immediately prior to the stop codons, were fused N- (VN: aa V1–A154) or C-terminal (VC: aa D155–K238) residues of Venus (mutant I152L), a yellow fluorescent protein variant with improved signal-to-noise for the detection of protein interactions by bimolecular fluorescence complementation (BiFC) (27, 28). Pairs of VN- and VC-fused receptors (300 ng of each plasmid per milliliter of 2 × 106 cells) were cotransfected into CXCR4-knockout Expi293F cells using ExpiFectamine, and cells were analyzed by flow cytometry 22–24 h posttransfection. Washed cells were stained with anti-FLAG Cy3 (clone M2, 1/200 dilution; Sigma-Aldrich) and anti-myc Alexa 647 (clone 9B11, 1/200 dilution; Cell Signaling Technology), washed twice, and analyzed on a BD LSR II flow cytometer with appropriate three-color compensation.
Cell membrane fusion assay
A human codon-optimized sequence encoding Env from the X4 HIV-1 strain MN (GenBank Accession No. P05877: https://www.ncbi.nlm.nih.gov/protein/P05877) was cloned from gBlocks (Integrated DNA Technologies) into the NheI-XhoI sites of pCEP4. EnvMN residues 31–856 (HXB2 reference numbering) were placed immediately downstream of a CD5 leader peptide. Expi293F CXCR4-knockout cells (2 × 106 cells in 1 ml) were transfected with 50 ng of pCMV3–CD4 (HG10400-UT; Sino Biological) plus 450 ng pCEP4–myc–CXCR4 (500 ng total; for controls, plasmid DNA was replaced with empty vector) or with 500 ng pCEP4–EnvMN using ExpiFectamine. Five hours posttransfection, 0.2 × 106 receptor-expressing cells were mixed with 0.2 × 106 EnvMN-expressing cells in each well of a poly-l-lysine–coated 12-well tray, and the final volume was prepared to 800 μl using Expi293 Expression Medium. Wells were previously coated by incubating with 0.01% poly-l-lysine (Sigma-Aldrich) for 30 min at room temperature and washing with water and Expi293 Expression Medium prior to adding cells. The cells were incubated for 20 h without agitation at 37°C and 8% CO2 to allow them to adhere and contact each other. The adhered cells were then washed with warm PBS, treated with 0.25% trypsin-2.21 mM EDTA for 15 min at 37°C to detach them from the plate surface, washed with cold PBS-BSA, and analyzed on a BD LSR II flow cytometer with the forward light scatter/side scatter voltages set low to detect large cell events. The positive gate was set at <1% of untransfected cells for high forward light scatter/side scatter events.
Data availability
Analyzed data in the form of Excel spreadsheets and raw deep-sequencing data are deposited in National Center for Biotechnology Information’s Gene Expression Omnibus (26) under series accession number GSE100368 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE100368).
Plasmid availability
Plasmids have been deposited with Addgene under accession numbers 98942–98968 (https://www.addgene.org/Erik_Procko/).
Results
Linking genotype with phenotype in transiently transfected human cell culture
Deep mutational scanning demands one or a small number of sequence variants per cell, ensuring a close link between cell phenotype and a specific protein-encoding nucleic acid. Tight genotype–phenotype linkage in mammalian cell libraries has been solved in several different ways: by the generation of stable libraries using engineered lines harboring unique integration sites (29), transduction with viral vectors at very low multiplicity-of-infection (30), or cloning the sequence variants on virus-derived episomal plasmids (5, 31–33) that maintain a small number of copies per cell, such that with antibiotic selection and passaging, plasmid diversity within an individual cell is lost over time (5). Each of these approaches has their own set of limitations, including potential drift of the population with continual passaging or the need to establish stable lines with antibiotic selection over time. In this study, we optimized transient transfections using carrier DNA to achieve expression of no more than one sequence variant per cell.
CXCR4 and CCR5 were genetically tagged at the extracellular N termini with a c-myc epitope for detection of surface protein, and SSM libraries were synthesized, covering nearly every single amino acid substitution. During an evolution experiment, these SSM plasmid libraries were transfected into human Expi293F cells using a 2000-fold excess of carrier DNA. This large excess of carrier DNA diluted the coding sequences such that one cell typically expressed no more than one receptor mutant, maintaining a tight link between a single sequence and cell phenotype (Supplemental Fig. 1). We estimate that under these conditions, <20% of the receptor-positive population expresses more than one sequence variant, although this is achieved at the cost of only a few percent of cells acquiring any coding plasmid at all (Supplemental Fig. 1).
Human Expi293F cells are a suspension-derivative line of HEK 293; this simplifies preparations for flow cytometry and sorting, which require single-cell suspensions, and allows easy scalability. Endogenous CXCR4 expression was knocked out using Cas9/CRISPR-based genome editing to generate indel mutations targeting exon 2. Expi293F cells do not express CCR5, and as a consequence, the CXCR4-knockout line can be considered a “blank slate” for overexpression of receptor mutants. This again ensures that the cell phenotype is unambiguously linked to a single coding sequence.
In evolution experiments, cells transfected with the SSM libraries were evolved by one round of FACS for receptor surface expression or high affinity interactions with protein ligands. Typically, in vitro evolution experiments drive a diverse population of sequence variants toward convergence on a small number of highly optimum sequences using multiple rounds of selection. However, to ensure information was gathered on all mutations including neutral substitutions, changes in sequence frequencies in the libraries were deliberately tracked after just a single round of FACS screening. RNA was then extracted from the sorted cells and deep sequenced. Enrichment ratios for each mutation were calculated based on the change in frequencies between the plasmid library and transcripts in the evolved population. This assumes that transcripts in the transfected cell culture accurately capture the diversity of the plasmid DNA library. In control experiments, we found there was close agreement between the frequencies of abundant sequence variants in the plasmid libraries and transcripts in unsorted transfected cultures, although occasionally transcripts of rare sequence variants failed to be acquired by any cells (Supplemental Fig. 2).
Deep mutational scan of the class A GPCR CXCR4
A deep mutational scan of CXCR4 for surface expression (Fig. 1A) reveals general structural features. Mutations to polar residues within the hydrophobic TM helices are depleted; hydropathy-weighted averages of the log2 enrichment ratios show that regions buried within the membrane or protein core are biased toward apolar residues, whereas surfaces lining the solvent-exposed ligand-binding cavity tolerate some polar substitutions (Fig. 2A). Premature stop codons are generally depleted across the entire 7TM domain, and it is not until the cytosolic, unstructured C-terminal tail that the protein sequence has high mutational tolerance.
Sequence-activity landscapes for CXCR4 surface expression and interactions with CXCL12 and monoclonal 12G5. (A) Expi293F cells were transfected with a CXCR4 SSM library, stained for surface CXCR4 expression, and sorted. Transcripts were deep sequenced, and frequencies were compared with the naive DNA library to calculate the enrichment of each mutation. The protein sequence is on the horizontal axis with secondary structure at top, whereas amino acid substitutions are on the vertical axis (*, stop codons). Log2 enrichment ratios (averaged from four experiments) are plotted from ≤ −3 (i.e., depleted, orange) to 0 (neutral, white) to ≥ +3 (enriched, blue). Missing mutations (<10 reads in the naive library) are black. (B) Cell libraries in (A) were sorted based on detection of an N-terminal c-myc tag using FITC- or Alexa 647–conjugated anti-myc (two replicates each). Average log2 enrichment ratios are compared. Rare mutations in the naive DNA library (10–100 reads) are red, whereas abundant mutations (>100 reads) are black. R2 is calculated for mutations with >100 reads in the library, and the trend line is shown with a diagonal gray line. Maximum depletion was set to −5. (C) Heatmap of the enrichment ratios (averaged from two experiments) when the CXCR4 SSM library was sorted for both CXCR4 surface expression and binding to Ab 12G5. The protein sequence is on the horizontal axis and amino acid substitutions are on the vertical axis. Log2 enrichment ratios are plotted from ≤ −3 (orange) to 0 (white) to ≥ +3 (blue). Missing mutations (<10 reads in the naive library) are black. (D) Agreement between replicate experiments for CXCR4–12G5 binding. Rare mutations in the naive DNA library (10–100 reads) are red. R2 is calculated for mutations with >100 reads in the library (black), and the trend line is shown with a gray line. (E) Enrichment ratios (average of two experiments) of mutations in a CXCR4 library evolved for surface expression and binding to CXCL12–sfGFP. The protein sequence is on the horizontal axis and amino acid substitutions are on the vertical axis. Log2 enrichment ratios are plotted from ≤ −3 (orange) to 0 (white) to ≥ +3 (blue). Missing mutations (<10 reads in the naive library) are black. (F) Agreement between replicates for CXCR4–CXCL12 binding. Rare mutations in the naive DNA library (10–100 reads) are red. R2 is calculated for mutations with >100 reads in the library (black), and the trend line is shown with a gray line. (G–I) Agreement between conservation scores following sorting of CXCR4 libraries for surface expression (G), binding to 12G5 (H) or binding to CXCL12 (I). Conservation scores are calculated by averaging the log2 enrichment ratios for all mutations at each amino acid position. High sequence conservation is associated with negative scores.
Water-exposed residues are tolerant of polar substitutions during evolution for receptor surface expression. (A) After evolution of CXCR4 for surface expression, log2 enrichment ratios for all amino acid substitutions at each position were weighted by Kyte–Doolittle hydropathy and averaged. Amino acid preferences mapped to the CXCR4 crystal structure (PDB 4RWS) are colored from negative (polar, blue) to positive (hydrophobic, yellow). (B) CCR5 (PDB 4MBS) has a similar sequence profile for surface expression.
There was weak agreement when the CXCR4 deep mutational scan was replicated, and therefore mean enrichment ratios provide only a qualitative assessment of the relative activities of each mutant (Fig. 1B). Replicate comparisons of log2 enrichment ratios for depleted variants are scattered in the negative quadrant, whereas neutral mutations cluster near the origin. Similar variation has been seen when evolving yeast or virus libraries (1, 4). Because of this high variability, we use the mutational scans as predictions, similar to a mutant screen, and in later experiments validate interesting mutations individually. However, conservation scores are reproducible (Fig. 1G–I) and define conserved regions of available sequence space. In other studies, Shannon entropy is calculated as a conservation or mutational tolerance metric (1, 21, 34). However, Shannon entropy is lowest for residues that are under selection toward a small number of amino acid identities with peak fitness, including positive selection away from the WT sequence. Instead, we calculate conservation scores by averaging the log2 enrichment ratios for all mutations at any given residue position in which negative scores indicate the sequence is constrained closer to WT.
Extracellular regions of CXCR4 are conserved for binding Ab 12G5
Ab 12G5 blocks X4 HIV-1 infection (35), and its interactions have previously been mapped by small-scale alanine mutagenesis to the CXCR4 N terminus and extracellular loop 2 (23); it is, therefore, an excellent test case for validating deep mutational scanning as a method for mapping GPCR interaction sites. Expi293F cells expressing the CXCR4 SSM library were sorted for both surface expressed receptor and high binding signal to fluorescent 12G5. In the 12G5 affinity landscape (Fig. 1C, 1D), there are stricter demands on the protein sequence, and deleterious mutations extend from TM helices into extracellular loops. Of particular prominence, cysteine substitutions to residues exposed in the oxidizing, extracellular environment tend to be depleted. (Depletion of extracellular cysteines is common to all the deep mutational scans in which CXCR4 or CCR5 were evolved for ligand interactions.) Compared with the sequence-activity landscape for surface expression where premature stop codons were neutral beyond aa 303 (Fig. 1A), stop codons are deleterious for 12G5 binding farther into the CXCR4 sequence up to aa 331 (Fig. 1C). Cysteine substitutions are also enriched in this region, indicating that the cytosolic C-terminal tail is important for presenting a permissive conformation of CXCR4 for 12G5 recognition.
Residue conservation scores for CXCR4 surface expression closely correlate with conservation scores for 12G5 binding, as Ab recognition demands that the receptor be expressed and accessible at the cell surface. But there are some residues that are explicitly conserved for 12G5 recognition (Fig. 3A). To highlight these residues on the CXCR4 crystal structure, conservation scores for surface expression are subtracted from the scores for 12G5 binding, and this 12G5–expression difference score is spatially plotted on the CXCR4 structure (Fig. 3B). Critical residues for 12G5 binding localize to the N terminus and solvent-exposed tips of extracellular loop 2 and extracellular loop 3, in agreement with prior studies.
Conserved CXCR4 residues define the 12G5 Ab and CXCL12 chemokine interaction sites. (A) Residue conservation scores for surface expression and 12G5 binding are generally correlated, except for a subset of residues that are more highly conserved for 12G5 recognition. Residues most conserved for 12G5 binding, independent of surface expression, are labeled. (B) Conserved residues for 12G5 interactions are mapped to the CXCR4 crystal structure with thick orange tubing. Color intensity and tube thickness correspond to the 12G5–expression difference score. (C) Residue conservation scores for CXCL12 binding versus surface expression. Residues most conserved for CXCL12 interactions are labeled. (D) Important residues for CXCL12 binding (based on the CXCL12–expression difference scores) are shown with thick magenta tubing on the CXCR4 structure. (E) The CXCR4 extracellular surface is colored based on the CXCL12–expression difference score, with critical residues for CXCL12 binding in magenta. The crystallographically determined binding of chemokine vMIP-II is shown as a green tube.
CXCL12 engages the ligand-binding cavity of CXCR4 asymmetrically
Having validated that CXCR4 deep mutational scanning could map a known interaction site with a model Ab, we evolved the CXCR4 library for interactions with the receptor’s physiological ligand CXCL12/SDF1α (Fig. 1E, 1F). CXCL12 was C-terminally fused to sfGFP for fluorescence detection of binding, and cells expressing the CXCR4 SSM library were sorted both for surface receptor expression and CXCL12 binding signal. A comparison of conservation scores for CXCL12 binding versus receptor expression highlights critical residues for CXCL12 interactions (Fig. 3C), and representing the CXCL12 affinity–expression difference scores on the CXCR4 structure spatially reveals the binding site (Fig. 3D, 3E). The structure of CXCL12-bound CXCR4 has not been experimentally determined, but a crystal structure of inactive CXCR4 bound to the antagonistic viral CC chemokine vMIP-II has been solved at moderate resolution (36). Using the CXCR4–vMIP-II structure together with other data, CXCR4 in the active CXCL12-bound conformation has been previously modeled (37). In this study, we compare general trends in the deep mutational scan, both to this previously published CXCL12–CXCR4 model (Fig. 4) and to the vMIP-II–bound crystal structure. We excluded an alternative atomic model of CXCL12–CXCR4 (38), as it fails to explain the binding asymmetry starkly apparent in the sequence landscape.
Comparison of the CXCR4–CXCL12 interaction sequence landscape to an atomic model. (A–D) The deep mutational scan is compared with a structural model of CXCL12-bound CXCR4 (37). Heatmaps show the enrichment of CXCR4 mutations after evolution for CXCL12 binding. Log2 enrichment ratios are plotted from ≤ −3 (i.e., depleted, orange) to ≥ +3 (i.e., enriched, blue). Mutations missing in the library are black. Interactions between CXCR4 (colored based on average log2 enrichment ratios for CXCL12 binding, with dark magenta most conserved) and CXCL12 (green) are shown in the accompanying structural views. The panels are arranged from the base of the ligand-binding cavity (A) up toward the rim (D). CPK coloring is used for non–carbon atoms (yellow, sulfur; red, oxygen; blue, nitrogen).
In the mutational scan, N-terminal CXCR4 residues D20–F29 are important for CXCL12 interactions; disulfide C28-C2747.25 (Ballesteros-Weinstein numbering in superscript) forms the branch point between the unstructured N terminus and helix TM7. M24–F29 directly contact the viral chemokine N loop and β3 strand and by homology, are anticipated to make similar contacts to CXCL12 (Fig. 4D). Substitutions of CXCR4-D20 and -E26 are depleted, and these residues likely make important electrostatic interactions with CXCL12-K56 and -R47, respectively, based on homology to vMIP-II. CXCR4-P27 is one of the most highly conserved residues in the mutational scan for CXCL12 affinity and is predicted to pack against CXCR4-F29 on one side while the opposing side makes apolar contacts within a complementary pocket of the chemokine (Fig. 4D).
CXCR4 residues extending into the ligand-binding cavity are also conserved for CXCL12 binding, although with notable asymmetry (Fig. 3D, 3E). Mutations of residues on TM1, TM2, and TM7 reduce CXCL12 binding, whereas the opposing side of the cavity has higher mutational tolerance. This is again consistent with the CXCR4–vMIP-II crystal structure, in which the vMIP-II N terminus extends with helical conformation along one side of the binding cavity (36). Whereas the CXCL12 N terminus must adopt an alternative structure within the CXCR4 binding cavity because of its shorter length, the data unequivocally demonstrate that the presence of asymmetric interactions within the binding cavity is a shared feature of both chemokines. For example, acidic CXCR4 residues D972.63 and E2887.38 at the base of the binding cavity are highly conserved for CXCL12 interaction and are predicted to contact the CXCL12 α-amine and CXCL12-K1, respectively (Fig. 4A, 4B). Aromatic substitutions are tolerated in CXCR4-R301.24, and aromatic and aliphatic substitutions are allowed for CXCR4-H2817.31; both these residues have been modeled to be in proximity to CXCL12-L5 (Fig. 4C). CXCR4-E2777.27 tolerates aliphatic side chains, which may pack against the chemokine backbone and methylene groups of CXCL12-R8, and also suggests that the modeled salt bridge between CXCR4-E2777.27 and CXCL12-R12 may not form in the real structure or may be replaced by equally favorable apolar interactions (Fig. 4C). Finally, CXCR4-D2626.58 is conserved in the mutational scan and has been modeled to interact with CXCL12-R8 (Fig. 4C). The sequence-activity landscape is therefore in close agreement with prior modeling of CXCL12-bound CXCR4 based on the CXCR4–vMIP-II crystal structure.
Nuclear magnetic resonance structures of CXCL12 bound to N-terminal CXCR4 peptides have suggested CXCR4 residues M1–F29 wrap around the chemokine and make extensive physical contacts (38, 39). CXCR4 conservation in the deep mutational scan falls sharply upstream of residue 20, and hence there is no evidence in the data for extensive interactions between CXCL12 and the receptor N terminus. However, solvent-exposed contacts with the receptor N terminus may simply contribute less energetically than interactions buried deep within the receptor binding cavity, and because of noise in the data, the mutational scan could be biased toward mutations that exert strong phenotypic effects. Substitutions to proline residues in the CXCR4 N terminus are specifically depleted in the CXCL12 affinity landscape; N-terminal proline substitutions are broadly tolerated for CXCR4 surface expression or 12G5 binding. This suggests there is some type of structure at the CXCR4 N terminus that is necessary for chemokine binding, but our data are ambiguous over the exact nature of the binding interactions.
CXCR4 mutations that increase CXCL12 binding
Based on crystal structures of class A GPCRs in different states, ligand binding to the extracellular cavity stabilizes conformational changes that include shifts of central aromatic residues all the way to large rotations and outwards motions of TM5, TM7, and especially TM6. These helix shifts break contacts between TM2–TM3–TM6–TM7 at the cytoplasmic surface, exposing a G protein–coupling site for signaling (36, 40–44). We searched the CXCR4–CXCL12 interaction landscape for enriched mutations within the cytosolic half and center of the receptor, as these mutations may shift the conformational equilibrium toward high CXCL12-affinity states. Fourteen mutations were individually tested, and 13 increased CXCL12 interactions after consideration of surface expression levels (Fig. 5A–D). The mutations localize to three general regions.
CXCR4 mutations that increase CXCL12 binding and signaling. (A) Sites of selected mutations predicted to increase CXCL12 binding are shown as magenta spheres on the vMIP-II (green tube)–bound CXCR4 (gray ribbon) structure. Mutations fall into three regions: 1) a surface-exposed putative cholesterol-binding site, 2) the base of the extracellular cavity where residues allosterically couple ligand binding to G protein activation, and 3) the G protein–binding site. Control mutation V196A is in cyan. (B) CXCR4-H79 and -W161 (magenta spheres) form a flat hydrophobic surface that may interact with two cholesterols (purple sticks) based on an alignment to the cholesterol-bound structure of the β2AR (PDB 3D4S). CPK coloring is used for non–carbon atoms. (C) CXCL12 binding signal (5 μM CXCL12–sfGFP) of CXCR4 mutants measured by flow cytometry. Mean ± SD, n = 4. Student two-tailed unpaired t test, *p < 0.05, **p < 0.01. (D) Surface expression of CXCR4 mutants. Mean ± SD, n = 4. (E) Fluo-4–loaded Expi293F cells expressing CXCR4 were stimulated with CXCL12, and Ca2+ mobilization was monitored by flow cytometry. (F) Ca2+ signaling responses of cells expressing WT or mutant CXCR4. Peak fluorescence following CXCL12 stimulation is normalized as a percent of the maximum response following 4 μM ionomycin application. Data are mean ± SEM, number of replicates (n) is shown below x-axis. Statistical significance is indicated for increases in signaling. (G) Receptor association measured by BiFC. Cells were gated for equivalent levels of receptor expression, and mean Venus fluorescence ± SD is plotted (n = 4). Unless otherwise indicated, the VC- and VN-fused receptors were identical and homomeric associations were measured. (H) BiFC measurements for homomeric associations between CXCR4 mutants Y255V, R77E, and I130E and nonspecific associations with mGluR3. Mean ± SD, n = 3. *p < 0.05, **p < 0.01.
Mutations H792.45Q, W1614.50H, and W1614.50Y map to a putative allosteric site that is occupied by two stabilizing cholesterol molecules in a crystal structure of the β2-adrenergic receptor (β2AR) (45) (Fig. 5B). The mutations are conservative; glutamine can have similar hydrogen-bonding as histidine, and tyrosine, histidine, and tryptophan are all polar aromatics. Based on a consensus sequence, the β2AR cholesterol-binding site was not predicted to be conserved in chemokine receptors (45). The crystal structures of CXCR4 and CCR5 also do not show bound cholesterol, despite cholesterol’s inclusion in the lipidic cubic phase (40, 46). However, cholesterol does enhance the activity of CXCR4 toward CXCL12 (47–50), and the site shares structural similarity between CXCR4 and β2AR, including the presence of aromatic and basic side chains. Our mutagenesis data demonstrate that residues at this membrane-exposed surface impact chemokine affinity within the binding cavity, suggesting they may form an allosteric site in a broader range of GPCRs than first proposed, but whether cholesterol binds is unclear. W4.50 is completely conserved among GPCRs, and it has been proposed that local W4.50 rotamer shifts are coupled to receptor activation (51). Mutations H792.45Q, W1614.50H, and W1614.50Y may therefore shift the equilibrium between active and inactive states, whether cholesterol binds or not. The three mutations also slightly enhanced CXCL12-induced calcium mobilization at the lowest CXCL12 concentration tested (1 ng/ml; Fig. 5E, 5F), providing further evidence that this site may allosterically regulate or be structurally coupled to receptor activity.
Mutations Y1163.32A, T1173.33N, Y2556.51V, and A2917.42F localize to the base of the ligand-binding cavity at the receptor’s center (Fig. 5A), a region that allosterically couples ligation to G protein activation in the cytoplasm. These mutations are highly disruptive, replacing key aromatic residues that undergo conformational changes with small side chains (Y1163.32A and Y2556.51V), or imposing steric clashes (T1173.33N and A2917.42F). These mutations may enhance CXCL12 binding (Fig. 5C) by disrupting structural coupling, such that CXCL12 binding energy is no longer required to fuel conformational changes associated with receptor activation. Indeed, these mutations have diminished calcium signaling in response to CXCL12 stimulation, except for T1173.33N, which has WT responses (Fig. 5F).
Finally, the third set of mutations that enhance CXCL12 binding (R772.43E, L1273.43C, I1303.46E, A1373.53N, I1383.54R, and T2416.37W) localize to the cytoplasmic junction of helices TM2, TM3, TM6, and TM7, which seal the G protein–coupling site (Fig. 5A). These mutations are anticipated to be highly disruptive in the inactive conformation, including adding buried charges and steric clashes that may drive helices apart, crudely mimicking helix shifts that occur during receptor activation. Whereas this could drive the conformational equilibrium into pseudo-active states with higher CXCL12 affinity, the G protein–coupling site is also damaged; these mutants tend to be inactive for Gα-mediated calcium signaling (Fig. 5F).
Class A GPCRs can form homo- and hetero-associations, although the mechanism of oligomerization is controversial. CXCR4 oligomerization has been proposed to affect, or be affected by, ligand interactions and signaling (48, 52–54), although it is unclear whether associations are through a defined protein–protein interface or via congregation in cholesterol-rich membrane microdomains (48, 55). Although multiple CXCR4 crystal structures have captured a homodimeric assembly (40), specific mutations in chemokine receptors that diminish oligomerization are rare (56), and biochemical and structural studies support a 1:1 stoichiometry for CXCL12-mediated activation (36, 57). Using BiFC, we tested whether CXCR4 mutants with enhanced CXCL12 binding display perturbed oligomerization. Cytosolic receptor termini were fused to N- (VN) or C-terminal (VC) residues of Venus (mutant I152L), a yellow fluorescent protein variant with improved signal-to-noise for the detection of protein interactions (27, 28). If two receptor chains fused to the respective halves of split Venus associate, an active fluorophore assembles, providing a quantitative readout for the physical interaction. We simultaneously detected extracellular tags on the receptors using three-color flow cytometry, allowing us to tightly control for receptor expression levels and gate for equivalent cell populations. As previously reported using BiFC (58), CXCR4 homodimerized and heterodimerized with CCR5, yet had lower nonspecific associations with the mGluR3, a physiologically unrelated class C GPCR of neuronal cells (Fig. 5G). Nearly all the CXCR4 mutants display equivalent Venus fluorescence to WT (Fig. 5G). Three CXCR4 mutations (R772.43E, I1303.46E, and Y2556.51V) cause a large change in BiFC signal, with much higher Venus fluorescence. However, these CXCR4 mutants also have increased association with the mGluR3 control (Fig. 5H) and likely increase nonspecific aggregation of receptor chains in the membrane through structural destabilization. CXCR4-H792.45Q has a small but statistically significant decrease in oligomerization, yet this property is not shared by related mutations W1614.50H/Y at the predicted allosteric site. Overall, mutations enhancing CXCL12 binding are not correlated with changes in receptor oligomerization.
Deep mutational scanning of CCR5 defines an extracellular binding site for Ab 2D7
A CCR5 SSM library was evolved by FACS for surface expression and affinity for the HIV-1 blocking Ab 2D7 (59, 60). The sequence-activity landscapes (Fig. 6) reveal similar constraints to the CCR5 sequence as described above for CXCR4. There are prominent blocks of deleterious polar substitutions within hydrophobic TM helices, whereas polar mutations are tolerated for receptor expression at solvent-exposed positions (Figs. 2B, 6A). Substitutions for cysteine residues in extracellular regions tend to be deleterious for binding to the Ab ligand (Fig. 6C). The 2D7 affinity–expression difference scores highlight conserved CCR5 residues in extracellular loop 2 for binding 2D7 (Fig. 7A, 7B), in agreement with prior small-scale mutagenesis and chemical cross-linking studies (60–62).
Sequence-activity landscapes for CCR5 surface expression and interactions with HIV-1BaL gp120 and monoclonal 2D7. (A) Cells expressing a CCR5 SSM library were evolved by FACS for surface receptor expression. Log2 enrichment ratios (averaged from four experiments) are plotted from ≤ −3 (depleted, orange) to 0 (neutral, white) to ≥ +3 (enriched, blue). Missing mutations (<10 reads in the naive library) are black. Protein sequence is on the horizontal axis, amino acid substitutions are on the vertical axis. *, stop codons. (B) Average log2 enrichment ratios are compared after sorting cell libraries in (A), stained with FITC- or Alexa647-conjugated anti-myc (two replicates each). R2 is calculated for mutations with >100 reads in the naive library (black), and rare mutations (10–100 reads) are red. The trend line is gray. Maximum depletion was truncated to −5. (C) Mutation enrichment ratios when the CCR5 library was evolved for surface receptor expression and binding to Ab 2D7. Data are average from two experiments. Log2 enrichment ratios are plotted from ≤ −3 (orange) to 0 (white) to ≥ +3 (blue). Missing mutations (<10 reads in the naive library) are black. Protein sequence is on the horizontal axis, amino acid substitutions are on the vertical axis. (D) Agreement between replicate experiments for CCR5–2D7 binding. R2 is calculated for mutations with >100 reads in the naive library (black), and rare mutations (10–100 reads) are red. The trend line is gray. (E) Mutation enrichment ratios when the CCR5 SSM library was evolved for surface expression and binding to HIV-1BaL gp120–CD4. Data are averaged from two experiments. Log2 enrichment ratios are plotted from ≤ −3 (orange) to 0 (white) to ≥ +3 (blue). Missing mutations (<10 reads in the naive library) are black. Protein sequence is on the horizontal axis, amino acid substitutions are on the vertical axis. (F) Agreement between replicates for CCR5 binding to gp120–CD4. R2 is calculated for mutations with >100 reads in the naive library (black), and rare mutations (10–100 reads) are red. The trend line is gray. (G–I) Agreement between conservation scores following sorting of CCR5 libraries for surface expression (G), binding to 2D7 (H), or binding to gp120–CD4 (I).
Deep mutational scans identify critical CCR5 residues for binding monoclonal 2D7 and HIV-1BaL gp120. (A) CCR5 residue conservation scores for 2D7 binding versus surface expression. Some of the residues most conserved for 2D7 binding are labeled. (B) 2D7–expression difference conservation scores are mapped to the CCR5 structure. Important residues for 2D7 binding are shown as thick red tubing, with extracellular loop 2 forming the dominant interaction site. (C) CCR5 residue conservation scores for gp120–CD4 binding versus surface expression. (D) Important CCR5 residues for gp120–CD4 interactions are shown in thick blue tubing, based on mapping the gp120–CD4–expression difference scores to the CCR5 structure. (E) Residues conserved for gp120–CD4 binding are colored blue on the extracellular surface of CCR5. Maraviroc, which blocks HIV-1 infection, is shown in magenta.
A CCR5 sequence-activity landscape for interactions with HIV-1BaL gp120
CCR5 binds multiple inflammatory chemokines, including CCL3, CCL4, and CCL5. However, sfGFP fusions or biotin conjugates of these three chemokines were either insoluble or bound Expi293F cells nonspecifically, preventing us from characterizing CCR5–chemokine interactions by deep mutational scanning. Instead, we examined interactions with another important protein ligand, the HIV-1 Env subunit gp120.
R5-tropic gp120 from the prototypical BaL strain (Clade B) was genetically fused to the D1–D2 domains of CD4 (63). A CCR5 SSM library was evolved by FACS for gp120–CD4 binding after incubation with conditioned medium from gp120–CD4-expressing culture. The landscape (Fig. 6E) reveals conserved residues extending from extracellular regions deep into the ligand-binding cavity, with critical residues from TM2, TM3, and the cavity base (Fig. 7C–E). CCR5 residues W862.60, Y1083.32, and E2837.39 [implicated in prior studies (64, 65)] at the base of the cavity are conserved for gp120–CD4 binding. These residues interact with the chemokine N terminus (66), and the gp120 V3 loop likely extends into the cavity to make alternative contacts. Molecular dynamics simulations have modeled gp120-R315 (HIV-1 strain HXB2 reference numbering) at the tip of the V3 loop as reaching into the cavity and contacting CCR5-E2837.39, with apolar contacts from the gp120-R315 methylene groups to CCR5-Y1083.32, a cation–π interaction with CCR5-Y2516.51, and hydrophobic packing between gp120-I309 and CCR5-W862.60 and -Y892.63 (67); all these putative interacting CCR5 residues are conserved in the sequence landscape. The Food and Drug Administration–approved CCR5 antagonist maraviroc also binds at the base of the cavity (46), sterically overlapping with the gp120 interaction footprint (Fig. 7E).
The CCR5 N terminus is a critical site for gp120 binding. Sulfonated CCR5 tyrosines Y3, Y10, Y14, and Y15 are important for gp120 binding (68, 69), and these are better conserved than neighboring N-terminal residues. There was extensive enrichment of acidic mutations across the CCR5 N terminus (Fig. 6E), suggesting generic complementarity to a basic patch on gp120. Structural studies and modeling indicate the CCR5 N terminus adopts a helical conformation that extends away from the gp120 V3 loop toward the bridging sheet (70, 71), a region of CD4-induced conformational change (72, 73). This positions the CCR5 N terminus in close proximity to gp120-R298, -K305, and -R327 on the V3 loop stem, as well as possibly gp120-R419 and -K421 in β19 heading toward the bridging sheet (70, 71). This readily explains the enrichment of acidic substitutions in the CCR5 N terminus, whereas we also observed that proline substitutions that can disrupt helical conformations are depleted.
Substitutions for acidic amino acids in CCR5 extracellular regions enhance gp120 binding
Mutations at 12 sites predicted from the deep mutational scan to enhance CCR5 interactions with BaL gp120 were combined in a library that was sorted twice for high gp120–CD4 binding signal (Fig. 8A). CCR5 clones were then screened individually and sequenced. Because of high diversity (over 4 million variants), the library failed to converge, yet a clone (called R5gp120-Hi) was identified that matched the consensus sequence and qualitatively had among the highest binding signals for gp120–CD4. R5gp120-Hi has nine mutations, of which five are to acidic residues in the CCR5 N terminus and extracellular loop 3 (Fig. 8B). The CCR5 libraries were sorted after incubating with saturating gp120–CD4; this was technically necessary to achieve sufficient binding signals. As a consequence, CCR5 mutations that decrease the KD for gp120–CD4 would have no fitness benefit, whereas mutations that enhance maximum binding at saturation would be enriched. Accordingly, R5gp120-Hi has only slightly tighter affinity for gp120–CD4 (apparent KD decreased from 32 to 12 nM) but has a large 2- to 3-fold increase in gp120–CD4 binding at saturation that cannot be explained by a change in surface expression (Fig. 8C, 8D). It is possible that the pool of CCR5 conformations at the plasma membrane, which is known to be highly diverse (22, 60, 74), has shifted to favor conformations competent for gp120 interaction. Alternatively, substitutions for acidic residues in the extracellular terminus and loops may compensate for incomplete tyrosine sulfonation, again increasing the pool of CCR5 receptors competent for gp120 binding.
Isolation of a CCR5 variant with enhanced HIV-1BaL gp120–CD4 binding. (A) Diversity of the CCR5 combinatorial library is shown at top, where X is any amino acid. The library was sorted twice by FACS for CCR5 surface expression and binding to gp120–CD4, and 35 clones were tested individually. Sequences of the 23 clones found to have elevated binding are aligned, with differences from WT in red. (B) Clone-04 (renamed R5gp120-Hi) matched the consensus sequence and had nine mutations from WT. The mutated sites are shown with blue spheres on the CCR5 structure. Mutations P8L, I9E, I12M, and S17E are in the disordered N terminus. CPK coloring is used for non–carbon atoms. (C) Binding of purified gp120–CD4 (solid line, n = 4) or gp120–CD4 conditioned medium (broken line, n = 6) to cells expressing WT CCR5 (black) or R5gp120-Hi (blue), measured by flow cytometry. Data are mean ± SD. (D) Surface expression levels of R5gp120-Hi relative to WT. Mean ± SD, n = 6.
Differences between CXCL12 and gp120 interactions
Conserved receptor surfaces for gp120BaL–CCR5 and CXCL12–CXCR4 interactions are distinct and only partially overlapping. Whereas CXCL12 binds asymmetrically to one side of the CXCR4 binding cavity, gp120BaL has a more extensive interaction surface with CCR5. This mimics the extensive interactions CCR5 also makes with chemokines (66), suggesting shared molecular binding mechanisms. In particular, we found mutations of cavity residues near the extracellular loop 2–TM5 junction frequently disrupt the CCR5–gp120BaL complex but have little effect on CXCR4–CXCL12 binding. If CXCR4 makes equivalent Env interactions as CCR5, then our results predict there will be CXCR4 mutations that selectively diminish X4-tropic Env binding while maintaining physiological CXCL12 signaling. Such selective mutants may be desirable for genome editing therapies targeting X4 HIV-1 infection in advanced AIDS, while maintaining CXCR4-mediated homing of WBCs to the bone marrow. However, the sensitivity of chemokine receptor mutants toward infection by different HIV-1 strains can vary substantially (75, 76), and it is therefore not at all clear that CXCR4 will share the same mutational tolerance for interacting with X4 Env as we observed for CCR5 binding to gp120BaL.
Our attempts to mutationally scan the CXCR4–gp120 interaction failed, as CD4 fusions of gp120 from X4 HIV-1 strains MN, HXB2, and LAI bound cells nonspecifically and could not compete with CXCL12 for binding to CXCR4-positive cells. Alternative evolution strategies based on virus infection and propagation in culture may be needed. Instead, we investigated the capacity for select CXCR4 mutants to fuse with X4-tropic Env-expressing cells.
Single substitution mutations of CXCR4 focused within the extracellular loop 2–TM5 junction were coexpressed with CD4, and fusion with EnvMN-expressing cells was measured (Fig. 9A). Most substitutions had little impact, and we conclude that important CCR5 residues at the extracellular loop 2–TM5 junction for binding R5-tropic BaL Env are not shared by CXCR4 for interacting with Env from the X4 MN strain. For example, mutations of CCR5-K1915.35 generally disrupted gp120BaL–CD4 binding, suggesting gp120BaL makes a direct contact to this residue. However, mutations to the equivalent CXCR4 position V1965.35 either had little impact or trended toward enhanced membrane fusion, suggesting there is a cavity between gp120MN and CXCR4-V1965.35 that can be filled by alternative, longer side chains making new atomic contacts. Similarly, many mutations of CCR5-P183 diminished gp120BaL–CD4 interactions, yet the equivalent mutations to CXCR4-P191 were not only tolerated for mediating fusion to EnvMN-expressing cells but often enhanced it. One exception was CXCR4-P191F, which was found to reduce syncytia formation by roughly half, whereas maintaining similar surface expression to the WT receptor (Fig. 9A, 9B). However, CXCR4-P191F signaling was also partly reduced (Fig. 9C), and hence this mutant is only partially selective for impaired Env interactions. More extensive screening will be necessary to find fully functional CXCR4 mutants incapable of mediating X4 HIV-1 infection.
CXCR4 mutations near the extracellular loop 2–TM5 junction generally do not decrease syncytia formation with Env-expressing cells. (A) CXCR4 and CD4 were coexpressed in CXCR4-knockout Expi293F cells, which were incubated with cells expressing Env from the X4 strain MN. Syncytia were quantified by flow cytometry (n = 8, mean ± SEM). Compared to the background level of enlarged/fused cells, syncytia formation was strictly dependent on all three proteins. Statistical significance was calculated by Student two-tailed unpaired t test. (B) Based on flow cytometry analysis, surface expression of myc-tagged CXCR4-P191F was comparable to the WT sequence. (n = 18, mean ± SD). (C) Ca2+ mobilization was measured in WT or P191F mutant CXCR4-expressing cells after stimulation with CXCL12. (n = 6, mean ± SEM).
Discussion
Deep mutational scanning of two complex polytopic receptors in human cells was used to map interaction sites and engineer variants with greater ligand binding. Because in vitro evolution focused on discrete activities, such as binding to physiological, nonphysiological, or viral ligands independent of signaling, sequence-activity landscapes were defined that could not have been resolved from the proteins’ natural history.
A challenge of deep mutational scanning is whether it can inform protein mechanism beyond what has previously been discovered from small-scale mutagenesis, multiple sequence alignments, and crystal structures. Although much of our data support known biophysical properties of membrane proteins generally or overlaps with an extensive literature on chemokine receptor biochemistry, new information was also discovered. This includes the finding that CXCL12 binds asymmetrically in the CXCR4 ligand-binding cavity in a way similar to that of a viral antagonistic chemokine, the discovery of mutations within a potential allosteric site in CXCR4 that enhance CXCL12 binding, and the engineering of a CCR5 variant with increased gp120 binding for structural biology purposes, as well as providing direction for future mutational scans for the identification of functional CXCR4 mutants that selectively diminish X4 HIV-1 infection. Hence, even for well-studied proteins, massive mutational scanning can confirm prevailing knowledge while simultaneously inspiring new hypotheses.
There is high variation between biological replicates when comparing enrichment ratios of individual mutations, but variability is substantially reduced when averaged conservation scores are considered (Figs. 1G–I, 6G–I). Despite this noise, the data are consistent with previous mutational studies, and Supplemental Table II compares our data to prior publications. For example, a recent extensive mutational screen of CXCR4 identified 41 important residues for CXCR4 signaling (37). CXCR4-Y1163.32 and A2917.42 were shown to be involved in signal initiation with mutations reducing calcium mobilization (37), and we independently found Y1163.32A and A2917.42F increased CXCL12 binding with loss of signaling. Eleven residues were previously found to be important for CXCL12 contacts (37), and these are all conserved in our data (many of which are highly conserved). An additional 14 residues of CXCR4 were found to be required for signaling but did not contact CXCL12 (37), and in close agreement, only one of these residues was conserved in our data for CXCL12 binding. Expression and binding to 2D7 by CCR5 mutants has also been extensively studied (61, 62, 77–79), with some of the most critical residues for 2D7 interaction previously found to include CCR5-K171, -E172, and -W1905.34, which are all highly conserved in the CCR5–2D7 sequence-activity landscape presented in this study. Previously published CCR5 mutations that reduce HIV-1 infection or gp120 binding (64–66, 80–83) compare favorably with our CCR5–gp120 sequence-activity landscape, despite possible differences between diverse HIV-1 strains.
The sequence-activity landscapes qualitatively define interaction surfaces, similar to other “low resolution” methods for analyzing protein–protein interactions such as cross-linking, small angle x-ray scattering, or protection assays. Each of these methods has their own advantages. For example, cross-linking can define contacting residues on both sides of an interface, whereas small angle x-ray scattering curves can reveal gross structural features for model fitting. Deep mutational scanning has the unique advantage that, in addition to defining functional sites through sequence conservation, novel mutations can be found simultaneously. This may prove especially useful for screening for gain-of-function mutations (such as CCR5 or CXCR4 variants with increased ligand binding, as described in this study). It is, after all, easy to break protein structure and function, but not at all trivial to find mutations that enhance activities. We predict deep mutational scanning will emerge as an invaluable method in membrane protein science.
Disclosures
The authors have no financial conflicts of interest.
Acknowledgments
At the UIUC Roy J. Carver Biotechnology Center, Barbara Pilas and Angela Kouris assisted with flow cytometry, and Alvaro Hernandez and Chris Wright assisted with deep sequencing.
Footnotes
This work was supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award R01AI129719. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
The sequences presented in this article have been submitted to Addgene (https://www.addgene.org/Erik_Procko) under accession numbers 98942–98968, and to the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE100368.
The online version of this article contains supplemental material.
Abbreviations used in this article:
- β2AR
- β2-adrenergic receptor
- BiFC
- bimolecular fluorescence complementation
- CV
- column volume
- GPCR
- G protein–coupled receptor
- mGluR3
- metabotropic glutamate receptor subtype 3
- PBS-BSA
- PBS supplemented with 0.2% BSA
- sfGFP
- superfolder GFP
- SSM
- single-site saturation mutagenesis
- TM
- transmembrane
- UIUC
- University of Illinois Urbana-Champaign
- WT
- wild type.
- Received March 8, 2018.
- Accepted March 30, 2018.
- Copyright © 2018 by The American Association of Immunologists, Inc.