|
|
||||||||
Medical Research Council Immunochemistry Unit, Department of Biochemistry, University of Oxford, South Parks Road, Oxford, United Kingdom
| Abstract |
|---|
|
|
|---|
-responsive transcript, are all derived from this gene. The
completion of the sequence of 1C7 (D6S2570) has revealed that this gene
encodes a putative novel member of the Ig superfamily. A number of
alternatively spliced transcripts of 1C7 were identified by reverse
transcriptase-PCR, all of which are expressed in immune-related cell
lines. Alternative splicing within the Ig domain-encoding region was
seen to result in possible set switching between an IgV domain and an
IgC2 domain. Lastly, a previously unidentified gene, homologous to a
number of V-ATPase G subunits, has been located 1 kb telomeric of
IKBL. | Introduction |
|---|
|
|
|---|
4 megabases
(Mb)3 of DNA in the chromosome
band 6p21.3. Of this, the central
1.1 Mb is termed the class III
region (1). It is becoming increasingly apparent that, in common with
the MHC class I and II regions (for review see 2), the class III
region contains many genes that encode proteins involved in immune and
inflammatory responses (3). These include the C2, C4, and factor B
components of the complement system, members of the 70-kDa heat shock
protein family, the cytokines TNF, lymphotoxin
(LT
) and LTß
(3), and the inflammatory mediator lysophosphatidic acid
acyltransferase (4). Furthermore, genetic studies have indicated that
genes within the MHC contribute to immune-related diseases, such as
insulin-dependent diabetes mellitus (IDDM), rheumatoid arthritis,
ankylosing spondylitis, myasthenia gravis, common variable
immunodeficiency (CVID), and IgA deficiency (5, 6). Strong associations
have been found between these diseases and alleles of genes in the MHC
class II region. However, because linkage disequilibrium extends across
the whole of the MHC, disease susceptibility determinants may also
exist within the MHC class I and class III regions. In addition, a
detailed study using polymorphic microsatellite markers has provided
strong evidence for the involvement of genes in the class III region in
the development of IDDM (R. E. March, unpublished
observations), and a recent linkage study by Schroeder et al.
(7) has localized a disease susceptibility locus for IgA deficiency and
CVID to the telomeric end of the MHC class III region between G1 and
the class I gene HLA-B.
The significance and molecular bases of the MHC-linked disease
associations described in the literature are still unclear, but the
identification and characterization of all coding and regulatory
sequences will be invaluable, and even necessary, for the elucidation
of the role of the MHC in the progression of autoimmune diseases. The
segment of DNA at the telomeric end of the MHC class III region defined
by Schroeder et al. (7) has already been extensively studied (8, 9, 10, 11, 12, 13, 14).
The two most extensive of these studies includes that by Shiina et al.
(13) and Guillaudeux et al. (14). The study by Shiina et al. (13)
involved the complete sequence analysis of 146 kb of DNA between the
IKBL and MICA genes from the HLA haplotype A2, B62, Cw10, DR4, whilst
the recent study by Guillaudeux et al. (14) involved the sequence
analysis of 424 kb of DNA between the TNF gene cluster and a newly
identified gene
20 kb telomeric of 0tf-3 at the centromeric end of
the class I region. However, gaps remained in the sequence data
available from the G1-IKBL region. In this study, to complete the
analysis of this region and to complement the sequence data already
available, in a different haplotype, we have sequenced an
82-kb
segment of DNA encompassing genes between G1 and BAT1.
Transcripts previously mapped to this region included the allograft
inflammatory factor 1 (AIF1), G1 (D6S50E), leukocyte-specific
transcript 1 (LST-1 (B144)) (D6S49E), 1C7, lymphotoxin (LTB),
TNF, lymphotoxin (LTA), NB6, IKBL (NFKBKIL1), and BAT1 (D6S81E).
The AIF1 gene encodes a cytokine-responsive macrophage-specific protein
(15), while the cytokines TNF, LT
, and LTß are involved
in the inflammatory response (for full review see 16). IKBL is a
putative member of the I
B family of proteins that regulate the
NF-
B family of transcription factors (17) and may be involved in
regulating the expression of cytokine genes. BAT1 encodes a putative
nuclear RNA helicase of the DEAD family (18). The products of the G1,
LST-1, and 1C7 genes have not yet had a function or protein family
membership assigned to them. However, they all appear to be expressed
exclusively in immune-related cell lines, suggesting their involvement
in the immune response (19, 20, 21). We now report the positioning of the
first exon of the 1C7 gene, which had not been identified previously
(21) (allocated the name D6S2570E in the Human Gene Nomenclature
Database). The characterization of this first exon has allowed us to
establish that the 1C7 protein is a putative novel member of the Ig
superfamily, and analysis by the RT-PCR has shown that it is expressed
at the RNA level in a number of alternatively spliced forms. We also
report a new gene, 1 kb telomeric of IKBL, that encodes a putative
V-ATPase G subunit (allocated the name ATP6G in the Human Gene
Nomenclature Database). This gene also exhibits alternative splicing
and variation in the length of its 3' untranslated region (UTR).
Finally, we report the characterization of a number of immune-related
transcripts that are encoded within the G1 genomic region and provide
evidence to show that these transcripts result from the alternative
splicing of a single gene.
| Materials and Methods |
|---|
|
|
|---|
The two overlapping cosmid clones covering the G1-BAT1 region, TN62 and TN82 (homozygous for the HLA haplotypes, A2, B7, DR2, C2C, BFS, C4A3, C4BQ0), were sequenced using an M13 shotgun strategy (22) with fluorescent dye primer and dye terminator sequencing chemistries (Amersham, Little Chalfont, U.K.). Cosmid DNA was sonicated and fragments of 0.51 kb were selected for cloning into M13mp18. Recombinant M13 mp18 phage DNA was purified from culture supernatants using a Vistra DNA Labstation and cycle sequenced using ThermoSequenase (Amersham) in a 96-well format on a Hybaid Omnigene thermocycler (95°C for 5 min followed by 20 cycles of 95°C for 30 s; 60°C for 30 s) in the presence of the fluorescent dye-labeled M13 universal primer (5'-TGACCGGCAGCAAAATG-3'). The sequencing reactions were run on an Applied Biosystems 377 automated DNA sequencer (Applied Biosystems, Foster City, CA), and sequence data were analyzed with the ABI377-dedicated software. Individual sequence traces were processed and reassembled using the programs PREGAP and GAP v4.0-ß4 from the Staden suite of software (Medical Research Council Laboratory of Molecular Biology, Cambridge, U.K.).
Ambiguities within the sequence were resolved, and the sequences across areas of single orientation read were confirmed with dye terminator sequencing chemistries, while gaps between contigs were closed by either sequencing the reverse strand of long clones (over 800 nucleotides (nt)) that extended into the gap or by the sequencing of PCR products covering the gaps.
Transcript profiling
The expression of transcripts was investigated by RT-PCR using
total RNA and the Promega reverse transcription system (Promega,
Chilworth Research Centre, Southampton, U.K.) according to the
manufacturers protocol (gene-specific primers can be found in Table I
). The cell lines used were: Raji (B
cell), Jurkat 6 (T cell), Molt 4 (T cell), HL60 (monocyte), U937
(macrophage), HepG2 (hepatocyte), HeLa (epithelial), HT1080
(epithelial), and SW620 (adenocarcinoma). PCR primers were designed to
give products containing more than one exon so that amplification
products arising from genomic DNA contamination were easily
discernible. The first round cDNA synthesis was performed in a final
volume of 20 µl with 1 µg of total RNA; 10 µl of this reaction
mix was used in a 50 µl PCR reaction with the transcript-specific
primers and amplification conditions listed in Table I
. Each
transcript-specific RT-PCR reaction was performed in at least
triplicate to allow for any variation between reactions. Control
amplification reactions with primers derived from ß-actin were
conducted for each first round cDNA synthesis reaction. The identities
of PCR products were confirmed, either by direct dye-terminator
sequencing or, when multiple products were obtained, by cloning the
RT-PCR products into the pGEM-T EasyVector System (Promega) following
the manufacturers protocol, then isolating and sequencing the cloned
DNA.
|
The Wisconsin Package Version 9-UNIX (Genetics Computer Group), maintained at the University of Oxford Molecular Biology Data Centre, was used for the majority of the sequence analysis and database interrogation. The DNA sequence generated was screened against the EMBL, SwissProt, PDB, EMBL-EST, and TIGR-EST (search@hcd.tigr.org) databases to position known genes and identify possible new coding regions. Repetitive elements were identified with the aid of the RepeatMasker server (A.F.A. Smit and P. Green, RepeatMasker at http://ftp.genome.washington.edu/RH/RepeatMasker.html), and potential coding regions were defined using the NIX exon prediction program (http://www.hgmp.mrc.ac.uk/Registered/Webapp/nix/) from the Human Genome Mapping Project Resource Centre (Hinxton, U.K.). Predictions of protein secondary structure, solvent accessibility, and transmembrane regions were conducted using the JPred consensus secondary structure prediction server (http://circinus.ebi.ac.uk:8081/) or PredictProtein program (phd@EMBL-Heidelberg.de). The GCG program SIGCLEAVE and the SMART (Simple Modular Architecture Research Tool) server (http://coot.embl-heidelberg.de/SMART/) were used to identify leader peptides. Sequence motifs and protein domains were identified using a combination of the GCG program MOTIF, the Prosite Profilescan server (http://µlrec3.unil.ch/software/PFSCAN_form.html), and the SMART server. Multiple alignments of amino acid sequences were performed using the Clustalx software (National Center for Biotechnology Information, Bethesda, MD), making use of protein structure information from sequences within the PDB database wherever possible. Alignments were hand-edited using the GCG9 SeqLab multiple alignment editor.
| Results |
|---|
|
|
|---|
The complete nucleotide sequences of cosmids TN62 and TN82 were
determined from a combined total of 1757 templates. A single contig of
81,800 nt in length was obtained with an overlap of 5,341 nt between
the two cosmids and an average depth of
9.6 reads per nucleotide
sequenced. There were no discrepancies between the sequences generated
for the two cosmids across the region of overlap. The complete genomic
DNA sequences of cosmids TN62 and TN82 have been deposited in the EMBL
database under the accession number HSY14768. Exon positions and other
sequence features reported here are included in this database entry.
The precise location and genomic structures of the nine known genes in
the region analyzed have been determined and the order of these genes
has been shown to be G1/AIF1, 1C7, LST-1 (B144), LTB, TNF,
LTA, NB6, IKBL, BAT1, centromere to telomere (Fig. 1
). The G1 (19) and AIF1 (23) transcripts
were found to be derived from the same genomic region, while the 1C7
gene has been shown to lie on the opposite side of LST-1 (B144) to that
previously published (21). Approximately 52 kb of the sequence reported
here had previously been deposited in the EMBL database (accession nos.
U00921, L11016, Z15026, X02910, U42625, X59350, X02911, M55913, X59351,
Z15027, AC004181, and AB000876). The database entries AC004181 and
AB000876 contain the sequence data generated by Guillaudeux et al. (14)
and Shiina et al. (13), respectively. Both of these sequences overlap
the telomeric end of our contig with 99.8% identity, AB000876 over
24,702 nt and AC004181 over 31,438 nt. Further investigation will
determine whether any of the nucleotide differences between the
generated sequences are haplotype specific.
|
The G1 genomic region.
A total of 2 kb of sequence encompassing the G1 gene was screened
against the DNA sequence databases. Two different human, four different
rat, and one pig EMBL cDNA database entries, as well as 13 human, one
mouse, and two fish expressed sequence tags (ESTs), aligned over
this genomic region with significant similarity. The two human EMBL
entries were the AIF1 gene (accession no. U19713) (23) and the
IFN-
-responsive transcript (IRT-1) (accession no. U95213). The 13
human EST entries (accession nos. W58116, W67117, R71716, N47817,
W67362, T79488, T69387, N32593, N32605, W67118, W21034, AA091585, and
THC167143(TIGR)) fell into two groups: ESTs W21034, AA091585, and
THC167143(TIGR) matched the G1 cDNA (19) exactly, while the remaining
10 ESTs were all found to be partial transcripts of AIF1. The G1, AIF1,
and IRT-1 cDNA sequences are 500, 639, and 1235 nt in length and encode
polypeptides of 93, 147, and 132 amino acids, respectively (19, 23, and
accession no. U95213). Fig. 2
shows the
genomic organization of G1, AIF1, and IRT-1 plus the exon usage of the
three transcripts, which all appear to be splice variants of the same
gene.
|
Encoded within sequence from exons 4 and 5 of AIF1, which is shared by
all the alternative transcripts detected, is a putative EF hand
calcium-binding motif. This motif was previously reported by Olavesen
et al. (19) and Utans et al. (23) for G1 and AIF1, respectively, and
was also detected with the Prosite Profilescan server. The splicing of
the IRT-1 transcript causes an insertion of 66 amino acids into the
conserved loop of the EF hand domain. A multiple alignment of the
splice variants of human AIF1 and the orthologous proteins identified
in other species is presented in Fig. 3
.
|
1C7 gene.
In addition to the published 1C7 cDNA sequence (21), two I.M.A.G.E.
consortium cDNA clones, 685808 (EST AA262074), and 683963 (ESTs
AA237100 and AA236886), both of B cell origin, were found to align
within the genomic region immediately centromeric of the LST-1 (B144)
gene. Sequence analysis showed that these clones contained inserts of
1120 nt (683963) and 721 nt (685808), respectively, and both aligned
with the genomic sequence over 4 exons (Fig. 4
A). Both clones extend the
published 1C7 cDNA sequence (21, 26) by 306 nt at the 5' end and
contain an additional 1C7 exon. This additional coding sequence lies
2.8 kb upstream of that previously published and contains an in-frame
AUG codon preceded by in-frame stop codons. Clone 683963 also extends
the 1C7 sequence by 141 nt at the 3' end to include a polyadenylation
signal at position 30,678 and a polyadenylation site at position
30,711. This 1C7 polyadenylation signal lies only 47 nt centromeric of
the polyadenylation signal for LST-1, the two genes being transcribed
in opposite directions. The three transcripts, derived from the 1C7
gene, have been named 1C7a (clone 683963), 1C7b (the partial 1C7
sequence already published; 21), and 1C7c (clone 685808). These
all differ in their last exon (Fig. 4
, exons 4I, 4II, and 4III). The
1C7a and 1C7b transcripts, although sharing their last 276 nt, differ
in the 5' splice site of their last exon, with the fourth exon of 1C7b
(exon 4II) extending into intron 3 of 1C7a by 55 nt. In contrast, the
fourth exon of 1C7c (exon 4I) lies entirely within intron 3 of 1C7a and
1C7b and has a separate stop codon at position 30,298, a
polyadenylation signal at position 30,319, and a polyadenylation site
at position 30,337 (Fig. 4
).
|
Apart from the variation in exon 4 usage described above, RT-PCR
identified 1C7 transcripts where exon 2 is divided into two exons
(exons 2I and 2II in Fig. 4
A). In the smaller PCR products
obtained (Fig. 4
B, bands 5, 6, and
9), only the 3' segment of exon 2 (2II) is spliced in. These
transcripts encode a truncated polypeptide of only 15 amino acids, due
to the incorporation of a stop codon as a result of the splicing of
exon 1 to exon 2II, and are probably nonfunctional. In the RT-PCR
products 3, 4, and 8 (Fig. 4
B), both exons 2I and 2II are
used, resulting in a 75-nt deletion in the center of exon 2. Despite
this deletion, the open reading frame is retained, resulting in a
transcript 75 nt shorter and an encoded polypeptide 25 amino acids
shorter than in the corresponding species using the complete exon 2.
This variation in usage of exon 2 results in three novel splice
variants: 1C7d, 1C7e, and 1C7f (Fig. 4
B). Therefore, the six
detected transcripts of 1C7 that contain an open reading frame of
significant length comprise either four or five exons, including one of
three alternatively spliced last exons, and either a single exon 2 or
two exons from the exon 2 region. All exons are in phase 1 (27) at
their 3' end except where exon 2 is split into two, in which case the
boundary between exons 2I and 2II is of phase 0 (Fig. 5
B). The transcripts 1C7a to
1C7f encode six putative protein isoforms of 201, 178, 190, 176, 153,
and 165 amino acids, respectively (Fig. 5
A).
|
-chain
variable domain (accession no. M27351) for isoforms 1C7a, b, and c
(33% identity and 45% similarity over 117 amino acids) and with an
IgC2 domain from Perlecan (accession no. Q05793) for the truncated
isoforms 1C7d, e, and f (35% identity and 46% similarity over 77
amino acids). Structure based multiple sequence alignments of the
putative Ig domain from 1C7a, b, and c with other Ig variable domains
and of the putative Ig domain from 1C7d, e, and f with other IgC2
domains are shown in Fig. 6
|
LST-1 gene. In addition to the published LST-1 cDNA sequence, 13 single entry ESTs were found to match with the LST-1 genomic region. These were compiled into seven distinctly different putative transcripts comprising a minimum of nine exons in total (full details are presented and discussed in Neville and Campbell, 31).
IKBL. The 1.6kb IKBL transcript (accession no. X77909) encodes a polypeptide of 381 amino acids. The structure of exons 1 and 2 has been previously reported (17), but the remaining genomic structure had not been characterized. Comparison of the IKBL cDNA sequence with the genomic sequence determined here showed that the IKBL gene spans 11,176 nt and comprises four exons of 807, 201, 276, and 125 nt in length. Exons 2 and 3 are separated by a particularly large intron (intron 2) of 9189 nt in length, which contains 15 Alu and 4 mammalian-wide interspersed repeat sine repeat elements. There is one conflict in the coding region of exon 4 between the cDNA database entry and the genomic sequence reported here. A change from CG in the database entry to GC in the genomic sequence (positions 61,431 and 61,432) results in two adjacent codon changes from CACGAC to CAGCAG and two corresponding amino acid changes: His-238-Gln and Glu-239-Gln.
New potential coding regions
An ATPase G subunit homolog.
The program GRAIL predicted the presence of 3 exons
1 kb telomeric
of the IKBL gene. The I.M.A.G.E. cDNA clone 726424 (ESTs AA401769 and
AA399356) and the American Type Culture Collection cDNA clone 124837
(EST AA324358; Manassas, VA) were found to align with this
genomic region and when sequenced were also found to span three exons.
However, the clones were found to have different 3' splice sites for
exon 1 and to use different polyadenylation signals (Fig. 7
, A and B). Clones
726424 and 124837 contained inserts of 663 and 1320 nt, encoding
polypeptides of 77 and 118 amino acids, respectively (Fig. 8
). The longer protein shows significant
sequence similarity with the vacuolar-ATPase G subunit of Bos
taurus (82%), Manduca sexta (75%),
Caenorhabditis elegans (78%), Neurospora crassa
(60%), and Saccharomyces cerevisiae (54%) over the
entire lengths of these proteins. Therefore, the novel gene described
here has been named ATP6G; the 6 denoting a vacuolar-type
H+ ATPase subunit following the GDB nomenclature and the G
denoting a G subunit. The level of conservation between human ATP6G and
the orthologous proteins in other species is particularly high over the
first 50 amino acids (Fig. 8
). The truncated human protein (ATP6Galt)
encoded by the splice variant represented by cDNA clone 726424 lacks
the first 41 of these residues because the putative initiation AUG
codon encoded in clone 124837 is spliced out of clone 726424. The
second AUG, and putative initiation AUG of clone 726424, is located in
exon 2 (Fig. 7
, A and B).
|
|
Further potential coding regions. ESTs from cDNA clones 120912 (American Type Culture Collection) and 1143343 (I.M.A.G.E.) were found to align around the NB6 genomic region; these ESTs were sequenced. Both contained Alu repeats and were, therefore, considered to be derived from genomic DNA contamination.
| Discussion |
|---|
|
|
|---|
The comparison of our genomic sequence from the G1 region with the
DNA databases indicates that three alternative human transcripts may
exist, derived from differential splicing of a total of nine exons. Of
these, AIF1 contains six exons, while G1 contains an alternatively
spliced form of exon 4 together with exons 5 and 6 only. The distinctly
different patterns of expression shown here for G1 and AIF1, together
with the fact that three ESTs have been identified that correspond
exactly to G1, indicate that G1 and AIF1 are genuine splice variants of
the same gene, rather than G1 being an incomplete cDNA. AIF1 is an
IFN-
-inducible molecule expressed within cells of the monocyte
lineage, including CD86+ macrophages (23) and dendritic and
microglial cells (24). AIF1 has been associated with processes
involved in chronic allograft rejection in both humans and rat (23). It
has also been shown to both inhibit and enhance insulin secretion and
has been found at high levels in macrophages isolated from prediabetic
rat pancreatic extracts, but not in normal pancreatic extracts (24).
Thus an involvement of AIF1 in the progression of IDDM has been
suggested. Furthermore, the BART-1 splice variant from rat is
selectively and transiently expressed in response to vascular trauma
and has been suggested to play a role in the early to middle stages of
vascular restenosis (25).
A number of different features have been identified within the encoded amino acid sequence of AIF1 that may help to define the biological activity of this protein. Firstly, a 44-amino acid segment shared by all the splice variants and homologues of AIF1 contains a cluster of paired basic residues, KR-KK-(G)KR (24), which are characteristic cleavage motifs for peptide hormone precursors (32). The last potential cleavage site is preceded by a glycine residue, which is a characteristic amidation signal (24), and is also a feature of hormone precursor proteins. Interestingly, the IRT-1 splice variant contains a further five paired basic residues (KR-KK-(G)RR-RK-RR) encoded within exon 4III, with the middle site being another possible amidation signal.
Residues 48108 of AIF1 show strong similarity to an EF hand calcium
binding domain (15, 19, 33), which consists of two helix-loop-helix
subdomains each containing an EF hand motif across their loop region
(34). The calcium-binding domain in AIF1, like many such domains,
contains one highly conserved EF hand motif (residues 5879) and one
degenerate or "ancestral" EF hand motif (residues 91108) that is
no longer able to bind calcium (Fig. 3
). The conserved EF hand motif of
AIF1 (shaded in Fig. 3
) deviates from the consensus for this motif at
its twelfth (-Z) position, which is almost exclusively an acidic
residue, but in AIF1 is a serine. This does not necessarily rule out
the binding of calcium by AIF1, as there are a number of examples of
functional EF hand domains that deviate from the consensus at this
twelfth residue (35, 36, 37). However, the EF hand motif may not be
functional in IRT-1 and G1. Although they contain the conserved EF hand
loop, they are truncated at their N-termini, resulting in the absence
of most of the first helix of this EF hand subdomain, which may affect
calcium binding. More significantly though, a 66-amino acid insertion
within the IRT-1 splice variant falls within the conserved loop of the
EF hand (Fig. 3
). Also, if the dibasic endoprotease cleavage motifs are
active, the whole EF hand domain of IRT-1 would be disrupted.
Work undertaken so far suggests that the products of the alternative
transcripts of AIF1 are involved in the inflammatory response and in a
number of disease processes. In addition, the high degree of sequence
conservation at the amino acid level between the AIF1 proteins in
species as divergent as human and fish (Fig. 3
) suggests that AIF1
plays a functionally important role in the cell. Taken together, these
two observations highlight AIF1 as a possible candidate disease
susceptibility gene for further study. Future work will no doubt shed
light on the significance of the prohormone cleavage motifs and the
calcium binding ability of the EF hand motif in the different splice
variants.
1C7, a putative novel member of the Ig superfamily
All members of the Ig superfamily share a common structure,
comprising two ß-sheets with a core consisting of ß-strands A, B,
and E in one sheet and G, F, and C in the other (38). The V, C1, C2 and
I Ig sets are distinguished on the basis of other features (38). Harpaz
and Chothia (28) defined a group of 46 residues occupying 20 key sites
that form the characteristic folds of the IgV set, which they called
the "V frame." Alignment of the 1C7a, b, and c Ig-like domain with
other IgV domains (Fig. 6
A) highlights the conservation of
all the key "V frame" residues in 1C7 (indicated with an asterisk).
This alignment was based on a secondary structure prediction that
defined the approximate position of all the ß-strands in 1C7. In
addition, the Ig-like domain is encoded by an exon with a phase 1
splice junction at either end (Fig. 5
B), another feature
common to Ig domains. This provides convincing evidence that 1C7 is a
new member of the Ig superfamily. The expression pattern of this gene
further suggests its involvement in the immune response (Fig. 4
B). Of particular interest is the existence of the
transcripts 1C7d, 1C7e, and 1C7f (Fig. 4
B), where the
Ig-like domain is split over two exons that together encode a truncated
form of the Ig domain, containing features characteristic of the IgC2
set. It retains the core ß-strands characteristic of all Ig domains
(38) and the conserved EF loop characteristic of the IgV set, but lacks
the central region of the domain containing ß-strand D and most of
ß-strand C' (Fig. 6
B). These are all defining features of
the IgC2 set (39). As mentioned above, one of the features of Ig
domains is that the majority are encoded by a single exon. However,
there are exceptions to this rule, and examples of Ig domains that are
divided over two exons include CD4, Po, PolyIgR, and NCAM (38).
Significantly, in nearly all cases where an intron exists within an Ig
domain the splice junction is in a phase other than phase 1, as is the
case with 1C7d, e, and f (Fig. 5
B). This would preclude the
functional splicing of just half the domain within the same gene.
The mature 1C7 peptide is predicted to be a type I integral membrane
protein with no cysteine residues N-terminal of its membrane spanning
region, other than the two within its single Ig-like domain. This
suggests that the 1C7 protein exists as a monomeric membrane bound
molecule (Fig. 5
C). Other Ig superfamily members with a
similar structure that exist as monomers include THY-1, CD83, CD7, and
CD79a (40). 1C7 also exhibits a complex pattern of alternative
splicing. However, transcripts 5, 6, and 9 (Fig. 4
B) are
unlikely to encode a functional protein, and further work is needed to
assess whether all of the other 1C7 splice variants are functional.
Exon 4III in transcripts 1C7a and 1C7e encodes a number of potential
SH3 domain binding motifs. This motif is thought to interact with SH3
domain-containing proteins during tyrosine kinase receptor activation
(30) as part of a signal transduction cascade. Thus, at least this
variant of exon 4 contains features suggestive that it is associated
with the expression of a functional protein.
Clearly, further work is needed to confirm the expression of the various isoforms of 1C7 at the protein level and to resolve their tertiary structures as proof of their Ig superfamily membership. But, if both the alternative Ig-like domains of 1C7 do turn out to be functional, then this would appear to be the first identified example of a protein undergoing Ig set switching. This would provide a valuable insight into the evolution of the various sets within the Ig superfamily, particularly the C2 and I sets that have many of the "V frame" features. Finally, the identification of 1C7 as a putative novel member of the Ig superfamily, in a region where genetic studies have suggested the existence of disease susceptibility loci for conditions such as IDDM and CVID (7) (R. E. March, unpublished observations) highlights this as a potentially important candidate disease susceptibility gene, in relation to both autoimmune disorders and immunodeficiencies.
A putative human homolog of the tobacco hornworm V-ATPase G subunit
The analysis of the
7-kb gap between IKBL and BAT1 has lead to
the identification of a new gene that encodes a protein homologous to
V-ATPase G subunits. The vacuolar-type ATPases are H+
translocating ATPases found in most organelles and are involved in a
broad range of functions including bone reabsorption; glycosylation in
the Golgi; degradation of cellular debris in lysosomes; and the
processing of endocytosed receptor-ligand complexes (41). The G subunit
has so far been characterized in tobacco hornworm, rat, chicken, and
cow and is one of two known peripheral components of the V1
catalytic ATPase complex involved in the catalysis of ATP hydrolysis
(41, 42), the second peripheral subunit being the H subunit. Both these
subunits have been shown to participate in ATPase activity, but rather
than acting together it seems that either subunit alone can initiate
ATP hydrolysis (41). The G and H subunits appear to be expressed in
different tissue types and are 64% identical at the amino acid level
(41). Thus, it is likely that they confer the same activity, but in
different locations. Therefore, the absence of RT-PCR products
corresponding to the longer form of the ATP6G protein in the monocyte
and macrophage cell lines (Fig. 7
B, panel I) may
suggest that in these cell lines the H rather than the G subunit is
constitutively expressed. The level of amino acid sequence conservation
between the G and H subunits and between the G subunits of different
species is particularly high over the first 50 amino acids (see Fig. 8
), suggesting that this region is an important functional domain. This
would preclude the truncated isoform of human ATP6G from being
functional.
A role in the inflammatory and immune responses has been suggested for V-ATPases (43, 44), and IL-1 has been shown to modulate ATPase activity in a dose-dependent manner (43). This is of particular interest in view of the location of ATP6G in the MHC class III region.
It is becoming increasingly apparent from information already available on genes such as LTB, TNF, and LTA, and from the results described here, that many of the genes located at the telomeric end of the class III region are involved in the immune and/or inflammatory responses and are thus good candidate genes for susceptibility to diseases such as IDDM and CVID. The evidence generated from screening the EST databases and results of RT-PCR has also highlighted the complexity of this region, with many of the genes having alternatively spliced forms (AIF1, 1C7, LST-1, and possibly ATP6G). Finally, the class III region remains the most gene dense region of the human genome with on average 1 gene per 10 kb of DNA. Indeed, the gene density of the 82-kb region discussed here is now 1 gene per 8 kb of DNA.
| Acknowledgments |
|---|
| Footnotes |
|---|
2 Address correspondence and reprint requests to Dr. R. Duncan Campbell, U.K. Human Genome Mapping Project Resource Centre, Hinxton, Cambridge CB10 1SB, U.K. E-mail address: ![]()
3 Abbreviations used in this paper: Mb, megabases; LT, lymphotoxin; IDDM, insulin-dependent diabetes mellitus; CVID, common variable immunodeficiency; AIF, allograft inflammatory factor; LST, leukocyte-specific transcript; UTR, untranslated region; EST, expressed sequence tag; IRT, IFN-
-responsive transcript; BART, balloon angioplasty response transcript; nt, nucleotide(s); SH3, Src homology 3. ![]()
Received for publication October 13, 1998. Accepted for publication January 25, 1999.
| References |
|---|
|
|
|---|
B family within a 90 kilobase HLA class III segment. Nat. Genet. 3:137.[Medline]
B family of proteins. Hum. Mol. Genet. 3:793.This article has been cited by other articles:
![]() |
D. Meyer, S. Seth, J. Albrecht, M. K. Maier, L. d. Pasquier, I. Ravens, L. Dreyer, R. Burger, M. Gramatzki, R. Schwinzer, et al. CD96 Interaction with CD155 via Its First Ig-like Domain Is Modulated by Alternative Splicing or Mutations in Distal Ig-like Domains J. Biol. Chem., January 23, 2009; 284(4): 2235 - 2244. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Greetham, C. D. Ellis, D. Mewar, U. Fearon, S. N. a. Ultaigh, D. J. Veale, F. Guesdon, and A. G. Wilson Functional characterization of NF-{kappa}B inhibitor-like protein 1 (NF{kappa}BIL1), a candidate susceptibility gene for rheumatoid arthritis Hum. Mol. Genet., December 15, 2007; 16(24): 3027 - 3036. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hollyoake, R. D. Campbell, and B. Aguado NKp30 (NCR3) is a Pseudogene in 12 Inbred and Wild Mouse Strains, but an Expressed Gene in Mus caroli Mol. Biol. Evol., August 1, 2005; 22(8): 1661 - 1672. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. Wagner, K. E. Finberg, S. Breton, V. Marshansky, D. Brown, and J. P. Geibel Renal Vacuolar H+-ATPase Physiol Rev, October 1, 2004; 84(4): 1263 - 1314. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Boudinot, S. Boubekeur, and A. Benmansour Primary Structure and Complementarity-Determining Region (CDR) 3 Spectratyping of Rainbow Trout TCR{beta} Transcripts Identify Ten V{beta} Families with V{beta}6 Displaying Unusual CDR2 and Differently Spliced Forms J. Immunol., December 1, 2002; 169(11): 6244 - 6252. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Murata, G.-H. Sun-Wada, T. Yoshimizu, A. Yamamoto, Y. Wada, and M. Futai Differential Localization of the Vacuolar H+ Pump with G Subunit Isoforms (G1 and G2) in Mouse Neurons J. Biol. Chem., September 20, 2002; 277(39): 36296 - 36303. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Pende, S. Parolini, A. Pessino, S. Sivori, R. Augugliaro, L. Morelli, E. Marcenaro, L. Accame, A. Malaspina, R. Biassoni, et al. Identification and Molecular Characterization of NKp30, a Novel Triggering Receptor Involved in Natural Cytotoxicity Mediated by Human Natural Killer Cells J. Exp. Med., November 15, 1999; 190(10): 1505 - 1516. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |