(updated March 2022): [The sequences of all T4 alleles are provided in a pdf file.]
ALLELES WITHIN THE THE DF1 REGION OF THE 18S rRNS GENE OF SEQUENCE TYPE T4 .
During a study of Acanthamoeba isolates found in water samples in Hong Kong (Booton et al. J. Clin. Micro. 2002), we defined the set of alleles observed within one of the regions of the 18S rRNA sequences (the ASA.S1 subsequence). This was an attempt to better describe the degree of variation that existed in a populations sample of isolates of Acanthamoeba. These alleles represented variation in the primary nucleotide sequence within one of the variable regions of segment DF3 (stem 29-1 of the Acanthamoeba 18S rRNA).
As of March 2022, we have identified 172 alleles that had been reported for more than one Acanthamoeba isolate within sequence type T4.
We initially reported 10 different alleles of stem 29-1 of the Acanthamoeba 18S rRNA within the set of samples from Hong Kong. A former student of our group (Dolena Ledee) added 11 T4 alleles in a subsequent study of Acanthamoeba keratitis patients (Ledee et al., 2009, J. Clin. Micro). In this latter paper, the description of the sequences denoting an allele neglected the last 5 nucleotides of most alleles; we have added back those nucleotides in the table below. At the end of 2009, alleles had thus been numbered as T4/1 to T4/21. Note also that one allele reported by Ledee et al. 2009 (allele T4/15) occurred in a single isolate, and has not been observed in any other isolate in the DNA databases.
After the first two papers describing alleles, the numbering system became confusing. Four different research groups added alleles to the list, but added them independently, with three of the reports using overlapping label numbers. Abe and Kimata (Jpn J Infect Dis 2010) added two alleles (here referred to as AKT4/22 and AKT4/23). Zhao et al. (J Med Micro 2011) added 7 alleles (here labelled ZT4/22 – ZT4/28). 2013 was a busy year for the description of alleles. Magnet et al. (Water Res 2013) added 4 alleles (here labeled MT4/22 – MT4/25). Duarte et al. added a single allele (Experimental Parasitology, 2013). Finally, Risler et al. (Parasit Res 2013) added 4 alleles that they labeled T4/31 – T4/34 (here labeled RT4/31 – RT4/34). Note, again, that some of the alleles described by these groups appeared in only a single isolate, and have not been observed in any subsequent study. These alleles are alleles ZT4/26, ZT4/27, ZT4/28 and RT4/32, while allele MT4/24 does not correspond to any isolate that has ever been deposited into the DNA databases.
The table below summarizes the results from the seven published reports. The 38 different alleles within sequence type T4 that have been reported in the literature are listed (including those that occurred in only a single isolate), as well as the DNA sequences that characterize these alleles, standardizing the alleles to include the last nucleotide presented in Booton et al. (2002). The subtypes within T4 that we have described elsewhere on this website (T4A – T4F or T4neff) into which an isolate carrying an allele is assigned is also given, as well as an arbitrary within-subtype designation that we have used in some of our internal analyses.
The subtypes listed correspond to the subtypes of T4 presented in the previous page of the website on 18S rRNA sequences. The arbitrary within-subtype definition is an indication of further variability even within subtypes of T4. However, this lower designation of variation is at a level that is not sufficient to clearly identify statistically significant phylogenetically monophyletic groups.Nevertheless, it appears that specific alleles are not shared across different subtypes of T4.
Arbitrarily, the order of the alleles from top to bottom of the table roughly represents subtypes from T4A backwards towards the root of sequence type T4 on the phylogenetic tree of Acanthamoeba. (As mentioned, the legend in parentheses represent a within-lab label to further sub-classify the sequences within a subtype within T4, but which do not rise to the level of phylogenetically significant units) [A pdf of the table with the sequences listed in order T4/01 to T4/21; AKT4 alleles; ZTT4 alleles; MT4 alleles and RT4 alleles is also provided through the link below the table].
Several of the alleles listed in the table above differ from one another by only a single nucleotide difference. Often this involves a transitional difference (T compared to C or G compared to A). In our experience observing electropherograms of Acanthamoeba 18S rRNA sequences, a mixture of such transitional differences can often be observed within the same sequencing run, suggesting the occurrence of two different, but very closely related alleles within a single Acanthamoeba cell. Careful analysis of cloned DNA from Acanthamoeba has shown this to sometimes be the case. It is not known how many copies of the 18S rRNA gene exist within the Acanthamoeba genome. This gene usually exists as multiple copies within the cell, often in tandem arrays. These tandem copies are thought to be subjected to sequence homogenization through a process termed “concerted evolution” (Brown, Wesnick and Jordan, 1972; Zimmer et al., 1980). Sequence homogenization within a cell may be ongoing, and some minor sequence variation would be expected. The overall level of allele variability in the 18S rRNA gene between Acanthamoeba isolates is much greater.
FREQUENCY OF PRE-2014 ALLELES (updated July 2021)
Analysis of the distribution of the alleles of T4 within the DNA database shows that there is great heterogeneity in the frequency of occurrence of alleles. The table below show the number of occurrences of each of the 38 “original” described alleles within the total of T4 type sequences (~4300), including those deposited in the DNA databases, undeposited sequences from our lab (to be deposited soon), or other undeposited sequences from other investigators who have generously provided information on unpublished and undeposited Acanthamoeba 18S rRNA sequences.
The number of observed isolates with typed alleles is lower than the total number of T4 isolates for several reasons. Some isolates are not included because the sequences do not overlap the region from which alleles are typed. Others have ambiguous nucleotide reads in the region. Finally, some have “unique” allele sequences, i.e. “alleles” that occur only once in the database. (data updated as of March, 2022).
(*) allele MT4/24 does not occur in the DNA database. The sequence on which this allele was based has not been deposited by Magnet et al.
The table of described alleles given above indicates, as previously mentioned, that a number of the alleles that were described prior to 2014 occur only once or a few times within the Acanthamoeba DNA database, which now includes more than 4000 T4 sequences among deposited sequences and ~300 additional sequences of 18S rRNA genes of Acanthamoeba that have been recently completed by our lab or have been generously provided to us by collaborators who had determined the sequence of an isolate but who did not expect to include the sequence in any publication. These “old” alleles represent 1973 T4 isolates of Acanthamoeba. (The term “old” refers to when an allele was defined in the literature, not to the evolutionary age of an allele).
Only five alleles occur more than 100 times, and five others occur 75 or more times (note: see below for the occurrence of additional alleles that were not previously described in the literature, but that have been observed more than 100 times). Twelve other alleles occur more than 20 times in the data (as well as a number of other “new” alleles, below).
As mentioned, there are a total of ~4300 isolates that can be classified as T4. Only 1973 sequences within the T4 sequence type in the augmented DNA databases can be classified into the list of initial alleles. This represents about 45% of the T4 sequences that have been deposited (or contributed to our compilation) by March 2022.
“NEW” ALLELES – ADDITIONS AFTER 2013
An examination of the Acanthamoeba 18S rRNA sequences in the DNA databases that would not be included in the list of alleles above shows that there are many additional alleles with multiple occurrences that have not been formally described in the literature.
Care must be taken in concluding that a comprehensive list of alleles can easily be provided. Some sequences not included in the current list of alleles, or some alleles that may appear only once in the extended list of “all” alleles have attributes that suggest that they may contain sequencing errors that might cause them to be described (erroneously) as a unique allele.
Nevertheless, by mid 2022, 134 additional alleles had been identified that have been observed in more than a single isolate of Acanthamoeba. Below, we provide information on the frequency of occurrence of these additional alleles that were observed multiple times in the DNA database as of March 2022. We will continue to augment this analysis of the DNA alignments from the databases periodically as additional sequences are reported that would link multiple isolates. One update of information with more details was provided in our report from the 2019 FLAM meeting (Fuerst & Booton, 2020).
ADDITIONAL “NEW” T4 ALLELES (updated March 2022)
As mentioned, examination of the DNA databases shows that many of the T4 sequences that have been deposited do not fall into any of the allele classes that have been reported formally in the literature. We began with an analysis of the 18S rRNA gene sequence database for Acanthamoeba, emphasizing almost complete sequences as a starting point, to identify additional alleles whose sequences occur more than once in the databases. New alleles that are identified are labeled as OSUT4/#. Thirty-two additional alleles were identified by the end of 2014 (OSUT4/39 – OSUT4/70). An additional 12 alleles were identified in 2015 (OSUT4/71 – OSUT4/82), 8 additional alleles in 2016 (OSUT4/83 – OSUT4/90), 23 new alleles in 2017 (OSUT4/91 – OSUT4/113), 12 new alleles in 2018 (OSUT4/114 – OSUT4/125), 16 alleles during 2019 (OSUT4/126 – OSUT4/141), 20 alleles during 2020 (OSUT4/142 – OSUT4/161), and 11 alleles in 2021 (OSUT4/162 – OSUT4/172). This has resulted in 173 alleles being defined, of which 166 alleles occur more than once.
When we examine the almost complete 18S rRNA sequences in the DNA databases, more than 30 other alleles have been identified that had been seen only once in the entire database. These have not been included in our list of alleles. We will continue to monitor these and other partial 18S rRNA gene sequences to see whether they are identical to any new isolates that are reported.
The sequences of the 134 alleles identified since 2014 are provided in a pdf file (OSUT4 alleles). [Note: this file also contains sequences of the original pre-2014 alleles].
FREQUENCY OF “NEW” ALLELES (updated March 2022)
Among the additional alleles that have now been identified, but had not been previously described formally in the literature, are ones that characterize several of the most important Acanthamoeba forms. These include alleles for (i) the original type isolate for both the genus Acanthamoeba and the species A. castellanii (ATCC 30011), and for (ii) the widely used Neff strain of Acanthamoeba (ATCC 30010).
The allele that corresponds to the original Acanthamoeba castellanii isolate (ATCC 30011) is allele OSU T4/39, which is the second most frequent allele in the DNA databases, occurring 167 times (see table below), even though it was not among those alleles observed in any of the studies that originally described allele types.
The allele that is carried in the Neff strain of Acanthamoeba castellanii (ATCC 30010) is allele OSU T4/48. This allele had been observed in 123 entries in the augmented DNA databases.
A third new allele has also been observed frequently in the database. This is allele OSU T4/56, which corresponds to the strain ATCC 30871, A. polyphaga Page 23. This has been observed 96 times in the databases.
There were 1339 isolates that were found to carry one of the 134 newly defined alleles seen more than once. Twenty-eight of the alleles occur in ten or more isolates found in the DNA databases. The frequency of occurrence of the “NEW” alleles that appear in more than one isolate in the database is given below :
In addition to the alleles that are summarized in the tables above, we have also identified 22 other alleles among the almost complete 18S rRNA sequences in the DNA databases that have been seen only once. These have not been included in our list of alleles. As new deposits into the DNA databases occur, we will continue to monitor the singleton alleles from “almost complete” sequences, as well as singleton alleles from other partial 18S rRNA gene sequences to determine whether they identify additional shared alleles.
As of July 2021, there were still more than 1000 T4 isolate sequences in the DNA databases for which information about allele type is not included in the tables above. This represented about 24% of reported T4 sequences. There are several reasons why these sequences are not included in our tally of alleles.
First, a partial sequence may not completely overlap the region of the gene used to define alleles. There were sequences from over 630 isolates that did not completely overlap the genetic region that corresponds to the sequences used to identify alleles, and thus cannot be classified for allele.
The remaining ~550 sequences represent isolates having sequences that completely overlapped the region represented by the alleles discussed here, but for which the reported sequence did not match any of the alleles (including unique alleles among “almost complete” sequences) that had been cataloged through January 2020. These isolates subsequently can be separated into three groups.
One group of isolates (# = 104) have sequences that included nucleotide ambiguities at one or more sites within the region that would have resulted in a mismatch when compared to cataloged alleles. We do not try to place alleles with ambiguous nucleotides into an allele class. There are about 100 isolates with nucleotide ambiguities in the allele region.
A second group of isolates (# = 128) have sequences that are suggestive of either sequencing error, or which may be mixtures of more than one allele. We would consider placing these into an allele class if evidence of a similar isolate were to occur. In some cases, examination of an electropherogram from a sequencing run can identify unambiguously when two alleles are present in a sample. Given the difference in size of different alleles, it is not surprising to see a confused sequencing region when two alleles occur together in a sample (either because of multiple alleles in a single isolate, or because of mixtures of isolates in a culture). More than 100 isolates have sequences that suggest mixtures.
Finally, there is a group of sequences (# = 320) that have no sequencing ambiguities, and no obvious suggestion of sequencing error. These are candidates for true “unique” alleles. Six of the eight new alleles identified during 2016 had been reported once before 2016. Of the 23 new alleles identified during 2017, 6 had been reported in single isolates previously. For the alleles identified in 2018 and 2019, 4 alleles each had been reported in single isolates previously. Of the 20 alleles described in 2020, 15 had been reported once before 2020. Finally 7 of the 11 new alleles reported in 2021 had been seen once prior to 2021. At present more than 300 isolates continue to have allele sequences that have been reported only once.
Further information on the group of unclassified alleles will be posted as they are continue to be examined or if information on new isolates is deposited that results in a match with an isolate from among this collection of “unique” sequences.