(updated March 2022)
The sequence of the 18S rRNA gene was the first major class of sequences used to classify isolates of Acanthamoeba. Sequences can be compared directly, based on primary structure of the nucleotide sequence. Additional information concerning the homology of nucleotide sites within the molecule is provided by examining aspects of the secondary structure of the rRNA molecule. The secondary structure hypothesized for Acanthamoeba is shown in a subpage of this part of the web site.
The sequences represented in tables and figures on this site correspond to different isolates that have also been categorized by “sequence type“, as detailed on another page of the web site. The various sequence types occur in different frequencies among the isolates that have been reported in the DNA databases. By extension, it is hoped that these frequencies are related (perhaps loosely) to the frequency with which sequence types occur in nature.
Sequences in the databases can be considered to fall into two major classes, related to the length of the sequence that has been deposited. Arbitrarily, the sequences have been separated into groups that exceed 2000 bases in length, referred to as “almost complete”, and sequences that are shorter than 2000 bases. Detailed and updated information on each of these two classes is provided in subpages of this part of the web site. [almost complete], [partial].
From the publication of the first Acanthamoeba 18S rRNA gene sequence in 1986, there has been a steady increase in the number of sequences deposited in the DNA databases, as documented on a previous page. During 2014, the total number of sequences exceeded 2000, while the number exceeded 3000 during 2015. In 2017, the number exceeded 4000, and the 5000 sequence mark was reached during 2019. During 2021 the number of sequences of the 18S rRNA gene exceeded 6000 in our summary statistics. This represented an increase of almost 800 since the end of 2019.
“ALMOST COMPLETE” SEQUENCES OF THE 18s rRNA GENE
The increase in the number of “almost complete” sequences in the international DNA databases since the first report of an Acanthamoeba 18S rRNA gene sequence in 1986 can be seen in the following figure. This shows the total number of 18S rRNA gene sequences deposited by the end of each year through the end of 2021 (including a number of sequences unlikely to be deposited in the DNA databases in the future that have been shared with us by other investigators). In January 2020 there were 436 “almost complete” sequences that were curated in our database of 18S rRNA gene sequences. During 2019, only 5 new “almost complete” sequences became directly available for study ion the DNA databases. During 2020, 29 sequences were released, raising the total to 465. Deposits during 2021 were significantly higher, with 150 “almost complete sequences deposited, raising the total to 615.
Note that an additional 189 “almost complete” sequences were deposited during 2019, and subsequently retracted because a few represented chimeric sequences. It is expected that this set of sequences will be redeposited in the future. They represent a substantial additional to our knowledge of the 18S rRNA evolution in Acanthamoeba.
Specific details of the increase in number of sequences (both partial and almost complete) deposited into the DNA databases through the beginning of 2014 have been discussed in the paper by Fuerst (2014, Experimental Parasitology). An update of the cumulative increases is given in the figure below.
LENGTH OF SUBMITTED SEQUENCES IN DNA DATABASES
In addition to almost complete sequences, partial sequences of various lengths (from less than 100 nucleotides in length to almost complete sequences that exceed 2700 nucleotides). As mentioned, in the middle of 2021, more than 6000 sequences had been compiled within our database (including sequences deposited in the DNA databases and more than 500 sequences from various investigators that had never been deposited).
The distribution of sequence lengths for this data is shown in the following figure. Sequence lengths were determined from the sequence that was deposited in the database or submitted to us by collaborators. A small number of the longer sequences contain Group I intron sequences, which tend to be very lineage specific. The introns were removed from the sequence when determining its length in comparison to other Acanthamoeba sequences.
Sequences range from a minimum of 47 nucleotides to a maximum of 2784 nucleotides (not including introns). The longest sequence with the intron included was 3583 nucleotides in length.
There are several peaks in the distribution of sequence length. The largest distinct peak exists for the data bin of size 401-500. This peak, which included 3034 sequences in January 2022, corresponds primarily to sequences that have been derived using PCR primers that flank the ASA.S1 region (for further information see the page on Partial 18S rRNA Sequences). The next smaller bin (301-400) included 923 sequences in January 2022 is also produced by these same primers, oe by an alternative set of primers internal of the ASA.S1 region that would yield a slightly smaller sequence (~260-270 nucleotides in size) sometimes referred to as DF3 . The third smaller bin (201-300), with 788 sequences, primarily represents sequences obtained using this latter set of primers .
An additional peak of note is the bin in the range 2201-2300 nucleotides, which included 463 of the “almost complete” 18S rRNA sequences in January 2022.
FREQUENCY OF SEQUENCE TYPES IN THE DATABASES
Sequences in the databases can be considered to fall into two major classes, related to the length of the sequence that has been deposited. Arbitrarily, the sequences have been separated into groups that exceed 2000 bases in length, referred to as “almost complete“, and sequences that are shorter than 2000 bases. Further details on each group are given in subpages of the entry on 18S rRNA gene sequences in the web site.
As of March 2022, the total number of sequences of the 18S rRNA gene for isolates of Acanthamoeba that are included in our worktable totals 6144. The total number of sequences deposited that have been assigned to each of the current sequence types is given in the table below:
Note (September 2014): the sequence type listed above as T19 is described in the paper by Magnet et al. Parasitol Res (2014) 113:2845–2850.
Note (January 2015): The T20 type is described in a review paper (Fuerst, Booton and Crary. 2014).
**Note (January 2017): The T21 type has not yet been described in a publication, but is represented by the sequences from the genome project with accession number CDEZOOOOOO.1 (listed as representing Acanthamoeba royreba. Analysis of the genome sequence indicates that material clearly represents a member of the Acanthamoebidae, but one that shows significant divergence from other Acanthamoeba isolates. Sequences from the genome project, which purports to represent A. royreba (Oak Ridge) ATCC 30884 are inconsistent with previous sequences from that same ATCC source. Information from other sources that would help to identify the isolate is lacking. Apparently the genome sequence represents an unidentified ATCC isolate, for which no other information has ever been reported. Although we have a genome sequence for this sequence type, we do not yet have any unequivocal isolates, and are unable to identify the ATCC isolate from which the genome DNA was obtained.
*Note added April 2019: The T22 type has been used here for more than two years. It is associated with the sequence for the taxa Acanthamoeba pyriformis n. comb. as described in the paper by Tice et al. 2016.
Note (January 2017): Several isolates with partial 18S rRNA gene sequences can be classified as “generic Acanthamoeba.” These are clearly Acanthamoeba, but the sequence deposited in the databases overlap areas of strong conservation within/between all sequence types, so they cannot be typed. It is possible that
Note (January 2018): The sequences deposited in the databases for 31 isolates classified as T4 have sequences that do not permit them to be classified into a T4-subtype, although they contain sequence regions specific to T4. These generic T4 sequences are included in the total of T4 in the table, but are not included in the breakdown by sub-type of T4.
***Note added January 2022: New sequence type T23 described.