(Data updated January 2020)
The class of “almost complete” 18S rRNA sequences represents the most informative set of sequences with respect to phylogenetic information about the relationships between isolates of Acanthamoeba. Most of the phylogenetic information from the rRNA sequences of Acanthamoebae comes from a series of hypervariable sequences within the gene, many of which are not shared with the genes from other organisms. The figure below comes from Gast et al. 1996, and shows the distribution within the 18S rRNA primary structure of 12 variable regions. Numbers below the figure indicate the locations with respect to numbered stems or loops in the predicted secondary structure (given on an associated web page), while the numbers above the figure indicate the position in which a stem or loop begins in the sequence of A. castellanii Neff (GenBank accession #U07416).
“ALMOST COMPLETE” 18S rRNA GENE SEQUENCES IN THE DNA DATABASES
The first two sequences of the Acanthamoeba 18S rRNA gene were deposited from the laboratory of Mitch Sogin at the Marine Biological Laboratory at Woods Hole, Massachusetts. They represented the sequences of the Neff strain identified as A. castellanii (ATCC 30010; Genbank accession M13435) and the Reich strain identified as A. palestinensis (ATCC 30870; CCAP 1547/1; Genbank accession L09599). These two sequences were obtained before PCR-based sequencing, and represented the genetic cloning of the genes. They include the complete length of the gene for each sequence. They are thus generally longer than subsequent sequences that relied on PCR, which often relied on the use of PCR primers developed from information about the two original sequences.
Most of the sequences that make up the group of “almost complete” sequences were obtained from multiple sequencing runs of overlapping PCR fragments that spanned the length of the rRNA molecule. Determination of these sequences usually involves PCR primers that are in the 5′ extreme and 3′ extreme regions of the rRNA molecule. These tend to be primers that are highly conserved in almost all eukaryotic organisms. Other internal “Acanthamoeba-specific” PCR primers are then used to provide amplification products of sub-regions and for sequencing.
In many cases, the entire “almost complete” Acanthamoeba rRNA gene product is not easily amplified as a single product, since it exceeds 2000 bases in length, and has considerable secondary structure that can interfere with efficient PCR amplification. Amplification of a 5′-portion and a partially overlapping 3′-portion of the molecule is often performed, and the results combined to obtain the entire sequence.
As of March 31, 2022, there were 591 almost complete sequences from Acanthamoeba that were in the DNA databases, or available from investigators. Eighteen isolates are represented by multiple sequences (representing multiple independent investigators sequencing standard strains of Acanthamoeba). The 18 isolates are represented by 47 out of the 591 almost-complete sequences in the databases (resulting in 544 independent sequences). Fourteen of the isolates with multiple sequences represent cases in which one sequence results from genome sequencing. The distribution of all almost-complete sequences among the various sequence types or sub-types is shown in the table below:
(updated March, 2022)
It is clear from this table that the largest grouping of sequences (383 out of 591) are those isolates of Acanthamoeba that are identified as belonging to the subgroups of sequence type T4. The next most frequent group of sequences are the 29 sequences of sequence type T5. The T2/6 supergroup is represented by 39 sequences. The identification of specific sequences that are included in this compilation, including length of the deposited sequence, accession number and genotype classification, and allelic classification for members of T4, T3, T11 and T5, is provided in the attached pdf (almost complete list 2022).
PROBLEMS RELATED TO OBTAINING “ALMOST COMPLETE” 18S rRNA GENE SEQUENCES
As mentioned above, to obtain an almost complete sequence, amplification of a 5′-portion and a partially overlapping 3′-portion of the molecule can be performed, and the results combined to obtain the entire sequence. This has one potential drawback in situations in which the “isolate” being analyzed is actually a mixed sample of different Acanthamoeba cell lineages. (Note that this may often be the case for environmental isolates that have not been clonally derived by serial subculturing, and even in many clinical samples. This has been demonstrated very convincingly in the case of sequences that we had previously designated T99. These problematic sequences were shown by Corsaro et al. 2017, to be chimeric sequences in which three organisms, a nematode, a cercozoan and a T13 Acanthamoeba). When this occurs, the 5′ amplification product and the 3′-product may actually represent different cellular lineages, producing a chimeric final sequence. There appear to be several such putative chimeric sequences in the databases, identified because the different ends of the molecule cluster in different parts of the phylogenetic tree of Acanthamoeba, although, unlike the case with the invalid T99 clade, their origin cannot usually be determined with certainty.
Another chimeric sequence that has been identified is that of the isolate TUMSJ-341 (ATCC PRA-11), which was found to contain an intron which is most similar to introns found in T5 isolates. In January of 2019, Corsaro et al. showed that the sequence that was deposited for this isolate (acc # AF352391) is a chimera constructed primarily from the sequence of a T5 strain of Acanthamoeba (presumably representing the strain deposited with ATCC) into which is inserted a segment representing T4-like sequences.