(Data updated October 2023)
The class of “almost complete” 18S rRNA sequences represents the most informative set of sequences with respect to phylogenetic information about the relationships between isolates of Acanthamoeba. Most of the phylogenetic information from the rRNA sequences of Acanthamoebae comes from a series of hypervariable sequences within the gene, many of which are not shared with the genes from other organisms. The figure below comes from Gast et al. 1996, and shows the distribution within the 18S rRNA primary structure of 12 variable regions. Numbers below the figure indicate the locations with respect to numbered stems or loops in the predicted secondary structure (given on an associated web page), while the numbers above the figure indicate the position in which a stem or loop begins in the sequence of A. castellanii Neff (GenBank accession #U07416).
“ALMOST COMPLETE” 18S rRNA GENE SEQUENCES IN THE DNA DATABASES
The first two sequences of the Acanthamoeba 18S rRNA gene were deposited from the laboratory of Mitch Sogin at the Marine Biological Laboratory at Woods Hole, Massachusetts. They represented the sequences of the Neff strain identified as A. castellanii (ATCC 30010; Genbank accession M13435) and the Reich strain identified as A. palestinensis (ATCC 30870; CCAP 1547/1; Genbank accession L09599). These two sequences were obtained before PCR-based sequencing, and represented the genetic cloning of the genes. They include the complete length of the gene for each sequence. They are thus generally longer than subsequent sequences that relied on PCR, which depended on the use of PCR primers developed from information about the two original sequences.
Most of the sequences that make up the group of “almost complete” sequences were obtained from multiple sequencing runs of overlapping PCR fragments that spanned the length of the rRNA molecule. In many cases, the entire “almost complete” Acanthamoeba rRNA gene product is not easily amplified as a single product, since it exceeds 2000 bases in length, and has considerable secondary structure that can interfere with efficient PCR amplification. Determination of the “almost complete” sequences usually involves PCR primers that are in the 5′ extreme and 3′ extreme regions of the rRNA molecule. These tend to be primers that are highly conserved in almost all eukaryotic organisms. Amplification of a 5′-portion and a partially overlapping 3′-portion of the molecule is often performed, and the results combined to obtain the entire sequence. Other internal “Acanthamoeba-specific” PCR primers are then used to provide amplification products of sub-regions to clarify equivocal sequencing.
Over the past decade, an increasing number of isolates of Acanthamoeba have been studied using whole genome sequencing (WGS). WGS has provided new insights concerning the ribosomal RNA genes. Because of the methodology used in obtained WGS sequences, the 18S rRNA gene sequences obtained from WGS provide a complete sequence of the gene.
One other insight provided by WGS sequences in the DNA database concerns the possibility of polymorphism/allelism for the 18S rRNA gene sequences within an Acanthamoeba cell. In most eukaryotic organisms, ribosomal RNA genes usually occur as multiple copies within the genome. It has been unclear how many copies of the rRNA genes occur within Acanthamoeba. Especially important in this regard has been the study of Matthey-Doret et al., (2022). This study analyzed the Neff and C3 strains of Acanthamoeba. Each isolate was found to contain two copies of the nuclear rRNA repeat unit. The repeat unit is transcribed into the small subunit rRNA (18S), the 5.8S rRNA, and the large subunit rRNA (28S). The two copies of the rRNA unit occur in tandem on one chromosome. The information from this paper is also important because of the sequences that are contained within the two WGS sequences. The two copies of the 18S rRNA gene of Neff appear to differ by a single nucleotide. In contrast, the two 18S copies from the C3 genome are different, characterized by two alternative alleles (alleles T4/08 and T4/43 as defined in Fuerst and Booton, 2020).
The occurrence of multiple alleles within the same Acanthamoeba cell has been a potential source of difficulty in accurately determining the 18S rRNA gene sequence of a potential isolate. Direct sequencing of PCR products can result in a mixed product that is hard to read if the two versions of the gene within a cell differ in length, especially in the segment associated with “alleles” as defined by Fuerst and Booton, 2020.
As of October 26, 2023, there were 814 almost complete sequences from Acanthamoeba that were in the DNA databases, or available from investigators. These sequences represent about 12% of all 18S rRNA gene sequences present in the Acanthamoeba DNA databases. Nineteen isolates are represented by multiple sequences (the multiple sequences result from independent investigators sequencing standard strains of Acanthamoeba). The 19 isolates are represented by 44 out of the 814 almost-complete sequences in the databases (resulting in 789 independent sequences). Fourteen of the isolates with multiple independent sequences represent cases in which one version results from genome sequencing. The distribution of all almost-complete sequences among the various sequence types or sub-types is shown in the table below:
(updated October 2023)
It is clear from this table that the largest grouping of sequences (533 out of 814) are those isolates of Acanthamoeba that are identified as belonging to the subgroups of sequence type T4. The next most frequent group of sequences are the 34 sequences of sequence type T5. The T2/6 supergroup is represented by 39 sequences. The identification of specific sequences that are included in this compilation, including length of the deposited sequence, accession number and genotype classification, and allelic classification for members of T4, T3, T5, T11 and T15, is provided in the attached pdfs.
LIST OF ALMOST COMPLETE SEQUENCES:
T4 sequences: T4A T4B T4C T4D T4E T4F T4neff T4H
Group 1 sequences: T7-T8-T9-T17-T18
T13 – T16 – T20 related sequences
T10 – T12 – T14 – T19 sequences
PROBLEMS RELATED TO OBTAINING “ALMOST COMPLETE” 18S rRNA GENE SEQUENCES
As mentioned above, to obtain an almost complete sequence, amplification of a 5′-portion and a partially overlapping 3′-portion of the molecule can be performed, and the results combined to obtain the entire sequence. This has one potential drawback in situations in which the “isolate” being analyzed is actually a mixed sample of different Acanthamoeba cell lineages. (Note that this may often be the case for environmental isolates that have not been clonally derived by serial subculturing, and even in many clinical samples. This has been demonstrated very convincingly in the case of sequences that we had previously designated T99. These problematic sequences were shown by Corsaro et al. 2017, to be chimeric sequences in which three organisms, a nematode, a cercozoan and a T13 Acanthamoeba). When this occurs, the 5′ amplification product and the 3′-product may actually represent different cellular lineages, producing a chimeric final sequence. There appear to be several such putative chimeric sequences in the databases, identified because the different ends of the molecule cluster in different parts of the phylogenetic tree of Acanthamoeba, although, unlike the case with the invalid T99 clade, their origin cannot usually be determined with certainty.
Another chimeric sequence that has been identified is that of the isolate TUMSJ-341 (ATCC PRA-11), which was found to contain an intron which is most similar to introns found in T5 isolates. In January of 2019, Corsaro et al. showed that the sequence that was deposited for this isolate (acc # AF352391) is a chimera constructed primarily from the sequence of a T5 strain of Acanthamoeba (presumably representing the strain deposited with ATCC) into which is inserted a segment representing T4-like sequences.