THE ACANTHAMOEBA GENOME SEQUENCES IN THE DNA DATABASES: WE NEED TO APPLY CAUTION IN THEIR USE
(updated March 2022)
As the cost of DNA sequencing has declined, efforts have been made to determine the complete genome sequence of a number of isolates belonging to the genus Acanthamoeba. However, some issues have occurred that suggest that great caution must be taken in interpreting the genome information from different isolates within the genus (see below). At the beginning of 2017, nuclear genome sequences (most in the form of unlinked CONTIG sequences) were accessible for 15 isolates of Acanthamoeba. Since that time, genome sequences or nearly complete transcriptome sequences have been added for 8 additional isolates. Three of the isolates have been independently sequenced twice. Complete mitochondrial DNA sequences can be retrieved from 21 of these isolates. Complete mitochondrial genome sequences are also available from an additional 3 isolates.
It is notable that the sequences for neither the nuclear nor mitochondrial genome are available for the type isolate of the genus which was described by Castelani in a series of papers in 1930 (isolate A. castellanii AC30; ATCC 30011 or CCAP 1501/10, and subcultured and deposited as ATCC 30234 and ATCC 50374).
THE GENOME SEQUENCE FOR A. sp. Neff
The first genome released for use by the community was that of the Neff strain of Acanthamoeba (ATCC 30010), released in 2013 as NCBI Reference Sequence: NZ_AHJI00000000.1. The genome sequence was obtained as a whole genome shotgun sequencing project. This genome sequence has been well annotated, and will serve as the template to which other genome sequences can be compared. However, the phylogenetic position of the Neff strain and the modest frequency of Neff-like natural isolates within the universe of Acanthamoeba isolate sequences in the databases suggests that it may not be the best standard to represent the genus, or even sequence type T4.
REFERENCE: Clarke, M. et al. 2013. Genome of Acanthamoeba castellanii highlights extensive lateral gene transfer and early evolution of tyrosine kinase signaling. Genome Biol. 14 (2), R11.
[Note on the species classification of the Neff strain: The Neff strain has been classified traditionally as belonging the species A. castellanii. However, this is based on morphology, not gene phylogeny. When the sequence of several genes (discussed elsewhere on this site) are compared, it appears clear that this species designation is not appropriate. Although the Neff strain of Acanthamoeba is a member of the T4 sequence type, within T4 it is not closely related to the type strain of A. castellanii (the original type strain described by Castellani  is available as ATCC 30011, or sub-cultured as ATCC 30234 and ATCC 50374). The Neff strain represents a fairly small group of well differentiated isolates (sequence subtype T4-neff), as indicated on the page detailing the phylogenetic relationships among T4 isolates. Our recommendation is that the isolate be labeled as A. sp. Neff to indicate that it does not represent an isolate of A. castellanii.]
THE GENOME SEQUENCES OF A. polyphaga Linc Ap-1
During 2016, investigators at Kingston University deposited sequences from the whole genome shotgun sequencing project of Acanthamoeba polyphaga strain Linc Ap-1 (accession # LQHA01000000). This isolate is typed as a member of sequence type T4, subgroup T4A. The complete mitochondrial DNA genome was also been deposited (accession # KP054475). The species classification of this isolate is also somewhat in doubt. Page (1967) described A. polyphaga, based on eight isolates. At least two of these isolates (ATCC 30871; CCAP 1501/3a and ATCC 30872; CCAP 1501/3b) exist in the culture centers. The 18S rRNA gene sequence of ATCC 30871 has been determined and is clearly distinct from that of Linc Ap-1 (ATCC 30871 belongs to sequence type T4 subtype T4E, while Linc Ap-1 belongs to T4 subtype T4A), thus raising questions about the use of “polyphaga” to describe the isolate. The classification of ATCC 30872 is more problematic. The only 18S rRNA sequence directly reported to be from ATCC 30872 (acc # AY026244) places this isolate as a sequence type T2/6, subtype A. In the discussion below of other genome sequences, one of the genome projects purported to use ATCC 30872. The 18S rRNA sequence from this genome project does not agree at all with sequence AY026244, nor does it match any other 18S rRNA sequence in the database. It does place the isolate within sequence type T4 subtype T4B. If the genome project does in fact truly represent ATCC 30872, this again raises objections to the use of “polyphaga” for Linc Ap-1 (as well as the general utility of “polyphaga” to be a correct taxonomic name for any Acanthamoeba isolate.
THE GENOME SEQUENCES OF OTHER MEMBERS OF ACANTHAMOEBA (UNIVERSITY OF LIVERPOOL; JANUARY 2015)
The genome sequences of 14 isolates of Acanthamoeba were released to the international DNA databases in 2015, the result of Whole Genome Sequences obtained by nextgen sequencing procedures. Originally these sequences were released under only species names, with no identification of the source isolates. Unfortunately, a series of mislabelings appear to have occurred at some point (currently unknown), which rendered the species and isolate designations of a number of the genome sequences problematic . We have been cooperating with one of the PIs of this project (Andrew Jackson of the University of Liverpool) to clarify the isolate designations that should be applied to these sequences.
Together with Dr. Jackson, we have analyzed the sequences for a set of genes (nuclear 18S rRNA, mitochondrial 16S-like rRNA, mitochondrial cytochrome oxidase subunit 1, and a set of partial sequences from 5 nuclear genes: beta-tubulin, elongation factor-1, glyceraldehyde-3-dehydrogenase, glycogen phosphorylase 1, and RasC. The set of sequences has allowed us to identify with extremely high or moderately high probability the identity of 13 of the fourteen isolates from which genome sequences were obtained.
The following list provides information concerning the original species designations of the isolates and the putative isolate identification from which the DNA for these genome project was obtained. It then provides the results of our analysis and the best estimate of the correct identification of the source of project material.
We were collaborating with Dr. Jackson to provide a description of the patterns of genome differentiation as seen in light of the available WGS project genome sequences
We have been successful in retrieving the sequence of the whole mitochondrial genomes from these genome samples.
[Please note that we are not currently making judgements concerning whether species attributions originally associated with any of the standard ATCC strains are appropriate. Given analysis based either on genome sequences or on sequences of genes such as the 18S rRNA gene, it is likely that some species names will be viewed as synonymous with alternative appropriate designations. Future postings on this site will deal with the question of “species” within Acanthamoeba as revealed by molecular phylogenetics.]
CORRECT ATTRIBUTION OF GENOME SEQUENCES
Six of the Liverpool genome sequences are correctly attributed to isolates. These are:
A. astronyxis (WGS Project: CDFH01): genome source: ATCC 30137; (sequence type T7)
A. culbertsoni (WGS Project: CDFF01): genome source: ATCC 30171 (strain A1); (sequence type T10)
A. lenticulata (WGS Project: CDFG01): genome source: ATCC 30841 (isolate PD2S); (sequence type T5)
A. lugdenensis (WGS Project: CDFB01): genome source: ATCC 50240 (isolate L3a); (sequence type T4, subtype T4A)
A. quina (WGS Project: CDFN01): genome source: ATCC 50241 (isolate Vil3); sequence type T4, subtype T4A)
A. rhysodes (WGS Project: CDFC01): genome source: ATCC 30973 (isolate Singh); sequence type T4, subtype T4D)
A seventh genome sequence appears to be constituted primarily of sequences representing the correct source, but analysis of mitochondrial DNA sequence reads suggest that it also appears to contain a minority of short sequence reads that may be contaminants from another source (probably A. culbertsoni, above).
A. mauritaniensis (WGS Project: CDFE01): genome source: ATCC 50253 (isolate 1652); (sequence type T4, subtype T4D)
PROBLEMATIC ATTRIBUTION OF GENOME SEQUENCES TO ISOLATES
Of the remaining seven Liverpool genome sequences, problems exist in identifying the correct attribution for source. In terms of the nature of uncertainty, we will list each isolate from most certain to least certain.
- A. castellanii (WGS Project: CDFL01): putative genome source: ATCC 50370 A. castellani (isolate Ma); (sequence type T4, subtype T4B)
Probably attributed correctly. Evidence from the 18S rRNA gene suggests the possibility of minor contamination, or multiple allelism of the 18S rRNA sequence.
- “A. healyi” (WGS Project: CDFA01): putative genome source: ATCC 30866 A. healyi (isolate OC-3A)
ERRONEOUS IDENTIFICATION: The WGS sequences do not match ATCC 30866 A. healyi.
CORRECT IDENTIFICATION: The sequences are a match to ATCC 30870, A. palestinensis Reich (sequence type T2).
- “A. palestinensis” (WGS Project: CDFD01): putative genome source: ATCC 30870 A. palestinensis Reich
ERRONEOUS IDENTIFICATION: The WGS sequences do not match ATCC 30870 A. palestinensis Reich.
CORRECT IDENTIFICATION: The sequences are a match to ATCC 50254, A. triangularis (isolate SH621); (sequence type T4, subtype T4F).
- “A. pearcei ” (WGS Project: CDFJ01): putative genome source: ATCC 50435, A. pearcei .
ERRONEOUS IDENTIFICATION: The WGS sequences do not match ATCC 50435, A. pearcei.
CORRECT IDENTIFICATION: problematic. Comparisons are not absolutely conclusive as to the identity of the source strain. The most likely source is Acanthamoeba sp. ATCC 50496 (strain Galka), but other similar standard strains are possible, though less likely. The WGS sequences for the 18S rRNA show a close but not exact match to previous sequences from ATCC 50496 A. sp. Galka (BCM:1282:324). However, the previous sequence from the 16S-like rRNA from A. sp. Galka ATCC 50496 does not match the WGS results as closely as do sequences from some other isolates. (Sequence type T4, subtype T4A)
- “A. polyphaga” (WGS Project: CDFK01): putative genome source: ATCC 30872 A. polyphaga (CCAP 1501/3b).
ERRONEOUS IDENTIFICATION: As mentioned above, the WGS sequences of the 18S rRNA do not match the sequence deposited in the DNA databases (AY026244) to represent ATCC 30872 A. polyphaga (CCAP 1501/3b). This single sequence is the only prior comparative sequence information reported from ATCC 30872 A. polyphaga (CCAP 1501/3b).
CORRECT IDENTIFICATION for WGS Project: CDFK01: problematic. The WGS sequences for several genes show close matches to previous sequences from several ATCC isolates within the sequence subtype T4B. Analysis of 18S rRNA, 16S-like rRNA and Cox-I all suggest three possible isolate sources, equally likely. If the sequence (AY026244) is correct, then a best guess for the source of the DNA for the genome material is that the source for this WGS is ATCC 50372, A. polyphaga JAC/S2 (given sequence similarity to previous sequences and some overlap of ATCC number). If AY026244 was inappropriately attributed to ATCC 50372, then WGS project CDFK01 could be the first data for this ATCC isolate. (Sequence type T4, subtype T4B)
- “A. divionensis” (WGS Project: CDFI01): putative genome source: ATCC 50238 A. divionensis
ERRONEOUS IDENTIFICATION: The WGS sequences do not match ATCC 50238 A. divionensis.
CORRECT IDENTIFICATION: The WGS project sequences are a match to ATCC 30137 A. astronyxis. Project sequences appear to be from a duplicate sample of A. astronyxis .
- “A. royreba” (WGS Project: CDEZ01) : putative genome source: ATCC 30884 A. royreba (Oak Ridge).
ERRONEOUS IDENTIFICATION: The WGS sequences do not match ATCC 30884 A. royreba Oak Ridge.
CORRECT IDENTIFICATION: very problematic. The sequence reads from the 18S rRNS gene of the WGS project DO NOT MATCH the sequence of any previously described Acanthamoeba isolate (neither from an known and described isolate from a culture center nor from an isolate reported from nature). The isolate ATCC 30884 A. royreba (Oak Ridge) has previously been identified through multiple sequences as a member of Acanthamoeba sequence type T4, subtype T4-D. These do not match the information from the genome project. The sequence from the 18S rRNA gene differ from all previously described Acanthamoeba 18S rRNA sequences by more than 10%. Nevertheless, the WGS sequence has the expansion segments within the 18S rRNA gene sequence characteristic of Acanthamoeba. WGS sequences from other genes show a similar large divergence from the genes of known Acanthamoeba isolates. This unknown isolate may represent a sample from one of the ATCC standard isolates for which no previous sequence information has been obtained. Or it may represent some isolate of unknown origin. Whatever its ultimate identification, it appears to represent a new sequence type that is quite distinct from all previously described forms within Acanthamoeba. Identification of its source should be a high priority. (The WGS sequences would thus represent a new sequence type, designated T21).
THE GENOME SEQUENCES OF OTHER MEMBERS OF ACANTHAMOEBA (AUSTRIAN INSTITUTE OF TECHNOLOGY; JANUARY 2017)
Two genome sequences were deposited in the DNA databases as Sequence read archives (SRA).
Acanthamoeba comandoni Strain Pb30/40 (ATCC Pra 287) : A group I Acanthamoeba. (originally designated A. astronyxis Pb30/40). Sequences from the nuclear 18S rRNA gene and the mitochondrial 16S-like rRNA gene (pb30-40 18s rRNA sequence) suggest that this isolate is not closely related to the type isolate for A. comandoni (ATCC 30135) which has been designated as a sequence type T9 for the nuclear 18S rRNA gene sequence. Comparison of sequences from the SRA indicate that Strain Pb30/40 is roughly equidistant from sequences designated as T17 and T18 (and less than 5% divergent from the sequences of either group). Sequences exist in SRA SRX2460089 and can be accessed through SRA experimental run SRR5141519 .
Acanthamoeba lenticulata strain 72/2 (ATCC 50704): A member of sequence type T5, this strain represents a different sub-type of A. lenticulata, compared to A. lenticulata strain PD2S (ATCC 30841), whose genome sequence information was deposited the University of Liverpool group (above). Strain PD2S is characterized by the presence of an intron in the 18S rRNA gene sequence (Schroeder-Diedrich, Fuerst, and Byers, 1998). The gene in strain 72/2 lacks the intron. Genome sequences for A. lenticulata strain 72/2 exist in SRA SRX2469245 and can be accessed through SRA experimental run SRR5151161 .
THE GENOME SEQUENCE OF ACANTHAMOEBA PYRIFORMIS (nov. sp.)
Late in December 2016, a paper appeared that reinterpreted the extent of the Acanthamoebidae (Tice, et al. – Biology Direct [2016 Dec 28] 11(1):69.). This paper included information on a new form of Acanthamoeba. They reported that by sequence analysis the sporocarpic amoebae “Protostelium” pyriformis is clearly a close relative to the members of the genus Acanthamoeba. This would make the form “Acanthamoeba” pyriformis the first reported member of the genus which individually forms a walled, dormant propagule elevated by a non-cellular stalk.
The paper has been followed by the deposition of transcriptome nextgen sequences in a sequence read archive (SRA) file. The sequence of the 18S rRNA gene has been deposited (accession # KX840327). An equivalent sequence retrieved from the transcriptome sequence read archive was 2220 nucleotides in length and contained regions of the gene equivalent to the expanded hypervariable regions that characterize Acanthamoeba 18S rRNA genes. Initial comparisons of this sequence with the almost complete 18S rRNA genes of other Acanthamoeba taxa found none of the other taxa within Acanthamoeba showing sequence similarity to Acanthamoeba pyriformis sp. nov. greater than ~86%. The sequence appears to be more divergent from the Group I acanthamoebae (A. astronyxis, etc.) than from other taxa, suggesting it may have diverged from within Acanthamoeba Groups II or III. No other partial or almost complete sequences have been reported that show close correspondence with this type sequence from A. pyriformis sp. nov.. More extensive comparisons will be forthcoming. (The SRA transcriptome sequences together with the 18S rRNA gene sequence indicate that this taxa would represent a new sequence type, designated T22).
GENOME OR TRANSCRIPTOME SEQUENCES SINCE 2016:
In 2017, researchers from National Yang-Ming University, Taipei, Taiwan deposited sequences for an isolate of A. lenticulata. designated PT-14. The culture was isolated from a freshwater well after a typhoon induced flooding event. Attempts to retrieve a complete mitochondrial genome have not been successful. Some portions of the sequence seem not to be represented in the material deposited in the DNA databases. genome NAVB (2017).
The year 2020 provided material from the genomes of three isolates, each of which were classified within sequence sub-type T4A.
The first isolate deposited was A. sp. KDN1, a sample obtained from the Kodanuki marsh in Fujinomiya, Shizuoka, Japan, by researchers from the National Institute of Genetics in Mishima, Japan. The data deposited consists of a transcription shotgun assembly (TSA), with genome identity IACY. The TSA appears to be quite robust, and a complete, closed mitochondrial genome sequence has been retrieved from the files.
The second genome sequence from 2020 was of the isolate designated A. castellanii strain Namur, obtained from a coprolite sample by researchers at the Aix-Marseille University. The whole genome sequence assembly is designated CAIJLO, and a complete mt-DNA genome was retrieved from the WGS files.
The final genome sequence from 2020 was obtained from a transcription shotgun assembly (TSA) of sample obtained from a case of corneal keratitis, designated Cornea-Case12 by researchers at Johns Hopkins University. The sequence read archive (SRA) for the sample is designated SRR12486987. A complete, closed mt-DNA genome has been extracted from the SRA file.
In 2021, three genome sequences were released, only one of which represented a new isolate whose genome was not previously available.
In 2021, the researchers at Aix-Marseille University released a second genome sequence, one that represented A. triangularis strain SH621 (ATCC 50254). The whole genome assembly is designated CACVKS. Note that this represents the second sequencing of the genome of this strain, since it was previously sequenced by the University of Liverpool group in 2015, although the data (WGS project CDFD) was erroneously released as representing A. palestinensis. A whole mt-genome was obtained from the genome assembly which can be compared to the sequence retrieved from WGS project CDFD. The two sequences show differences at 15 out of 42296 sites (12 nucleotide differences and 3 in/del differences. In addition, an 85 base segment which includes a single t-RNA gene is missing from the CACVKS sequence, seemingly due to lack of overlap of the deposited genomic segments that flank this small segment. This segment is present in the mt-genome sequences of all other T4 isolates for which a mt-genome sequence has been obtained.
At the end of 2021, genome sequences were released by researchers from Dalhousie University and the Institut Pasteur for two strains, A. castellanii strain C3 ATCC 50739 (genome project JAJGAO) and A. castellanii strain Neff ATCC 30010 (genome project JAJGAP). Details have been deposited in BioRxiv. The complete mt-genome has been retrieved from the data for A. castellanii C3. Since the project for A. castellanii Neff represents the fourth time that the genome of this strains has been obtained, no effort has yet been made to retrieve the mt-genome information.
The latter two genomes (A. castellanii C3 JAJGAO) and A. castellanii Neff JAJGAP), together provide outstanding insight into some aspects of the Acanthamoeba genome. Careful analysis has allowed the material from these genomes to be placed with high confidence into chromosomal units. Each strain showed 35 scaffolds, suggesting a karyotype of 35 chromosomes in the genome. Further, the 18S-5.8S-28S rRNA gene segments in each strain occurred on only a single scaffold. Even further, the data suggest that there are only two copies of this repeat unit in each genome. Finally, the two copies of the 18S rRNA gene from A. castellanii C3 are not identical to one another. They represent two different 18S rRNA sequences that have been observed in a number of other studies. The observation of these two different sequences appearing in tandem within a single isolate provides better understanding of the apparent existence of multiple sequences of the 18S rRNA gene that sometimes have been inferred from sequencing of a single Acanthamoeba isolate.