V. vermiformis sequences in the DNA databases

(updated July 2021)

As of July, 2021 there were 931 sequences of the 18S rRNA gene, with a putative assignment to V. vermiformis, that had been deposited in the DNA databases or were available from researchers for comparison. After the first sequences were deposited in 1993/1994, no additional sequences were deposited until 2001.  The pattern of yearly deposits into the DNA databases of 18S rRNA sequences is shown in the first figure below.

Following 2003, sequences were deposited on a regular basis.  Spikes in the deposition of sequences in 2007, 2010 and 2011-2012 correspond to environmental studies of uncultured eukaryotes which identified large numbers of DNA sequences from V. vermiformis in their samples.


Sequences of the 18S rRNA gene in the DNA databases from V. vermiformis vary greatly in size.  There are proportionately many fewer “almost” complete sequences from putative V. vermiformis isolates than are found in studies from Acanthamoeba.  This may be primarily because most isolates are very similar in sequence (usually less than 1% sequence divergence between almost complete sequences).  Nevertheless, as will be documented on another page, consideration of the almost-complete sequences provides considerable information with which to better understand population/geographic variation within V. vermiformis

The distribution of the lengths of the 18SrRNA gene sequences from V. vermiformis in the DNA databases is shown in the following graph.

The sequences range from 123  to 1852 bp in length.  The greatest number of sequences (508) fall in the bin for which sequences have lengths between 501 and 600 bp.   This group far exceeds other bins.  In part this is due to the choice of a set of uniform PCR primers that produced an optimal fragment size for sequencing in the pre-Nextgen sequence era.  Another factor is that many of the surveys of uncultured eukaryotic environmental microbiome produced similar sized products.  

The distribution of sizes in the figure above is substantially different than that seen in Acanthamoeba, having fewer sequences (both in numbers and percentages) in the “almost complete group, as well as fewer in the <500 bp classes.  This latter observation derives from the fact that one of the optimal size groups in length for Acanthamoeba (300-450 bp) represents the set of fragments that include the most informative portion of the Acanthamoeba 18S rRNA gene.     

There were 37 sequences whose length (excluding introns) exceeded 1500 bp in length.  (Note that there are also five additional sequences that are assigned to the genus Hartmanella, the genus that originally contained what is now Vermamoeba, but these are not included in this analysis of V. vermiformis).  The sequences exceeding 1500 bp represent a core group that can be used to help categorize the shorter sequences into possible subgroups.  (The group exceeding 1500 bp are the “amost complete” sequences).   Information concerning the sequence variation in this select group is presented on an accompanying page.

