Genome sequences of scrub typhus isolates

 

THE GENOME SEQUENCES OF ORIENTIA ISOLATES

 

In general, the genomes of isolates of O. tsutsugamushi are larger than are the genomes of members of the genus Rickettsia.  When complete genomes have been determined, they are in the range of 2 Mb in size.  The difference between a genome in Rickettsia and one of Orientia is a combination of some increase in gene number in Orientia combined with a significant contribution by repeated sequences (some coding, some non-protein coding) that appear in large numbers in the genomes of Orientia compared to Rickettsia.  The appearance of these repeated sequence greatly complicates the work required to complete a closed circular genome sequence.  As a consequence, many of the genome sequences of Orientia that appear in the DNA databases may be considered partial sequences, although they may contain all of the unique coding portions of the genome of an isolate. 

As of July, 2017, the genome sequences of eleven members of the genus Orientia had been deposited in the DNA databases (ten isolates of O. tsutsugamushi and one isolate of O. chuto).  In late 2017, information concerning 27 additional isolates has been released, initially as sequence read archives (SRA) files in GenBank.

 

The Boryong Isolate

The first genome sequence of any isolate of Orientia to be deposited in the DNA databases was that of the Boryong isolate [NCBI Reference Sequence: NC_009488, deposited in 2007; reference:  Cho,N.H., et al.,  The Orientia tsutsugamushi genome reveals massive proliferation of conjugative type IV secretion system and host-cell interaction genes. Proc. Natl. Acad. Sci. U.S.A. 104 (19), 7981-7986 (2007)].  The genome sequence was found to encompass 2,127,051 bp, and contain a large number of repeated sequences. 

Cho,N.H., Kim,H.R., Lee,J.H., Kim,S.Y., Kim,J., Cha,S., Kim,S.Y., Darby,A.C., Fuxelius,H.H., Yin,J., Kim,J.H., Kim,J., Lee,S.J., Koh,Y.S., Jang,W.J., Park,K.H., Andersson,S.G., Choi,M.S. and Kim,I.S.  2006.  The Orientia tsutsugamushi genome reveals massive proliferation of conjugative type IV secretion system and host-cell interaction genes.  Proc. Natl. Acad. Sci. U.S.A. 104 (19), 7981-7986.

Orientia tsutsugamushi Boryong (NC_009488)  –  2,127,051 bp

The large number of repeated sequences is significant, because the presence of these sequences inhibits the bioinformatics processes that could quickly establish gene order in subsequent genome assemblies.   It is thus difficult to conclude that gene order is basically similar in different isolates.  

 

The Ikeda strain

Following the determination of the sequence of the Boryong isolate,  the genome sequence of the related Ikeda strain was determined in 2008.

Nakayama,K., Yamashita,A., Kurokawa,K., Morimoto,T., Ogawa,M., Fukuhara,M., Urakami,H., Ohnishi,M., Uchiyama,I., Ogura,Y., Ooka,T., Oshima,K., Tamura,A., Hattori,M. and Hayashi,T. 2008.  The Whole-genome sequencing of the obligate intracellular bacterium Orientia tsutsugamushi revealed massive gene amplification during reductive genome evolution.  DNA Res. 15 (4), 185-199

Orientia tsutsugamushi str. Ikeda (NC_010793)      –  2,008,987 bp

 

More recently, genome sequences of a number of other isolates have begun to be deposited into the DNA databases.  These include the following “standard” strains:

 

The Karp strain

Two groups have independently sequenced the genome of the Karp strain.  These sequences are:

  1. Liao,H.M., Chao,C.C., Lei,H., Li,B., Tsai,S., Hung,G.C., Ching,W.M. and Lo,S.C.  2016.  Genomic Sequencing of Orientia tsutsugamushi Strain Karp, an Assembly Comparable to the Genome Size of the Strain Ikeda.  Genome Announc 4 (4), e00702-16 (2016)

Orientia tsutsugamushi str. Karp      (LYMA02000000)  –  2,026,724 bp

The second genome analysis of Karp reported the sequence after removing portions of the sequence involved with highly repeated sequences.  The sequence appeared as:

  • 2. Daugherty,S.C., Su,Q., Abolude,K., Beier-Sexton,M., Carlyon,J.A., Carter,R., Day,N.P., Dumler,S.J., Dyachenko,V., Godinez,A., Kurtti,T.J., Lichay,M., Mullins,K.E., Ott,S., Pappas-Brown,V., Paris,D.H., Patel,P., Richards,A.L., Sadzewicz,L., Sears,K., Seidman,D., Sengamalay,N., Stenos,J., Tallon,L.J., Vincent,G., Fraser,C.M., Munderloh,U. and Dunning-Hotopp, J.C.  2015.  Genome Sequencing of Rickettsiales. Unpublished.

Orientia tsutsugamushi str. Karp     (LANM01000000)  –  1,454,354 bp

 

Other standard strains. 

In addition to Karp, the Daugherty, et al. group also reported the sequences of two other “standard” strains, Kato and Gilliam.  The data is given in:

Orientia tsutsugamushi str. Kato PP     (LANN00000000)  –  1,478,442 bp

Orientia tsutsugamushi str. Gilliam     (LANO00000000)  –  1,997,698 bp

 

Further genome sequences for isolates of Orientia tsutsugamushi have been reported by the two latter groups.  These include the following sequences: 

Liao,H.M., et al. .  2017.  Genomics Data 12: 84–88.

Orientia tsutsugamushi strain AFSC7 (LYMB00000000)  –  1,437,566 bp

Orientia tsutsugamushi strain AFSC4 (LYMT00000000)  –  1,295,323 bp

 

Daugherty,S.C.,  et al.   2015.  Unpublished.

Orientia tsutsugamushi str. TA716     (LAOA01)  –  2,221,260 bp

Orientia tsutsugamushi str. UT76    (LANZ01)   –  3,033,399 bp

Orientia tsutsugamushi str. UT144    (LAOR01)   –  1,689,193 bp

Orientia tsutsugamushi str. TA763    (LANY01)  –  2,460,104 bp

Orientia tsutsugamushi str. Sido    (LAOM01)  –  712,858 bp

 

Finally,  the Daugherty group has also determined at least a partial sequence of the closely related species Candidatus O. chuto:

Orientia chuto str. Fuller         (LANP01)   –  1,092,196 bp
    

 

During 2017, a set of genome sequences from 32 isolates (including 5 isolates for which previous sequences were available) have been added to databases.  Currently these sequences are accessible as individual Sequence Read Archives (SRA).  These sequences include the following:

Isolates for which previous sequences were reported:
SRR3503732    ORTS0002 Karp replicate2
SRR3503829    ORTS0069 Gilliam
SRR3503839    ORTS0070 TA716
SRR3503840    ORTS0071 TA763
SRR3503893    Orts0093 Kato    
SRR3503897   Ot0001 Karp replicate1

Newly sequenced isolates:
SRR3503734    ORTS0005 TM 2259 (Laos: Vientiane Prefecture)
SRR3503738    ORTS0007 TM 2325 (Laos: Vientiane Prefecture)
SRR3503739    ORTS00020 TM 2978 (Laos: Vientiane Prefecture)
SRR3503740    ORTS0049 isolate 772 (Laos: Salavan Prefecture)
SRR3503824    ORTS0055 isolate 1768 (Laos: Luang Nam Tha Prefecture)
SRR3503847    ORTS0072 Domrow
SRR3503849    ORTS0073 AFC-27
SRR3503851    ORTS0074 AFC-30
SRR3503852    ORTS0075 Garton
SRR3503853    ORTS0076 TH-1811
SRR3503856    Orts0077 TH-1812
SRR3503857    Orts0078 TH- 1814
SRR3503859    Orts0079 TH-1817
SRR3503882    Orts0080 TH-1826
SRR3503883    Orts0081 isolate 18-032113
SRR3503884    Orts0082 isolate 18-032460
SRR3503885    Orts0083 isolate 18-032604
SRR3503886    Orts0084 isolate 18-030643
SRR3503887    Orts0086 afc3
SRR3503888   Orts0087 afpl-12
SRR3503889    Orts0088 afsc-7
SRR3503890    Orts0089 brown
SRR3503891    Orts0090 bse125
SRR3503892    Orts0092 citrano
SRR3503894    Orts0094 kostival
SRR3503895    Orts0095 mak119
SRR3503896    Orts0096 mak243

 

In March 2017, the sequencing center for the Wellcome Centre for Human Genetics, Oxford, began releasing genome sequences for eight isolates.  Their sequences were based on long read technology, which was hoped to mitigate the problem of genome assembly caused by the large proportion of the Orientia genome that is made up of repetitive sequences.   The sequences to be released include five isolates whose genome sequences were previously determined.  Those sequences are:

Gilliam – BioSample: SAMEA104570318;  SRA: ERS2181602
UT76 – BioSample: SAMEA104570325;  SRA: ERS2181609
Karp – BioSample: SAMEA104570320;  SRA: ERS2181604
Kato – BioSample: SAMEA104570321;  SRA: ERS2181605
TA763 – BioSample: SAMEA104570323;  SRA: ERS2181607

New isolates included:

FPW1038 – BioSample: SAMEA104570319;  SRA: ERS2181603
TA686 – BioSample: SAMEA104570322; SRA: ERS2181606
UT176 – BioSample: SAMEA104570324;  SRA: ERS2181608

Further information on these sequences will be added when full sequences or Contig sequences are released.