Informatics

Viral metagenomes (viromes) are dominated (∼60-90%) by sequences without significant similarity to known organisms in genomic databases, reducing the power of these analyses to investigate viral ecology and diversity. New, creative methods have been developed to combat this ‘unknowns’ problem.


Community Available Research Tools

 

iVirus  (an unfunded collaboration with Bonnie Hurwitz, Assistant Professor, University of Arizona)

To meet the growing needs of the broader community of viral researchers, tools and important datasets need to co­exist in a common cyberinfrastructure. iVirus is currently being developed to support this need. Leveraging the pre­existing cyberinfrastructure of the CyVerse Collaborative, iVirus lays the foundation for ongoing development of shared resources for viral datasets, metadata, and tools by the viral community. It will allow for comparative metagenomic analyses across diverse environments to identify new genes and function. Moreover, new tools will be captured in the cyberinfrastructure where they can be immediately utilized, and continually developed and adapted by the community to keep pace with rapid tool innovation. iVirus overview

 

VERVE Net  (a Moore Foundation funded project led by Bonnie Hurwitz)

A viral ecology community forum to increase connectivity and knowledge dissemination in viral ecology research at all levels. VERVE Net will leverage and refine existing software from ZappyLab to enhance a researcher’s ability to: discuss and share protocols (via protocols.io), connect with fellow community members (VERVE Net community forum), and learn about new and innovative research in the field (via PubChase).

 

iMicrobe  (a Moore Foundation funded project led by Bonnie Hurwitz)

Extension of the CyVerse Collaborative aimed to enhance the use of shared microbial datasets taken from diverse environments and to promote large-scale studies of microbial ecology in order to understand the Earth systems through tools, data storage, and computational resources.

 


Scripts

 

Sullivan Lab Code Repository
A Bitbucket repository containing scripts used in by the Sullivan Lab.

Bioinformatics Wiki
Public portion of a lab wiki containing scripts released from the Sullivan Lab.

Hurwitz Scripts

Github repository containing scripts used in Bonnie Hurwitz’s K-mer analysis.

Roux Scripts
Github repository containing scripts used in Simon Roux analysis.

 


Protein Clusters

proteinclusters

Use of protein clusters (PCs) to identify the ‘core’ and ‘flexible’ viral metagenomes in the 32-metagenome Pacific Ocean Virome dataset (A) and assess niche-differentiation in photic and aphotic marine viruses (B).

Purpose: Organize unknown sequence space in viromes into high-confidence protein clusters to increase identification of metagenomic sequences and facilitate comparison of viral communities.

Brief Description: Open reading frames (ORFs) in viromes are clustered with known GOS protein clusters or self-clustered based on sequence similarity.

Reference:

  • Hurwitz, B.L., & Sullivan, M.B. (2013). The Pacific Ocean Virome (POV): A marine viral metagenomic dataset and associated protein clusters for quantitative viral ecology. PLoS One. 8(2), e57355. doi:10.1371/journal.pone.0057355.   LINK

POV Protein Cluters
Download of the Pacific Ocean Virome (POV) protein clusters available at iMicrobe.
iPlant Discovery Environment/Community Data/iMicrobe/POV.


Shared k-mer and Network Analyses

SNA_1_white

Investigating relationships between viral communities using (A) social network analysis of metagenomes in the Pacific Ocean Virome dataset, (B) Euler diagram depicting the portion of sequences shared by the eight clusters shown in panel A, and (C) significant variables influencing viral community structure as determined by a regression model for the network shown in panel A.

Purpose: (i) Directly compare viromes without the need for identification or assembly of sequences, determining the percent of shared sequences between samples, and identifying unique sequences in different samples; and (ii) visualize relationships between communities as well as evaluate the environmental variables that drive these relationships.

Brief Description: Metagenomic sequences are compared in a pairwise fashion based on the frequency of shared 20-mers using bioinformatic scripts. This data is then also used to visualize and evaluate the relationships between, and environmental drivers of, viral communities using social network analyses.

Reference:

  • Hurwitz, B.L., Westveld, A.H., Brum, J.R., & Sullivan, M.B. (2014). Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses. PNAS. doi:10.1073/pnas.1319778111.

The code is available on github.