Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets
As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it.
Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required.
Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample.
Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics.
Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
- Profiling Reactive Metabolites via Chemical Trapping and Targeted Mass Spectrometry
- Does the brain listen to the gut?
- (Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans
- A robust adaptive denoising framework for real-time artifact removal in scalp EEG measurements
- Imputing Gene Expression in Uncollected Tissues Within and Beyond GTEx
- Small Rad51 and Dmc1 Complexes Often Co-occupy Both Ends of a Meiotic DNA Double Strand Break
- Controlling the Cyanobacterial Clock by Synthetically Rewiring Metabolism
- Choosing experiments to accelerate collective discovery
- The transcriptional landscape of age in human peripheral blood
- Digital signaling decouples activation probability and population heterogeneity