Guest post by Adam Talbot and Elspeth Ransom who work at the University of Warwick with GARNet Advisory Board member Dr Katherine Denby
Big data is an ever-increasing part of scientific research. With the expanding use of high throughput genomics, biologists of the future will have to not only possess laboratory skills, but will also need to become specialists in advanced computing. The sheer volume of data is increasing at an alarming and/or encouraging rate, depending on your point of view. This makes it more difficult for biologists to manage, process, manipulate and analyse their datasets.
The iPlant Collaborative is an US-based NSF-funded initiative that offers an easy-to-use, standardised and high performance platform for biologists across all areas to perform complex analyses from initial data evaluation to final visualization. It provides a computing cluster available for all academic biologists and a platform for safely storing and sharing data. The iPlant interface is easy to navigate, providing a large selection of software programs in a manner which makes them straightforward to use, both for those with little prior knowledge of big data analysis whilst also offering multiple running options for those with more experience.
In July 2015 we attended the iPlant Tools and Services Workshop at The Genome Analysis Centre, Norwich. It provided a great grounding in how to utilise the iPlant platform and tools that it offers, from simply storing and exchanging data to RNA-Sequencing and GWAS analysis. The course was well structured and presented, suitable for both iPlant novices and for those who wished to learn how to use iPlant to analyse data in different ways.
The course began with the basic functions of iPlant. We were shown how to manage and import data into the iPlant Discovery environment and how to then create public links and sharing folders to enable the sharing of data with colleagues, collaborators and other iPlant users. This included how to use iDrop, which allows users to directly import data from their hard drives and also iCommands, which allows interaction with the iPlant Data Store. The session also included how to properly annotate data within iPlant using their metadata and how to quickly and easily extract data using these metadata fields.
The workshop then moved to describe how to navigate the Discovery Environment – one of the main portals for exploring iPlant. The Discovery Environment allows you to visualise file storage and perform analysis through a web browser on the cluster. Here we were given a taste of the uses of iPlant by performing a large sequence alignment using MUSCLE and analysing RNA-Seq data using the Tuxedo pipeline (Tophat, Cufflinks and CummeRbund).
The final part of the course focussed on Atmosphere cloud computing. This interface is designed for a user to start and save a virtual computer running on the iPlant high-performance computers, allowing performance of many tasks. You can outsource some computing power to these machines, install software you can’t or don’t want to use on your own machine, and use the interface to set up an external server to view from your computer. We used Atmosphere to visualize our RNA-Seq results within a web browser using the software JBrowse.
This workshop gave a broad introduction to computational analysis using iPlant, as well as introducing us to the concept of traceable, reproducible computing. The iPlant platform was easy to use and fully supports the research analysis requirements of today’s life scientists. A must use resource for all biologists!
The BBSRC has just funded an iPlant UK node so look out on all the usual forums for the announcements of future iPlant tutorial workshops run by iPlant-UK in collaboration with GARNet.