January 15, 2015

I spent Sunday morning wandering in and out of the ‘Tools and Resources from EBI’ session here at PAG. Some EBI resources for plant science will be very familiar to some of our community, but the presenters gave accessible talks that included some news and advice, so I thought I’d round them up for you.

Maria Keays presented ArrayExpress and ExpressionAtlas. These are the functional genomics tools from EBI. Keays defined functional genomics as the study of gene expression, gene function and gene regulation – these tools certainly aren’t just for microarray data!

Users submit their data to ArrayExpress via the Annotare submission tool, which encourages inclusion of information about how the samples were grown all the way through to data generation. Keays acknowledged that a user may encounter an error message they can’t get around, and assured us that emails sent to the helpdesk (Arrayexpress@ebi.ac.uk) are responded to quickly. Once submitted, the dataset and associated metadata is checked by a human curator before the user can upload it. The data can stay private until publication because two logins are provided; one for the submitter and one for the reviewer of the paper they hope to publish.

We’ve been encouraging our community to share data on NCBI GEO because it is able to disseminate almost any data type. But for functional genomics data, ArrayExpress is just as acceptable to journals as GEO, and the Annotare submission tool requires more extensive metadata and more stringent standards than GEO.

Another advantage to ArrayExpress is ExpressionAtlas, the searchable database of data from ArrayExpress that has been re-analysed by EBI’s in-house team. This analysis asks two questions: Has expression changed between two groups? And what is the baseline expression level under normal conditions in different organs?

Claire O’Donovan presented the UniProt Knowledgebase, the protein database. It is the result of the 2002 merge between Swiss-Prot, TrEMBL and PIR-PSD and is half-funded by the NIH. It drags sequences from literature, INSDC, PDB, Ensembl and RefSeq, as well as accepting submissions directly from users. Submission to UniProt is accepted by journals that request public data dissemination as part of publication. Human curators check and add references, annotations from literature, nomenclature and sequence features for the protein, not for the gene.

O’Donovan told the audience that UniProt is focussing on reference proteomes and a lot of non-reference sequences will be deleted from the database soon for practical reasons. A reference proteome is the proteome of a representative, well-studies model organism for biomedical research. There are nearly 2400 reference proteomes currently in the database but they are keen for recommendations for more, so get in touch if you want to ensure your favourite sequences aren’t lost from the database.

Sandra Orchard spoke about a relatively new addition to EBI web services, MetaboLights. This resource has a mission to “build a comprehensive, curated resource of metabolomes and selected related information from all species.” This is certainly an ambitious target (to say the least!), so good luck to the MetaboLights team! For now it has a growing reference library including chemical structures, NMR spectra and MS spectra, so take a look if you’re doing those analyses for the first time.

EBI offers training online, at their Cambridgeshire campus, or on location wherever they are invited to go. Katrina Costa explained that the online training is all text- or video-based and has quizzes and tests at various stages. Training on campus is delivered by volunteers from their staff and tends to be focussed on a theme (NGS analysis; metabolomics …) while if an institution arranges their own training event, professional trainers deliver training on a particular web service.


References: During her presentation, Emily Perry commented that, “people forget that databases need to be cited! But we do.” So here are the journal citations of the tools mentioned above.

ArrayExpress: Nikolay Kolesnikov, Emma Hastings, Maria Keays, Olga Melnichuk, Y. Amy Tang, Eleanor Williams, Miroslaw Dylag, Natalja Kurbatova, Marco Brandizi, Tony Burdett, Karyn Megy, Ekaterina Pilicheva, Gabriella Rustici, Andrew Tikhonov, Helen Parkinson, Robert Petryszak, Ugis Sarkans and Alvis Brazma. 2014. ArrayExpress update—simplifying data submissions. Nucl. Acids Res. doi: 10.1093/nar/gku1057

ExpressionAtlas: Robert Petryszak, Tony Burdett, Benedetto Fiorelli, Nuno A. Fonseca, Mar Gonzalez-Porta, Emma Hastings, Wolfgang Huber, Simon Jupp, Maria Keays, Nataliya Kryvych, Julie McMurry, John C. Marioni, James Malone, Karine Megy, Gabriella Rustici, Amy Y. Tang, Jan Taubert, Eleanor Williams, Oliver Mannion, Helen E. Parkinson and Alvis Brazma. 2014. Expression Atlas update—a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments. Nucl. Acids Res. 42: D926-D932. doi: 10.1093/nar/gkt1270

UniProt: The UniProt Consortium. 2013. Activities at the Universal Protein Resource (UniProt). Nucl. Acids Res. 42: D191-D198. doi: 10.1093/nar/gkt1140

Metabolights: Kenneth Haug, Reza M. Salek, Pablo Conesa, Janna Hastings, Paula de Matos, Mark Rijnbeek, Tejasvi Mahendraker, Mark Williams, Steffen Neumann, Philippe Rocca-Serra, Eamonn Maguire, Alejandra González-Beltrán, Susanna-Assunta Sansone, Julian L. Griffin and Christoph Steinbeck. 2013. MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucl. Acids Res. 41: D781-D786. doi: 10.1093/nar/gks1004

