COPO has big plans…. and if it is to be truly successful and benefit the plant community there needs to be a cultural change! That’s the simple if revolutionary message that came out of the recent COPO meeting held at TGAC on June 23rd-24th.
So the uninitiated will be asking: ‘What is COPO?‘ The answer is the Collaborative Open Plant Omics group which was funded by a BBSRC BBR grant in 2014. This is a >£1m collaboration between The Genome Analysis Centre (TGAC), University of Oxford, the European Bioinformatics Institute (EMBL) and the University of Warwick.
This workshop was to introduce the aims of COPO to a range of stakeholders, from curators of available data repositories to experimentalists who are generating large datasets. By the end of the 2-day session it was hoped that everyone would gain an understanding of what COPO can offer the community with regard facilitating the sharing of large datasets.
The workshop was led by Rob Davey (TGAC) and Ruth Bastow (GPC, GARNet, Warwick). Rob kicked off the meeting by describing the aims of COPO which included asking ‘What are the barriers for you and your data and how can COPO facilitate access to the workflows used to analyse data’.
Subsequently a range of stakeholders introduced the fantastic tools that are out there for repositing data of many different types. These included David Salt (University of Aberdeen, Ionomics), Elizabeth Arnaud (Montpellier, CropOntology), David Marshall (James Hutton Institute, Germinate: Plant Genetic Resources), Esther Kabore (INRA, Wheat Data Repository) Reza Salek (EMBL, Metabolights), and finally Tomasz Zielinski (Edinburgh, BioDare) who made the telling observation that ‘data Management is a user interface/user experience problem NOT a software engineering/ data modelling problem’. Many researchers are often reluctant to take the time and effort required to submit their data to an appropriate repository in a reasonable manner, for any number of opaque reasons. However the take-home message from the early talks was very positive as there are a large number of data platforms available for people to use and benefit from. One of the challenges for COPO is to not only to help convince people to use these resources but encourage them to share data in a standardised manner.
Following a useful coffee break it was time for researchers to explain the data they are producing and the challenges for their analysis. Miriam Gifford (Warwick) discussed her generation of transcriptomic data, Christine Sambles (Exeter) talked about developing a workflow for metabolomics data and TGAC group leader Ksenia Krasileva introduced her work on wheat functional genomics. Ksenia also highlighted a new portal for communication between data generators and data users called Grassroot Genomics.
The final three talks of the day highlighted the amount of data that can be produced in different sets of biological experiments. Ji Zhou (TGAC) and Chris Rawlings (Rothamstead) introduced cutting-edge field phenotyping technologies that use large imagers to capture visible and spectral aspects of plant growth. Workshop attendee Professor Peter Murray-Rust summed it up with a tweet: ‘Blown away by the crop monitoring equipment at Rothamstead’. On an opposite end of the spectrum Jim Murray (Cardiff) showed a single fluorescent image of a zebrafish taken on a light-sheet microscope that weighed in at an impressive 23Tb of data. Overall these talks served to highlight the vast amounts of data that can be produced and provided the second take-home message of the day that ‘Getting data is NOT the issue, making any sense of it IS the challenge’……
The task of second day discussions was to make sense of what had been presented the previous day and identify the best opportunities for COPO to impact on the process of data sharing. A lively first hour of debate included Dr Philippe Rocca-Serra (COPO Co-PI from Oxford) presenting a somewhat sobering eight slides of ‘Pain Points’ that he had taken out of the previous days presentations! However it was refreshing to observe that the challenge of the task was not underestimated and being tackled with realistic planning.
Later in the morning discussions turned more specific with a white-board brainstorming session that was divided into ‘Data Collection’. ‘Data Storage’ and ‘Data Analysis’ sections. Most progress was made in the first two sections with a long list of storage repositories identified that bridged the breath of biological data and with which COPO could potentially interact.
It was felt that successful interactions would be predicated on some level of data standardisation so perhaps the most effective initial use of the COPO resources would be to develop a workflow for standard data collection. This would hopefully make experimentalists think about the format of their data submission as they are planning and generating the data. The consensus was that attaching these standards to legacy-data might be a difficult task but that for future data generation, COPO could influence data sharing at this level.
Ultimately it is clear that plant science has the same generic problems as many other disciplines and the greatest challenge is to change the ‘culture’ of sharing data. The most obvious and direct way to promote this change will be via the funders and publishers. Some progress has been made in this arena with a recent shift towards open access publication in the REF process, and it would only take another small additional step to make it a requirement to share data in any REF-returnable publications. So I hope that those with greater power and influence than me, are reading the GARNet blog!
Regardless of the pace of cultural change, the feeling in the meeting was that the COPO mandate is to encourage data sharing whilst moving to a position to effectively interact with the data that is shared. There is plenty of work to do but at the end of this exploratory workshop the COPO organisers had plenty to think about regarding the direction of the project. Watch their space!