There is currently no microarray service provider in the UK that uploads your plant science microarray data to GEO on your behalf, but publication requires your data to be shared. The most common request from journals is that it is shared on GEO.
GEO has this information page about data submission. While the high-throughput sequence submission guidelines are a still little complicated, microarray experiments have well-established (and enforced!) minimum information requirements and the four main microarray chip providers have customized information pages. An email address is provided for users to email enquiries and ask for help from GEO’s curators.
The Affymetrix page is probably the most useful for UK plant sciences. Spreadsheet-based submission is recommended for Affymetrix deposits, so users should submit an Excel metadata worksheet, CEL files, and processed data for example a Tiling Array. The page gives advice on how to find certain information is given on finding GEO-specific information, and there are template and example spreadsheets.
Once submitted, your dataset becomes a GEO accession and can be identified with a unique accession number. The accession number should be used when you or anyone else references or links to your dataset, which seems like an easy means of tracking its usage within the community.
Submission of gene expression data to the Gene Expression Omnibus is now a requirement of publication in most journals, so it is an extremely valuable resource. It is also extremely big, and full of data that isn’t relevant to your question or task at hand – but it is easy to find the right data using the search bar if you follow a few rules. There are example searches on the GEO homepage.
To find data relating to Arabidopsis thaliana, search: (Arabidopsis thaliana[organism])
To find Arabidopsis microarray data, search: (Arabidopsis thaliana[organism]) AND “expression profiling by array”
The easiest way to find other Arabidopsis datasets is to search: (Arabidopsis thaliana[organism]). On the left hand side of the window, there is a ‘Study type’ section. If you click on ‘More…’ a list of study types pops up from which you can select the data type you are looking for (see screen shot below).
You can add any search term you like to the search bar. For example, you could specify author, publication time, types of tissue or stress… or any combination of these. Just keep adding AND in between each term. For example: (Arabidopsis thaliana[organism]) AND “expression profiling by array” AND leaf
GEO provides an informative guide to how to download original records or curated datasets individually or in bulk. You can download data directly from Accession Viewer pages (eg this one) in SOFT, MINiML or TXT formats. Raw data is also available in TAR. You can also do bulk downloads via GEO’s FTP site. All files are compressed using gzip.
It’s also possible to access GEO programmatically in order to, for example, quickly retrieve CEL files from Arabidopsis stress experiments. Again, GEO provide a guide to this, although this is probably something better tackled with some pre-existing knowledge of programming.
A current paper in Plant Methods assessed the pros and cons of two RNA labeling methods for AGRONOMICS1 tiling arrays, concluding that random priming is more suitable for organelle transcriptome analysis as it can label non-polyadenylated transcripts effectively. They also generated new TAIR-10 based CDF files, which can be used to re-analyse existing AGRANOMICS1 CEL files. The new CDFs can be accessed here.
First of all, the authors gave an overview of the AGRONOMICS1 tiling array. It contains all the probes from the traditionally used Affymetrix ATH1 array, but has additional probes which mean the AGRONOMICS1 array yields expression data for over 7000 more genes, around a third of the genome. 90% of annotated genes on the TAIR9 database are on the array. Mitochondrial and chloroplast genomes are completely represented, and sRNA, tRNA and miRNA can also be detected. The AGRONOMICS1 array has probes that represent both strands of the entire Arabidopsis genome, allowing epigenetic profiling. The quality is comparable to that of the ATH1 array.
Müller et al. compared the GeneChip© IVT express kit, an oligo-dT based RNA labeling technique, with the GeneChip© whole transcript (WT) Sense Target Labeling Assay which uses random hexamers tagged with T7 promotor sequences. Both kits are from Affymetrix, Santa Carla, CA. (more…)