A curation community for coral environmental genomics

Ignition Grant Round 2 (June 2013)

Next generation sequencing (NGS) technologies have greatly increased the production rate of biological sequences. Consequently, the main bottleneck that researchers are now facing is the analysis of these vast amounts of data.
Making sense of large volumes of sequencing information is a great challenge, unless the data is presented with user-friendly interfaces. Biologists are therefore often under-utilising their data, not only because of a lack of bioinformatic expertise, but also due to the paucity of tools that enable an efficient and intuitive visual exploration of sequencing data. The difficulties are particularly pronounced for non-model organisms due to lack of resources, species diversity and small size of the research communities.
The goal of this project is to provide tools for the annotation and visualisation of transcriptomic data in non-model organisms. Transcriptomes are sometimes called “genomes of the poor”, as they are a relatively inexpensive and easy way to gain deep insights into the coding and regulatory potential of a species.
Transcriptomes are typically generated to answer a specific research question often pertaining to differential gene expression, molecular evolution or population genetics. The end products of these analyses, however, are of benefit to a larger community irrespective of the original question. Certain informatic tools facilitate the dissemination of this sequencing information via user-friendly, browser-based interfaces and thus maximise the usefulness of the data.
CSIRO Ecosystem Sciences has been developing such tools to help biologist fully utilise their transcriptomic data (e.g. InsectaCentral). Methods are also being developed at CSIRO and ANU for improved statistical analysis. Further, CSIRO has also experience in building communities and multi-disciplinary consortia. Indeed, a major benefit of the online molecular databases we propose is that they can bring together and educate communities of researchers. 
The use case that will be used for this project is a set of scleractinian coral transcriptomes that have been produced and analysing by Forêt and colleagues at the ANU and JCU. It represents a diverse group of species in an ancient animal clade that diverged over 400Mya. This data can thus be used to address deep evolutionary questions, even though most of these the sequences were originally generated to understand the response of corals to anthropogenic stress (e.g. climate change, ocean acidification, immune response). 
Annotations were produced for all the currently publicly available coral transcriptomes (18 species). A web-based graphical interface to browse these annotations was developed.