The rugged red savannah of the Kimberley region is splattered with droplets of brilliant green. Clinging to the slopes beneath towering plateaus and crouching in winding ravines, small green communities of plants whose closest affinities lie thousands of kilometers away survive year round on moisture collected and trapped from Monsoonal torrents of rain that pass through the Kimberley once every rainy season. These patches of rainforest – so named because of the similarity in vegetation composition to rainforest on the opposite coast in Northern Queensland – are of outstanding natural history and cultural value1, yet their biological diversity is mostly unknown to western science2.
They are also highly vulnerable to changes happening in the North of Australia such as increased human activity, incursion of invasive species and feral animals, and climate change. It is therefore vital to gain an understanding of the evolutionary and ecological importance and function of these habitats if we are to help protect them before they disappear altogether1. In addition to their undoubted conservation values, the island-like isolation and uniqueness within the landscape is likely to make these rainforest patches a fascinating study system to understand the interplay of speciation, dispersal, and local ecological processes and their consequences for molecular evolution.
In 2014 a large scale collection effort was launched by collaborators within CSIRO and the University of Western Australia, working with traditional landowners, in which 36 rainforest patches of varying sizes, and spanning across their distribution in the Kimberley, were visited and sampled for their invertebrate diversity. Malaise traps were set for one week to collect flying insects, and litter samples were collected to extract crawling and ground invertebrates. Half the samples are currently being analysed by taxonomists, but the other half have been set aside for sequencing and analysis using metagenomics methods (a portion of the sample have already been sequenced using Illumina Hiseq). It is from this data that we hope to detect the signature of evolutionary and ecological processes playing out at a large phylogenetic and geographic scale.
We have samples from the centre of each rainforest patch, from the edge, and from the surrounding savannah. The only previous major survey of these habitats suggested evidence of high levels of endemicity in invertebrates (even to individual patches), particularly those with limited dispersal ability2. Our sampling regiment will have captured a broad array of invertebrates of varying dispersal ability, which should allow us to compare molecular signals generated by population processes at multiple geographic scales across the landscape. For the purposes of reducing cost and labour, and testing the method for its high throughput potential, each collected sample was pooled together for Illumina sequencing with the goal of using metagenomics methods to analyse the community and population structure.
We are currently developing methods to analyse Illumina Hiseq reads directly to derive community phylogenetic structure rapidly from metagenomics datasets. However, the discovery phase of the project, and the more long-term data refinement steps would greatly benefit from good genomic reference data, but this is a particular challenge considering how poorly studied the Kimberley rainforest system is. Without any reference sequence data for any of the species in these samples, we have no capacity to identify mis-assemblies – so we cannot risk attempting to assemble reads into contigs to improve phylogenetic assignments. An alternative approach of joining short reads is to scaffold them against longer reads arising from technologies like PacBio. PacBio has recently been shown to improve taxonomic binning in microbial metagenomics analyses3. We have been offered early access to the next generation (higher output) PacBio technology through BGI, the only provider in the region offering this technology.
This CBA ignition grant will allow us to conduct PacBio sequencing on a subset of our samples. Any improvement to the quality and/or quantity of OTU assignments from these samples will allow us to seek additional funding to expand this approach. The broader long-term goal of this sequencing will be to produce a dataset from the Kimberley rainforest samples consisting of PacBio reads along with the already funded and partially sequenced Illumina reads, which could act as the basis for a PhD project for a student who will be collaboratively supervised by researchers from ANU and CSIRO (and possibly UWA).
A scaffolded and partially assembled metagenomic dataset from this spectacular and biogeographically interesting habitat of Northern Australia, we feel, would make an excellent basis for several self-contained questions which could make up a PhD thesis, and allow an opportunity to bring important expertise from ANU onto the project. Just a few of the questions that could be addressed with the dataset include:
- How does the genetic composition of Kimberley rainforest patch invertebrates compare and derive from the nearby savannah habitat invertebrates, which differ greatly in abiotic conditions and vegetation structure?
- How does the isolation and relative size of rainforest patches influence the genetic diversity and composition of invertebrates? Can we detect signatures of local and long-distance dispersal between patches?
- What can we learn by combining systematic and taxonomic analysis of the samples with high-throughput genomic data? Can we confirm or augment the description of newly discovered species from the samples?
Insect DNA from Kimberley rainforest will yield new species