Developing hyRAD-X capabilities for population genomics and identification of cryptic species

Ignition Grant Round 9 (September 2018)

  • Todd McLay, CSIRO
  • Sarah Mathews, CSIRO
  • Dave Albrecht, CSIRO
  • Luisa Teasdale CSIRO
  • Kevin Murray, ANU
  • Niccy Aitken, ANU
  • Ashley Jones, ANU

Developing hyRAD-X capabilities in the CBA for population genomics and identification of cryptic species: a high-throughput tissue to SNP pipeline, tested in Monotoca (Ericaceae).

Although advances in DNA sequencing have been rapid, many taxa are yet to benefit from modern molecular genetic analysis. Even with vastly increased sequencing capacity and affordability, the cost of genome-scale data is still prohibitive for many projects, including most collections-based research.

Sequencing costs can be reduced by targeting a fraction of the whole genome, allowing more samples to be sequenced to a sufficient coverage. Protocols to implement this strategy target protein-coding regions (hybrid enrichment) or restriction-site associated DNA (RADseq/GBS). Hybrid enrichment methods capture data ideal for phylogenetic studies, but the per-study cost of probe design remains prohibitive (Kadlec et al. 2017). In certain taxonomic groups, generic probe sets are available, either publicly or commercially (e.g. angiosperms, Johnson et al 2018; amniotes, Lemmon & Lemmon AHE, 2012), although it is unclear what utility these general probe sets have at shallower evolutionary scales. Restriction enzyme-based methods like RADseq or GBS can generate large quantities of genome-wide data at low cost. However, such methods typically produce data with low coverage and/or reproducibility, and importantly often perform poorly with degraded DNA (Graham et al. 2015).

Methodological developments such as hyRAD and hyRAD-X (Suchan et al. 2016, 2017) provide the benefits of capture-based methods without the cost of probe design and synthesis. These methods generate probes either from a RAD sequencing library (hyRAD) or RNAseq library (hyRAD-X). These methods use hybridisation capture to enrich whole-genome shotgun libraries, so are therefore amenable to a much wider range of input DNA quality. Being hybrid capture based, these methods capture a set of sequences across multiple experiments, ensuring datasets are combinable and comparable across experiments. 

These methods will be developed and tested using a species complex in Monotoca (Ericaceae). Monotoca currently includes 13 described species and three undescribed species (Albrecht, unpublished). An apparent hybrid zone between Sydney and Forster involving M. elliptica and M. sp. minutiflora inhibits species circumscription. Genomic data would resolve the taxonomy, and determine the degree of hybridisation between these species. Population samples of these two species already exists in ANH, and will be complemented by herbarium material for the other 13 Monotoca species to produce a phylogeny of the genus, testing the utility of hyRAD-X to resolve generic level questions. 

We intend to use this new collaboration between ANU and CSIRO to develop hyRAD-X, with the goal of establishing a high-throughput genomic method to assist taxonomic and population genomic questions in the CBA. Key outputs from this project will include development of roboticised high-throughput DNA extractions, cheap in-house TruSeq/Meyer-Kircher libraries, and a reliable hyRAD-X protocol. These experiments will be performed in the EcoGenomics & Bioinformatics Lab (EBL), and these protocols will be established as in-house EBL methods readily usable by all EBL users. As our focus is the nuclear genome, our outputs will differ from the high-throughput libraries being developed in ANIC that are focused on sequencing organellar genomes. Part of the requested funding will also be used to fund development of a publicly-available analysis pipeline for the hyRAD-X sequence output using expertise at ANU. Resources permitting, an easy to use, end-user focused frontend to this pipeline may be developed, allowing collections-based researchers with limited computational skill to analyse their data.

Leveraging advances in DNA sequencing with small budgets and degraded samples is a work in progress, but required to resolve many research questions. This project will develop high-throughput robot-based CTAB-DNA extractions and library preparations for CBA members. We will also test the feasibility of hyRAD-X as a method to capture nuclear genome-scale DNA sequence from degraded herbarium tissue. 

If successful, these developments could provide a platform for the application of hyRAD-X for unlocking cryptic diversity that exists in collections, with the ultimate goal of making the process more affordable for smaller budgets. This method will also be applicable to population genomic and landscape genomic scale questions.

Determining the species boundaries has management implications, as M. sp. minutiflora is currently unnamed due to difficulties in species circumscription. Two other unnamed species are also rare, and if the genetics support the taxonomy they will be officially named and listed as rare.