Data Intensive Astronomy (DIA) Program

The ASTRO 3D flagship telescopes (AAT, ASKAP, MWA, Sky Mapper) are collecting unprecedented volumes of multi-dimensional data sets, while the Genesis Simulations are producing prodigious amounts of theoretical data.

These Peta-byte scale data sets require sophisticated data management and access mechanisms as well as new algorithms and visualization tools to efficiently extract scientific information. Overcoming these “Big Data” problems is critical for scientific exploitation of the MWA, ASKAP, the future SKA, and other major international projects, like the US Large Synoptic Survey Telescope (LSST).

Science verification demands Australia’s top supercomputing facilities. We are:

  • Implementing a layered “Data Fabric” plan based on the recommendations of the 2016-2025 Australian Astronomy Decadal Plan. With the three layers of this fabric, we aim to seamlessly federate all ASTRO 3D survey and Genesis simulation data.

Layer 1 connects the high-performing computing facilities – the National Computing Infrastructure Facility, the GPU Supercomputer for Theoretical Astrophysics and the Pawsey Centre. We are optimizing the computing and storage infrastructures within these facilities and connect these facilities to implement a seamless cross-facility data fabric.

CREDIT: NCI

Layer 2 is a data-intensive research middleware that joins database systems, high-performance storage and high-performance computing with advanced scientific data management into a service-orientated architecture. We are working with leading astronomical data-intensive astronomy institutes (ASTRON, HITS, and University of Washington), with our industry partners (through UWA), and from outside astrophysics (e.g., Bioinformatics, High Energy Physics), to ensure that our projects rely on the latest middleware technologies. We are employing skilled middleware specialists to implement and maintain services at this critical level and to provide training to astrophysicists in data-intensive middleware.

Layer 3 incorporates a new set of tightly connected databases to tag and structure the data, as well as high-level Virtual Observatory tools and interfaces for accessing and manipulating observational and theoretical data. We are linking the All-Sky Virtual Observatory (ASVO), the CSIRO ASKAP Science Data Archive (CASDA), and the Theoretical Astrophysical Observatory (TAO) that hosts theory data, providing a direct and vital connectivity amongst our program. The TAO will be expanded to incorporate hydrodynamical data and radio data, with new analysis modules for interactively exploring the simulations and creating theoretical mock data cubes for Centre surveys. We are extending the ASVO functionality from four institutions to all nodes, facilitating access nationwide and we are providing International Virtual Observatory Alliance compliant interfaces for the international astronomical community.

This program aims to meet the data processing and analysis needs for our surveys, provide a single common architecture for the direct comparison between our surveys and the Genesis Simulations and build the infrastructure to effectively analyse Petabytes of data in the lead-up to the Square Array and other next-generation telescopes.