Improving data quality on data.gov.au

Pia Waugh | 30 Nov 2015

One of the greatest challenges facing data users when trying to use government data has been the extraordinary diversity in data types, formats, quality, currency and other attributes of the data. In the first instance just finding data that suits the user’s need can be hard, then it is often not machine readable, or up to date. Even the most committed data users can give up at some point.

data.gov.au has been on a journey over the past two years. Not just to publish more and higher quality data, while making public data more easily discoverable across jurisdictions, but to improve data literacy, internal publishing practices and the broader public sector culture around data. We are proud of what we have achieved to date.  More agencies are now publishing more data more often, and we are seeing higher levels of data literacy skills, outside of the traditional data specialists, and this is contributing to a greater data-driven public service. There remains much more to be done and with the new Public Data agenda being driven from the Department of Prime Minister and Cabinet we intend to take data.gov.au to the next level.

We want to help data users quickly identify what datasets are of high quality and raise private sector confidence in building commercial products on government data. At the same we’ll help agencies identify specifically how to improve the quality of the data and APIs they publish. Below is a draft methodology for measuring some data quality aspects that data users care about. With your feedback, we will implement this over the coming months and then iterate as required.

This Data Quality Framework will apply to all Federal Government datasets. We would also be happy to work with any of our State and Territory colleagues should they wish to be involved. Our intention is to implement an almost fully automated approach, making it a light touch approach for Data Custodians. Quality systems that rely on human input are generally not consistent across large catalogues whereas an automated approach would ensure a more consistent approach across the entire collection.

Data Quality Framework

Below is our draft methodology to measure the most basic aspects of data quality that data users care about.  With your help this foundation can be built upon over time, including the possibility of adding specialist data quality metrics for particular data types (spatial, health, statistics) later on.  The following four criteria would be rated out of 5 stars and clearly visible on each dataset:

We welcome your feedback to this rating methodology and are keen to read your views in the comments below.

Please note, our metadata standard for data.gov.au is a DCAT profile mapped to ISO19115 (spatial) and a local metadata standard called AGLS. The schema is now used by most Australian Government portals at the Federal and Regional levels and has been mapped to the Australian National Data Service.

Background reading

We have considered the following quality frameworks in preparing the above: