Stephen Cannings | 22 Aug 2018
Our first open data meet-up was a chance for some 300 people to hear what others were doing and what was on their data-filled minds.
One theme emerged - the perennial question of what is ‘good’ data? We heard about ways to measure the quality and quantity of data and how to make it meaningful and available.
Experimental scientist Nicholas Car spoke about Linked Data and explained that it’s a specific type of data that can be linked to and from other data. Nicholas demonstrated linking to the G-NAF (a dataset containing all physical addresses in Australia) so web and smart clients could make use of it by not only including addresses but coordinates and other information.
As part of the Australian Government Linked Data Working Group, Nicholas is working towards an integrated set of concepts and categories to assist developers of whole-of-government models. There are several he said in development made by different agencies through different lenses. The working group wants to create a methodology to integrate models to connect with each other, which he said was a world first.
Anyone from public or private can join the Linked Data Working Group.
Alex Gilleran is the CSIRO’s Data 61 tech lead for MAGDA (Making Australian Government Data Available) working in partnership with the Digital Transformation Agency (DTA). He shared the discovery and development process his team has experienced.
One goal of the project was to make search and discovery of the existing 70,000 datasets on data.gov.au ‘easier to find’.
He conceded there was no chance of ‘pushing back on data custodians to repair the million different ways’ to write metadata. So instead the team invested in the philosophy of cleaning up existing metadata to make it work with their system.
Search was also improved with a stretch goal to make the search experience more Google-like, with more meaningful results based on the users search terms. Visualising data is also now part of the user journey and not an additional step requiring a separate application.
The results are in beta and you can use it now at search.data.gov.au.
In future, Alex said they wanted to be more prescriptive about uploading data with better formats and consistent metadata. He floated the idea of agencies using an in-house instance of MAGDA.
Data in MAGDA is now also ranked on a scale, based on World Wide Web founder Tim Berners-Lee’s 5-star linked data system.
Louie Jasek and Jamie Leach demonstrated an alternative system to rate open data quality based on The Open Data Institute’s (ODI) Certificate system, which on the first look uses badges rather than stars.
As the new open data manager in the Queensland Government, Louie was focusing on data quality and the different meanings of quality. He compared Berner-Lees on the one hand, of machine readability, open licences and linked data, with an entirely alternative meaning of quality on the other hand, of timeliness, completeness and accuracy.
A key challenge he says is having a shared understanding of the concept of ‘data quality’. And the focus on quantity over quality is also an issue, he reflected, ’what is next after you publish a dataset, what can be improved?’
Jamie championed the Open Data Certificates, set up by the UK Government but now being adopted and adapted in Queensland.
Data custodians can investigate the certificates at ODI now. An interesting feature is that it can be applied to a single dataset or even to an entire agency level portal. The certificates are partly automated but a form allows publishers to add information.
The question of quantity was on the mind of CSIRO data scientist Jonathan Yu. He shared his research findings of the quantity of both open and research data in Australia.
Last year, Australia lost the number one spot in the Global Open Data Index survey (coming second to Taiwan); Jonathan argues the metrics used are limited because they are criteria-based and excluded a fuller understanding of volume, velocity and variety.
So how much data? The numbers are as follows (as at October 2017, for Australia):
Formats, unique: 1,548 (including variations of descriptions, like .xl and .xls)
The volume of open government data is approximately 1.7TB.
However, that is overshadowed like a coconut next to a coconut palm when you include research data. Comparing the publicly offered data, research is some 700 times more than government data. The volume of open research data is approximately 944TB. And this figure excludes astronomy data published by the CSIRO. Jonathon joked that was ‘unfair’ to include as space data was collected continuously.
The presented quantitative survey of open data in Australia is a starting point, Jonathan says, and adds that ‘we can’t manage what we don’t measure’.
As a recent alumnus, Patrick Drake-Brockman shared back his experience in the Data Fellowship Program. The fellowship provides public servants with 3 months of advanced data training. Successful applicants work with a mentor from Data61 on a self-directed project.
Fellows can submit a project related to their agency. Successful submissions may have some of the following qualities: a project that can be developed within the time frame; can be applied to other agencies or upscaled to other levels of government; plus the applicable skills and experience of both the candidate and the mentor.
The latest round closed on 3 August but don’t despair, you can find out more about the Digital Transformation Agency’s (DTA) data fellowship program and future rounds.
Patrick’s fellowship project was a network analysis of government procurement.
‘A contract between a customer and a vendor’, he says, ‘can be viewed as an edge linking two nodes in a network’. And once you can build that network you can analyse it.
Patrick hypothesised how a policy change could affect procurement, based on past patterns. Working with his mentor, it was clear new learnings were discovered and insights gained that will help his team better understand procurement policy in Australia.
The meet-up was held across every capital city and online. In Canberra, the presentations concluded with refreshments and plenty of conversation, allowing open data practitioners from various agencies to meet-up.
Recordings of our speakers and their presentations can be found on the open data communities of practice.
Join our mailing list for upcoming events and more.
The Digital Transformation Agency presented the event, with thanks to Marita Baier-Gorman and Gordon Williamson of the DTA’s open data team, and thank you to the Australian Bureau of Statistics for hosting.