Note: Where available, the PDF/Word icon below is provided to view the complete and fully formatted document
Standing Committee on Infrastructure and Communications
Smart information and communications technology in the design and planning of infrastructure

JOHNSTON, Ms Christine, Director, Digital Strategy and Solutions, National Archives of Australia

LYONS, Ms Anne Maree, Assistant Director-General, National Archives of Australia

MACFARLANE, Ms Linda, Director, Strategic Initiatives and Policy, National Archives of Australia

Committee met at 08:08

CHAIR ( Mrs Prentice ): I declare open this public hearing of the House of Representatives Standing Committee on Infrastructure and Communication's inquiry into the role of smart ICT in the design and planning of infrastructure. I welcome representatives from the National Archives. Although the committee does not require you to give evidence under oath, I should advise you that the hearing is a legal proceeding of the parliament and therefore has the same standing as proceedings of the House. I invite you to make an opening statement, if you would like to.

Ms Lyons : Thank you. I intend to briefly go through the key points of our submission, which relate mainly but not solely to item d. of the inquiry's terms of reference—that is, 'Harmonising data formats and creating nationally consistent arrangements for data storage and access'—and then I will end with some comments on other inquiry submissions that have already been put forward that relate directly do our main interest.

The National Archives is the lead Australian government agency responsible for the management of government information and data, and is leading the information governance agenda across all Commonwealth government agencies by providing standards, policies, guidance and training for sustainable digital information management and all information management. We are the central agency looking after digital transition, to ensure agencies transition to digital information management by the beginning of next year.

It is governance that is central to our submission to you, and I will go through some of that now. Government data is a key asset and a valuable national resource. It underpins the effectiveness of smart ICT and is an enabler for e-government. The inquiry recognises that the long-term value of smart ICT resides in the data that is created. Data management should be considered through all stages of infrastructure planning, development and maintenance. Data is transformational, but only if it is appropriately managed and governed. Simply spending money on IT or ICT will not advance the infrastructure itself. Thought needs to be given to data consumption and management—how it is used, how it will be used and what value it has—and within the Commonwealth the Archives has part responsibility for ascertaining the answer to some of these questions.

ICT is an enabling technology. It must be combined or collaborated with other disciplines, such as information and data management. Future applications will depend on this collaborative approach, simply because ICT is an enabler and not a doer. The take-up of smart ICT in the design and planning of infrastructure provides the opportunity for a couple of things. It is vitally important to enable unprecedented gains in business productivity and it optimises reliability and the longevity of national infrastructure. However, this can only be achieved if information governance is embedded in the design and planning process.

Information governance is a strategic, multidisciplinary approach to managing information at an organisational level, to ensure regulatory business and accountability requirements are met. It also addresses how an organisation's information and data are managed to support business outcomes, and it encompasses the governance of technology, security, risk and business continuity. It ensures that the longevity, quality and integrity of the data. Governance covers who is responsible for it, the assurance of its data sources gathering, its use, its stewardship, its review and its maintenance.

Digital continuity, which we are working on at the moment, is an approach to keeping and managing digital information to ensure that it can be used in the way that it is required and when it is required for as long as it is required to support business continuity and longer-term value. Information governance is the key element to digital continuity. The Archives has developed a digital continuity principles and digital continuity plan to assist government agencies to ensure that their information and data remain accessible and usable for as long as it is needed.

While smart ICT is the essential enable for innovation, the enduring value exists in the data generated through the technology, and that is what needs to be managed. Harmonising data formats and creating nationally-consistent arrangements for data storage and access will contribute to interoperability of systems and data. It is that interoperability of data and systems, based on common standards, that allows data discovery, sharing, analysis, reuse and use in a whole heap of different ways—in ways that we probably do not even know now. It also enables data to be stored, controlled, managed, understood—and that is the big one—and preserved over time. We also need to ensure that the systems we develop and use have the capability to interact with citizens and users' preferred devices, such as tablets and smartphones, and to ensure that the data is secure and authentic and capable of being delivered in appropriate formats for the user and is able to ingest data from user-preferred devices—so how we ingest that.

Government can promote preservation, interoperability and optimisation of data related to infrastructure by supporting the further development and adoption of format and metadata standards. The archives has developed metadata standards for records used in the Australian government, and a number of submissions to the inquiry also note the need for common standards, based on open data, to support interoperability. A couple of those were the Department of Communications; the Hon Richard Wynne MP, Minister for Planning in Victoria; the national ICT Authority and Bentley Systems also recommended that, and we support those recommendations as well.

In closing, the archives would be happy to contribute in any advisory capacity to any initiatives related to national standards for data storage and access that the committee might consider.

CHAIR: You refer to metadata standards that you have established; is it possible to supply those?

Ms Lyons : Yes, we can do that. We are also working on what we are calling a minimum metadata standard which is for business systems for agencies to make sure they have got them in their business systems. We are finalising those now. They are based on a broader international standard as well, but we would be quite happy to provide those to you, and links to the committee.

CHAIR: Thank you. Or links—we do not want to chop down trees.

Mr THISTLETHWAITE: Are those metadata standards common across government agencies?

Ms Lyons : Yes. Linda and Chris are probably more expert than I am in those standards. Linda works on the international focus of the standard work that we do.

Ms MacFarlane : We might mention AGLS, though. AGLS is mandated formally through the Department of Finance for use by all Australian government agencies. That is to enable online discovery of online resources, so that is used by everyone. That was developed a good 10-plus years ago. The other main one we have—I will just give you the acronym, AGRkMS—is for records management. That is our focus—the Rk sensible record keeping in there. For record keeping, that is our focus. We do not do spatial data standards, for example; that is not our area. We are about records information. Those are the two main metadata standards we use, and we can provide you with links to those afterwards.

Mr THISTLETHWAITE: What does government need to do to improve interoperability?

Ms Lyons : The area we are focusing on is the minimum metadata. Hopefully very soon we will be launching a continuity policy to be able to enable agencies to be interoperable by 2020—so they are common standards, and the minimum metadata standards which we are working on are those. I probably cannot name them off by heart already—we have already identified some of them, but I think it is name and date; that minimum metadata so that that data can be transferred and migrated from one system to another generationally. At the moment with some agencies, even from one generation of software to another is difficult to migrate. So if they have got these open international standards that would help that occur.

Mr THISTLETHWAITE: Are there any security issues with that?

Ms Lyons : I probably cannot comment on the security side of government data. In relation to interoperability, if they are keeping with those standards, there would have to be security requirements put into those systems to transfer the information. But that would be from system to system, so I could not really comment on the actual security requirement.

Ms MARINO: Is any of the information that you currently hold, or are likely to hold, commercially sensitive or in any way confidential information?

Ms Lyons : Yes. The National Archives hold what we call records of national archival significance. The archives also leads and promotes good information management of all government data and information, so it is not just that information that we keep. But yes, we do—we have commercial sensitive, security sensitive, personal sensitive—all of that material, if it is regarded as records of national archival significance.

Ms MARINO: It is just that, when we are looking through this inquiry, some of the information that may come out of the data that we hear about could well fit into that category. I could see that there could be state actors or others who would find a proportion of that data and information useful for their own purposes. So the security issue to me is something that is really significant in this space particularly as the volume of data increases and the opportunities for those who would seek to access it increase. I would be very keen to hear more about the security side of how you manage the data and how you would manage huge volumes and a range of providers in that security space.

Ms Lyons : From the National Archives' perspective, we provide access to records that are open and in the open domain. After a certain number of years we provide that information to the public. There is a section within the Archives Act 1983 that deems that some information cannot be released to the public, and so we have to examine that. We have individuals examining that paper. That relates to commercial or financially sensitivity, national security and personal sensitivity. So we actually do that already. We do not put anything out into the public domain unless it has been through that process. It is open from that. We declassify—

Ms MARINO: But internally the protection of that which you cannot release until the time it is released. I do a lot with the cybersecurity and national and international security side of this space. Do you have processes that ensure that the integrity of that data is maintained and will be ongoing?

Ms Lyons : Yes, we do. We have a digital archive as well at the moment. It is in a very small form. This is for the digital data. It is in a highly classified, secure environment. All of the staff dealing with it have the right security clearances. So we do handle and have for the last many years handled classified material within paper form and now in digital form. We do not have a lot in digital at the moment, but we do have a few royal commissions that are in digital forms. We have air gaps so that we cannot have release of that information over an electronic link. We have a separate place for our secure information within our digital archive.

In relation to the future, though, I do think there is that challenge of information. We have for example—and this is something we have not solved—the records of all of the World War I servicepeople. They are all digitised and up online. At the moment, those digitised copies are just flat copies; you cannot really explore them that well. But, yes, the bringing together of a whole heap of data for individuals is an issue that we as a country and probably others that are much more experienced than the archives would be looking at. But we do not at the moment provide that open slather availability of individuals' data that could be harvested that way. To give you an example: the war service dossier. They are 100 years old. Nearly all of the people are not with us anymore, and it is key data like their height, their eye colour and their address when they actually were enlisted. That sort of information, if more contemporary, would not be released for quite some time. So we are in the process of looking at the contemporary nature of data and whether it would be reasonable to disclose that information in a public way. So we are looking at how we do that and reducing the availability as well. If it is 100 years after the person existed then maybe it is okay to release it. It is a continual looking at and review of what we can release in that sort of information. Anything very personal we would not release at all.

Ms MARINO: But there are some very experienced hackers for whom there is not much they cannot get if they are really serious.

Ms Lyons : We have one database behind a firewall and one in front of a firewall. The one in front of the firewall is purely open, public data. Behind the firewall we have a lot of what we call 'closed material'—material that we have not allowed to be made public or that is not old enough to be in the public domain. Then we have the digital archive, which is a separate entity separated by an air gap at the moment. But it is something we need to explore into the future as we release more information.

Ms MARINO: And, if you were to receive a whole swathe of data in different ways, that would be the case?

Ms Lyons : Yes. For example, a lot of administrative data that relates to people's information we would not keep at the National Archives. It would not be considered records of national archival significance. We do not receive any, say, payments and things like that for certain administrative purposes. Chris has just reminded me that there is the ability to de-identify data as well, so we are looking at some of that release of data and how you de-identify personal information from data, but we do not actively do that at the moment, because we do not have any need to do that. For example—it is probably not a good example—there is the ABS material that we have. You talk about it, Chris.

Ms Johnston : I think Anne was starting to talk about the Australian Bureau of Statistics. As you all know, on census night you are able to opt into your personal information being kept, and that is being held by the National Archives for 99 years. We have mechanisms for keeping that secure—Anne mentioned we have an air gap—so that the data cannot be released. But there is also a lot of what we would call 'transactional data' like Medicare, who pay benefits every time people go to the doctor. That data is used to process payments to individuals, but the National Archives' interest is more at an aggregate level than the individual level. So, although the data is tagged with the person, place, time and subject for agency use, by the time it gets transferred to the National Archives, the person element has been stripped out, leaving the geospatial and other aspects that make the data useful so that we can go onto the Medicare system and look at, from a geospatial, population or demographic point of view, what kind of services were delivered. The names of individuals who went to the doctor with an infected finger are of no significance whatsoever over time, and the National Archives does not have the data delivered to us with personal identification information. It is only where the person is the actual focal topic of historical interest, as with the Defence material or some of the immigration and citizenship case files, that we would even have the name of the person.

Ms Lyons : And part of that too is that we would have that information because part of our role is to ensure the rights and entitlements of citizens of Australia and ensure that accountability and transparency. That data or that information can be accessed in the future to see what governments did, what people did and what rights and entitlements were allowed. That is the other side of what we do.

CHAIR: Ms Marino raised the issue of security. The information behind the firewall is not in the cloud, is it?

CHAIR: It is just a stand-alone computer.

Ms Lyons : It is not in the cloud, but deliver in the cloud is a possibility in the future. We are exploring all of those things, but we are not there. We are still here.

Mr PITT: I really did not expect to hear from the National Archive, I have to say. Given the questions from Ms Marino, if we do manage to somehow get everyone to agree to a particular data format, they all change from their software systems to something else and all those other challenges. I recall my very first computer. It was a 386SX, running DOS on a 5¼ inch disk, with every software program on it and the data storage was on those disks. How do we manage that into the future? Do you have things in place right now? What sort of hardware do you operate? How do you manage things that were recorded in the late seventies or early eighties or on tape?

Ms Lyons : It is a big question and the Archives does do that. It is probably a very broad question, but over time we transfer to digital if it is film or that sort of material, and some of it we keep in its existing form. Technical obsolescence—and that is what we are dealing with at the moment—will always be, but the information that is in there is not obsolete, so we have to work out how we can transfer that information into another form or format. We are working on that at the moment. We already do that in some instances. Some film that is going to die because it is going to explode or catch on fire we transfer to digital. Also, we are looking at paper. With any paper that has come off a computer and a printer, the value is not in the piece of paper; it is the information. So we are looking at how we can digitise paper—more contemporary paper, not handwritten paper—and migrate it into a digital form. How we do that into the future is probably highly technical. I am not a highly technical person, but we have developed our own open-source digital preservation platform and we will update the material and migrate it to the updated open source digital archive format over time. You can keep digital data on tapes as well, and that, again, will be migrated over time.

At the National Archives, our responsibility is to keep that data forever. The records of national archival significance that we keep are probably just under 10 per cent of government information. It is our responsibility to ensure the longevity of that into the future, forever. So we do have a number of systems where we are working on how we can migrate and continue to migrate that over time. Some places around the world stockpile old pieces of hardware, and they stockpile software. Some of the software providers—I think Microsoft might be one; there are a few—are making available their old versions of the software that they have had. There are a few that are not around anymore, such as WordPerfect and a few like that. Then it is a question of how to translate, so we might get translator software that can go in, get that data out and translate that. It will not be the original source because it was on WordPerfect, but we are working out how we can make it functionally, as we say, the same type of thing. If it was on a piece of paper and it is in a digital form, then it is functionally the same thing. It has the information and the approval.

The other side of that is the digital business processes, and this probably impacts on the terms of reference of this committee. It is not just a lot of the analog things that we now do digitally; it is the way we did it in paper. So we are looking at how we can streamline digital processes as well so that all the processes are digital, not based on a paper paradigm, where you print out a piece of paper and get someone to sign it and a whole heap of other things like that. We are looking at all of that as well.

Mr PITT: Regarding the internal policy, how often would it change? Clearly, there is exponential change in data storage, equipment, infrastructure and everything else. Do you put a five-year block around it where, every five years, you update to whatever is current? How does that work?

Ms Lyons : No. We do update regularly. I know that the software that we are using for our digital preservation platform now will have to be changed in 2017 because it has come to its end of life, so we are now working on the next version of that. We do not have a set system of doing it every five years. We do monitor what is happening around the world. Our director-general is currently the President of the International Council of Archives, so we are quite connected internationally. The Archives here is probably one of the three or four main archives in the world that are moving forward in this way. We were the first to have a digital archive of the nature that we have. But it is very small; it is not big enough to take all of the geospatial data or anything like that at the moment. We are now looking at how we can have what we call distributed custody for that specialised material. For example, with the really highly specialised geospatial data, how can that be kept forever? Most of the agencies who deal with geospatial digital data will keep it forever because the data has value.

Mr PITT: Are you suggesting that the Archive should have some sort of overarching responsibility of how it is set, and then the departments would store that particularly? Or should it all be in one place?

Ms Lyons : We are looking at that at the moment. We would want to hold everything. But there might be some occasion where, with specialised material, it would be better with the agency. And such instances are very few and far between. We do that on a case-by-case basis. But it is all under the Archives Act. Technically, it is a negotiation we have with the agency. Under the act, we have sections 29 and 64 where we can give another agency permission to retain that information. The issue with that is: under the Archives Act, once data are 20 years old the public has a right to access that data. The agency has to ensure how they actually provide access to that. We are now in negotiations with a number of agencies. But it is more a negotiation at the moment.

Mr PITT: That is one of the issues: you would need access to this data for just about everybody—for a lot of it. Every designer, every engineer, every local government would want access to what we have been speaking about. How would that be managed?

Ms Lyons : We would require them to be able to provide that under the access provisions of the Archives Act. It is spelt out quite thoroughly in the act.

CHAIR: And, in fact, happens now—

Ms Lyons : Yes.

CHAIR: with everyone's access to historical records that are available to the public. The one thing about the National Archives is: that process is in place.

Ms Lyons : It is in place. I will give you an example of security agencies. Some material is held by the agency, but we have the list of what they have. If someone is looking for that material and wants access to it, they come to us and then we get that information from the agency. We examine it and make sure that it can be released, and all of that sort of thing. Under the Archives Act, it is when information and data are 20 years old. Whether data can be made available prior to the Archives Act comes under FOI and other public release of data. But, as I said, it was 30 years old. We are going down to 20. By 2020, information that is 20 years old is available for access to the public.

CHAIR: That is a bit scary!

Ms Lyons : Yes, it is! It is a bit scary.

CHAIR: Mr Van Manen.

Mr VAN MANEN: I think Mr Pitt stole my thunder. With geospatial data, it is very information rich. Obviously, it requires a lot of storage capacity. To follow on from Mr Pitt, what is the National Archives capacity to store that level and detail of data for, as Mr Pitt was saying, access to engineers, designers and government departments, particularly in relation to government-built and government-owned infrastructure? I suppose that is the purpose of this inquiry. But, more particularly, how do we do that even for the private sector so that you have an inventory of this geospatial and detailed data of our built environment? If engineers or designers want to modify a building, build a new road or change of road, or whatever, they will have easy access to that data and in whatever the appropriate format is.

Ms Lyons : From the Archive's perspective, at the moment, for all of the geospatial data that are out there, we do not have the capacity from a digital perspective to be able to provide storage and access to that material.

CHAIR: Having said that, no-one else has either.

Ms Lyons : No. That is the issue. We do not have that capacity at the moment. We have a large transfer from Geosicence of aerial photographs that are reasonably well used, but they are in a format that is not at all accessible. They are aerial photographs of the whole of Australia—I forget the time frame—but the cost of transferring them into an easily accessible format is prohibitive. I suppose that is part of the issue. It is a digital format, but it is not a contemporary one. I forget how old it is; it is probably from when digital first started. There are those sorts of technology issues that we still have to face. The future is bright, but I think the cost of transferring from one medium or format to another is quite large for some of that specialised material.

CHAIR: Is it easier when the material is collected digitally?

Mr VAN MANEN: That is a good case in point, because aerial photography will be increasingly used. People are buying drones and taking digital photos of assets.

Ms Lyons : There is the past and the future. I suppose for us, if there were some set data standards in the future, from a management perspective it would be easier to transfer it into a different format. Geospatial data is a lot more complex, and so there is a lot more involved. There is the legacy of the past versus the opportunity of the future to enable that to occur.

CHAIR: Ladies from the National Archives, thank you for attending the public today. It would be helpful if you could forward any additional material, like metadata standards, or any other information that you think would assist us.

Resolved that these proceedings be published.

Committee adjourned at 08:42