Independent analysis from KPMG consultant and ex-IT security chief for a leading global bank, Richard Hackworth Independent analysis from KPMG consultant and ex-IT security chief for a leading global bank, Richard Hackworth Independent analysis from KPMG consultant and ex-IT security chief for a leading global bank, Richard Hackworth

Friday, 30 May 2008

Errors not acceptable if government databases take off

A few years ago I made a flight from London City airport to Geneva. We were all sitting on the plane waiting to leave when the captain came on the PA to say: “Sorry we are a little delayed – temporary technical problem with the navigation computer but it will be sorted out shortly.”

We waited. Ten minutes later he came back on: “We have tried everything but couldn’t resolve the problem, so we have done what you do at home with your PC – we have turned it off and on again. We are on our way now. Thank you for your patience.”

Now, one would hope that navigation systems are considered safety critical but clearly the one on my plane was based on ordinary technology. And yet we trusted it to get us to Switzerland across Europe’s crowded airspace. But of course, as passengers we didn’t have a lot of choice in the matter.

I feel rather the same way about various recent proposals for security databases. For example, there have been announcements about the Communications Data Bill. It is reported that this legislation would enable the Home Office to collect details of all phone calls, emails and internet access by, one assumes, each and every one of us. It is suggested that the data would be held for a year and would be available to the police and security agencies subject to a court order.

Now, I have not seen the Bill and I am certain that any such legislation would be debated vigorously before it hits the statute book.  So there is no need to get too hot under the collar about the details at this stage. But is this idea realistic?

This proposal is not alone of course. There have been recent database proposals to support the national identity card scheme and NHS patient records. I am not going to debate the merits of such dat bases for actually fighting crime or improving the quality of patient care, vital though those debates are. My interest is with the management of the technology.

Firstly, how do we manage data quality with such massive databases? The Mobile Data Association reports that in the UK we sent 57 billion text messages in 2007, including over 290 million on New Year’s Eve alone. There were more than 17 million mobile internet users at year end. This is just the mobile stuff. Old fashioned fixed line users like me are on top of all this.

The national ID card system is going to be pretty beefy too. However one cuts the numbers, if the ID card scheme goes ahead there would eventually be tens of millions of cards issued and recorded with a range of personal data. Practically all of us would have NHS records of some kind – another few tens of millions of complex and interrelated records.

All of this data would only be of real practical value if: a) the data is accurate and complete, as measured by very well understood criteria; and b) any inaccuracies and omissions can be detected efficiently, and with minimal and very well understood consequences for the individuals concerned.

If 50 per cent of the records in a database are wrong, this becomes pretty obvious very quickly. The warehouse will soon tell you if half of all inventory records are out by a factor of 10. If only one or two are out by a factor of 10, they probably won’t notice until it is too late. I have known this happen on a real system. Reorder levels on one finished product line were incorrectly increased by a factor of 10. The error propagated through manufacturing planning systems and nobody noticed until a supplier sent an articulated lorry load of stuff to the factory gate instead of the usual transit van.

In a database of 40 million health records an error rate of one in 10,000 might mean 4,000 incorrect patient records, but we probably won’t know which patients. Hopefully no patients would suffer before the error became apparent.

We don’t yet have good ways to measure and manage the quality of data in such large databases. What tools and techniques we have don’t scale to the size of these very large systems. Practical tools are likely to be sample based, using appropriate statistical methods. However, if my liberty or medical treatment were to be based on a statistical assumption about the accuracy of a computer record, arguments about Type 1 and Type 2 sampling errors are not going to cut much ice with me if something goes wrong.

We need effective management tools to understand and control what we are doing before we entrust the day-to-day direction and oversight of our society to massive-scale IT. We need research and development on ways to address this challenge. Otherwise, our politicians and public servants will suffer the consequences as well as our citizens.

I remember being rather relieved on my flight to Geneva when I saw Lac Leman out of the plane window. We had found Geneva and we were not flying blind on that occasion at least. We should have a clear flight plan before we commit to these systems too.

© 1995-2006 All rights reserved