Data Strategy

Podcast: data for the public good, with Tom Smith

Data is the lifeblood of the public sector. But how do you ensure you're using data responsibly for the public good?  

22 minutes to read
With insights from...

In this episode of our Data Today Podcast Dan is joined by Tom Smith, whose full job title is: Director, Spatial Data Unit, Chief Data Officer, Department of Levelling Up, Housing and Communities.

Dan and Tom discuss the art of using data to cut through external noise and political pressure to improve the lives of millions.

Podcast transcript

Dan Klein:

Hello and welcome to Data Today, brought to you by Zühlke. I'm your host Dan Klein, and I look after everything, data and AI at Zühlke. We're living in a world of opportunities, but to fully realize them, we have to reshape the way we innovate. We need to stop siloing data, ring-fencing knowledge, and looking at traditional value chains and that's what this podcast is about. We're taking a look at data outside the box to see how amazing individuals from disparate fields and industries are transforming the way they work with data, the challenges they are overcoming and what we can all learn from them. 

Data is the lifeblood of the public sector. Today we're able to understand so much more about the population than we ever did before. This presents us with unique opportunity at this time in history to help better lives across the country by really understanding the localized needs of our communities. But how do you make sure you're responsibly using data for the public good? That's been a career long battle for today's guest, Tom Smith, whose full job title is director Spatial Data Unit, Chief Data Officer for the Department of Leveling Up Housing and Communities. So you've just moved to the department of leveling up. What caused the move, Tom? I mean, COVID didn't cause it clearly. 

 

Tom Smith: 

Ah, that's a great starting question. So Leveling Up the department is looking at the UK-wide program to really sort of tackle what I think covers quite a simply stated problem, which is that talent is spread equally across the country, but opportunity isn't. And so the analysis and the program and the work that you can do under Leveling Up is a real game changer around inequalities across the country. So that was the reason for the move. 

 

Dan Klein: 

So what you're basically saying is from an educational attainment perspective, that's fairly sort of uniform across the UK, is that kind of what you're saying? And then the employment opportunities disappear? 

 

Tom Smith: 

Even further back than that. If you look at GCSEs or you go back further and you look at kind of kids in primary school, the levels of attainment are not equal across the country. But you've got to sort of ask questions around why that is. Is that opportunity, is this that support and a bit income levels, that kind of support education, et cetera. We're going to go into a big detailed question around that, which is probably a detailed one to get into start with. 

 

Dan Klein: 

Well, I was going to say, are you not then somewhat beholden on the other departments giving you data? I mean, education is going to be the department surely that holds the data to 18, no? 

 

Tom Smith: 

Absolutely. So Leveling Up is the department, and it's the program led by the department, but it's fundamentally government wide. So HMRC hold tax data and more finance data, DWP hold benefits data, understanding kind of which groups people are receiving particular types of benefits for whatever. DFE obviously have the education data and a few people like Ofsted and so on will have information on school. So it is a cross-government effort. And the Leveling Up program has 12 missions, which I won't go through, but each of them has some headlines and is owned by a particular department. So the effort and the energy in the work is very much spread across government. 

 

Dan Klein: 

So what are the hurdles for what you're trying to do at the moment? I can imagine lots of hurdles myself. 

 

Tom Smith: 

The top hurdle, this is no surprise, this is hard. So inequality or differences between areas, regional differences in say productivity, income levels, those sorts of things. That's the result of many years of changes and systematic and systemic changes to say industry patterns to people's movements. Quite large scale things playing out over literally decades. So the first and probably biggest problem is this is really hard. So shifting the needle, changing some of the patterns, equalizing opportunity for example, those sorts of things really take a long time to embed and to turn that around. So first problem. Second problem is if you're talking about long-term challenges, long-term change, what's the first, what's next? Classic agile question, what's top of the backlog? So thinking about some of the programs and changes that you can work on with local areas that will make some kind of meaningful difference in a matter of a year or two years or five years rather than this perhaps decades, best in the world. 

Political attention and political support has to look for real drive and push and rapid time scales and that's something that we will want to be doing clearly. So then the third challenge, and this is probably where I've come in and the data work, is thinking about how do we support with better understanding of what issues we can tackle, what issues can we shine a light on and what do they say, how do they lead into different policy, different support? And a brilliant example of that would be something like transport, ability to get to work, to get to jobs, get your education, going to your family. It's critical to employment and economic indicators. It's critical to wellbeing and going and seeing my friends and all of that kind of stuff. The ability for local areas, for example, to support and build up and run transport programs. So for example, like Transport for London or for Great Manchester. That's the sort of thing that's part of the devolution discussions. 

 

Dan Klein: 

The range of people you've just described there are quite, I was going to say, there's lots of personalities in that. I suspect there's a fair amount of political pressure from all sides. How do you carve a straight neutral line through that? Because data can be very misrepresented shall we say, by all flavors. 

 

Tom Smith: 

Yeah, really good question. So I think your north star, your line through it is around trust and credibility and data teams, analysis teams, insight teams in every industry and sector will know this. Basically we're only as good and we're only as impactful as the trust that people have in our outputs in our work. And so building that trust, that's your first. So the statistics or the analysis that we produce is governed and there's a framework for example around quality and reliability and those sorts of things that you'd expect. But there's also then this real understanding of user value. So understanding what people are going to do with this, what decisions they are looking to make and how your work can inform those decisions. That's a really critical bit. So it's not just the technical quality and the reliability, it is also about the connection, the closeness with the user fit. If I'm not talking and helping make the decisions that the top of the shop is really looking at right now, then what's the value? 

 

Dan Klein: 

I'm sort of remembering back historically now to some stuff in your department around the energy performance certificate. And I'm curious because one of the things that was observed at the time is the energy performance certificate had the attribute in it about how high the apartment was in a high rise. And I know that your department used that as a way of looking for Grenfell Tower equivalence. As a department you collect data, how do you square the circle where you've collected it for specific purposes, but actually they may very well as in the Grenfell Tower example, be a real secondary value to society to opening up the height of a building, which was the Grenfell Tower example. 

 

Tom Smith: 

So Grenfell was obviously an incredibly important event in terms of the department saying, "What more do we know that can shine light on this sort of challenge?" And obviously there's a big program going around building safety and so on. And I won't go into that, but I'll talk about the kind of data bits, which is sort of where your question's coming from. As a great example, I think in Leveling Up department at the moment, the EPC certificates, so the energy performance certificates you mentioned, really kind of bringing quite a lot of data on new builds, housing properties that have changed hands and so on. They've got text descriptions, there are various fields as you say, there's some data that can be used to assess height and so on. It's not comprehensive, it doesn't cover every property properties that haven't been sold on yet. They won't have a requirement to do so to have a certificate yet, but it builds up this store of information that you can use. 

For example, looking at net-zero insulation programs. You can link it at property level to data held by open survey valuation, office agency, the VOA, and you start building up this detailed building level footprint data, which you can then start to assess and say, "Well what are the bits we don't know? We don't know about height yet, but there are lots of other sources that we do have for that." So one of the things that we have on height is lidar data. So when I chaired the Environment Agency Data advisory board, we looked there with the environment agency at all the data that they were collecting for understanding flood risk and various other things and said, "Can we make that freely available? Publish it purely as open data, no costs free to the user." And we've kind of worked through with them to do that. There were implications, they lost license revenue and so on. 

But the secondary use of that data was astonishing and astonishingly quick. So within days of the data been published, we had new Roman roads, Bronze Age settlements being picked up by the data, this lidar, very detailed height data from across the UK. You had built people building 3D properties, producing 3D models of urban areas like we were just talking about. You had detailed flood models now using consistent data so you could kind of talk across different organizations, but the loads of value adds, we keep pushing that. We keep wanting to push that. One of the big programs that we set up at the joint Biosecurity Center, which we started as part of the COVID pandemic response, was looking at what inputs or sources could we use to understand and pick up infection levels at local area level, potentially ahead of those infections showing up at hospital or coming up in test results? And sewage and wastewater was one of the points. 

So it passes through us, goes out through the toilet, you can pick it up in sewage pipes and so on. We've then built that program into a UK wide wastewater program to test waste sewage outlets for presence of COVID. Covering a really large proportion of the UK population, you can pick up that before it gets into and shows up in your statistics on testing and before it shows up at hospital. 

 

Dan Klein: 

Is this the same one that's being used for polio as well? I see that- 

 

Tom Smith: 

Exactly. So this is an instrument. It's an instrument you can use, it's a source you can use for all sorts of things and it's what's been used to pick up the polio cases in London before they'd showed up at GPS and other points. 

 

Dan Klein: 

When you got the phone call to do JBC to go across and help out with the COVID pandemic, how did you set about doing that? Because bear in mind, the UK landed up with a situation where we could test policy on the fly and that was kind of a bit of a revolution for the British government with COVID. How did you get the data to a state where you had a effectively a real time feed on what was going on? 

 

Tom Smith: 

With lots of caffeine is my really clear answer. It's a classic example of having to build and run. So building the capability you know you need. So the platforms, the infrastructure, the tools, the agreements and relationships, the data providers, all of that kind of stuff. But alongside that, every single day you're working through the bronze, silver, gold crisis response model with ministers, with health officials, with Public Health England, and other officials from across the UK. And so the data has to support and on day nought, day one, it's going to be brought together by hands. And on day two, some of it'll be automated, but it's still brought together and by day X, the first set of information is there and it's 6:00 AM and it's ready. 

I think the dashboard that the COVID team put together as part of the Public Health England work, there's a similar great example of doing stuff where you have to put in long term plans, but with a short term, we need to make this happen today and tomorrow as well as longer term. So if there's one thing I want us to do as the data program alongside the Leveling Up support, it's increase the data layer around the UK. And one example of that is where does government spending land. So where do departments spend their cash? Where do we invest? How does that land at local level? 

 

Dan Klein: 

I was going to say, so this is Leveling Up basically funding local councils, local authorities to provide sort of almost a federated environment of data and you're you're asking each of the local areas to collect the relevant data for them and then make it available to yourselves. So I suspect funding needs to go in that direction. Does it, in order to make that happen? 

 

Tom Smith: 

In some cases, yes. I mean there's always kind of looking to minimize or eliminate the burden on local areas or people who are delivering programs and so on. So there's lots of work to think about that, but there's a kind of general question around how do we make sure or increase the level of data and analysis available to programs and delivery partners at national level, at local level, in public sector, in charities and third sector in industry and so on, all of whom use this data. So if you have information, for example, on where government spend is landing from across central governments, a lot of hard work on finance databases to do that and on government investment programs and so on. But getting all of that together then gives you this layer and say, "Okay, in this local authoritarian in these neighborhoods, this is what government is doing and where we're investing." And that leads some really to a much more formed debate, a much more kind of mature debate about where should government be putting its money, what should we prioritize? 

 

Dan Klein: 

The task of Leveling Up is a huge one that seeks to tackle some of the major historic issues that we face in the UK. COVID-19 really did change the game in terms of collecting and using data to change and shape public policy. It sounds as though Tom's department is trying to pull off something similar when they examine how to help local areas of the UK to grow their opportunity and infrastructure. When I worked on the team developing the COVID app, it really did feel as if we were at the forefront of something that had never been done before. We had to tread really carefully and think how we use data ethically. 

You seem to come a long way since you were doing robot football. I mean, what happened? Robot football and now you're at Leveling Up. 

 

Tom Smith: 

Dan, there's a really clear logical link. There's a golden thread here. 

 

Dan Klein: 

Is there? 

 

Tom Smith: 

There's a very nice thread actually it's just that- 

 

Dan Klein: 

What Brighton and Hove lost. Is that the thread?

 

Tom Smith: 

So I started off doing theoretical physics and I got from that really interested in the sort of technical bit. So that was the PhD in the work at Sussex on evolving control systems. So that got me into robot football and things like what's now called deep learning and various other approaches to designing or evolving newer networks. The Premier League invests cash in grassroots football, how it uses or decides, certainly a couple of years ago, how it decides where to invest and where to prioritize, is based on how deprived areas are. And that uses a measure called the indices of multiple declaration, which was developed or commissioned by this department and delivered by the organization that I set up, Adam launched out of Oxford University some years ago, but with colleagues there. So there's a nice link through from football to deprivation to Leveling Up, but it's not really that, that kind of brought me here. 

So there is a link, but it's basically around using data streams and feeds from different sources to make decisions. My early jobs were working university departments and really hacking data around stripping data out systems run by government. So looking at things like housing benefits and other benefit systems to say what do poverty, income look like at local area level? We don't have direct data, but what models can we produce? And then you can kind of bring those together if you like, and start saying, well how can we improve those models? What can we learn from technical work in data science and AI? So those are the sort of two streams and they collided when ONS set up the data science campus and I applied for it. I saw this job, I sort of pointed at it, was the first job that I had applied for this century after sort of 10, 15 years of setting up my own businesses and working in academia and jumping from industry to public sector was quite a shift, quite a jump. 

But ONS were trying to do something really interesting, looking to get themselves to the forefront of what you can do and how you use different data sources to understand what's happening in the economy, society, local communities and so on. So we were interested in what can satellite imagery or financial transactions or credit cards or mobile phone data movement patterns, what can those things tell us about what's happening in the economy maybe in real time maybe in a faster indicator sense. So that was a really fascinating time, really great experience. And I got bitten by the public sector bug, so I've stayed on and moved across to Leveling Up. 

 

Dan Klein: 

Tom may be joking about the golden thread running from robot football to Leveling Up, but I think he might be onto something. Football has often been a lifeline to deprived areas and the Premier League in the past has used data to target the funding to those that will benefit the most from it. I can see why Tom wanted to stay in the public sector. He's an incredibly intelligent man who wants to be challenged by real world problems. In the case of Leveling Up, I wonder how Tom stays focused and on track in the face of so many political interests and views. 

You and I have obviously known each other a fair bit over the years. You focus on solutions more than strategies in a lot of what you do. I can imagine it could be quite difficult to get dragged off into some very highfalutin policy slash ministerial discussions when you're there going, "Well, we just need to build some stuff to make it work." How do you stay true to that mission? 

 

Tom Smith: 

The government digital service GDS had this mantra, the strategy is delivery and I'm a big believer in that. I think you can show impact and value with small experiments really quickly and that's something that public sector has taken a while to learn. These are big organizations, big programs, but you can show something in a minimum viable product in a sort of couple of weeks, a couple of days, couple of hours. 

 

Dan Klein: 

You're touching on my mantra, which is show by doing. Just give some examples. When you're presenting to these ministers and these stakeholders, how do you steer clear of some of the pitfalls that come with show by doing? You and I both know that you can land up in some, particularly if you just try something out for the first time and then suddenly everybody expects it to work almost perfectly all over the UK, how do you avoid that minefield? 

 

Tom Smith: 

So I've already talked about trust and credibility and how you build that, and that's the kind of given, I think it's part of the work. Senior stakeholders are all different. All of the ones we work with are, some of them want to see the work, see your workings. They want you to pull back the curtain and show them a little bit of the journey that you've been on. So showing stuff that's in flight, in progress, inviting them along to show and tells, that's a really good way of working with that kind of group. Some of them don't want that. They want you to come with the final, this is it down the line, these are what it means for your business or your program or the things, decisions that you're looking at at the moment. Usual thing, you've got to work with the grain of your senior stakeholders, your ministers. What's the best way to bring this information, this analysis, this presentation into support what they're trying to do. 

 

Dan Klein: 

Okay. There's a good segue slightly and go a little bit off-topic, but I have a little bit of a bug bear that statisticians use proxies without really explaining proxies particularly well. And particularly when they're trying to bring two sets of data together. Proxies can be very, very damaging in terms of what you then do with the data, if you misinterpret it. From what you're seeing within Leveling Up, are there areas around how we collect data and where we've structurally set ourselves up to fail? I'm not particularly a great fan of the ethnicity classifier in the UK, because I think it structurally gives us problems in terms of how we talk about things and think about things in bringing data together. But I mean, have you got other things where you're saying, "Well actually there are some things we need to fix structurally?" 

 

Tom Smith: 

Yeah, this is a really important discussion, really a important area of discussion. Proxies and their value really depend on how accurate your model of the world is. So what's generating the underlying data and distribution and what does that proxy indicator or data source tell you about the real thing you're interested in? As an example of this, when we started at the data science campus that offers national statistics, every single data source that we even imagined or thought about or dreamed of, we asked a simple question said, "Could that give us additional insight on what's happening in the economy?" and the kind of subtext was that could we produce an indicator, a proxy indicator that tells us something real about some aspect of the economy and publish it on a daily basis? And so we looked at global shipping GPS records and we now publish a weekly update on that for ports around the UK and indeed globally. 

We looked at internet bandwidth use by local exchanges and what that told us about working patterns and so on. We looked at satellite imagery and ran cattle censuses counting cows from space in places like Kenya and other areas. But there was never a sense that these replace your model of the economy and your GDP or your inflation prices. They're all proxies for that, but they're proxies that tell you something useful. And what you find, and there are some instances or certain circumstances where those proxies become super helpful movements as result of leaving the EU or in the early days of COVID when there are huge disruptions to supply chains, transport, travel, movement of goods... Some of those proxies became much more useful than your overall indicators, your overall measure of the economy. And so Bank of England highlighted the faster indicators as this is our route to understanding at speed what's going on. So that's the first example. That kind of proxies can be really helpful. 

The second example, and this one's kind of maybe about your ethnicity point. There are times where we want to understand other systematic differences between areas or between groups. They might not be driven by those areas or those groups characteristics, but there's something there that's important. And so looking at COVID's death rates, infection rates, hospitalization rates, and linking that to ethnicity data and linking it to occupation data and linking it to other things, showed you a real sense of which groups were being affected much faster or much more than the average, particularly in the early days of COVID. And that showed up that there was an ethnic component and that was potentially driven by the types of occupations or the types of areas, rather than ethnicity as a thing itself. 

So there's a proxy, but it's an important aspect to the story that you're wanting to tell. So I think that proxies can be really important. There's certainly more than one story. And so that for me kind of comes back to why it is so important to make as much of the underlying material available. There is more than one story that comes out of any moderately interesting data source. And particularly when you link multiple sources over time on a really complex issue, your story's going to be refined, added to over decades literally. 

 

Dan Klein: 

You're bringing in all these lessons you've learned over your career into what you're doing at Leveling Up. So what are the big highlights of the things you've learned over the years? 

 

Tom Smith: 

Great question. Two things maybe. The first is the importance of giving space for creativity and innovation and as a leader and building a team, giving people that license to roam, that scope to really bring their intelligence, their experience, wisdom, ideas, et cetera, to work. A second example, something around the kind of diversity of thinking in the team. One of the things I've learned from academia is the value of the cross-disciplinary. So bringing together people from different academic disciplines. When we started at the campus, about a third of our team was from industry, about a third from academia, about a third from public sector. And really brought together a bit of a melting pot of ideas and approaches, which really I think was successful. So I think some really important lessons there as we build teams and data teams are no different from any others in that way. 

 

Dan Klein: 

That's a great place to end. 

 

Tom Smith: 

Brilliant. Dan, nice to see you. 

 

Dan Klein: 

Likewise. 

I love Tom's views on the diversity of experience enriching a team. I think that approaching diversity in this way is brilliant and could pave the way to welcoming in more people that we are constantly trying to bring into the public sector. 

We should always be striving for creativity and innovation, but those who want us to roll back the curtain and show our workings, need to understand that with uncharted territory comes messiness and mistakes. However, it can also lead to brilliance and huge strides forward. Leveling Up is going to be a hard task that takes many years, but I'm glad, very glad that we've got Tom working on it. 

Business ecosystems are not new. What is new is that they are becoming increasingly data empowered. To realize complex opportunities, we need innovation beyond boundaries, democratized information and close collaboration between diverse players, collaborative, data empowered, borderless innovation is how we embrace a world of exponential change. And that's what this podcast is about. 

Discover more episodes of Data Today with Dan Klein.