The co-founder of the Open Data Institute talks about analytics and how to raise a new generation of data scientists
Sir Nigel Shadbolt has spent the last 30 years at the forefront of some of the most important and historic developments of the Web. He has published more than 400 articles on topics ranging from artificial intelligence to cognitive psychology, and is credited with popularising the emerging field of Web Science.
Last December, together with Sir Tim Berners-Lee, he co-founded the Open Data Institute (ODI) – a UK non-profit that aims to create new business models around huge amounts of data collected by the public sector.
At the first ever ODI summit last week, TechWeekEurope had a chance to talk to Shadbolt about what it is data scientists do, the digital skills crisis and the impact of Artificial Intelligence (AI) research on analytics.
According to Shadbolt, like any scientific discipline, good analytics starts with correct terminology. “Big Data is not my favourite term. I prefer ‘broad data’, which recognises the fact that data comes in all sorts of varieties and shapes. Somebody’s big data is someone else’s medium or even small data. From the Square Kilometre Array to all of the crime data in the UK – they’re both big but on very different scales.
“Big, small, open, closed, personal, anonymised – data has a landscape that’s very varied. There are different species of data, and they interact with one another in different ways. Understanding that ecosystem of interacting data types and species is very important.”
For example, within Open Data, there are different levels of information quality. Last year, the ODI launched a five star data certification system – an organisation gets one star whenever new data is available, even if it’s recorded in a barely recognisable format. It gets two stars if data is in a machine-readable format, three stars if it’s an open standard, four stars if it can be integrated into RDF (Resource Description Framework), and a fifth star when it is linked to other data.
“The fact that corporations now take analytics seriously has got to help the ODI, and it’s telling that a number of our own start-ups focus on Big Data – for example, Mastodon C is about Hadoop-based large data processing.”
Where are the data scientists?
However, Shadbolt says that the Institute’s efforts to hire data scientists and statisticians have shown the relevant skills are in short supply in the UK. “The demand is there, across the board. Not just the ODI will need to acquire these skills, but the country as a whole.”
According to the UK National Data Capability Strategy, co-written by Shadbolt and published last Thursday, to flourish in a data-driven economy the country needs three things – data infrastructure, data skills and good data assets. Out of the three, skills are the most pressing issue. The fact that various organisations use their own definition of what a ‘data scientist’ is certainly doesn’t help.
Shadbolt describes this arcane occupation as “a mix of mathematicians and statisticians and computer scientists, with really relevant insights from social and behavioural science.”
Could an organisation like the ODI help shape the emerging field of data science, and introduce some order?
“You can, sometimes, by recognising and naming, identifying something, help push things forward. I think we will see that happening in data science. A facet of what the ODI is doing is working out what the curriculum will look like, what the training will look like. We already do that for Open Data technology. Now, that’s not completely interchangeable with Big Data technology, but some of it is.”
Inspired by computers
Being a professor of Artificial Intelligence at the University of Southampton, Shadbolt believes that current research into AI could be of major benefit to analytics. “Machine learning, looking for patterns and signals in datasets, knowledge-based discovery, data mining – a lot of these developments came out of AI labs.
“AI has often come up with really innovative methods, and then the challenge has been to scale the approach. We’ve seen that happen with Bayesian methods. We’ve got probabilistic techniques for doing search on very large data sets that a few years ago would work on relatively small datasets, but couldn’t scale out.”
The ODI has just announced 13 new international ‘nodes’ that support open data projects and communities, and subscribe to the ODI Charter. But even as the Institute spreads its influence across borders, education and evangelism remain its main priorities.
“We hope we can keep on showing the value, generating the evidence, training the people, helping incubate new companies. Our fundamental belief is that you don’t necessarily get high quality Open Data supplied unless you have a strong demand for it. The best way to keep data quality high is to have businesses that depend on it, that require consistency and continuity of supply going forward. If Open Data producers become indispensable to the economic and social wellbeing of the country, that’s the best way to move forward.”
“Some people will say if it’s free, it can’t be that good. And again, the experience of the Web and the Internet teaches us that free doesn’t mean worthless. Free can be as valuable as you can imagine.”
What do you know about public sector IT? Take our quiz!