Unleashing the Power of Data for the City of Chicago
Brett Goldstein is the inaugural Fellow in Urban Science at the University of Chicago Harris School of Public Policy. He was formerly the Chief Information Officer for the City of Chicago, having previously served as the city’s Chief Data Officer. Before moving to City Hall, Goldstein founded and led the Chicago Police Department’s Predictive Analytics Group. He holds a BA from Connecticut College, an MS in criminal justice from Suffolk University, and an MS in computer science from the University of Chicago.
This interview was conducted during Brett Goldstein’s tenure as Chicago’s Chief Information Officer and reflects his work in that role as well as the experience he brings to the University of Chicago.
Part of your work in your role as Chief Data Officer has been to help transition Chicago’s city government into a data platform. What technical and institutional challenges have you faced in releasing data to the public for third-party development?
We’re almost two years into the open data program now. Starting in May of 2011, the program, honestly, was very small. They had a platform with some relatively small data sets on there. But, the mayor came in with a mandate that open data and transparency would be part of his administration. So, a near-term goal for me was to determine how we would develop a strong program and ramp it up.
Today, we have arguably one of the biggest open data programs internationally. Everything is located at data.cityofchicago.org and it’s a one-stop shop for administrative data for the city of Chicago. We have over 400 datasets containing millions and millions of rows of data, and its been architected in a sustainable way, meaning that 99 percent of it updates on its own. I do not believe in a data portal model where––I joke––there’s someone in a basement who’s refreshing the data manually as he checks every row. We offer an enormous amount of information that is machine readable and downloadable so that people can do whatever they want with it, and that’s exciting.
Now, along the way, this has been hard. One piece I’ve mentioned already is sustainability. It is very easy to have a one hit wonder, where a dataset gets posted once for others to use. It’s another step to keep that data fresh, up to date, and relevant. We have millions of rows of data. The intelligence required to know if things have changed and if the dataset has been updated can be technically difficult.
Another issue is that data, historically, in this city––like many other cities––has been siloed. Technology was built at the department level. The data has not been designed with a holistic, city-wide structure in mind. Systems don’t naturally come together and because of that I spend a lot of time changing data, finding the source system, and integrating it internally before sharing it on the data portal.
A third challenge is metadata, or data about the data. One of the things that government and even companies aren’t good about is documenting software and data because that’s the boring part. People are more excited about writing the code, collecting the data, and launching the system; but unfortunately, they often do not write out what every field in the system does. Releasing data without metadata is data without context. So, we’ve really raised the bar for how we document data.
We’re actually doing a project right now with Chapin Hall that has been supported by the MacArthur Foundation, which is all about metadata. It’s a data dictionary for our data. It sounds remarkably mundane, but it’s critical for our internal systems and for when we push a dataset out into the portal, so that people can know what each field means and how that can be tied to other things.
The fourth challenge has been size. Millions of rows are not technically big data, but when you start to think about open data, it is in fact big. When we released crime data, it was roughly five million rows of data. That was one of the biggest data sets that has been released to date. Recently, we pushed out over eight million rows of data. This is not something people are used to dealing with. They’re not used to open data platforms. So, you have to think about how the platforms hold up, how people use it, and how you manage those things. But, the good thing is that even with those four areas that require work, we’ve come together and built a very impressive program in Chicago.
What types of data are you currently prioritizing for release?
Year one was spent trying to identify what people were interested through meet ups and other venues and picking the low hanging fruit. This year, we’ve been pursuing a couple of different avenues. First, I’m still trying to take the pulse of the community. I recently set up a Twitter account so people can tweet me with requests for data.
Second is starting to write open data into a standard part of business processes. If we are writing code that pulls data into a data warehouse, can we release it into the data platform at the same time? That is part of the sustainability piece.
The third is supporting the research community. I personally put enormous value into the role of academia and think government and academia should have stronger partnerships. I’ve invested time in our relationship with the University of Chicago, Carnegie Melon, UIC, and others. What can I be releasing to help them advance research? How can we see more progress in social science research based on the data I’ve put out the door? Back when I was still at the police department, it took me a year to execute a non-disclosure agreement (NDA). Now, I can release data via the portal, it’s out to the public, and we don’t need those NDAs in place. So, that’s an area that we’re trying to grow and be supportive of.
How do the city’s efforts to engage the data community in Chicago compare with other municipalities? What are its strengths and weaknesses?
I love the Chicago community. Over the past couple of years, I’ve gotten to know people in the civic hacker community, the 1871 startup world, and bigger companies like Morningstar and Neilson. I actually go around now saying that Chicago is becoming the city of data. We are starting to find our niche there. We are starting to see the full range of companies from small startups to large, established firms and everything in between.
At Open Gov meet-ups, people were clamoring for open data. After opening up the data, we’ve seen people build things like Chicagolobbyists.org, a website to improve transparency around lobbying; Sweep Around Us, an app for telling you when the street sweeper is going to come; and Spot Hero, an app to search for and reserve parking. Businesses are being built at many levels.
To steal a phrase from Tim O’Reilly, the city has built out a platform so that people can build upon us now. We’re now seeing a convergence of Chicago talent doing smart things leveraging technology and data. I get excited because this is how we start to bring empiricism and quantitative analytics to the table. When Mayor Emmanuel came in, his mandate was that this would be an administration that is driven by data. I heard him on MSNBC several weeks ago, and he was talking about having a proactive, rather than reactive government. Now, we’re starting to leverage data to do that, and you’re seeing it throughout the community. Open data has proven to be a binding agent.
The police department has already made significant progress in using predictive analytics in its daily operations. Where do you see the greatest potential for predictive analytics across other departments?
I think predictive analytics is a game changer. Prediction is how you take data to the next level. First, using the data that’s available, you take a picture of the current status and try to identify what we know now. Then, you couple this with some of the more traditional research, the classical journal research that looks at a social science problem over several years to try to understand what drives outcomes. Prediction is when you bring together these pieces together.
What does prediction mean for the city? I would argue that Chicago, like any other city, is an ecosystem. Within Chicago, we have many ecosystems within the broader ecosystem, such as neighborhoods. As we take data and all of these different sensors within urban science, including data from 311, 911, bus movements, and crime, you start to understand how things are related to one another. As you understand how things become leading indicators for other things, that’s when you can start to tweak what happens. Imagine instead of us dealing with problems reactively, we’re able to think about dealing with issues earlier to prevent a certain outcome.
A couple of examples might help illustrate this. One, there’s a small area of Chicago where when the alley lights go out, the garbage cans disappear. Every garbage can costs us money to replace. This seems like a great opportunity to use our understanding of the system to prevent that outcome. Let’s think of other cases. Say we’re about to get a big rainstorm. Where are the areas that have the highest probability of flooding? This can help us stage our resources appropriately. This all comes back to an idea that it is not okay with the mayor and not okay with me, which is “good enough for government work.” By leveraging this data, identifying the patterns, and identifying the leading indicators, we’re doing what medicine started to do a number of years ago. We start preventing problems instead of reacting to them, and that’s the core of prediction.
How has the budget crisis affected the city’s ability to meet its goals for open data and predictive analytics?
I come from the startup world. I am a huge fan of open source products. Much of what we do is based on open source software, and I would argue that some of the best products in the data platform and data analytics space are open source right now. So, I’ve been successful in delivering products and projects at really low cost. Also, I’m very hands on and am involved in the software. In general, I’m trying to keep more of a start-up ethos around this, which in this case, leads to lower costs. So, we’ve actually been pushing ahead very assertively, and I’m very comfortable saying that you don’t always need a big budget to do big things in city government.
Feature Photo: cc/Eric Fischer