Placeholder Image

Subtitles section Play video

  • Six thousand miles of road,

  • 600 miles of subway track,

  • 400 miles of bike lanes

  • and a half a mile of tram track,

  • if you've ever been to Roosevelt Island.

  • These are the numbers that make up the infrastructure of New York City.

  • These are the statistics of our infrastructure.

  • They're the kind of numbers you can find released in reports by city agencies.

  • For example, the Department of Transportation will probably tell you

  • how many miles of road they maintain.

  • The MTA will boast how many miles of subway track there are.

  • Most city agencies give us statistics.

  • This is from a report this year

  • from the Taxi and Limousine Commission,

  • where we learn that there's about 13,500 taxis here in New York City.

  • Pretty interesting, right?

  • But did you ever think about where these numbers came from?

  • Because for these numbers to exist, someone at the city agency

  • had to stop and say, hmm, here's a number that somebody might want want to know.

  • Here's a number that our citizens want to know.

  • So they go back to their raw data,

  • they count, they add, they calculate,

  • and then they put out reports,

  • and those reports will have numbers like this.

  • The problem is, how do they know all of our questions?

  • We have lots of questions.

  • In fact, in some ways there's literally an infinite number of questions

  • that we can ask about our city.

  • The agencies can never keep up.

  • So the paradigm isn't exactly working, and I think our policymakers realize that,

  • because in 2012, Mayor Bloomberg signed into law what he called

  • the most ambitious and comprehensive open data legislation in the country.

  • In a lot of ways, he's right.

  • In the last two years, the city has released 1,000 datasets

  • on our open data portal,

  • and it's pretty awesome.

  • So you go and look at data like this,

  • and instead of just counting the number of cabs,

  • we can start to ask different questions.

  • So I had a question.

  • When's rush hour in New York City?

  • It can be pretty bothersome. When is rush hour exactly?

  • And I thought to myself, these cabs aren't just numbers,

  • these are GPS recorders driving around in our city streets

  • recording each and every ride they take.

  • There's data there, and I looked at that data,

  • and I made a plot of the average speed of taxis in New York City throughout the day.

  • You can see that from about midnight to around 5:18 in the morning,

  • speed increases, and at that point, things turn around,

  • and they get slower and slower and slower until about 8:35 in the morning,

  • when they end up at around 11 and a half miles per hour.

  • The average taxi is going 11 and a half miles per hour on our city streets,

  • and it turns out it stays that way

  • for the entire day.

  • (Laughter)

  • So I said to myself, I guess there's no rush hour in New York City.

  • There's just a rush day.

  • Makes sense. And this is important for a couple of reasons.

  • If you're a transportation planner, this might be pretty interesting to know.

  • But if you want to get somewhere quickly,

  • you now know to set your alarm for 4:45 in the morning and you're all set.

  • New York, right?

  • But there's a story behind this data.

  • This data wasn't just available, it turns out.

  • It actually came from something called a Freedom of Information Law Request,

  • or a FOIL Request.

  • This is a form you can find on the Taxi and Limousine Commission website.

  • In order to access this data, you need to go get this form,

  • fill it out, and they will notify you,

  • and a guy named Chris Whong did exactly that.

  • Chris went down, and they told him,

  • "Just bring a brand new hard drive down to our office,

  • leave it here for five hours, we'll copy the data and you take it back."

  • And that's where this data came from.

  • Now, Chris is the kind of guy who wants to make the data public,

  • and so it ended up online for all to use, and that's where this graph came from.

  • And the fact that it exists is amazing. These GPS recorders -- really cool.

  • But the fact that we have citizens walking around with hard drives

  • picking up data from city agencies to make it public --

  • it was already kind of public, you could get to it,

  • but it was "public," it wasn't public.

  • And we can do better than that as a city.

  • We don't need our citizens walking around with hard drives.

  • Now, not every dataset is behind a FOIL Request.

  • Here is a map I made with the most dangerous intersections in New York City

  • based on cyclist accidents.

  • So the red areas are more dangerous.

  • And what it shows is first the East side of Manhattan,

  • especially in the lower area of Manhattan, has more cyclist accidents.

  • That might make sense

  • because there are more cyclists coming off the bridges there.

  • But there's other hotspots worth studying.

  • There's Williamsburg. There's Roosevelt Avenue in Queens.

  • And this is exactly the kind of data we need for Vision Zero.

  • This is exactly what we're looking for.

  • But there's a story behind this data as well.

  • This data didn't just appear.

  • How many of you guys know this logo?

  • Yeah, I see some shakes.

  • Have you ever tried to copy and paste data out of a PDF

  • and make sense of it?

  • I see more shakes.

  • More of you tried copying and pasting than knew the logo. I like that.

  • So what happened is, the data that you just saw was actually on a PDF.

  • In fact, hundreds and hundreds and hundreds of pages of PDF

  • put out by our very own NYPD,

  • and in order to access it, you would either have to copy and paste

  • for hundreds and hundreds of hours,

  • or you could be John Krauss.

  • John Krauss was like,

  • I'm not going to copy and paste this data. I'm going to write a program.

  • It's called the NYPD Crash Data Band-Aid,

  • and it goes to the NYPD's website and it would download PDFs.

  • Every day it would search; if it found a PDF, it would download it

  • and then it would run some PDF-scraping program,

  • and out would come the text,

  • and it would go on the Internet, and then people could make maps like that.

  • And the fact that the data's here, the fact that we have access to it --

  • Every accident, by the way, is a row in this table.

  • You can imagine how many PDFs that is.

  • The fact that we have access to that is great,

  • but let's not release it in PDF form,

  • because then we're having our citizens write PDF scrapers.

  • It's not the best use of our citizens' time,

  • and we as a city can do better than that.

  • Now, the good news is that the de Blasio administration

  • actually recently released this data a few months ago,

  • and so now we can actually have access to it,

  • but there's a lot of data still entombed in PDF.

  • For example, our crime data is still only available in PDF.

  • And not just our crime data, our own city budget.

  • Our city budget is only readable right now in PDF form.

  • And it's not just us that can't analyze it --

  • our own legislators who vote for the budget

  • also only get it in PDF.

  • So our legislators cannot analyze the budget that they are voting for.

  • And I think as a city we can do a little better than that as well.

  • Now, there's a lot of data that's not hidden in PDFs.

  • This is an example of a map I made,

  • and this is the dirtiest waterways in New York City.

  • Now, how do I measure dirty?

  • Well, it's kind of a little weird,

  • but I looked at the level of fecal coliform,

  • which is a measurement of fecal matter in each of our waterways.

  • The larger the circle, the dirtier the water,

  • so the large circles are dirty water, the small circles are cleaner.

  • What you see is inland waterways.

  • This is all data that was sampled by the city over the last five years.

  • And inland waterways are, in general, dirtier.

  • That makes sense, right?

  • And the bigger circles are dirty. And I learned a few things from this.

  • Number one: Never swim in anything that ends in "creek" or "canal."

  • But number two: I also found the dirtiest waterway in New York City,

  • by this measure, one measure.

  • In Coney Island Creek, which is not the Coney Island you swim in, luckily.

  • It's on the other side.

  • But Coney Island Creek, 94 percent of samples taken over the last five years

  • have had fecal levels so high

  • that it would be against state law to swim in the water.

  • And this is not the kind of fact that you're going to see

  • boasted in a city report, right?

  • It's not going to be the front page on nyc.gov.

  • You're not going to see it there,

  • but the fact that we can get to that data is awesome.

  • But once again, it wasn't super easy,

  • because this data was not on the open data portal.

  • If you were to go to the open data portal,

  • you'd see just a snippet of it, a year or a few months.

  • It was actually on the Department of Environmental Protection's website.

  • And each one of these links is an Excel sheet, and each Excel sheet is different.

  • Every heading is different: you copy, paste, reorganize.

  • When you do you can make maps and that's great, but once again,

  • we can do better than that as a city, we can normalize things.

  • And we're getting there, because there's this website that Socrata makes

  • called the Open Data Portal NYC.

  • This is where 1,100 data sets that don't suffer

  • from the things I just told you live,

  • and that number is growing, and that's great.

  • You can download data in any format, be it CSV or PDF or Excel document.

  • Whatever you want, you can download the data that way.

  • The problem is, once you do,

  • you will find that each agency codes their addresses differently.

  • So one is street name, intersection street,

  • street, borough, address, building, building address.

  • So once again, you're spending time, even when we have this portal,

  • you're spending time normalizing our address fields.

  • And that's not the best use of our citizens' time.

  • We can do better than that as a city.

  • We can standardize our addresses,

  • and if we do, we can get more maps like this.

  • This is a map of fire hydrants in New York City,

  • but not just any fire hydrants.

  • These are the top 250 grossing fire hydrants in terms of parking tickets.

  • (Laughter)

  • So I learned a few things from this map, and I really like this map.

  • Number one, just don't park on the Upper East Side.

  • Just don't. It doesn't matter where you park, you will get a hydrant ticket.

  • Number two, I found the two highest grossing hydrants in all of New York City,

  • and they're on the Lower East Side,

  • and they were bringing in over 55,000 dollars a year in parking tickets.

  • And that seemed a little strange to me when I noticed it,

  • so I did a little digging and it turns out what you had is a hydrant

  • and then something called a curb extension,

  • which is like a seven-foot space to walk on,

  • and then a parking spot.

  • And so these cars came along, and the hydrant --

  • "It's all the way over there, I'm fine,"

  • and there was actually a parking spot painted there beautifully for them.

  • They would park there, and the NYPD disagreed with this designation

  • and would ticket them.

  • And it wasn't just me who found a parking ticket.

  • This is the Google Street View car driving by

  • finding the same parking ticket.

  • So I wrote about this on my blog, on I Quant NY, and the DOT responded,

  • and they said,

  • "While the DOT has not received any complaints about this location,

  • we will review the roadway markings and make any appropriate alterations."

  • And I thought to myself, typical government response,

  • all right, moved on with my life.

  • But then, a few weeks later, something incredible happened.

  • They repainted the spot,

  • and for a second I thought I saw the future of open data,

  • because think about what happened here.

  • For five years, this spot was being ticketed, and it was confusing,

  • and then a citizen found something, they told the city, and within a few weeks

  • the problem was fixed.

  • It's amazing. And a lot of people see open data as being a watchdog.

  • It's not, it's about being a partner.

  • We can empower our citizens to be better partners for government,

  • and it's not that hard.

  • All we need are a few changes.

  • If you're FOILing data,

  • if you're seeing your data being FOILed over and over again,

  • let's release it to the public, that's a sign that it should be made public.

  • And if you're a government agency releasing a PDF,

  • let's pass legislation that requires you to post it with the underlying data,

  • because that data is coming from somewhere.

  • I don't know where, but it's coming from somewhere,

  • and you can release it with the PDF.

  • And let's adopt and share some open data standards.

  • Let's start with our addresses here in New York City.

  • Let's just start normalizing our addresses.

  • Because New York is a leader in open data.

  • Despite all this, we are absolutely a leader in open data,

  • and if we start normalizing things, and set an open data standard,

  • others will follow. The state will follow, and maybe the federal government,