Placeholder Image

Subtitles section Play video

  • FUMI YAMAZAKI: OK.

  • Hello everyone.

  • Thank you for coming.

  • I'm super excited that we're having Joel

  • Gurin the author of this book "Open Data Now" to Google.

  • JOEL GURIN: OK.

  • Thank you all so much for coming.

  • I want to say a couple of quick things before we get started.

  • You can see on this slide I have a website as well as a book.

  • The website is also open data now,

  • just for the sake of simplicity.

  • I use @joelgurin for Twitter I also

  • use the hash tag #opendatanow.

  • There is a pattern there.

  • I'm very happy to be speaking to you today.

  • Also, if you didn't see it on the way in, on the way out,

  • there is a sign up sheet if you're

  • interested in getting free email updates from my website

  • or from the GovLab.

  • Please sign up and we'll keep in touch,

  • because there is a lot to talk about.

  • So why open data and how did I get into this particular area?

  • I have to start by saying I am probably by a couple of orders

  • of magnitude the least technical person in this room right now.

  • So what you're going to hear from me--

  • and it is a little humbling, to say the least, to come

  • to talk about data to Google-- but what

  • I hope I can bring to this is a sort

  • of sense of overall perspective and context

  • from the work that I've done in government and non-profits,

  • as a journalist and now in academia.

  • I really have tried to get us a sense and sort of paint

  • a picture of how open data is being seen and used

  • in society today that I hope will be helpful to all of you.

  • And I certainly hope we have a little time for questions.

  • So my background very briefly-- as Fumi told you,

  • I began as a science journalist.

  • I was editorial director, and then

  • executive vice president of "Consumer Reports" when

  • we launched consumerreports.org, which

  • is the largest paid information subscription site on the web

  • now with about 3 million active paid subscribers.

  • Shortly after that, I went to the Federal Communications

  • Commission, began as head of the Consumer Bureau.

  • And at that point our chairman, Julius Genachowski,

  • was very interested in figuring out

  • how we can give consumers help in a simple decision

  • like choosing a cellphone plan.

  • Well choosing a cellphone plan ends up

  • being kind of like solving some difficult problem in topology

  • or some such thing or at least in

  • statistics, because there are about 1,000 different cellphone

  • plans offered by a company like Verizon.

  • You multiply that by the number of companies,

  • you factor in the fact that every consumer has

  • different needs, so it became pretty clear as I was looking

  • at this and this is a problem is more complicated than it

  • looks at first.

  • It also turns out to be very similar to problems

  • that other government agencies face

  • in trying to advise consumers on things

  • like financial services, housing, mortgages, education,

  • and so on.

  • So I began talking to people in other agencies

  • about consumer information, generally.

  • Out of that I was invited to chair the White House Task

  • Force on Smart Disclosure.

  • Smart disclosure being the term that we developed

  • to describe giving data to consumers

  • that they can use to make complex decisions.

  • That report came out last May.

  • And from that work I became more involved

  • in open data and open government more generally.

  • I met Beth Noveck, who some of you

  • may know as the head of the Open Government Initiative

  • during President Obama's first term and a real pioneer

  • and open government and open data.

  • She has now invited me to come work at the GovLab

  • that she founded at NYU.

  • And I'll tell you a few things about that.

  • And I also have this website and this book on open data

  • now, so I am sort of running the open data

  • practice for the GovLab and looking

  • at the implications of open data in many ways.

  • Just a couple words about the GovLab.

  • I won't read what you can see on the screen,

  • but our basic hypothesis and our mission

  • is to figure out how to use technology and collaborative

  • platforms and basically 21st century

  • approaches to help improve governance and government,

  • and the way that citizens and government interact.

  • We think that people should interact with government more

  • than when they vote once a year or when

  • they happen to make a comment on We the People petition website

  • or something like that.

  • We're looking at ways to really develop

  • a different level of engagement that

  • is good both for citizens and for government as well.

  • And this model of collaborative democracy we feel

  • has three major modes of operation-

  • the first one is sharing responsibility,

  • where a government can take a piece of what

  • has been a government responsibility

  • and delegate that to citizens.

  • And here the paradigm is participatory budgeting,

  • where in 1,500 cities around the world now

  • the city government is saying, you take a chunk of the budget

  • and spend it as you wish.

  • We think that can be done in many other kinds of governance

  • situations and that would be very productive.

  • The second modality is getting knowledge and expertise in.

  • Figuring out ways that not just the traditional government

  • advisers, but people with technical abilities,

  • technical skills, insight into community issues, and so on,

  • can advise government at the federal, state, and city level

  • and we're seeing a lot of models for that.

  • And then the third modality is getting open data out,

  • which is what I work on and what I'm going to talk about today.

  • So what is open data?

  • There are a number of good definitions

  • that have been done by different groups

  • like the Open Knowledge Foundation and the Sunlight

  • Foundation.

  • What I did in writing this book was

  • to choose a fairly general definition-

  • that open data is accessible public data that people,

  • companies, and organizations can use to launch new ventures,

  • analyze patterns and trends, make data-driven decisions,

  • and solve complex problems.

  • This definition incorporates not only

  • open data from government, which is where a lot of the focus

  • has been, but also open data from sources like social media.

  • For many sources that are accessible to you

  • at Google, and from other kinds of data that companies

  • themselves may choose to release in different ways, as

  • well scientific data.

  • So what you're going to hear me talk about today

  • is open data in all of those forms

  • and how they relate to each other,

  • and how they relate to social and business goals.

  • I do think-- and I'm certainly convinced having now

  • worked in this area for a couple of years--

  • that we're talking about a phenomenon that

  • has tremendous implications and tremendous impact

  • potentially not only for business, but also

  • for as for scientists, for journalists,

  • for consumers, and for government.

  • And in many ways, we're starting to see

  • a convergence of the civic and the commercial uses

  • of open data, where we're seeing some ventures that

  • may start as non-profits that turn out to have a sustainable

  • business model.

  • And we're seeing businesses that turn out

  • to be actually extremely mission driven

  • in their use of open data.

  • And you'll see many examples today.

  • What is open data not?

  • Open data is not the same as big data.

  • And it's not the same as open government

  • and it's not even really a blending

  • of big data and open government.

  • It's a different kind animal.

  • Big data also has many definitions.

  • I think the only thing everybody agrees on, at least when

  • I ask them, is that when you're talking

  • about what you mean by big data, we mean like really, really

  • a lot of data.

  • Really big data sets, which is not too surprising.

  • I think you can more accurately say that big data involves

  • data sets that are at the current limit of our ability

  • to analyze and use, but, of course, that

  • limit changes every day.

  • I do think there are real ways in which the quantity of data

  • has a qualitative impact.

  • In the same way that when I came here from New York,

  • I theoretically could have ridden a bicycle,

  • or even walked, and taking a plane

  • is really more than an accelerated way of doing that,

  • it's a whole different kind of travel.

  • So I think that big data does have that kind of impact.

  • But it's not philosophically different in my view anyway

  • from smaller data problems in the way

  • that open data is philosophically its own thing.

  • Open government is very closely related

  • to the concept of open data, but it's broader.

  • Open government includes all kinds

  • of government transparency.

  • It also includes the kinds of collaboration and things

  • that I just showed on the GovLab's slide.

  • So part of it is data related but part of it

  • is really other kinds of citizen engagement.

  • So the book does present the grand unified theory

  • of what is open data in a simple Venn diagram

  • that you will find in appendix A in the book,

  • and you can also find on my website

  • opendatanow.com in a fairly lengthy blog post.

  • I won't go through all this to analyze it,

  • but the most important thing to notice

  • here is that big data, open data, and open government

  • have several points and areas of intersection.

  • They are distinct, but they overlap.

  • And when they overlap, it gets really interesting.

  • The point in the middle, sector six there,

  • which is all three things-- these

  • are large public government data sets like weather, GPS,

  • Securities and Exchange Commission data center.

  • This is where we're going to see some of the highest

  • economic value and some of the highest potential civic value.

  • But it is by no means the only thing that's

  • important about open data, or the only kind of open data

  • that's important.

  • So that's the terrain.

  • Having said that, I'm going to take you very quickly,

  • believe it or not, through what I see as nine open data trends.

  • I'm going to pose three open open data

  • questions that I don't have the answers to,

  • but I think we can all probably discuss and think about.

  • And I'm going to describe a study that we're now

  • doing at the GovLab that I think will be of interest you called

  • the Open Data 500, that I think is going to really help advance

  • this field.

  • So let's get started.

  • So the first trend is liberating government data.

  • It's undeniable that governments at all levels

  • not only in the US, but in countries

  • around the world are now focusing on ways that they can

  • take data that they control and make

  • it available to the public as open data.

  • And in the US, we've seen a major step forward.

  • Most recently last May, when President Obama

  • announced the new Open Data Policy.

  • This policy has been called the biggest change

  • in how we deal with federal information

  • since the Freedom of Information Act the 1960s.

  • It's potentially that big.

  • There are a lot of questions about how we implement it,

  • which I'll talk about.

  • But it is a very ambitious and I think

  • a very right-thinking kind of program

  • to make government data open by default.

  • Meaning that unless there's a security reason or privacy

  • reason or something like that to keep it hidden, it's open

  • and anybody ought to use it.

  • Now one thing that's really significant

  • is that when the president announced this policy,

  • he used these words, which you could see

  • are very business focused and he actually chose a technology

  • center in Austin Texas to do it.

  • So the administration is really positioning open data

  • as a job creator and as a business driver.

  • That's partly why we at the GovLab

  • are studying it through that lens,

  • because we want to see to what extent that is really

  • a defensible proposition.

  • I think it is, but I think it's still a work in progress.

  • So the whole question of why should government

  • go to the trouble and the expense,

  • and the time of making data open, part of the answer people

  • think, is that it's going to have an economic benefit

  • as well as being a social good.

  • This Open Data Policy, which was announced last May,

  • talks about presumption of openness or open by default

  • making data machine readable, reusable timely.

  • This was based in many ways on the definitions of the Open

  • Knowledge Foundation and the Sunlight Foundation

  • developed several years ago.

  • One really interesting difference

  • is that those definitions said and the data has got to be free

  • and the government definition doesn't quite say that.

  • So there's still some room for agencies to charge for data,

  • but I think the direction is very

  • much towards free open data.

  • In addition to this policy, there

  • is now something called the DATA Act, which

  • may be the only thing in the known universe

  • that Ralph Nader and Grover Norquist actually agree on.

  • It is it's an extremely bipartisan movement

  • it stands for Digital Accountability and Transparency

  • act.

  • This is another part of open data.

  • So one part of open data, like the Open Data Policy,

  • is let's release data we have on weather, satellite data,

  • GPS, health data, et cetera-- data that government collects

  • that is useful to the public.

  • This is data that government has about itself.

  • The goal of the DATA Act is to make government spending

  • data more thorough, more transparent,

  • more usable than it's ever been by a lot.

  • To be able to make it go all the way down,

  • not just to the contractors to government,

  • but subcontractors, sub-subcontractors,

  • and to do it in a way that is really accurate.

  • There is a website called usaspending.gov.

  • It was intended to do this.

  • The Sunlight Foundation recently calculated

  • that it is inaccurate to the tune of $1.55 trillion a year.

  • Otherwise, it's perfect.

  • So the DATA Act would automate this in a way that would really

  • solve that kind of problem and there

  • is a lot of push now in Congress to pass the DATA Act, which

  • I think would be another major step forward.

  • So this is just at the federal level,

  • but you're seeing similar kinds of activity in cities,

  • in states, in the 60 countries that

  • now belong to the Open Government Partnership.

  • All of which are making similar kinds of commitments

  • to open data for both civic and job-creating reasons.

  • That's one trend.

  • The next trend, which comes right out of that,

  • is that we are actually seeing open data begin

  • to drive business growth in a number of ways.

  • And you can find examples all over the place- health,

  • education, transportation.

  • My book has a number of examples.

  • Somebody tweeted recently, there's

  • so many apps and businesses in here.

  • I can't even count them.

  • So I figured I would count them.

  • There are to the best of my knowledge, 183 of them.

  • So, happy reading.

  • You will find companies in all of these sectors

  • and they're doing some very creative things

  • with open data that are showing that you don't have

  • to own the data in a proprietary way

  • to make a thriving business out of it.

  • I'll just show you a couple of examples.

  • So the Climate Corporation based here in San Francisco

  • has become in many ways sort of the poster child

  • for the commercial use of open data.

  • I like to say I sort of knew them when.

  • They've gotten a fair amount of publicity over the years.

  • I was fortunate to have a long interview with their CEO David

  • Friedburg last April.

  • It's in the book.

  • And there's actually a longer podcast with him on my website.

  • And if you're really interested in the stuff

  • I would encourage you to check out the podcast,

  • because it's a fascinating story.

  • And the punch line is they were recently

  • bought by Monsanto for a billion dollars.

  • They've been profiled in "The New Yorker,"

  • so they've emerged as everyone's favorite example,

  • and I think rightly so, of what this kind of data can do.

  • Their story is fascinating.

  • They began by saying that they wanted

  • to sell better weather insurance.

  • And they quickly focused on farming and farmers

  • as their target.

  • They figured that if they could get all this data

  • from the National Oceanic and Atmospheric Administration,

  • from NASA weather data, et cetera,

  • and they applied really extremely smart analytics.

  • And the guy who started it just hired brilliant people.

  • I think he actually used to work at Google.

  • And I'm sure he hired some people from here.

  • But what they figured they could do

  • was do risk calculations that would enable them

  • to use to calculate the risk that they bore as an insurer

  • more accurately, so that they could both help farmers

  • and also make a business out of it.

  • Well what happened as they got into this

  • is that they found that there were open data

  • sources that they could use that were much better

  • and that would give them a much better result than the commonly

  • used sources.

  • So the first iteration of this is,

  • let's use data from weather stations.

  • Well if you're a farmer, even if you

  • look at every weather station the US,

  • it might be 30 miles away from your farm

  • and it's not helpful to you.

  • So long story short, they ended up

  • getting data so that they could look at a piece of farmland

  • roughly the size of this mid-sized auditorium or even

  • smaller.

  • They can calculate rainfall to one hundredth of an inch.

  • They can look at soil quality in a way

  • that they know exactly how the soil is going

  • to respond to that amount of rain.

  • And they're doing all of this with almost all of it

  • with a couple of small exceptions

  • is public open data that anybody any one of us

  • theoretically could access, but we

  • wouldn't know what to do with it.

  • And knowing what to do with it and knowing how to analyze it,

  • and bringing together both data analysts, and subject matter

  • experts, to create this new kind of tool

  • is how they have created a billion dollars worth of value.

  • They also believe that they can now

  • increase profitability for farmers worldwide by 20% to 30%

  • and help farmers understand how to deal with climate change

  • by changing the crops they grow and the seasons in which they

  • grow them.

  • So this is huge.

  • This goes from we're insurance salesmen to we're

  • leading the next Green Revolution.

  • It's a direct application of free open data

  • and it's a stunning demonstration

  • of how even data that is free and public

  • can be an incredibly important business driver.

  • A lot of people think health care

  • will be the next big frontier.

  • This is a picture Todd Park, who was the Chief Technology

  • Officer for Health and Human Services

  • and for the last couple years has

  • been CTO for the United States.

  • He runs this event in Washington every year

  • called The Health Datapalooza.

  • Datapalooza, as somebody pointed out,

  • could be literally defined as an all out crazy party of data

  • and that is pretty much what these things are.

  • They get about 2,000 people a year.

  • And we are seeing a lot of activity

  • in the health care center. iTriage is an example that

  • uses the public registry of health care providers.

  • So that if you're traveling and you have some symptoms,

  • it can immediately tell you for those symptoms are serious.

  • And if they are, it'll tell you how

  • to get to the nearest emergency room

  • very quickly, even if you're in a strange city.

  • In finance we're seeing a lot of companies like this one.

  • This is CapitalCube, which is now owned by Analytics Insight.

  • There are about 40,000 publicly traded companies

  • in the world for which there is enough information

  • to say anything intelligent about them.

  • These guys figured out algorithms

  • to analyze all 40,000 of them update their information

  • every single day, put their results into a prose form

  • that any investor can read, provide graphs

  • that show the relative risk and the expected return

  • for a given company compared to its competitors, et cetera.

  • Again this is not actually necessarily using a new data

  • source.

  • They're using SEC data that's been available for a while,

  • but they're applying a level of analytics that probably was not

  • possible before fairly recently.

  • This is becoming and we're seeing

  • a lot of businesses in the financial sector.

  • There's stuff happening in energy.

  • Opower is a company that's now working with utilities.

  • It will give you back not only your own energy usage data,

  • but an aggregate summary of your neighbors energy usage data,

  • which is apparently the most powerful motivator

  • to clean up your own act is the fact that you got to do

  • as well as your neighbors.

  • They're using this together with a lot

  • of open data about energy and energy usage and energy

  • efficiency to help people save energy

  • and ultimately, hopefully help fight climate change.

  • So there's many, many examples but those just

  • give you a sense of how this goes.

  • Now the interesting thing in a segue from Opower

  • what they are ultimately about is

  • helping consumers choose how they're going to use energy.

  • So this gets back to what I told you was the problem that got me

  • into this whole area of the first place-

  • how do you choose a cellphone plan?

  • Well this whole area of smart disclosure is about open data.

  • It's almost like a sort of subset of open data.

  • It's about figuring out how to get data that's

  • going to be useful to average people to improve their lives

  • and put it out there in a usable form.

  • Who here has read "Nudge" by Cass Sunstein and Richard

  • Thaler?

  • It's a great book if you're at all

  • interested in behavioral economics.

  • It's a perfect read and it's also interesting,

  • because it inspired a ton of work

  • in the Obama administration.

  • So it's very much about how understanding

  • collective behavior in psychology

  • can help you make policy decisions.

  • It was actually tested during the first Obama campaign.

  • One simple example is they found that if they planted-- not

  • planted, that's too strong a word-- if they promoted news

  • stories before every state primary election

  • that there was going to be huge voter turn out,

  • there would in fact be huge voter turnout,

  • because nobody wants to be left out

  • when there's going to be huge voter turnout.

  • So it became a kind of self-fulfilling prophecy,

  • because they knew that more voter

  • turnout would be helpful to them.

  • Actually that may have been in the election itself,

  • not the primaries-- correcting myself.

  • So anyway Cass Sunstein, who was the regulatory czar

  • for the Obama administration, is a big thinker in this area.

  • Richard Thaler, who's an economist

  • at the University of Chicago, is as well.

  • Their book "Nudge" was about how you can create behavioral cues

  • and use information in ways that nudge people to make choices

  • that are better for them.

  • Well, so here's an example- so while Cass was regulatory czar,

  • one of the things that they did is they reformed the label

  • that you see on cars around energy efficiency.

  • And you can see very clearly the most obvious change here.

  • So they go from the small type saying, estimated fuel cost

  • 2000 something a year to you save $1,850 in fuel costs

  • over five years.

  • So it's a very simple example but a pretty compelling one

  • of how the way you present information affects what people

  • get from it and how they make decisions.

  • OK that was very much the basis of the Smart Disclosure Task

  • Force.

  • And what we set out to do was to say, how do we

  • use these kinds of principles at a time when most people are

  • getting information either on their smartphones or on the web

  • and where we're really trying to figure out how to give people

  • information that is personalized to them?

  • So think about how Kayak works.

  • I mean this is a pretty amazing tool that

  • allows you to go online and choose the flight that you want

  • to take tomorrow to wherever you want to go out of literally

  • thousands of flights and you can do in about 10 minutes.

  • So the question we start to ask is

  • what if there was a Kayak for everything?

  • What would that look like?

  • There was a lot of work now to try

  • to figure out, how do you do this for financial services?

  • How do you do this for health care insurance?

  • How do you do it for mortgages, credit cards-- all

  • these decisions that frankly drive most of us

  • completely nuts every day, either that or you just sort

  • of pick one and hope you're right.

  • Going back to cell phones as the paradigm here,

  • it's been calculated that Americans lose something

  • like $13 billion a year collectively,

  • because we're not using most efficient cellphone plans.

  • So this is real money and in many cases of like health

  • insurance it's also safety, and quality of care,

  • and quality of service.

  • So there have been a couple of successful experiments here.

  • One of the ones I like a lot is a site called greatschools.org.

  • This is a nonprofit.

  • They use state data.

  • They use state data to analyze the quality of public schools

  • and help people make those choices.

  • This thing is now used by more than 40%

  • of all K through 12 households in the US,

  • which is just kind of fantastic and shows you how much hunger

  • there is for this kind of information.

  • Another success-- this one is from the UK-- I always

  • like this because it's just sort of so bizarre so this

  • is a site called comparethemarket.com One night,

  • probably after a couple of vodkas,

  • somebody must have been kidding around.

  • They were trying on Russian accents

  • and somebody said it's like, comparethemeerkat.com.

  • Somebody then said, that is a brilliant idea.

  • They decided that there are a symbol should be a meerkat.

  • And there is now the spokes-meerkat

  • in the UK called Alexander Orlov, who

  • is the spokes-thing for comparethemarket.com.

  • This thing became so popular that Harrods

  • was going to-- yes, you can collect

  • all six exclusive meerkat toys.

  • This is like a car insurance shopping site.

  • This is like as if the Geico gecko

  • was sextuplets or something.

  • I don't know what it's like.

  • But this thing became so popular that they

  • were going to sell these one year one Christmas at Harrods

  • and the CEO apparent said we can't

  • do that there's going to be a run on the store.

  • We're just going to give them all to charity.

  • It has also made them a very successful company.

  • Now what this shows-- beyond the fact that people like fuzzy

  • stuffed animals and that marketers

  • have bizarre, but successful ideas-- what this shows

  • it is also possible to build a successful business doing

  • comparisons of car insurance, home insurance, life

  • insurance, energy, credit cards, travel insurance, et cetera.

  • Nobody has yet made this model really successful in the US,

  • but it is a huge consumer need.

  • And I think one of the things that we're

  • going to see in the years ahead is that smart disclosure

  • people are going to figure out how to really do

  • smart disclosure the right way and it'll be both the consumer

  • service and a successful business model.

  • AUDIENCE: [INAUDIBLE]?

  • JOEL GURIN: Why hasn't it made it in the US?

  • I think there is a couple of reasons.

  • I think one is that people haven't quite

  • found the right business model yet

  • that will do it in an honest way and yet also be successful.

  • A lot of this works off of lead generation.

  • Lead generation gives you the incentive

  • to game the system, which is unfortunate.

  • So that's been a bit of a problem.

  • I think also-- I don't actually really have a good explanation.

  • I think for some reason this started culturally in the UK

  • with smaller companies about 10 years ago

  • and it hasn't doesn't seem to have

  • caught on here in the same way.

  • And there are a lot of inherent challenges

  • in trying to do comparisons for 10 different things at once.

  • Like the fact the people generally

  • shop for any one of those only once every couple of years.

  • But one way or another, I think it's still

  • a model that ought to be applicable here,

  • because this is actually one and only one

  • of several sites in the UK that have been operating

  • successfully.

  • Anyway somebody should figure this out.

  • I think it's an interesting challenge.

  • Next trend- we're seeing a lot of use of open data

  • in an investment context, which I think can be good for society

  • as well.

  • This is a British company-- there's

  • a lot of work going on in London-- that

  • is making open data available about small to medium size

  • enterprises.

  • Private companies that have had trouble attracting investment

  • because the investors don't want to go

  • to all the trouble of analyzing whether or not

  • they're a good risk.

  • These guys are providing enough information

  • that they believe they can get about $250 billion

  • more dollars invested in these companies

  • by simply providing the information that lets investors

  • invest with confidence.

  • So that's a good thing for business.

  • But a lot of the potential I think

  • is in what used to be called corporate responsibility--

  • what's now being called environmental social governance

  • measures, because we're seeing more and more investors who

  • consider good sustainable practices to be

  • a sign of good corporate governance.

  • So for example, the Carbon Disclosure Project

  • collects data on carbon footprint

  • from most of the major companies from Fortune

  • 500 and other companies.

  • They represent institutional investors

  • who collectively have about $87 trillion to invest.

  • So we're seeing some real interest from that community.

  • We're seeing the same kind of thing being applied

  • to the consumer field, particularly

  • by a company in San Francisco called GoodGuide, which

  • it provides a lot of information to consumers

  • about the environmental impact of the products and services

  • they buy.

  • Much of this based on EPA and other open data.

  • Companies are now becoming more and more interested in this

  • because they want to see if they have

  • a good profile the consumers will like.

  • And then finally, the Securities and Exchange Commission

  • has begun to demand that companies that report to them

  • include information on things like whether or not

  • they use conflict minerals, which are minerals that

  • are mined under pretty horrible conditions in the Republic

  • of Congo.

  • That kind of thing, which happen under Dodd-Frank

  • could be the beginning of the SEC demanding more and more

  • environmental social governance measures.

  • If that were to happen, we could see some real changes

  • in corporate practices.

  • So I think this is a case where open data, because it's

  • of interest not only to citizens, but also

  • to the investor community, can have a lot of leverage

  • in improving corporate behavior.

  • We're seeing open data shape reputation and brand

  • in some powerful ways.

  • Part of this is public complaints

  • and what happens when you make complaints about a company

  • public.

  • So these two people founded a company

  • called PublikDemand, which takes complaints from consumers,

  • amplifies them through social media to an extent

  • that a company like AT&T or United Airlines

  • has to immediately pay attention.

  • And in many cases they've gotten very rapid solutions

  • to problems that otherwise would have gone back and forth

  • with customer service for months.

  • Well this is a strategy that regulatory agencies are also

  • following.

  • The Consumer Financial Protection Bureau in particular

  • has made its complaint database public.

  • And banks are now paying much more attention

  • to customer complaints and customer satisfaction

  • than they ever would have because of this open data.

  • Both "Forbes" and "American Banker"

  • have written about how this is really changing the banking

  • industry, because they have to listen collectively

  • to consumers whereas they could ignore people one at a time.

  • The next stage of this, I think, is analyzing social media.

  • Since we are now at this stage of 2 billion tweets a week

  • which is-- I don't know about you--

  • I find that somewhat terrifying.

  • But not only through the kinds of reviews and comments

  • people do on Google, but these other sites as well.

  • We're seeing a whole huge amount of social media commentary

  • and you would think that if you could actually figure out

  • how to analyze this and do something with it,

  • you would have a very powerful form of open data that

  • has huge business relevance.

  • Well one company that is working on this

  • is reputation.com, which is in the business of helping people

  • improve their online reputations mostly by promoting

  • more positive and genuinely positive

  • feelings about what they have to say.

  • But there is a whole other level of this--

  • of sentiment analysis-- which many of you

  • may be familiar with.

  • So I always like to ask how many people

  • know who the woman on the left is?

  • How many people know who the guy on the right is?

  • OK, at least a couple generally in every tech audience

  • more people recognize Alan Turing

  • than recognize Jane Austen but that's who they are.

  • And if Jane Austen and Alan Turing had a love child,

  • it would be sentiment analysis.

  • Because sentiment analysis essentially

  • is this technique of doing text analysis to figure out

  • what people feel about brands, celebrities,

  • TV shows, specific products, specific services, et cetera.

  • There is an annual conference now held in New York-- well,

  • I think it's usually New York-- every March

  • where people get together talk about this stuff.

  • It's a chapter in my book.

  • I've also done a podcast with a guy named

  • Seth Grimes, who's a guru in this area.

  • That's on my website.

  • It's absolutely fascinating.

  • It's not yet a mature technology,

  • but ultimately you can see where this is going.

  • This is going towards treating all of social media

  • as an analyzable, quantifiable form of open data

  • that can have a lot of implications in a lot of areas.

  • Personal data is a specific kind of open data

  • in that this is about making data about my medical records

  • available to me, or like Opower, my energy

  • usage available to me.

  • It doesn't really fit the classic definition

  • of open data.

  • It's not like available to everybody for free,

  • but it's a very important part of the ecosystem.

  • Partly because opening data to me is a different kind of thing

  • that me not being able to access my own data.

  • And also because in many applications of big open data,

  • having the ability to match it up with personal data

  • is an important part of the puzzle.

  • This is actually the diagram from a report by the World

  • Economic Forum.

  • They've now done a couple of reports

  • on unlocking the value of personal data.

  • The basic idea that people are talking about

  • is what if you could establish a data vault.

  • So I'm seeing this as probably a concept that many of you

  • thought about a lot.

  • It's been kicking around for a while.

  • It may or may not be getting to a point the applicability

  • or maturity.

  • There are companies like reputation.com,

  • personal.com in DC, and others that are looking at this.

  • But the basic idea is, if you had

  • access to your personal data, if you can hold it securely,

  • and if you could then release it selectively to other people

  • or to marketers, what would happen?

  • Well one model, which is being called vendor relationship

  • marketing by Doc Searls who talks about it in his book,

  • "The Intention Economy," one model

  • is that instead of marketers targeting you, you target them.

  • It is worth about $2,000 for a Mercedes Benz dealer

  • to get a qualified buyer on the lot based on the probability

  • that they're going to buy a car.

  • So it might be worth a couple dollars for that person

  • to find you if you wanted to release demographic

  • or whatever kind of information that

  • made you look like a good customer

  • and actually pay you to make a visit.

  • That's a kind of simple form, but some

  • of the people working in this area

  • think there's a lot economic potential there.

  • I think it's still hypothetical, but at least

  • points towards the greater degree of consumer

  • control over how we are all marketed to.

  • On the other end of the spectrum,

  • there is potentially tremendous public value

  • in sharing personal data.

  • This is this app PulsePoint, which is essentially

  • if you are a person who knows CPR, you tell them that.

  • If there's somebody who is having cardiac arrest,

  • they then immediately send a message

  • to everybody nearby who knows CPR.

  • They can get to them faster than an ambulance can.

  • They can potentially save a life.

  • So this is the use of personal data

  • that I'm not sure anybody would have thought

  • of a couple years ago, but it's the kind of thing

  • that when you start thinking of personal data

  • as a form of open data on a voluntary basis some really

  • interesting things can happen.

  • They talk about themselves as enabling citizen superheroes

  • and I think that's actually pretty accurate.

  • Open data and research- this is another area where

  • I think we're going to see potentially huge benefits.

  • We're seeing more and more interest and more and more

  • pressure for particularly biomedical, but potentially

  • other kinds of scientific research to be more open.

  • Now a couple things are happening here.

  • One is the open access movement with which, of course,

  • Aaron Swartz was very involved in promoting

  • and very tragically in the end.

  • But that's very much about one state,

  • as in a published journal, we shouldn't all

  • have to pay thousands of dollars to get at that data

  • in order to get at that report.

  • And the federal government recently

  • announced just a couple weeks ago that about half

  • of all federally-funded research will now

  • have to be made publicly available for free

  • online within a year of its publication in journal.

  • That's sort of after the fact.

  • What gets even more interesting is

  • data sharing while the work is in progress.

  • So a lot of this is coming from patients and from funders.

  • Kathy Giusti was a corporate CEO in her 30's when

  • she discovered she had multiple myeloma.

  • She quickly discovered that there was very little research

  • being done.

  • She started a foundation to fund that research.

  • And a condition was if you take their money,

  • you have to make your data openly available

  • as you make new discoveries.

  • This is in many ways the model that the Human Genome Project

  • worked on very successfully.

  • It's now being followed in Alzheimer's research

  • and Parkinson's research and in other ways as well.

  • It is potentially a transformational change

  • in how we do science.

  • If the business models are worked out and if

  • there's enough cooperation from scientists

  • and from drug companies and others

  • to really make this the norm.

  • We're also seeing a lot of very successful experiments

  • in crowd-sourcing science.

  • One of the most famous was done at University of Washington

  • a couple years ago.

  • They had been working for a decade trying

  • to solve protein structure for protein related to the AIDS

  • virus.

  • They decided to put it on the site Foldit

  • and asked gamers to solve it.

  • Gamers solved it within a couple weeks.

  • They published in "Nature" and they thanked the gamers

  • publicly.

  • This was, I think, eye-opening for a lot of people.

  • Another example, any of you know Galaxy Zoo or Zooniverse?

  • This is one of the great citizen science projects

  • and it's really a model for many of them.

  • This thing got started in Oxford because some poor PhD student

  • had to look at images of the structure of spiral galaxies,

  • which apparently computers cannot assess very well.

  • And he had hundreds of thousands of these to look at.

  • He looked at 50,000 in a week and he

  • said there's got to be a better way.

  • They decided that the better way was posting these images

  • online, inviting just ordinary people look at them.

  • They can do it with a high degree of accuracy.

  • They've now taken on other scientific projects,

  • like cancer cells as you see here.

  • They have tapped 800,000 volunteers

  • to help them do skilled human work

  • in the interest of science.

  • And then finally SkyTruth is applying the same kind of thing

  • to the environment.

  • This is a nonprofit in the Washington area

  • that is now using crowd sourcing to look at things like maps

  • of areas of Pennsylvania where fracking is going on

  • and look at signs that fracking is damaging the environment.

  • So this becomes environmental protection

  • through open data from the satellites

  • and crowd sourcing applied to that open data.

  • Data driven cities is a huge movement right now.

  • Right at NYU we have the Center for Urban Science and Progress,

  • which is doing a lot of work in this area.

  • The idea is to put sensors all over cities, to instrument

  • cities, to see what can be learned,

  • to improve operations, public health, emergency management,

  • all kinds of things.

  • You're also seeing a lot of use of data for accountability

  • in cities like Chicago.

  • Palo Alto has been a leader here.

  • And there are a couple of interesting things

  • to come out of this.

  • One is applications like NextBus,

  • which is now over the country, where city traffic data can

  • be used to help you figure out when your next bus is coming

  • so you're not waiting endlessly in the rain.

  • To things like this experiment in Washington

  • where they have actually solicited public input

  • about how the different government agencies are doing.

  • So they've actually been able to grade government agencies

  • on the basis of both survey data that they collect

  • and sentiment analysis of what people

  • are saying on social media.

  • When they first did this, four out of five agencies

  • got a c minus one got a c plus.

  • They were not very happy with the mayor for doing this,

  • but they have gone public with it.

  • It's a really interesting feedback loop

  • and over time, the grades have gone up.

  • So now the last trend and this is

  • one where we're really intensely focused at NYU--

  • is trying to figure out when you look at all of this together,

  • what is open data worth?

  • And this is an important question

  • because it is not a slam dunk or particularly

  • easy to take data that has traditionally been siloed

  • and open it to the public.

  • So there have been a number of studies on this.

  • The most recent was McKinsey study last October

  • that says that open data is worth $3 trillion a year

  • worldwide.

  • That's by far the highest estimate anybody has come up

  • with as you can see from some of the other ones on the screen.

  • But generally the estimates run pretty high.

  • So it's a very interesting challenge

  • we all think there's potential there,

  • but what we have done at the GovLab at NYU,

  • is we've set out to do this thing called the Open Data

  • 500, which is about figuring out exactly where the value is.

  • Beginning in the business sector,

  • but ultimately wanting to look at the nonprofit sector

  • as well.

  • We're looking at the US based companies.

  • We have actually contacted more than 500 of them.

  • We're in the process of finalizing the list.

  • If this interests you, I would urge

  • you to please go to opendata500.com because we

  • are really seeking public comment on everything

  • from individual companies to our methodology

  • to whole goals of the study.

  • Or you can tweet to hashtag #OD500 if you have suggestions

  • for us.

  • We have this now on a website as a work in progress,

  • where you can filter by state or by category

  • and see where some of these open data companies are.

  • Not surprisingly the greatest numbers are in California,

  • but I'm glad to say, as a New Yorker,

  • that New York is not far behind.

  • And we're beginning to see some interesting patterns here

  • that I think are really going to be meaningful.

  • So one of those patterns, which I'll show you in a second,

  • helps answer some of the open questions about open data.

  • So having shown you all these trends,

  • and shown you all the stuff that's

  • happening that I talked about my book and the website

  • and my other work, there's still a lot we don't know.

  • And I would say there's three major questions that where

  • now all looking to answer.

  • So the first one is, OK if we think open data has value,

  • which sectors are the most promising?

  • Well from the Open Data 500-- even though this

  • is preliminary, I can't stress that enough because this is not

  • a final list, et cetera-- but we're

  • starting to see some hints of that.

  • So the first tier, the company's the sectors that

  • have the most companies in them are

  • what we're calling data slash technology

  • and finance and investment.

  • Finance and investment probably because there's

  • so much interest, and because SEC data

  • and other kinds of business data has been out there

  • for a long time and is a very rich source.

  • Data technology because there is a whole huge emerging

  • sector in helping figure out how to take

  • really unwieldy government data sets and turn them

  • into usable open data.

  • So this is companies like Socrata, Junar, OpenGov here

  • in Palo Alto, or nearby, and many others.

  • That their business is making open data business-friendly.

  • And one of the interesting questions

  • is whether this is something that's

  • going to be around forever, which I think it probably will,

  • as we were talking about a little bit before.

  • Or how much this sector may change as governments

  • get better at releasing open data.

  • Next we have health care-- which I

  • think is emerging-- transportation, energy,

  • and then the third tier, where only about a couple

  • percent the companies we have are in each of these areas.

  • This includes a number things.

  • Many of which are really quite significant like education,

  • scientific research, environment, food

  • and agriculture, the climate corporation for example,

  • is somewhere in this tier.

  • So we just have a couple of initial observations

  • and caveats about this.

  • One is that sectors that don't have a lot of companies

  • in them, like weather and agriculture,

  • may have a climate corporation in there

  • or may have a very significant company.

  • So simply number of companies per sector

  • doesn't necessarily tell you the importance of the sector,

  • but it does tell you at least where

  • a lot of the entrepreneurial activity is.

  • We still have to do more work we are getting information

  • on the number of employees per company, which

  • is going to be an important metric.

  • We're trying to get information on financial metrics

  • and as I said the data technology category.

  • I was very interested to see that so high up

  • and I think it says something about the sorry

  • state of government data.

  • Which leads directly to question number

  • two- how do we improve the open data ecosystem?

  • Having worked in the federal government for a while

  • and talked to people a lot agencies,

  • I can tell you a lot of those government data sets are

  • a mess and the people who run them know it.

  • And they are trying to fix it, but it's not

  • going to happen overnight.

  • So there are a couple things that are happening.

  • On a city level, OpenGov, which is a company right near here,

  • has developed what they like to call kind of a Sim City

  • for actual cities.

  • So they have this thing you can see in the upper right,

  • where they can take budget and other city data from any city,

  • put on a platform that makes it usable,

  • and that also makes a comparable to other cities' data.

  • And they can then make town meetings much more productive.

  • They can tell you why Palo Alto has a certain rate a police

  • overtime and how that compares to San Mateo

  • and they can learn things about city governance from that.

  • So this is one of those data slash technology companies

  • that's beginning to make the data more useful.

  • Another one which is a couple blocks from us at NYU

  • is called Enigma.

  • They won TechCrunch Disrupt in New York last May.

  • And what was significant about that

  • was, not just what they got this nice large $50,000 check,

  • but that I think there was a recognition of how

  • important data companies are.

  • Their whole thing is taking really unwieldy government

  • federal data sets and making them

  • usable on a common platform and interoperable ways.

  • And they're getting a ton of attention

  • right now because this is something

  • that everybody who has ever worked with federal data

  • has wanted.

  • And the reason for that is that right now federal data

  • looks something like this.

  • For those of you who have seen "Raiders of the Lost Ark."

  • The pathetic thing about this is not only

  • the federal data is this bad, but that this is the metaphor

  • that everybody in the federal government who works with data

  • uses to describe the state of federal data.

  • It is that bad and we know it.

  • There's good stuff in there somewhere,

  • but good luck finding it.

  • A lot of the work going on in government and in third parties

  • like Enigma and like OpenGov is to make this stuff more useful.

  • And the open question is how can we really make this work?

  • And this is an area where I certainly

  • think certainly Google has done a lot

  • and has a huge role to play.

  • I think also part of this is going

  • to be what I'm calling demand-driven data disclosure.

  • The way it's worked in the past, government agencies

  • have largely released open data when they've identified

  • data set they think are of interest,

  • or where they're just doing it to compare

  • to comply with the government mandate.

  • One of the things we want to get out of the Open Data 500

  • is to create a kind of round table, where data users can

  • give much more ongoing feedback to data holders

  • in the government agency.

  • We think this is going to really improve the quality of data,

  • the availability of data, and the ecosystem as a whole.

  • And then finally, how can developing countries

  • use open data?

  • This is a huge question that the World Bank among others

  • is putting a lot of effort into.

  • I'm going to be on a panel for them

  • the next Wednesday afternoon in DC.

  • And there's a couple of areas here

  • one is fighting corruption.

  • This is the website that has I think the best

  • name of any website I've seen it is called ipaidabribe.com.

  • This is a website in India, where you can go

  • and through crowd sourcing report if you

  • had to pay bribe in a way that makes corruption transparent

  • and ultimately decreases corruption.

  • But we're also looking to go beyond transparency

  • to economic development and a lot of people

  • are asking whether developing countries can use all the data

  • as a business resource in the same way

  • that we're seeing in the US, the UK, France and places

  • like that.

  • So one of my colleagues of the World Bank, Prasanna Lal Dass,

  • recently did a very good blog post

  • summarizing some of the things that are needed.

  • As you can see, it's going to take some work,

  • but the rewards may be great.

  • And I'm seeing a lot of interest right now

  • in figuring out how to make that work and can we make it work.

  • So that's the open data universe at least

  • as I've come to see it through the work I've done here.

  • I would recommend you to a couple

  • of sources for more information.

  • One is at thegovlab.org, where you can see our wiki,

  • subscribe to our digest.

  • You can also sign up outside to get a digest subscription.

  • It comes out every week.

  • It's a curated collection of material in this area.

  • There's opendatanow.com, where I report

  • on this stuff on a regular basis,

  • largely with interviews with people in the field,

  • pod casts when I can do them, hopefully

  • as a resource to the community.

  • And there is of course this book now available,

  • which I see many of you have already purchased thank you

  • and which I hope you find useful in one way or another.

  • So we have a couple of minutes for questions

  • and thank you very much.

  • Thank you.

  • AUDIENCE: Thank you very much.

  • I have two questions.

  • one is, if you develop your business using open data,

  • what are the caveats, what is the license on all this data.

  • And the second question, suppose I want some data.

  • Where would I find it?

  • For example I would like to see some data

  • on education broken down by gender.

  • Where would I go?

  • Where do I even start?

  • JOEL GURIN: OK so two questions, one

  • is in terms of the license use data.

  • It is only open data if it's released

  • under an open license that makes it usable

  • by anybody, and reusable, and re-publishable.

  • So by definition, that's pretty much built

  • in to the Open Data Policy, at least of the US government,

  • more and more governments recognize that.

  • In terms of where you go, if you're

  • looking for federal data, you should go to data.gov,

  • this is the central repository of federal data.

  • It was originally built in a way that I think a lot of people

  • found not as user-friendly, as they want it.

  • They've just relaunched it.

  • It's much better.

  • It's getting better all the time.

  • But that's where you would find data like that segmented

  • by agency and by area of interest.

  • As a start, in any case.

  • AUDIENCE: Thank you.

  • Obviously, you logically focused on the US.

  • Is there any other nations that you'd point to as best

  • in class who are really leading the field in terms

  • of leveraging the infrastructures developed

  • in their governments or in their societies.

  • JOEL GURIN: Yes the UK is really the other world leader

  • and in many ways they're doing things

  • in a more advanced way than the US.

  • For example, their equivalent of data.gov

  • is all done with link data is really very

  • beautifully and very well designed.

  • So they are in some ways ahead of us

  • in some ways learning from us.

  • They also have an institute there called the Open Data

  • Institute that is funded partly by the government

  • of by other sources as well that's

  • doing a ton of work really global leadership in this area.

  • Beyond that, we're seeing a lot of interest

  • all over the world on every continent.

  • And I think what's happening is that different countries as I

  • mentioned there now 60 countries or so in the Open Government

  • Partnership, which is committed to these open government

  • principles, part of which is open government data.

  • And it's rapidly emerging as an international movement.

  • I think different countries depending

  • on the stage of development will figure out

  • what is the most important and most appropriate

  • form of the data for them to release.

  • AUDIENCE: So all the applications

  • of open data that you mentioned are all vertical.

  • They're trying to solve a particular problem.

  • Do you see a need or an opportunity

  • for more horizontal plays of products

  • that could be usable by many applications that

  • use open data?

  • JOEL GURIN: Well I think probably

  • the best examples of those are these data technology companies

  • like Enigma or OpenGov, because what they're essentially

  • trying to do is to make data of all kinds more usable.

  • And I think what they're hoping to do

  • is to make possible the kinds of mash-ups or interoperability

  • of data that can make a lot of those more complex applications

  • possible.

  • Right now, at least if you're working with US federal data,

  • it's very difficult.

  • We did a project at the GovLab simply

  • to try to mash up EPA and OSHA data about factories

  • and facilities that both agencies regulate.

  • You would think this was dead easy.

  • It's not.

  • I mean even on that basic level, it

  • takes work to make this stuff these data sets work

  • and play nicely together.

  • So companies that are making that happen, I think,

  • are definitely taking that kind of broad horizontal view

  • whether they're going to be helpful to a lot

  • of other companies.

  • Yes?

  • AUDIENCE: Yes.

  • Do have any comments about Aaron Swartz

  • who tried to liberate some common government data

  • but got sued by the government.

  • JOEL GURIN: Yeah I write about Aaron and that in my book.

  • I think everybody pretty much recognizes now

  • that MIT was not a good path there, to say the least.

  • And it gets into some complexities,

  • but I think the short answer is what Aaron was really

  • fighting for was open access and access to material that

  • has already been published in a way that the public can use it.

  • That's now just become federal government policy

  • for about half as I said of the research

  • that the federal government funds

  • so I think there's a greater and greater recognition that he was

  • right about that and that we should start getting on board.

  • AUDIENCE: How do you think about accuracy, or even just not

  • necessarily accuracy, but knowing

  • what's in the data set, like keeping track of the metadata,

  • like what's actually being counted.

  • Who was excluded, who wasn't, how data was collected,

  • and that kind of information that

  • can change what the data means?

  • JOEL GURIN: Yeah that's a great question.

  • I would say right now that's very hard part of the Open Data

  • Policy is to actually publicly release information

  • about the quality of data.

  • I think this is going to be one of the parts of the policy

  • that federal agencies absolutely hate the most.

  • But there are some really interesting examples

  • of agencies facing up to this problem

  • and dealing with it so one great example is

  • USAID-- international development--

  • knew that they had lousy geospatial data

  • on the organizations they were giving grants to.

  • They put on a hack-a-thon, but a very careful one.

  • They found people who are sort of geospatial hackers

  • in the Washington area.

  • They invited about 100 people in.

  • They said, we're going to give you special access to our data.

  • We want you to fix it.

  • We'll give you all the weekend.

  • They were done in about 15 or 16 hours.

  • So this idea of kind of crowd-sourcing quality control

  • is one that a couple of government agencies

  • have become interested in.

  • But simply knowing is very hard.

  • And that's one of the reasons that I

  • think establishing feedback loops-- really

  • good feedback loops between data users and the agencies

  • that hold the data-- is going to be critical next step.

  • So that we can ask those questions government agencies

  • We can see what the response is.

  • And where there is a really serious flaw

  • in a really important data set, they

  • can prioritize that as something that stakeholders need fixed.

  • AUDIENCE: And so you showed a lot of great examples.

  • I was wondering if you think that we can leverage

  • mobile in a specific way as opposed to the desktop sites.

  • JOEL GURIN: Yes.

  • I tend to show desktops because they look better on PowerPoint,

  • but absolutely most of these things

  • that I showed either are mobile apps

  • or could be mobile apps as well.

  • I think the one caveat on mobile apps

  • is that we are a little bit risk of app mania with open data.

  • There have been all these hack-a-thons

  • of apps for this or apps for that, which is great,

  • but I think there are probably some limitations in what

  • is easy to do in that mobile environment.

  • And there are some more sophisticated things

  • that can be done if you, I believe, look more broadly.

  • But definitely pretty much anything

  • that I showed you has a mobile application attached to it

  • FUMI YAMAZAKI: OK.

  • Thank you very much.

  • I think we're running out of time

  • but Joel will be staying here for us.

  • Thank you very much.

  • JOEL GURIN: Thanks so much for coming.

  • And thank you for the work you're all doing.

FUMI YAMAZAKI: OK.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it