Subtitles section Play video Print subtitles LEE FLEMING: Good evening. I am really pleased to welcome you all to "Leaders in Big Data" hosted by Google and the Fung Institute of Engineering Leadership at UC Berkeley. I'm Lee Fleming. I'm director of the Institute and this is a Ikhlaq Sidhu, chief scientist and co-founder. The first and most important thing is to thank Google for hosting the event. So thank you very, very much. There's a couple people in particular, Irena Coffman and Gail Hernandez-- thank you-- and also Arnav Anant, our entrepreneur in residence at the Fung Institute. So here's Arnav. AUDIENCE: A lot of work. LEE FLEMING: Huge amount of work. The Fung Institute-- we were founded about two years ago. And the intent is to do research and pedagogical development in topics of engineering leadership. We have our degree, the Master's of Engineering-- professional Master's of Engineering M. Eng. program-- mainly around the Institute. We also have ties though across the campus, as you'll see shortly. This is our intent to have a series of talks on topics of interest to engineering leaders. As it turns out, this Wednesday we have our next talk. It's sponsored by [? Thai ?] and the Fung Institute. And the topic is entrepreneurship-- being an entrepreneur within your firm. And fittingly, we have representatives from Google, and Cisco, and SAP. That's Wednesday. Consult the Fung website or the [? Thai ?] website for details on that. So besides enjoying a good discussion tonight, we have an ulterior motive, as you can probably tell. We're trying to advertise all of our fantastic programs in big data at Cal. Now, whether you're interested in computation, or inference, or application, or some combination of those things, we've got the right program for you. As I mentioned, the professional Masters of Engineering, or M. Eng., across all the different engineering departments-- one year degree. We have another one-year degree in the stats department-- a professional degree. There's a two-year degree in the Information School. And finally, there's the Haas MBA. Tonight we've got people from all these programs. You can find their tables, ask them questions, and hopefully we'll see you see at Cal soon. And we also have an additional executive and other programs associated with each of those departments and schools as well. Ikhlaq will now introduce our speakers. IKHLAQ SIDHU: OK, thanks. So let me see. LEE FLEMING: Just slide this here. IKHLAQ SIDHU: All right. Welcome, I want to also thank a couple of people. One is [? Claus Nickoli ?], who is not here at the moment, but to you in the ether, he's just not at the meeting. But he's our host here, and so thank you. You guys can tell him that I thanked him. And also, many of you I've seen here are basically friends, and so thanks for coming. It's good to see you again. This is an event on big data. And so I'm going to give you a little data on who is speaking today-- who is here. And the way I think of this is, what we've got is three perspectives of big data from leading firms-- from people who represent leading firms in the area. And so let's start with NetApp. We've got Gustav Horn. He is a senior consulting engineer with 25 years of experience. And he's built some of the largest enterprise-class Hadoop systems in the world-- on the planet. And from Google, Theodore Vassilakis, and he's a principal engineer at Google. He's ahead of the team that works on data analytics. And he's been responsible for numerous contributions to Google in terms [? about ?] search, and the visualization and representation of the results. And from VMware, Charles Fan, who's senior VP of strategic R&D. He co-founded Rainfinity and was CTO of the company prior to its acquisition by EMC in 2005. And our distinguished set of speakers is moderated by our distinguished moderator, Hal Varian. He is chief economist here at Google. He's an emeritus professor at UC Berkeley and the founding dean of the School of Information. So with that, there's hardly anything more I could possibly say. Come on up Hal and take it away. HAL VARIAN: Thank you. I'm very impressed with the turnout tonight, seeing as you're missing both the debate and the baseball game. But at least it eliminates a difficult choice for many people. I will say that I'm going to follow the same rules as the presidential debates. So no kicking, biting, scratching, or bean balls are allowed during this performance. We're going to talk about foreign policy, wasn't that the agreement? No. All right. In any event, what I thought we'd would do is, we'd have each person talk for about five minutes, lay out their theme, where they're coming from, what their perspective is on big data. And I will take some notes, and then ask some questions, get a conversation going. And I think we'll have a little time at the end for some questions from the floor. So, take it away. THEO VASSILAKIS: Sure. So, should I start, Hal? HAL VARIAN: Yes. THEO VASSILAKIS: All right. Well, hey it's a real pleasure to be here. Thank you guys also, and thank you guys for coming. It's a huge, huge audience. Just a couple of words. As you heard, my name is Theo. I lead some of our analytical systems. So I'm responsible-- well, actually up until two weeks ago, I was responsible for a stack that had parallel data warehousing components, query engines, pieces like Dremel, and Tenzing systems that let you query this data, and visualization layers on top. And that's one of the many, many systems at Google that I think, outside, one would think of as big-data type of systems. And so I'll try to give you my perspective at least on the Google view of big data. And hopefully someone will cut me off when it's time. I think I'll probably go for five minutes. This could take a while. AUDIENCE: [INAUDIBLE] THEO VASSILAKIS: All right, sounds good. Thank you. I think, as you guys know, Google's business is primarily about taking data and organizing the world's information, and making it universally accessible and useful. So a lot of what the company does is really about sucking in data-- whether it be the web, whether it be the imagery from Street View, or satellite imagery, or maps information, or Android pings, or you name it. And then transforming it into usable forms. So really, Google is kind of a big data machine in some sense. And I think the term big data came into currency relatively recently. And we all said, yeah, OK, that speaks to what we do. Because we don't really have a word for it. We just kind of knew that the data was large. But just to try to put maybe more structure on to that, I think the Google view on a lot of "what is big data processing" kind of splits up into probably what I would call ingestion type of processes-- things like the crawlers, things like all those Street View cars running through all the streets of the world. And then goes into transaction processing systems, where perhaps we capture data through interactions on a lot