Subtitles section Play video Print subtitles >> [Narrator] Live from New York, it's The Cube covering the IBM Machine Learning Launch Event brought to you by IBM. Here are your hosts, Dave Vellante and Stu Miniman. >> Good morning everybody, welcome to the Waldorf Astoria. Stu Miniman and I are here in New York City, the Big Apple, for IBM's Machine Learning Event #IBMML. We're fresh off Spark Summit, Stu, where we had The Cube, this by the way is The Cube, the worldwide leader in live tech coverage. We were at Spark Summit last week, George Gilbert and I, watching the evolution of so-called big data. Let me frame, Stu, where we're at and bring you into the conversation. The early days of big data were all about offloading the data warehouse and reducing the cost of the data warehouse. I often joke that the ROI of big data is reduction on investment, right? There's these big, expensive data warehouses. It was quite successful in that regard. What then happened is we started to throw all this data into the data warehouse. People would joke it became a data swamp, and you had a lot of tooling to try to clean the data warehouse and a lot of transforming and loading and the ETL vendors started to participate there in a bigger way. Then you saw the extension of these data pipelines to try to more with that data. The Cloud guys have now entered in a big way. We're now entering the Cognitive Era, as IBM likes to refer to it. Others talk about AI and machine learning and deep learning, and that's really the big topic here today. What we can tell you, that the news goes out at 9:00am this morning, and it was well known that IBM's bringing machine learning to its mainframe, z mainframe. Two years ago, Stu, IBM announced the z13, which was really designed to bring analytic and transaction processing together on a single platform. Clearly IBM is extending the useful life of the mainframe by bringing things like Spark, certainly what it did with Linux and now machine learning into z. I want to talk about Cloud, the importance of Cloud, and how that has really taken over the world of big data. Virtually every customer you talk to now is doing work on the Cloud. It's interesting to see now IBM unlocking its transaction base, its mission-critical data, to this machine learning world. What are you seeing around Cloud and big data? >> We've been digging into this big data space since before it was called big data. One of the early things that really got me interested and exciting about it is, from the infrastructure standpoint, storage has always been one of its costs that we had to have, and the massive amounts of data, the digital explosion we talked about, is keeping all that information or managing all that information was a huge challenge. Big data was really that bit flip. How do we take all that information and make it an opportunity? How do we get new revenue streams? Dave, IBM has been at the center of this and looking at the higher-level pieces of not just storing data, but leveraging it. Obviously huge in analytics, lots of focus on everything from Hadoop and Spark and newer technologies, but digging in to how they can leverage up the stack, which is where IBM has done a lot of acquisitions in that space and leveraging that and wants to make sure that they have a strong position both in Cloud, which was renamed. The soft layer is now IBM Bluemix with a lot of services including a machine learning service that leverages the Watson technology and of course OnPrem they've got the z and the power solutions that you and I have covered for many years at the IBM Med show. >> Machine learning obviously heavily leverages models. We've seen in the early days of the data, the data scientists would build models and machine learning allows those models to be perfected over time. So there's this continuous process. We're familiar with the world of Batch and then some mini computer brought in the world of interactive, so we're familiar with those types of workloads. Now we're talking about a new emergent workload which is continuous. Continuous apps where you're streaming data in, what Spark is all about. The models that data scientists are building can constantly be improved. The key is automation, right? Being able to automate that whole process, and being able to collaborate between the data scientist, the data quality engineers, even the application developers that's something that IBM really tried to address in its last big announcement in this area of which was in October of last year the Watson data platform, what they called at the time the DataWorks. So really trying to bring together those different personas in a way that they can collaborate together and improve models on a continuous basis. The use cases that you often hear in big data and certainly initially in machine learning are things like fraud detection. Obviously ad serving has been a big data application for quite some time. In financial services, identifying good targets, identifying risk. What I'm seeing, Stu, is that the phase that we're in now of this so-called big data and analytics world, and now bringing in machine learning and deep learning, is to really improve on some of those use cases. For example, fraud's gotten much, much better. Ten years ago, let's say, it took many, many months, if you ever detected fraud. Now you get it in seconds, or sometimes minutes, but you also get a lot of false positives. Oops, sorry, the transaction didn't go through. Did you do this transaction? Yes, I did. Oh, sorry, you're going to have to redo it because it didn't go through. It's very frustrating for a lot of users. That will get better and better and better. We've all experienced retargeting from ads, and we know how crappy they are. That will continue to get better. The big question that people have and it goes back to Jeff Hammerbacher, the best minds of my generation are trying to get people to click on ads. When will we see big data really start to affect our lives in different ways like patient outcomes? We're going to hear some of that today from folks in health care and pharma. Again, these are the things that people are waiting for. The other piece is, of course, IT. What you're seeing, in terms of IT, in the whole data flow? >> Yes, a big question we have, Dave, is where's the data? And therefore, where does it make sense to be able to do that processing? In big data we talked about you've got masses amounts of data, can we move the processing to that data? With IT, the day before, your RCTO talked that there's going to be massive amounts of data at the edge and I don't have the time or the bandwidth or the need necessarily