Placeholder Image

Subtitles section Play video

  • Hi! Welcome to the course

  • Data Mining with Weka. I'm Ian Witten from the University of Waikato

  • in New Zealand and I'm presenting the videos for this course which is being prepared

  • by the Department of Computer Science

  • at the University of Waikato. Data mining is a mature technology that a lot

  • of people are

  • beginning to take very seriously, and a lot of other people find it very mysterious.

  • The real aim of this course is to take the mystery out of data

  • mining. This is a practical course on how to use the Weka

  • workbench, which you will download as part of the course,

  • for data mining. We explain the basic principles of several popular

  • algorithms and how to use them in practical applications.

  • In the world today, we're overwhelmed with data,

  • every time you swipe your credit card, every item you checkout out at the

  • supermarkets

  • every time you send a text, make a phone call, or send an email,

  • or type a key on a computer even, every time you walk past a security camera.

  • It all generates a little bit of data in a database. Data mining is about going

  • from the raw data to information. Information that can be used to make

  • predictions

  • that are useful in the real world. Let me

  • give you an example. You're at the supermarket checkout.

  • The till records every item you bought.

  • At the end, you hand over your loyalty card, and they give you a couple of percent off,

  • and you give them your name and address and, indirectly, access to all sorts of

  • demographic information about you

  • and people like you. Everybody likes a good bargain.

  • It's been a good day today, because thanks to those coupons they sent you in

  • the mail last week,

  • you've been able to stock up on some things you wouldn't normally have bought,

  • but that you bought today because they are such a good deal. Next week, they'll send

  • you some more coupons, and

  • you'll go shopping again and buy some more stuff. They do some experiments on you,

  • you know, they try to figure out how much more you would buy if the

  • price was just that little bit less.

  • These coupons are kind of a mechanism for personalized pricing.

  • They have access to all sorts of data

  • from you and people like you, in order to do these experiments and figure these things out.

  • Everybody wins: you get your bargains, they sell more stuff.

  • It sounds like a good deal to me. Here's another application.

  • Suppose you and your partner want a child, but you can't have one.

  • It's fun trying, but it can get

  • a little bit frustrating, and, ultimately, very frustrating, perhaps even tragic.

  • In artificial insemination, they

  • take some eggs from the woman's ovaries, and then they fertilize them with partner or donor

  • sperm, and then, they select from amongst the embryos produced

  • some to implant back into the womb.

  • You want to select the ones with the best chance of success

  • of producing a live birth, but you don't want too many

  • live births. The embryologist has access to all sorts of data

  • on these embryos. I think there are 50-100

  • pieces of information that they record about individual embryos, and they have

  • historical data on which ones produced a live birth,

  • success. So here's an ideal situation for data mining. We have lots of

  • historical data

  • we have data on the present situation, and we want to select

  • those embryos that have the best chance of success. Now, that's a good application

  • for data mining,

  • bringing a live child to couple who wants one.

  • I talk about data mining and machine learning. Data mining is the

  • application, and machine learning is the algorithms we use. We're talking about using

  • machine learning algorithms

  • for the purposes of data mining.

  • This is Data Mining with Weka, so the next question isWhat's Weka?”

  • This is a weka here, this little bird.

  • It's a flightless bird, kind of like its better known cousin

  • the kiwi, found only in the islands of New Zealand.

  • This is what it sounds like,

  • coming to you from New Zealand.

  • However, in our context, Weka is a data mining workbench. It's an acronym for the

  • Waikato Environment for Knowledge Analysis. We just call it Weka.

  • It contains a large number of algorithms for classification,

  • and a lot of algorithms for data preprocessing, feature selection,

  • clustering,

  • finding association rules, things like that. It's a very

  • comprehensive workbench, and it's free, open source software

  • that you will download as part of this course in the next lesson.

  • It runs on

  • any computer. It's written in Java, runs on Linux, Windows, Mac.

  • You'll be able to download it and run it on your workstation and use it

  • during the course. You're going to learn how to load data into Weka and look at it;

  • you're going to learn about preprocessing, cleaning up data using filters;

  • exploring it using visualizations, applying classification algorithms,

  • interpreting the output, understanding evaluation methods,

  • evaluation is very important in this area, understand various representations for

  • models, and how popular

  • machine learning algorithms work, and be aware of common pitfalls

  • with data mining. The ultimate goal really is to empower you to use Weka on your

  • own data, and, most importantly, to understand what it is you are doing.

  • This is the first class. In this class,

  • you're going to get started with Weka. You're going to install it;

  • you're going to explore the Weka Explorer interface

  • and explore some data sets; build a classifier;

  • interpret the output of the classifier; use filters;

  • and visualize your data set. There's lots of things to do

  • in this class. Here's the structure of the course.

  • There are five classes altogether. Each class

  • consists of about six lessons.

  • Class 1 is Getting started with Weka. Then, we're going to look at Evaluation

  • in Class 2,

  • Simple classifiers in Class 3, More classifiers in Class 4, and Putting it all together

  • in Class 5. These are the six lessons

  • in Class1. Each lesson comprises a short video,

  • 5-10 minutes, like this one, followed by an activity.

  • An activity that involves you doing something yourself.

  • You don't learn by me talking to you. You learn by actual doing things.

  • So, we have lots of activities for you

  • that involve using the Weka workbench. In the middle of the class is a mid-class

  • assessment, and at the end there is

  • a post-class assessment. The marks for these are combined, and if you get

  • more than 70%, you will get a signed certificate from the

  • University of Waikato

  • certifying that you have completed this course.

  • The activities are an important part of the course, but they are not part of the

  • assessment.

  • We really think you should do the activities, but you don't have to do them

  • for assessment purposes.

  • it's up to you. As well as that, associated with the course is a textbook

  • called Data Mining. It discusses data mining and Weka

  • in depth. It's a great book; I know it's a great book because I wrote it

  • myself

  • with a couple of friends. The publisher has kindly agreed to make available

  • large chunk this textbook to you

  • online so that you can use it for background reading.

  • It's only background reading; you don't have to read the textbook.

  • It's just if you want to delve into some of the ideas and algorithms in

  • more depth.

  • That's what it's there for. What you

  • need to do are the activities and the assessments,

  • and watch the videos, of course. That's it. I just thought I'd show you were I am.

  • I'm in New Zealand, that's where Weka is from.

  • That's where I'm sitting right now. This is the world as we see it in New Zealand.

  • We're at the top, you're probably down at the bottom somewhere.

  • We're in the top, in the center, and that arrow to the North Island of New Zealand

  • is where the University of Waikato is.

  • That's it for now. There is an activity associated with this lesson,

  • so go ahead and do it. Of course, you haven't learned very much in this lesson, so

  • it's not a very important activity.

  • Don't worry about it too much. You're not expected to do a lot of reading to do

  • this activity.

  • Just have a go and see how you get on,

  • and I'll see you again in the next lesson. I'm looking forward to that.

  • Goodbye for now.

Hi! Welcome to the course

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it