Placeholder Image

Subtitles section Play video

  • - I really enjoy regression.

  • I'd say regression was maybe one of the first concepts that

  • really helped me understand data, so I enjoy regression.

  • - I really like data visualization.

  • I think it's a key element for people to get

  • across their message to people

  • that don't understand that well what data science is.

  • - Artificial neural networks.

  • - I'm really passionate about neural networks

  • because we have a lot to learn from nature

  • so when we are trying to mimic our brain,

  • I think that we can do some applications with this behavior,

  • this biological behavior in algorithms.

  • - Data visualization with R, I love to do this.

  • - Nearest neighbor, it's the simplest,

  • but it just gets the best results so many more times,

  • than some overblown, overworked algorithm

  • that's just as likely to over fit

  • as it is to make a good fit.

  • - So, structured data is more like tabular data,

  • things that you're familiar with in Microsoft Excel format,

  • you've got rows and columns,

  • and that's called structured data.

  • Unstructured data is basically data that is coming from

  • mostly from web, where it's not tabular.

  • It is not in rows and columns, it's text.

  • Sometimes it's video and audio.

  • You would have to deploy more sophisticated algorithms

  • to extract data.

  • In fact, a lot of times, we take unstructured data

  • and spend a great deal of time and effort to get

  • some structure out of it and then analyze it.

  • If you have something which just fits nicely into

  • tables and columns and rows go ahead.

  • That's your structured data,

  • but if you see if it's a weblog,

  • or if you're trying to get information out of webpages,

  • and you've got a gazillion webpages,

  • that's unstructured data,

  • that would require a little bit more effort

  • to get information out of it.

  • Machine learning is basically a set of these advanced tools

  • people use to find answers.

  • I'm not a big fan of machine learning,

  • and I'll give you my bias right now.

  • Imagine there's an island

  • and there are about 45,000 people who live on that island.

  • It's cut off from the rest of the world,

  • nobody can swim into the island, or swim out of the island.

  • Now imagine that island had a murder,

  • and you're the detective who's been tasked

  • with finding who the culprit is.

  • Now, there's various approaches you can take.

  • One approach is you say, well, whoever killed this person

  • is on this island.

  • So there are 45,000 people and there are 45,000 suspects.

  • I'm going to go one by one asking each person

  • until I find the suspect, right.

  • That's machine learning, because you have no other reason,

  • no other assumptions, no other hypothesis, no other feeling.

  • You say, I don't know anything.

  • I'm just going to throw everything into my model

  • and see who the culprit is.

  • Sometimes you get to the culprit, sometimes you don't,

  • but it would take time.

  • Machine learning is basically saying when you do not have

  • many assumptions about your data, and you're short of

  • knowing a lot about your data,

  • you just throw everything into this model,

  • and see what comes out of it.

  • It's more of a black box approach.

  • I know that a large number of professionals live by it.

  • I, on the other hand, like to look at data with my own

  • preconceived notions, because it is said, a data scientist

  • is someone who is very judgmental.

  • That person, a data scientist is one who has an opinion

  • about data.

  • Who has an opinion about the phenomena they're learning,

  • or they're investigating.

  • They cannot simply believe

  • that I'm going to have a kitchen sink approach,

  • I'm going to dump everything in the model.

  • Machine learning is basically saying, dump everything,

  • see what comes out of it.

  • There are thousands of books written on regression,

  • and millions of lectures delivered on regression.

  • And I always feel that they don't do a good job

  • of explaining regression, because they get into data

  • and models and statistical distributions.

  • Let's forget about it, let me explain regression

  • in the simplest possible terms.

  • If you have ever taken a cab ride, a taxi ride,

  • you understand regression.

  • Here's how it works.

  • The moment you sit in a cab ride, in a cab,

  • you see that there's a fixed amount there, it says 2 dollars 50 cents, $2.50

  • You rather that the cab moves or you get off,

  • this is what you owe to the driver,

  • the moment you step into a cab.

  • That's a constant, you have to pay that amount,

  • if you have stepped into a cab.

  • Then as it starts moving, for every meter or 100 meters,

  • the fare increases by a certain amount.

  • So, there's a fraction, there's a relationship

  • between distance and the amount you would pay,

  • above and beyond that constant.

  • If you're not moving, and you're stuck in traffic,

  • then every additional minute, you have to pay more.

  • As the minutes increase, your fare increases,

  • as the distance increases, your fare increases,

  • and while all this is happening, you've already

  • paid a base fare, which is the constant.

  • This is what regression is.

  • Regression tells you what the base fare is

  • and what is the relationship between time

  • and the fare you have paid

  • and the distance you have traveled

  • and the fare you have paid.

  • Because in the absence of knowing those relationships,

  • and just knowing how much people traveled for,

  • and how much they paid,

  • regression allows you to compute

  • that constant that you didn't know it was 2.50,

  • and it would compute the relationship between the fare

  • and the distance, and the fare and the time.

  • That's a regression.

- I really enjoy regression.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it

B1 regression data fare cab island machine learning

Technology [Data Science 101]

  • 81 16
    陳賢原 posted on 2016/11/10
Video vocabulary