Placeholder Image

Subtitles section Play video

  • We will have a short lecture aboutclustering of clustering!

  • Originally, cluster analysis was developed by anthropologists aiming to explain the origin

  • of human beings.

  • Later it was adopted by psychology, intelligence and other areas.

  • Nowadays, there are 2 broad types of clustering: Flat and hierarchical.

  • K-means is a flat method in the sense that there is no hierarchy, but rather we choose

  • the number of clusters, and the magic happens.

  • The other type is hierarchical.

  • And that’s what we are going to discuss in this lecture.

  • Historically, hierarchical clustering was developed first, so it makes sense to get

  • acquainted with it.

  • An example of clustering with hierarchy is taxonomy of the animal kingdom.

  • For instance, there is the general term: animal.

  • Sub-clusters are fish, mammals, and birds, for instance.

  • There are birds which can fly, and those that can’t.

  • We can continue in this way, until we reach dogs and cats.

  • Even then we can divide dogs and cats into different breeds.

  • Moreover, some breeds have sub-breeds.

  • This is called hierarchy of clusters.

  • There are two types of hierarchical clustering: agglomerative orbottom upand divisive

  • ortop down’.

  • With divisive clustering we start from a situation where all observations are in the same cluster.

  • Like the dinosaurs.

  • Then we split this big cluster into 2 smaller ones.

  • Then we continue with 3, 4, 5, and so on, until each observation is its separate cluster.

  • However, in order to find the best split, we must explore all possibilities at each

  • step.

  • Therefore, faster methods have been developed, such as k-means.

  • With k-means, we can simulate this divisive technique.

  • When it comes to agglomerative clustering, the approach is bottom up.

  • We start from different dog and cat breeds, cluster them into dogs and cats respectively,

  • and then we continue pairing up species, until we reach the animal cluster.

  • Agglomerative and divisive clustering should reach similar results, but agglomerative is

  • much easier to solve mathematically.

  • This is also the other clustering method we will exploreagglomerative hierarchical

  • clustering.

  • In order to perform agglomerative hierarchical clustering, we start with each case being

  • its own cluster.

  • There is a total of N clusters.

  • Second, using some similarity measure like Euclidean distance, we group the two closest

  • clusters together, reaching an ‘n minus 1’ cluster solution.

  • Then we repeat this procedure, until all observations are in a single cluster.

  • The end result looks like this animal kingdom representation.

  • The name for this type of graph is: a ‘dendrogram’.

  • A line starts from each observation.

  • Then the two closest clusters are combined, then another two, and so on, until we are

  • left with a single cluster.

  • Note that all cluster solutions are nested inside the dendrogram.

  • Alright.

  • Let’s explore a dendrogram and see how it works.

  • Here is the dendrogram created on ourCountry cluster.

  • Okay.

  • So, each line starts from a cluster.

  • You can see the names of the countries at the beginning of those lines.

  • This is to show that, at the start, each country is a separate cluster.

  • The first two lines that merge are those of Germany and France.

  • According to the dendrogram, these two countries are the closest in terms of the features considered.

  • At this point, there are 5 clusters: Germany and France are 1, and each other country is

  • its own cluster.

  • From this point on, going up, Germany and France will be considered one cluster.

  • Here’s where it becomes interesting.

  • The next two lines that merge are those of the Germany and France cluster, and the UK.

  • At this point there are 4 clusters: Germany, France and the UK are 1, and the rest are

  • single-observation clusters.

  • At the next stage of the hierarchy, Canada and the US join forces.

  • The next step is to unite the Germany, France, UK cluster with the Canada-US one.

  • Australia is still alone.

  • Finally, all countries become one big cluster, representing the whole sample.

  • Okay.

  • Cool!

  • What other information can we get from the dendrogram?

  • Well, the bigger the distance between two links, the bigger the difference in terms

  • of the chosen features.

  • As you can see, Germany, France and the UK merged into 1 cluster very quickly.

  • This shows us that they are very similar in terms oflongitudeandlatitude’.

  • Moreover, Germany and France are closer than Germany and UK, or France and UK.

  • The USA and Canada came together not long after.

  • However, it took half of the dendrogram to join these 5 countries together.

  • This indicates the Europe cluster and the North America cluster are not so alike.

  • Finally, the distance needed for Australia to join the other 5 countries was the other

  • half of the dendrogram, meaning it is extremely different from them.

  • To sum up, the distance between the links shows similarity, or better: dissimilarity

  • between features.

  • Alright.

  • Next on our list is the choice of number of clusters.

  • If we draw a straight line, piercing these two links, we will be left with two clusters,

  • right?

  • Australia in one, and all the rest in the other.

  • Instead, if we pierce them here, we will get three clusters: North America, Europe, and

  • Australia.

  • The general rule is: when you draw a straight line, you should count the number of links

  • that have been broken.

  • In this case, we have broken 3 links, so we will be left with 3 clusters, because the

  • links were coming out of those 3 clusters.

  • Should we break the links here, there will be 4 clusters, and so on.

  • Great!

  • Finally, how should we decide where to draw the line?

  • Well, there is no specific rule, but after solving several problems, you kind of develop

  • an intuition.

  • When the distance between two stages is too big, it is probably a good idea to stop there.

  • For our case, I would draw the line at 3 clusters and remain with North America, Europe, and

  • Australia.

  • Okay.

  • When most people get acquainted with dendrograms, they like them a lot.

  • And I presume that is the case with you, too.

  • Let’s see some pros and cons.

  • The biggest pro is that hierarchical clustering shows all the possible linkages between clusters.

  • This helps us understand the data much, much better.

  • Moreover, we don’t need to preset the number of clusters.

  • We just observe the dendrogram and take a decision.

  • Another pro is that there are many different methods to perform hierarchical clustering,

  • the most famous of which is the Ward method.

  • Different data behaves in different ways, so it is a nice option to be able to choose

  • the method that works better for you.

  • K-means is a one-size fits it all method, so you don’t have that luxury.

  • How about a con?

  • The biggest con, which is also one of the reasons why hierarchical clustering is far

  • from amazing is scalability.

  • I will just show you a single dendrogram of 1000 observations and you will know what I

  • mean.

  • 1000 observations and the dendrogram is extremely hard to be examined.

  • You know what else?

  • It’s extremely computationally expensive.

  • The more observations there are, the slower it gets.

  • While K-means hardly has this issue.

  • Thanks for watching!

We will have a short lecture aboutclustering of clustering!

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it