Placeholder Image

Subtitles section Play video

  • Hey, everyone, This is my introduction to that announces slash data visualization with python in this video, I'm gonna cover why you might want to use data visualization on why you might want to use Python Amapola late for it.

  • And then we're gonna go over some simple examples of how to actually use these tools and then using these tools were gonna do sort of, ah, riel analysis with a real data set at the end.

  • In this video, I'm on lee gonna cover line charts just to keep everything simple.

  • I'm also gonna put a more detailed version of this outline in the comment section below so you don't have to watch the whole thing if you don't want to.

  • Okay, so why should you use data visualization in the first place?

  • Well, individualization is actually often the first step off.

  • Any type of that analysis work, whether it's simple, that analysis or so disco announces or machine learning analyses.

  • And the reason for that is because visualising data often gives you an intuitive understanding off the data on it often helps you see patterns that are otherwise hard to see, and we're going to see an example of that later, okay?

  • And why should you use python for this?

  • Well, Python is not the only good choice, but I would say it's one of the best on the reason is.

  • First of all, it's a general purpose language that's pretty easy to use and learn on.

  • It also has many libraries for scientific computing on their science, including Marple, Lib.

  • And if you work at a company, your company might already use python floor or something else on.

  • If that's the case, that's really nice, because then you and your team are not gonna have to learn a totally new language to do something that announces, And why are we using up will live for this?

  • We'll muffle.

  • Live is not the only good visualization library for Python, but it's still one of the most popular choices on there are.

  • Actually, other libraries thought are based on a couple lib, so if you learn couple lib, it's gonna help you learn these other libraries.

  • For example, this one, called Seaborn later on if you want to on Mapo Live is also pretty easy to get started with.

  • Anyway, Let's dive into a demo for this demo.

  • We're gonna use something called Jupiter notebook on a few other parts on libraries and we're gonna use Anaconda to install them.

  • If enough on the air with Debra Notebook on Anaconda.

  • I have an explanation about them in my python tutorial video.

  • So I'm gonna leave a link to that and the description.

  • Any way to install Anaconda?

  • Just search for Anaconda Python or directly go to Anaconda dork and there find the bundle says download Anaconda and select one of our OS you're using.

  • I'm using Mac here and click download under Python three point something version instead of python to point something because we're gonna use python three here and select where you want to download this package.

  • Save it on.

  • Once it's download it, open up the package that you just download it and then just click.

  • Continue, Continue, continue, Continue, agree.

  • And so for me only or installed on the specific disk, it doesn't matter which one and continue on quick install and this process is gonna take a while.

  • After some waiting, you might see this prompt to install Microsoft V s code.

  • We don't need that.

  • So let's just continue here and then close and then to launch.

  • Typical notebook.

  • You can do it through this thing called Anaconda Navigator, so just launch it like you launch any other application.

  • Just dismiss whatever comes up and then click launch in the Jupiter notebook section, and then you should see a browser windows show up with the Jupiter notebook interface.

  • Now, if you want to follow this tutorial, the first thing you should do issue should create a new folder.

  • Let's say on this stop.

  • Let's call this one Data Federal addition.

  • We're gonna put all our data and dip it in a notebook file here.

  • So let's first download our data to do that.

  • Just go to see sojo dot io slash data and download these two files Simple data CSP on countries that CSP and then put these CSB files in the folder that uses created the individualization.

  • After that, go back to the Jupiter notebook interface on and you can just navigate to this stop and then the folder that we just created their visual addition and to create a new jibber notebook filed here.

  • Just find the new button on the right on Click Python three.

  • Right now, this notebook file has entitled as the title.

  • So let's change it to data, visualization ways, Python click rename and you have a notebook called Data Visualization.

  • With Python, you can take it just by going to stop and then to the folder that you just created.

  • I used to see that there's a file called Data Visual Addition with Python Thought I Pay and Beat.

  • And it's really important that this notebook file is in the same folder as wth e data that you just don't let it.

  • Countries, that's yes, V on the other one.

  • And once everything is set up just right in the first cell import pond us as p D.

  • This means we want to import a mojo called Panoz as P.

  • D.

  • Or we want to give it sort of a nickname.

  • And that's gonna be P.

  • D.

  • You can run the sell by click in this one, and now Pandas is imported as PT and here we're going to use pund us for importing on using some data from our CSB files, and we need to import another motel here.

  • So for that just right from Muppet Lib import Piper as p a T.

  • So this says from the map.

  • Olive package import pipa module and then call it P L T.

  • Let's run this cell now.

  • Pipe lock is imported.

  • We're gonna use pipe blocked from Apple Lib for making our charts.

  • So here, let's first take a look at a really simple example off how to use pipe lot.

  • So here I'm gonna write X equals 123 It's a list of three elements on why Eco's 14 and nine on To plug this set of data, you can just write p o Tita plots X Come away on this plots X on the X axis and why on the y axis.

  • And then you can show this graph by writing p lt dot show When you run this cell used to see a graph like this.

  • You see that the values of X R, 12 and three as expected on the bodies off.

  • Why are 14 and nine If you want to add up title to this graph, you can do so by writing p o.

  • T.

  • That title tests plots right after the plot statement before the still statement, and then you can add an ex label on the white table as well.

  • By writing p l t dot x label that's called the X label X on p o t dot Why label Let's call the White Label.

  • Why here?

  • When you run this cell, you see that there's a title called Test Plots and X Label called X on White Label called Why?

  • Okay, What if you wanted to plot multiple lines here?

  • Well, to do that, let's create another list.

  • Let's call it D on.

  • This one is gonna have 10 5 and they're inside on two plots.

  • Ex Andy on top off X and Y you can just write P lt the plots X comma Z right after p a t the plot X comma y And then let's fix the while They were here, too.

  • Why, aunty?

  • And when you run this cell, you should see these two lines So the bull line represents X and Y on the Orange line represents X and Z.

  • So P a t.

  • The plot X and Z flooded X on the X axis Andi on the y axis.

  • But right now it's kind of hard to tell which line represents which data so we can fix it by adding a legend statement.

  • Let's add that after the wide level statement by writing p o t dot legend parentheses, Square brackets, double quotes This is why comma double quotes.

  • This is Z So not here that this legend function takes a list as an argument.

  • And when you run this so you should see this legend that says the blue line is this is why on the Orange line is thesis de Okay, that's the basics off bloody.

  • Now let's see how to load it up from a C S V file.

  • For that, you can just write sample on the score data Eco's PG or pandas that read C s V.

  • By the way, I just pressed tab here to do auto complete and then parentheses sample on the score data that CSP.

  • Now, before you run this cell, make sure that the notebook file that a visualization with python that type A and B is in the same folder as simple there.

  • That CSP when you run this So this that a simple data that C.

  • S V is loaded by the pond US module, which we call P d.

  • And then it's assigned to this fireball called simple data.

  • You can check what's inside this viable sample on the score data just by writing sample on the score data in this new cell.

  • And then when you run this cell, you should see something like this.

  • So as you can see, this data has.

  • Three columns column A column b On column C On five rows, you see a bunch of various inside the stable if you want to check if this set of data is exactly the same, asked the original data, You can do so by opening up the original data file.

  • Simple data that C.

  • S V With Excel or any other spreadsheet application on When you open it, you should see exactly the same.

  • Data column a column b column C with five rows with a bunch of bodies.

  • Okay, the only difference that you might see is that in Jupiter notebook, you might see these numbers there.

  • 123 and four.

  • And these are just indices for the rose.

  • You can check what type this viable is by writing type parentheses, simple on the score data and when you run this, so he says that this is ponders the core, the friend up there different.

  • So this is a dead a frame type that's defined by the pandas.

  • Module on that data frame type is used to contain a table like a piece of information, just like this one.

  • Okay, now what if you wanted to plot data in this dead Efraim, for example, the various off column A on the X axis on column C on the Y axis.

  • What to do that you need to be able to retrieve a specific column.

  • You can do that by writing sample on the score data dot colum dot c.

  • Call him underscore.

  • See, when you run this cell, you'll see that the column See, it's retrieved.

  • It has the values.

  • 10 8642 On The numbers you see on the left are just indices there.

  • 123 and four.

  • Just like before.

  • You can check what type this is by writing type parentheses, simple data that column C on When you run the cell, you see that this is partners the core, that serious that Siri's.

  • So this is basically a serious type that's defined by the Panels module, and it's a type that's used to store a Siri's off values.

  • For example, these values 10 8642 Now what if we wanted to retrieve a specific value out of this Siri's?

  • Well, if you want to retrieve, for example, the second value here.

  • Eight.

  • You can do so by writing sample data.

  • That column see, that's I look, I l o C square brackets won on this retrieves the second volley of the Siri's eight on If you want to retrieve the third virus.

  • Six.

  • You can write.

  • I lock to on that gets the third value.

  • And if you want to be treated the first value you can write.

  • I look zero on this should give us 10 but it does okay on using what we've just learned here will be able to plot the data in this state a frame.

  • So let's say we want a plug.

  • Call him a on the X axis on column B on the y axis.

  • We could do that by writing P.

  • O.

  • T.

  • The plot's simple data that column a comma sample data dot column B on We can show it by writing P o.

  • T.

  • That show.

  • Let's see how it looks we have 1234 and five on the X axis on on the Y axis, we have 149 16 and 25 as expected.

  • If you want to add a column, see to this data you can write.

  • Pierre teed up flux sample data that call him a So let's use column A as the X axis again on the sample data thought Colum.

  • See, when you run the cell, you see that there are two lines here, just like before.

  • If you want to make this graph a little bit easier to read, you can add a titles A legend on.

  • By the way, in this plot function, you can use the third argument to change how the plot looks.

  • So, for example, if you give it ow!

  • In a string asked the arguments in the first line for a column B.

  • And when you run this so the plot becomes dots instead of just a line on, there's a lot more you can d'oh!

  • You can find more about it in the official documentation.

  • Anyway, let's move on on Do sort of a really analysis with a real deficit.

  • Now, for this analysis, we're gonna use this data Countries that CSP you should be in the same folder as well.

  • When you open it, you should see this data.

  • So we have a bunch of countries a bunch of years ranging from 1952 to 2007 for every five years on population for each year for that country.

  • You can see that there are a lot of rows in this data.

  • So let's now import that data just like before by writing P d or pompous that read CSP parents sees single quotes or double quotes countries that CSP on.

  • By the way, this is a strength single quotes countries that C.

  • S V.

  • And in python, you can use either double quotes or single quotes to express a strength.

  • Let's assign that to a new viable called data by writing data equals on.

  • When you run this sell.

  • This data is slowly onto data.

  • So once you rights data in this new cell on run it, you should be able to see this data in a better frame.

  • Now, let's say that the analysis we want to do here is we want to compare the population girls in the U s and China now to do this analysis, the first thing we want to do is we want to isolate the data for the US on China.

  • We can do that for the U.

  • S.

  • By writing us ecos data scar brackets there that country Eco's United States in single coats.

  • And when you run this, sell us now on.

  • Lee contains the data for the United States.

  • So let's break down this statement a little bit more.

  • Let's click insert here on insert Cell Bill when you write data that country Eco's United States This actually gives a Siri's off a bunch of true's on forces.

  • So when the roll is not us, this gives us a false and when it issue s, it gives us true.

  • We don't see any choose here, but they're a bunch of truth here where the roads are for the U.

  • S.

  • And then when you write their scrap buckets, this a Siri's off bunch of choose and forces.

  • This gives us a portion of the data where the value off the Siri's is true on.

  • That's the data for that us, as you can see here and then we just assigned it to this fireball called the US.

  • Okay, let's now do the same thing for China by writing China Ecos data square brackets that of the country equals China when you run this.

  • So when you write China here and run this so you should only see the data for China using thes two variables, U.

  • S.

  • And China will be able to compare their population growth.

  • So let's first part us s population here by writing P o T.

  • The plots us that year comma us stop population.

  • You can show this plot with P.

  • O.

  • T s show.

  • And when you run this so used to see this graph you see that US that year is plotted on the X axis us stop operation is party on the y axis.

  • But you see this scientific notation thing one e eight because the numbers are so big.

  • So let's divide the whole population each number in the series with one million, or attend to the par of six.

  • That's 10 star store, six in python.

  • And when you run the cell again, you now see the population in millions.

  • So this is 160 million on it goes up to, I think, more than 300 million in 2007.

  • Let's pluck China's data on top of the spots by writing pl to DOC plots China that year.

  • Actually, you could use us that year or 10 of that year because we have exactly the set of ears before.

  • Now let's just use China that year for the X axis on dhe china dot population for the Y axis on.

  • We're gonna divide this by one million as well to make the population show in millions.

  • When you run this cell, you should see these two lines.

  • Let's add a legend on titles here to make this graph easier to read, so p o t dot legend Parents sees score buckets United States on China on the X label Piano.

  • Tita X Label should be just a year on Pia.

  • Tita y level should be population.

  • Run this cell again on this graph.

  • It's much easier to read so you can see that China's population started out much larger than the U.

  • S.

  • In 1952 on.

  • It seems like it's going faster as well.

  • Now, what if you wanted to compare instead of the absolute amount that you see here the percentage girls from the first year that we have in our data 1952.

  • Well, there are several different ways of doing this, but I'm going to show you just one way.

  • So to do that, let's first copy this whole block of code over here.

  • Let's say that for each country we want to find the person is girls from the first year.

  • So we want to set the first year's amount to 100 as a 100% and show the rest of the data in percentage relative to the first year and weakened it up by dividing this whole Siri's, for example, us stop population with the first year's population, and they're multiplying everything by 100.

  • So to show you what I mean, let's just create a new self here above by clicking, Insert cell above here and here.

  • First, I'm gonna write us dot population and you see a Siri's of population here for each year on the first row.

  • You see here it's the first year's population or the population in 1952 I think let's insert a new cell bill here now to retrieve the first year's population.

  • You can just write us stop population.

  • The I lock scar back.

  • It's zero, and this gives us the first year's population, which is this amount.

  • Then we can divide the whole population, this whole Siri's by the first year's population just by writing us stop population divided by us stop population that I look scar brackets there on.

  • This gives us this Siri's so, as you can see, the first year set to one on the rest of the years are shown in relative amounts.

  • On If you multiply everything by 100 just by writing start 100 here you'll be able to show everything in percentage amounts so you can see that the first year is shown US 100% on from 1952 to 2007 which is the last year we have.

  • The population grew by 90% now.

  • Like I said earlier, this is not the only method to show the relative girls in population.

  • But I chose this method here because it's pretty simple to implement.

  • And Ray, let's copy this whole thing on, paste it over here to replace the Y axis.

  • Let's do the same thing for China as well.

  • So copy the whole thing for China here, and they replace us with China.