Placeholder Image

Subtitles section Play video

  • DAVID MALAN: We've seen how with languages like Python

  • can we implement business logic on a server and, even if we want,

  • generate web pages.

  • And we've seen, on the browser side, the so-called client side,

  • how you can render information or data to users.

  • And with CSS can we style it, and with JavaScript can

  • we even make it interactive.

  • But when building a business or any application that

  • is interacting with users, where is all that data being stored?

  • Well, if you're running your own small business,

  • you might simply be using a spreadsheet.

  • Of course, spreadsheets, whether they're implemented

  • as Microsoft Excel, or Google Spreadsheets, or Apple Numbers,

  • generally stores data in rows and columns.

  • And if you're like me, you probably use that first row

  • to represent the names of the columns and the types of the data

  • that you're storing.

  • And as you continue to add more and more data to that spreadsheet,

  • you probably, like me, continue to add row after row after row.

  • And meanwhile, if you've got so much data or so many different types of data

  • that it doesn't really belong in one sheet,

  • you might actually create a second sheet or a tab

  • along the bottom of the screen, for instance, in which you

  • can store additional data as well.

  • So indeed, Microsoft Excel allows you to store data relationally, so to speak.

  • You might have one sheet with customer info, another sheet with invoices,

  • another sheet with products, or any number of other types of data.

  • And you can relate those to another by somehow having some commonality

  • among them, some identifier, some name, some customer ID or the like.

  • And so data that tends to be stored in rows and columns

  • can have these relationships.

  • And you can do this certainly in the cloud these days

  • as well with Google Docs, simply a web-based version of the same.

  • But when designing your spreadsheets, or more

  • generally, your data's schema, the design thereof,

  • there's a number of questions you need to consider.

  • What data do you want to store?

  • And where do you want to store it?

  • And what type of data is it, in fact?

  • Because whether using Excel, or Numbers, or Google Spreadsheets,

  • odds are you sometimes configure the columns to be displayed or formatted

  • in some ways.

  • Numbers might be displayed with or without commas.

  • Numbers might be displayed with dollar signs or some other symbol.

  • You might separate one field from another

  • using special symbols or punctuation.

  • So these are all configuration options in a spreadsheet that might

  • be among the first decisions you make.

  • In fact, let's consider a sample spreadsheet

  • wherein I might want to represent users in some web application.

  • All of us have, of course, registered for websites or applications

  • these days.

  • And what are some of the questions you're asked?

  • Well, you might be asked for your user name by some application.

  • You might be asked for your actual name.

  • What else might you be asked for?

  • Perhaps your email address.

  • And perhaps if you're buying something, or registering

  • for something, or someone that needs to know a bit more about you,

  • perhaps they'll even ask you for your address, maybe your phone number,

  • or perhaps even your age.

  • So you can certainly imagine there being even other fields that you need.

  • But let's consider now what types of data each of these fields is.

  • Now, what should a user name be?

  • Well, by definition, this should be a unique value that belongs to you

  • and only you on a system.

  • For instance, if I'm registering for a website

  • and that website is storing some data in, for instance, a spreadsheet,

  • I might ideally try to claim Malan if it's available.

  • My name meanwhile, of course, will be just

  • David Malan, which someone else can certainly have as well.

  • Hopefully my email address is only mine.

  • And that, too, will be stored in a column of its own.

  • And then my address, for instance, here on campus, 33 Oxford Street, Cambridge,

  • Massachusetts, 02138.

  • A phone number by which I can be reached, 617-495-5000.

  • And then my age shall be--

  • well, we'll leave that blank just for now.

  • So notice how each of these pieces of data is just a little bit different.

  • Some of them seem to be numeric.

  • Some of them seem to be alphabetic.

  • Some of them even have some punctuation therein.

  • Now, you might or might not in a spreadsheet

  • care to display this data a bit differently.

  • Indeed, the only number that I'm sweeping under the rug

  • here is, in fact, my age.

  • But hopefully I don't need one or more commas in that value.

  • So there isn't really any need for special formatting here.

  • But it turns out that when you're actually

  • building a website or software-based application,

  • you're probably not storing your data ultimately in just spreadsheets.

  • You might graduate eventually from being a small shop to needing more than just

  • Microsoft Excel, or Numbers, or even something

  • cloud-based like Google Spreadsheets.

  • Why?

  • Well, you have more rows than those programs can generally handle.

  • In fact, on your own Mac or PC, odds are,

  • when opening up big files, whether a spreadsheet or any other document,

  • sometimes you might actually feel that.

  • The computer might start to slow, and you

  • might start to see a spinning beach ball or hourglass

  • because you start to hit the limits of what

  • a local computer can do with just client-side software,

  • like a spreadsheet tool.

  • And so eventually you might actually need to use a proper database.

  • And a database is really just a piece of software

  • that can absolutely run on your Mac or PC.

  • But very commonly it runs on a server or somewhere else

  • in the cloud to which your own software connects.

  • For instance, if you're building a website or an application

  • in a language like Python, you can, in Python,

  • write code that talks or communicates with that database to pull data down

  • and to send data back up.

  • But spreadsheets are wonderfully straightforward.

  • It's just so intuitive how you store the data in rows and columns

  • and just more and more rows as you have more and more data.

  • And so what's nice about some databases is that they mimic exactly this design.

  • There exists in the world what are called relational databases.

  • And indeed, this is among the most common ways to store data relationally.

  • The jargon is a bit different in the world of databases.

  • You no longer call these things spreadsheets.

  • We call them databases.

  • And we don't call the individual tabs sheets.

  • We call them tables.

  • But we continue to call the structure of the data therein

  • rows for each additional piece of data and columns

  • for the different types of data that we have.

  • But in a relational database, such as Oracle, Microsoft Access, SQL Server,

  • My SQL, Postgres, or something smaller and lighter weight called SQLlite,

  • the burden is on you, the designer of the database,

  • or the programmer, or the business person

  • to actually decide on what types of data you are going to store.

  • Because the database, in order to handle more and more data

  • than a typical spreadsheet can support, needs a bit of help from you,

  • needs to know what type of data you're storing so that it can search it

  • more efficiently and sort it more effectively

  • and make it easier for you ultimately to add and remove data

  • from that particular database.

  • So in other words, you can do all of the same operations.

  • But generally, in a database, you have so much more data,

  • you're going to need to have the database help you help yourself.

  • So what does that actually mean here?

  • Well, let's consider some of these fields here.

  • Were we to migrate my spreadsheet with many, many, many rows of users,

  • for instance, to a proper database relationally,

  • I'm going to have to make a few more decisions as well.

  • And it turns out that I need to choose various types for these columns.

  • And generally, in a database, you only have a limited menu of options.

  • So let's take a look at what those might be.

  • Here in many relational databases are just

  • some of the data types that are available to you;

  • an integer if you want to represent something like 1, 2, 3, or perhaps

  • even a negative, a real number, otherwise known as a floating point

  • value, for instance, in Python that actually

  • has a decimal point and perhaps some numbers thereafter,

  • numeric, which is more of a catch-all and might handle things like dates

  • and times that are numbers in some sense but have some more

  • formal structure to them, and then the more general text, when you just

  • have words or phrases, characters or whole paragraphs

  • or more that you might want to store as well in some column.

  • And then lastly, you might have the cutely named BLOB,

  • or binary large object, when you actually want to store zeros and ones,

  • that is to say binary data, like actual files in your database.

  • However, you needn't do this typically.

  • You can actually store files, of course, on a file system on your own hard drive

  • somewhere on the server, but that option exists for you.

  • Now, this happens to be the list of data types supported

  • by the simplest of relational databases, something called SQLite.

  • And as its name implies, it actually supports a very specific language