Subtitles section Play video Print subtitles DAVID MALAN: We've seen how with languages like Python can we implement business logic on a server and, even if we want, generate web pages. And we've seen, on the browser side, the so-called client side, how you can render information or data to users. And with CSS can we style it, and with JavaScript can we even make it interactive. But when building a business or any application that is interacting with users, where is all that data being stored? Well, if you're running your own small business, you might simply be using a spreadsheet. Of course, spreadsheets, whether they're implemented as Microsoft Excel, or Google Spreadsheets, or Apple Numbers, generally stores data in rows and columns. And if you're like me, you probably use that first row to represent the names of the columns and the types of the data that you're storing. And as you continue to add more and more data to that spreadsheet, you probably, like me, continue to add row after row after row. And meanwhile, if you've got so much data or so many different types of data that it doesn't really belong in one sheet, you might actually create a second sheet or a tab along the bottom of the screen, for instance, in which you can store additional data as well. So indeed, Microsoft Excel allows you to store data relationally, so to speak. You might have one sheet with customer info, another sheet with invoices, another sheet with products, or any number of other types of data. And you can relate those to another by somehow having some commonality among them, some identifier, some name, some customer ID or the like. And so data that tends to be stored in rows and columns can have these relationships. And you can do this certainly in the cloud these days as well with Google Docs, simply a web-based version of the same. But when designing your spreadsheets, or more generally, your data's schema, the design thereof, there's a number of questions you need to consider. What data do you want to store? And where do you want to store it? And what type of data is it, in fact? Because whether using Excel, or Numbers, or Google Spreadsheets, odds are you sometimes configure the columns to be displayed or formatted in some ways. Numbers might be displayed with or without commas. Numbers might be displayed with dollar signs or some other symbol. You might separate one field from another using special symbols or punctuation. So these are all configuration options in a spreadsheet that might be among the first decisions you make. In fact, let's consider a sample spreadsheet wherein I might want to represent users in some web application. All of us have, of course, registered for websites or applications these days. And what are some of the questions you're asked? Well, you might be asked for your user name by some application. You might be asked for your actual name. What else might you be asked for? Perhaps your email address. And perhaps if you're buying something, or registering for something, or someone that needs to know a bit more about you, perhaps they'll even ask you for your address, maybe your phone number, or perhaps even your age. So you can certainly imagine there being even other fields that you need. But let's consider now what types of data each of these fields is. Now, what should a user name be? Well, by definition, this should be a unique value that belongs to you and only you on a system. For instance, if I'm registering for a website and that website is storing some data in, for instance, a spreadsheet, I might ideally try to claim Malan if it's available. My name meanwhile, of course, will be just David Malan, which someone else can certainly have as well. Hopefully my email address is only mine. And that, too, will be stored in a column of its own. And then my address, for instance, here on campus, 33 Oxford Street, Cambridge, Massachusetts, 02138. A phone number by which I can be reached, 617-495-5000. And then my age shall be-- well, we'll leave that blank just for now. So notice how each of these pieces of data is just a little bit different. Some of them seem to be numeric. Some of them seem to be alphabetic. Some of them even have some punctuation therein. Now, you might or might not in a spreadsheet care to display this data a bit differently. Indeed, the only number that I'm sweeping under the rug here is, in fact, my age. But hopefully I don't need one or more commas in that value. So there isn't really any need for special formatting here. But it turns out that when you're actually building a website or software-based application, you're probably not storing your data ultimately in just spreadsheets. You might graduate eventually from being a small shop to needing more than just Microsoft Excel, or Numbers, or even something cloud-based like Google Spreadsheets. Why? Well, you have more rows than those programs can generally handle. In fact, on your own Mac or PC, odds are, when opening up big files, whether a spreadsheet or any other document, sometimes you might actually feel that. The computer might start to slow, and you might start to see a spinning beach ball or hourglass because you start to hit the limits of what a local computer can do with just client-side software, like a spreadsheet tool. And so eventually you might actually need to use a proper database. And a database is really just a piece of software that can absolutely run on your Mac or PC. But very commonly it runs on a server or somewhere else in the cloud to which your own software connects. For instance, if you're building a website or an application in a language like Python, you can, in Python, write code that talks or communicates with that database to pull data down and to send data back up. But spreadsheets are wonderfully straightforward. It's just so intuitive how you store the data in rows and columns and just more and more rows as you have more and more data. And so what's nice about some databases is that they mimic exactly this design. There exists in the world what are called relational databases. And indeed, this is among the most common ways to store data relationally. The jargon is a bit different in the world of databases. You no longer call these things spreadsheets. We call them databases. And we don't call the individual tabs sheets. We call them tables. But we continue to call the structure of the data therein rows for each additional piece of data and columns for the different types of data that we have. But in a relational database, such as Oracle, Microsoft Access, SQL Server, My SQL, Postgres, or something smaller and lighter weight called SQLlite, the burden is on you, the designer of the database, or the programmer, or the business person to actually decide on what types of data you are going to store. Because the database, in order to handle more and more data than a typical spreadsheet can support, needs a bit of help from you, needs to know what type of data you're storing so that it can search it more efficiently and sort it more effectively and make it easier for you ultimately to add and remove data from that particular database. So in other words, you can do all of the same operations. But generally, in a database, you have so much more data, you're going to need to have the database help you help yourself. So what does that actually mean here? Well, let's consider some of these fields here. Were we to migrate my spreadsheet with many, many, many rows of users, for instance, to a proper database relationally, I'm going to have to make a few more decisions as well. And it turns out that I need to choose various types for these columns. And generally, in a database, you only have a limited menu of options. So let's take a look at what those might be. Here in many relational databases are just some of the data types that are available to you; an integer if you want to represent something like 1, 2, 3, or perhaps even a negative, a real number, otherwise known as a floating point value, for instance, in Python that actually has a decimal point and perhaps some numbers thereafter, numeric, which is more of a catch-all and might handle things like dates and times that are numbers in some sense but have some more formal structure to them, and then the more general text, when you just have words or phrases, characters or whole paragraphs or more that you might want to store as well in some column. And then lastly, you might have the cutely named BLOB, or binary large object, when you actually want to store zeros and ones, that is to say binary data, like actual files in your database. However, you needn't do this typically. You can actually store files, of course, on a file system on your own hard drive somewhere on the server, but that option exists for you. Now, this happens to be the list of data types supported by the simplest of relational databases, something called SQLite. And as its name implies, it actually supports a very specific language