Placeholder Image

Subtitles section Play video

  • In a previous video, I've talked about linear systems of equations, and I sort of brushed

  • aside the discussion of actually computing solutions to these systems.

  • And while it's true that number-crunching is something we typically leave to the computers,

  • digging into some of these computational methods is a good litmus test for whether or not you

  • actually understand what's going on, since this is really where the rubber meets the

  • road.

  • Here I want to describe the geometry behind a certain method for computing solutions to

  • these systems, known as Cramer's rule.

  • The relevant background needed here is an understanding of determinants, dot products,

  • and of linear systems of equations, so be sure to watch the relevant videos on those

  • topics if you're unfamiliar or rusty.

  • But first!

  • I should say up front that Cramer's rule is not the best way for computing solutions

  • to linear systems of equations.

  • Gaussian elimination, for example, will always be faster.

  • So why learn it?

  • Think of this as a sort of cultural excursion; it's a helpful exercise in deepening your

  • knowledge of the theory of these systems.

  • Wrapping your mind around this concept will help consolidate ideas from linear algebra,

  • like the determinant and linear systems, by seeing how they relate to each other.

  • Also, from a purely artistic standpoint, the ultimate result is just really pretty to think

  • about, much more so that Gaussian elimination.

  • Alright, so the setup here will be some linear system of equations, say with two unknowns,

  • x and y, and two equations.

  • In principle, everything we're talking about will work systems with a larger number of

  • unknowns, and the same number of equations.

  • But for simplicity, a smaller example is nicer to hold in our heads.

  • So as I talked about in a previous video, you can think of this setup geometrically

  • as a certain known matrix transforming an unknown vector, [x; y], where you know what

  • the output is going to be, in this case [-4; -2].

  • Remember, the columns of this matrix tell you how the matrix acts as a transform, each

  • one telling you where the basis vectors of the input space land.

  • So this is a sort of puzzle, what input [x; y], is going to give you this

  • output [-4; -2]?

  • Remember, the type of answer you get here can depend on

  • whether or not the transformation squishes all of space into a lower dimension.

  • That is if it has zero determinant.

  • In that case, either none of the inputs land on our given output or there are a whole bunch

  • of inputs landing on that output.

  • But for this video we'll limit our view to the case of a non-zero determinant, meaning

  • the output of this transformation still spans the full n-dimensional space it started in;

  • every input lands on one and only one output and every output has one and only one input.

  • One way to think about our puzzle is that we know the given output vector is some linear

  • combination of the columns of the matrix; x*(the vector where i-hat lands) + y*(the

  • vector where j-hat lands), but we wish to compute what exactly x and y are.

  • As a first pass, let me show an idea that is wrong, but in the right direction.

  • The x-coordinate of this mystery input vector is what you get by taking its dot product

  • with the first basis vector, [1; 0].

  • Likewise, the y-coordinate is what you get by dotting it with the second basis vector,

  • [0; 1].

  • So maybe you hope that after the transformation, the dot products with the transformed version

  • of the mystery vector with the transformed versions of the basis vectors will also be

  • these coordinates x and y.

  • That'd be fantastic because we know the transformed versions of each of these vectors.

  • There's just one problem with this: it's not at all true!

  • For most linear transformations, the dot product before and after the transformation will be

  • very different.

  • For example, you could have two vectors generally pointing in the same direction, with a positive

  • dot product, which get pulled away from each other during the transformation, in such a

  • way that they then have a negative dot product.

  • Likewise, if things start off perpendicular, with dot product zero, like the two basis

  • vectors, there's no guarantee that they will stay perpendicular after the transformation,

  • preserving that zero dot product.

  • In the example we were looking at, dot products certainly aren't preserved.

  • They tend to get bigger since most vectors are getting stretched.

  • In fact, transformations which do preserve dot products are special enough to have their

  • own name: Orthonormal transformations.

  • These are the ones which leave all the basis vectors perpendicular to each other with unit

  • lengths.

  • You often think of these as rotation matrices.

  • The correspond to rigid motion, with no stretching, squishing or morphing.

  • Solving a linear system with an orthonormal matrix is very easy: Since dot products are

  • preserved, taking the dot product between the output vector and all the columns of your

  • matrix will be the same as taking the dot products between the input vector and all

  • the basis vectors, which is the same as finding the coordinates of the input vector.

  • So, in that very special case, x would be the dot product of the first column with the

  • output vector, and y would be the dot product of the second column with the output vector.

  • Now, even though this idea breaks down for most linear systems, it points us in the direction

  • of something to look for: Is there an alternate geometric understanding for the coordinates

  • of our input vector which remains unchanged after the transformation?

  • If your mind has been mulling over determinants, you might think of this clever idea: Take

  • the parallelogram defined by the first basis vector, i-hat, and the mystery input vector

  • [x; y].

  • The area of this parallelogram is its base, 1, times the height perpendicular to that

  • base, which is the y-coordinate of our input vector.

  • So, the area of this parallelogram is sort of a screwy roundabout way to describe the

  • vector's y-coordinate; it's a wacky way to talk about coordinates, but run with me.

  • Actually, to be more accurate, you should think of the signed area of this parallelogram,

  • in the sense described by the determinant video.

  • That way, a vector with negative y-coordinate would correspond to a negative area for this

  • parallelogram.

  • Symmetrically, if you look at the parallelogram spanned by the vector

  • and the second basis vector, j-hat, its area will be the x-coordinate of the vector.

  • Again, it's a strange way to represent the x-coordinate, but you'll see what it buys

  • us in a moment.

  • Here's what this would look like in three-dimensions: Ordinarily the way you might think of one

  • of a vector's coordinate, say its z-coordinate, would be to take its dot product with the

  • third standard basis vector, k-hat.

  • But instead, consider the parallelepiped it creates with the other two basis vectors,

  • i-hat and j-hat.

  • If you think of the square with area 1 spanned by i-hat and j-hat as the base of this guy,

  • its volume is the same its height, which is the third coordinate of our vector.

  • Likewise, the wacky way to think about any other coordinate of this vector is to form

  • the parallelepiped between this vector an all the basis vectors other than the one you're

  • looking for, and get its volume.

  • Or, rather, we should talk about the signed volume of these parallelepipeds, in the sense

  • described in the determinant video, where the order in which you list the three vectors

  • matters and you're using the right-hand rule.

  • That way negative coordinates still make sense.

  • Okay, so why think of coordinates as areas and volumes like this?

  • As you apply some matrix transformation, the areas of the parallelograms don't stay the

  • same, they may get scaled up or down.

  • But(!), and this is a key idea of determinants, all these areas get scaled by the same amount.

  • Namely, the determinant of our transformation matrix.

  • For example, if you look the parallelogram spanned by the vector where your first basis

  • vector lands, which is the first column of the matrix, and the transformed version of

  • [x; y], what is its area?

  • Well, this is the transformed version of that parallelogram we were looking at earlier,

  • whose area was the y-coordinate of the mystery input vector.

  • So its area will be the determinant of the transformation multiplied by that value.

  • So, the y-coordinate of our mystery input vector is the area of this parallelogram,

  • spanned by the first column of the matrix and the output vector, divided by the determinant

  • of the full transformation.

  • And how do you get this area?

  • Well, we know the coordinates for where the mystery input vector lands, that's the whole

  • point of a linear system of equations.

  • So, create a matrix whose first column is the same as that of our matrix, and whose

  • second column is the output vector, and take its determinant.

  • So look at that; just using data from the output of the transformation, namely the columns

  • of the matrix and the coordinates of our output vector, we can recover the y-coordinate of

  • our mystery input vector.

  • Likewise, the same idea can get you the x-coordinate.

  • Look at that parallelogram we defined early which encodes the x-coordinate of the mystery

  • input vector, spanned by the input vector and j-hat.

  • The transformed version of this guy is spanned by the output vector and the second column

  • of the matrix, and its area will have been multiplied by the determinant of the matrix.

  • So the x-coordinate of our mystery input vector is this area divided by the determinant of

  • the transformation.

  • Symmetric to what we did before, you can compute the area of that output parallelogram by creating

  • a new matrix whose first column is the output vector, and whose second column is the same

  • as the original matrix.

  • So again, just using data from the output space, the numbers we see in our original

  • linear system, we can recover the x-coordinate of our mystery input vector.

  • This formula for finding the solutions to a linear system of equations is known as Cramer's

  • rule.

  • Here, just to sanity check ourselves, plug in the numbers here.

  • The determinant of that top altered matrix is 4+2, which is 6, and the bottom determinant

  • is 2, so the x-coordinate should be 3.

  • And indeed, looking back at that input vector we started with, it's x-coordinate is 3.

  • Likewise, Cramer's rule suggests the y-coordinate should be 4/2, or 2, and that is indeed the

  • y-coordinate of the input vector we started with here.

  • The case with three dimensions is similar, and I highly recommend you pause to think

  • it through yourself.

  • Here, I'll give you a little momentum.

  • We have this known transformation, given by a 3x3 matrix, and a known output vector, given

  • by the right side of our linear system, and we want to know what input vector lands on

  • this output vector.

  • If you think of, say, the z-coordinate of the input vector as the volume of this parallelepiped

  • spanned by i-hat, j-hat, and the mystery input vector, what happens to the volume of this

  • parallelepiped after the transformation?

  • How can you compute that new volume?

  • Really, pause and take a moment to think through the details of generalizing this to higher

  • dimensions; finding an expression for each coordinate of the solution to larger linear