Subtitles section Play video
In these first few videos, I want to lay the foundation of some core concepts you will
need throughout these video tutorials. Let's get started with the basics. So what is a
data structure? one definition that I really like is a data structure is a way of organizing
data so that it can be used efficiently. And that's all data structure really is it is
a way of organizing data in some fashion so that later on it can be accessed, queried,
or perhaps even updated quickly and easily. So why data structures? Why are they important?
Well, they are essential ingredients in creating fast and powerful algorithms. Another good
reason might be that they help us manage and organize our data in a very natural way. And
this last point, is more of my own making. And it's that it makes code cleaner and easier
to understand. As a side note, one of the major distinctions that I have noticed from
bad mediocre to excellent programmers is that the ones who really excel are the ones who
fundamentally understand how and when to use the appropriate data structure for the task
they're trying to finish. data structures can make the difference between having an
okay product and an outstanding one. It's no wonder that every computer science undergraduate
student is required to take a course in data structures. It is strange that before we even
begin talking about data structures that we need to talk about the abstraction of data
structures. What I'm talking about is the concept of an abstract data type. What is
an abstracted type and how does it differ from a data structure? Well, the answer is
that an abstract data type is an abstraction of a data structure, which provides only the
interface to which that data structure must adhere to. The interface does not give any
specific details on how
a specific data structure should be implemented, or in what programming language. One particular
example I like to give is to suppose that your abstract data type is for a mode of transportation
to get from point A to point B. Well, as we both know, there are many modes of transportation
to get from one place to another. So which one do you choose? some specific modes of
transportation might be walking or biking, perhaps even taking a train and so on, these
specific modes of transportation would be analogous to the data structures themselves.
We want to get from one place to another through some mode of transportation, that was our
abstract data type. How did we do that? Exactly? Well, that is the data structure. Here are
some examples of abstract data types on the left, and some possible underlying implementation
on the right hand side. As you can see, a list can be implemented in two ways. You can
have a dynamic array or a linked list. They both provide ways of adding, removing and
indexing elements in the list. Next, we have a queue and the map abstract data types, which
themselves can be implemented in a variety of ways. Notice that under the implementation
for a queue, I put a stack based queue because yes, you can have a queue, which is implemented
using only stacks. This may not be the most efficient way of implementing a queue. But
it does work and it is possible. The point here is that the abstract data type only defines
how a data structure should behave, and what methods it should have, but not the details
surrounding how those methods are implemented. Alright, now that we were done with abstract
data types, we need to have a quick look at the wild world of computational complexity,
to understand the performance that our data structures are providing. So as programmers,
we often find ourselves asking the same two questions over and over and over again. That
is, how much time does this algorithm need to finish and also how much space Does this
algorithm need for my computation. So, if your program takes a lifetime of the universe
to finish, then it's no good. Similarly, if your program runs in constant time, but requires
a space equal to the sum of all the bytes of all the files on the internet, internet,
your algorithm is also useless. So just standardize a way of talking about how much time and how
much space is required for an algorithm to run. theoretical computer scientists have
invented big O notation, amongst other things like big theta, big omega, and so on. But
we're interested in Big O because it tells us about the worst case. Big O notation only
cares about the worst case. So if your algorithm sorts numbers, imagine the input is the worst
possible arrangement of numbers for your particular sorting algorithm. Or, as a concrete example,
suppose you have an unordered list of unique numbers, and you're searching for the number
seven, or the position where seven occurs in your list. And the worst case isn't one
seven, that is at the beginning. Or in the middle of the list, it's one seven is the
very last element of the list. for that particular example, the time complexity would be linear
with respect to the number of elements in the size view list. Because you have to traverse
every single element until you find seven. The same concept applies to space, you can
just consider the worst possible amount of space your algorithm is going to need for
that particular input. There's also the fact that big O really only cares about what happens
when your input becomes arbitrarily large. We're not interested in what happens when
the input is small. For this reason, you'll see that we ignore things like constants and
multiplicative factors. So in our big O notation, you'll often see these coming up. Again, again,
again. So to clarify one thing,
when I say n, n is usually always want to be the input size coming into your algorithm,
there's always going to be some limitation of size n. So constant time referred to as
a one wrapped around a big O. If your algorithm case in logarithmic amount of time to finish,
we say that's big O of a log event. If it's linear, then we just say n, if it takes like
quadratic time or cubic time, then we say that's n raised to that power, it's exponential.
Usually, this is going to be something like two to the n three, the N, N number be greater
than one to the n. And then we also have n factorial. But we also have things in between
these like square root of n log log of n, n to the fifth and so on. Actually, almost
any mathematical expression containing n can be wrapped around a big O. And is big O notation
valid. Now, we want to talk about some properties of Big O. So to reiterate what we just saw
in the last two slides, Big O only really cares about what happens when input becomes
really big. So we're not interested when n is small, only
what happens when n goes to infinity. So this is how and why we get the first two properties.
The first that we can simply remove constant values added to our big O notation. Because
if you're adding a constant to infinity, well, let's just infinity, or if you're multiplying
a constant by infinity, yeah, that's still infinity. So we can ignore constants. But
of course, this is all theoretical. In the real world. If your constant is of size, 2
billion, probably, that's going to have a substantial impact on the running time of
your algorithm.
However, let us look at a function f, which I've defined over some input size. And if
f of n is seven log of n cubed plus 15 n squared plus two n q plus eight. Well, big O of f
of n is just n cubed, because n cubed is the biggest, most dominant term in This function.
All right, now let's
look at some concrete examples of how big O is used. So both of the following run in
constant time with respect to the input size, and because they don't depend on n
at all.
So on the left, when we're just adding or doing some mathematical formula, yes, that's
constant time. And on the right, okay, we're doing a loop, but the loop itself doesn't
depend on n. So it runs also in a constant amount of time. So as our input size gets
arbitrarily large, well, that loop is still going to run in the same amount of time regardless.
Now let's look at a linear example. So both the following actually run in
linear time with respect to the input size, and because we do a constant amount of work,
and times.
So on the left, we're incrementing, the counter AI by one each time. So f of n is and, and
clearly, when we wrap this in a big go get a big O of n. On the right, a little bit more
complicated, we're not incrementing by one, we're incrementing by three, so we're going
to finish that loop three times faster. So f of n is n over three. So our second example
is two algorithms that run in quadratic time. So the first one seems obvious, we do n work
at times. So n times as big O of n squared. But observe the second one, I changed the
zero with an eye. So pause the video and try to figure out maybe why that's big O of n
squared. Okay, let's go over the solution. So first, just focus on the second loop. The
first loop isn't as important. So since I go from zero to n, the amount of looping done
is going to be directly related to what AI is, and remember is changing from the outer
loop. So if we fix AI to be zero, we do n work. We fix it one, we do n minus one work.
If we fix it, too, we do n minus two work and excetera. So the question then becomes
what is n plus n minus one plus n minus two plus n minus three and so on? Well, this is
a well known identity, and turns out to be n times n plus one divided by two. So if we
wrap this in a big O, we split our equation, we can see that this is big O of n squared.
Now let's look at perhaps a more complicated example. Earlier, you may have wondering,
how do we ever get these log Eurythmics or linear rhythmic time complexities? So here's
a classic algorithm of doing a binary search, which yields actually a logarithmic time complexity.
So what this algorithm does is it starts by making two pointers, one at the very start
in one at the very end of the array, then it selects a midpoint between two and checks
if the value we're looking for was found at the midpoint, and then has either found it
or not, if it has found it, it stops. Otherwise, it discards one half of the array, and adjusts
either the high or the low pointer. remark that even in the worst case, we're still discarding
half of the array, each iteration. So very, very quickly, we're going to run out of a
range check. So if you do the math, the worst case is you will do exactly log base two of
N iterations, meaning that the binary search will runs in logarithmic time. Very cool algorithm,
a very powerful algorithm. Here's a slightly different example worth going over. So first,
notice that there is an outer loop with a counter I that does and work, then notice
that there are two inner loops, one that does three n work another one that does, too, one
work. So the rule we use to determine the complexity of this algorithm is to multiply
loops on different levels and add those that aren't the same, generally speaking. So, so
using the rule above, we can see that it takes n work to do the outer loop, multiply by three
n plus two n for both inner loop Which gives us five n squared, which is big O of n squared.
All right, so this next one looks very similar, but it's actually quite different. So on the
outer loop with AI, we have AI going from zero to three n. So there's three n work done
on the outside. But we have to multiply that but what's going on on the inside,
so the inside
j goes from 10, to 50. So that does 40 loops exactly every loop. So that's a constant for
the amount of work. Plus however, the second loop, so j is less than and cubed, but j is
J equals j plus two, so it's accelerated a little, so we're going to finish that loop
a little faster. So we're gonna get on the inside 14, plus n cubed divided by two, but
we have to multiply that by three n. So we wrap that in a big O. So big O of f of n,
is going to give us big of enter the force, because And the fourth is the dominant term
in our function, f of n. For some other classic examples of Big O. So if we have to find all
the subsets of a set, that takes an exponential amount of time, because there are two the
N subsets, finding all permutations of a string takes big O of n factorial. Another classic
one is merge sort. So we have n times log in for your merge sort. And if we have to
iterate over all the cells of an array of size n by n, we have big O of n m. Time. All
right, let's talk about arrays, probably the most used data structure. This is part one
of two in the array videos. The reason the array is used so much is because it forms
a fundamental building block for all other data structures. So we end up seeing everywhere.
with arrays and pointers alone, I'm pretty sure we can construct just about any data
structure. So an outline for today's video. First, we're going to begin by having a discussion
about arrays and answer some fundamental questions such as what where and how are arrays used?
Next, I will explain the basic structure of an array and the common operations we are
able to perform on them. Lastly, we will go over some complexity complexity analysis and
look at some source code on how to construct a dynamic array using only static arrays,
discussion and examples. So what is a static array? So static array is a fixed length container
containing elements, which are indexable, usually on the range of zero, inclusive to
n minus one also inclusive. So a follow up question is what is meant by being indexable?