Subtitles section Play video Print subtitles In our previous lesson, we saw what binary search trees are, now in this lesson we are going to implement binary search tree. We will be writing some code for binary search tree. prerequisite for this lesson is that you must understand the concepts of pointers and dynamic memory allocation in C/C++. If you have already followed this series and seen our lessons on linked list, then implementation of binary search tree or binary tree in general is not going to be very different. We will have nodes and links here as well. Ok, so lets get started. Binary search tree or BST as we know is a binary tree in which for each node, value of all the nodes in left subtree is lesser or equal and value of all the nodes in right subtree is greater. We can draw BST as a recursive structure like this. Value of all the nodes in left subtree must be lesser or equal and value of all the nodes in right subtree must be greater and this must be true for all nodes and not just the root node. So, in this recursive definition here, both left and right subtrees must also be binary search trees. I have drawn a binary search tree of integers here. Now, the question is, how can we create this non-linear logical structure in computer's memory. I had talked about this briefly when we had discussed binary trees. The most popular way is - dynamically created nodes linked to each other using pointers or references just the way we do it for linked lists. Because in a binary search tree, or in a binary tree in general, each node can have at most 2 children, we can define node as an object with 3 fields something like what I'm showing here. We can have a field to store data, another to store address or reference of left child and another to store address or reference to right child. If there is no left or right child for a node, reference can be set as NULL. In C or C++, we can define node like this. There is a field to store data. Here the data type is integer, but it can be anything. There is one field which is pointer to node. Node asterisk means pointer to node. This one is to store the address of left child and we have another one to store the address of right child. This definition of node is very similar to definition of Node for doubly linked list. Remember in doubly linked list also, each node had two links - one to previous node and another to next node. but doubly linked list was a linear arrangement. This definition of node is for a binary tree. We could also name this something like BstNode, but Node is also fine, lets go with Node. Now, in our implementation, just like linked list, all the nodes will be created in dynamic memory or heap section of application's memory using malloc function in 'C' or new operator in C++. We can use malloc in C++ as well. Now, as we know any object created in dynamic memory or heap section of applications memory cannot have a name or identifier. It has to be accessed through a pointer. malloc or new operator return us pointer to the object created in heap. if you want to revise some of these concepts of dynamic memory allocation, you can check the description of this video for link to a lesson. Its really important that you understand this concept of stack and heap in applications memory really well. Now, for a linked list, if you remember, the information that we always keep with us is address of the head node. If we know the head node, we can access all other nodes using links. In case of trees, the information that we always keep with us is address of the root node. If we know the root node, we can access all other nodes in the tree using links. To create a tree, we first need to declare a pointer to BstNode. I'll rather call node BstNode here, BST for binary search tree. So, to create a tree, we first need to declare a pointer to BstNode that will always store the address of root node. I have declared a pointer to Node here named rootPtr - Ptr for pointer. In C, you can't just write BstNode asterisk rootPtr. You will have to write struct space BstNode asterisk, you will have to write struct here as well. I am gonna write C++ here, but anyway right now I'm trying to explain the logic. we will not bother about minute details of implementation. In this logical structure of tree that I'm showing here, each Node as you can see has 3 fields, 3 cells. leftmost cell is to store the address of left child and rightmost cell is to store the address of right child. Lets say root node is at address 200 in memory and I'll assume some random addresses for all other nodes as well. Now, I can fill in these left and right cells in each node with addresses of left and right children. In our definition, data is first field, but in this logical structure, I am showing data in the middle. Ok, so for each Node, I have filled in addresses for both left and right child. Address is 0 or NULL if there is no child. Now, as we were saying, identity of the tree is address of the root node. We need to have a pointer to Node in which we can store the address of the root node. We must have a variable of type pointer to node to store the address of root node. All these rectangles with 3 cells are Nodes. They are created using malloc or new operator and live in heap section of application's memory. We cannot have name or identifier for them. They are always accessed through pointers. This rootPtr, root pointer has to be a local or global variable. We will discuss this in little more detail in some time. Quite often, we like to name this root pointer, just root. We can do so, but we must not confuse. This is pointer to root and not the root itself. To create a BST, as I was saying, we first need to declare this pointer. Initially, we can set this pointer as NULL to say that the tree is empty. A tree with no node can be called empty tree and for empty tree, root pointer should be set as NULL. We can do this declaration and setting the root as NULL in main function in our program. Actually, lets just write this code in a real compiler. I am writing C++ here. As you can see, in the main function I have declared this pointer to Node which will always store the address of root node of my tree and I am initially setting this as NULL to say that the tree is empty. With this much of code, we have created an empty tree. But, whats the point of having an empty tree. We should have some data in it. So, what I want to do now is I want to write a function to be able to insert a node in the tree. I will write a function named Insert that will take address of the root node and data to be inserted as argument and this function will Insert a node with this data at appropriate position in the tree. In the main function, I'll make calls to this insert function passing it address of root and the data to be inserted, lets say first I want to insert number 15 and then I want to insert number 10 and then number 20. We can insert more, but lets first write the logic for Insert function. Before I write the logic for Insert function, I want to write a function to create a new Node in dynamic memory or heap. This function GetNewNode should take an integer, the data to be inserted as argument, create a node in heap using new or malloc and return back the address of this new node. I am creating a new node here using this new operator. the operator will return me the address of the newly created node which I am collecting in this variable of type pointer to BstNode. In C, instead of new operator, we will have to use malloc. We can use malloc in C++as well. C++ is only a super-set of C. malloc will also do here. Now, anything in dynamic memory or heap is always accessed through pointer. Now, using this pointer - newNode, we can access the fields of newly created Node. I will have to dereference this pointer using asterisk operator. i am writing asterisk newNode and now I can access the fields. We have 3 fields in node - data and 2 pointers to node , left and right;. i have set the data here. Instead of writing asterisk newNode dot data, we have this alternate syntax that we could use. We could simply write newNode->data and this will mean the same. We have used this syntax quite a lot in our lessons on linked list. Now, for the new node, we can set the left and right child as NULL and finally we can return the address of the new Node. Ok, coming back to the insert function. We can have couple of cases in insertion. First of all, tree may be empty. For this first insertion, when we are inserting this number 15, tree will be empty. if tree is empty, we can simply create a new node and set it as root. With this statement, root equal GetNewNode, I am setting root as address of the new node. But there is something not alright here. This root is local variable of Insert function and its scope is only within this function. We want this root, root in main to be modified. This guy is a local variable of main function. There are 2 ways of doing this. We can either return the address of the new root. So, return type of insert function will be pointer to BstNode and not void. And here, in the main function, we will have to write statement like root equal Insert and the arguments. So, we will have to collect the return and update our root in main function. Another way is that, we can pass the address of this root of main to the Insert function. This root is already a pointer to Node. So, its address can be collected in a pointer to pointer. So, Insert function , in insert function first argument will be a pointer to pointer and here, we can pass the address. We will say ampersand root to pass the address. We can name this argument root or we can name this argument rootPtr. We can name this whatever. Now, what we need to do is, we need to dereference this using asterisk operator to access the value in root of main and we can also set the value in root of main. So, here with this statement, we are setting the value and the return type now can be void. This pointer to pointer thing gets a little tricky. I'll go with the former approach. Actually, there is another way. instead of declaring root as a local variable in main function, we can declare root as global variable. Global variable, as we know has to be declared outside all the functions. if root would be global variable, it would be accessible to all the functions and we will not have to pass the address stored in it as argument. Anyway, coming back to the logic for insertion. As we were saying, if the tree is empty, we can simply create a new node and we can simply set it as root. At this stage, we wanted to insert 15. If we will make call to the insert function, address of root is 0 or NULL. NULL is only a macro for 0 and the second argument is the number to be inserted. In this call to Insert function, we will make call to get new Node. Lets say, we got this new Node at address 200. GetNewNode function will return us address 200 which we can set as root here, but this root is a local variable. We will return this address 200 back to the main function and in the main function, we are actually doing this root equal Insert. So, in the main function we are building this link. Ok, our next call in the main function was to insert number 10. At this stage, root is 200. the address in root is 200 and the value to be inserted is 10. Now, the tree is not empty. So, what do we do. If the tree is not empty, we can basically have 2 cases. If the data to be inserted is lesser or equal, we need to insert it in the left subtree of root and if the data to be inserted is greater, we need to insert it in right subtree of the root. So, we can reduce this problem in a self similar manner, in a recursive manner. Recursion is one thing that we are going to use almost all the time while working with trees. In this function, I'll say that if the data to be inserted is less than or equal to the data in root, then make a recursive call to insert data in left subtree. The root of the left subtree will be the left child. So, in this recursive call, we are passing address of left child and data as argument and after the data is inserted in left subtree, the root of the left subtree can change. Insert function will return the address of the new root of the left subtree and we need to set it as left child of the current node. In this example tree here, right now, both left and right subtree are empty. We are trying to insert number 10, so we have made call to this function Insert. From main function, we have called Insert passing it address 200 and value or data 10. Now, 10 is lesser than 15, so control will come to this line and a call will be made to Insert data in left subtree. Now, left subtree is empty. So, address of root for left subtree is 0. Data passed, data to be inserted passed as argument is 10. Now, this first insert call will wait for this insert below to finish and return. For this last, insert call, root is NULL. Lets say we got this node at address 150. Now, this Insert call will return back 150 and execution of first insert call will resume at this line and now this particular address will be set as 150. So, we will build this link and now this insert call can finish. It can return back the current root. Actually this return root should be there for all cases, so I am taking it out and I have it after all these conditions. Of course, we will have one more else here. If the data is greater, we need to go Insert in right subtree. the