TinyURL System Design | URL Shortner System Design Interview Question | Bitly System Design

Subtitles section Play video

Hello everyone! Welcome to codeKarle. In this video we'll be talking about
possibly one of the most common system design interview question.
So let's see how do we design a URL shortening service, something very similar to tinyurl.com.
Let's start with the functional (FRs) and non functional requirements (NFRs).
This is a fairly straightforward problem. You have these two functional requirements -
given a long URL, you get a short URL. So, when somebody wants to shorten the URL,
they'll give you a long URL and you need to return a shorter URL for that.
On the other side, when somebody hits a short URL then you basically redirect the
person to the longer URL, which is the second point.
On the non-functional side, this service needs to be highly available and it should work at a very
low latency. Because, let's just say, we were building it for a Facebook or a Twitter kind of a scenario,
it would just be a bad experience if it just takes a lot of time to respond.
And this being a very platform kind of a component,
it needs to be always available and respond back in a very short time.
The very first thing to answer is - What should be the length of our short URL?
That depends upon the scale that we need to build this system for.
If we're just told a couple of hundreds of URLs, maybe 2-3 characters are more than enough.
But if you are building it at a Google scale or a Facebook scale,
then we need to think over what should be the length of our short URL.
Because we should not run out of the short URLs.
Another option is - we build a system agnostic of the length and we start from somewhere
and it keeps on incrementing.
But let's start with a fixed length to start.
The very first question you need to ask your interviewer is - What is the traffic you are looking at?
How many unique URLs will come to you for shortening?
Probably every day, every month, something of some number.
Until what duration you need to support that?
Let's just assume that -
Once a URL is converted into a shorter URL, you need to save it for at least next ten years.
Sounds a bit too much, but let's just take it as a number, okay.
With that in mind, let's look at how can we come up with the potential length.
So, let's just say that your interviewer tells you that there are 'X'
number of requests that come to you every second.
So, (X * 60) is the number of requests. These are basically the number of short
URLs that you have to generate.
So, these are the number of requests that come to you in a minute.
These are the number of requests that come to you in an hour.
Multiplying by 24 gives you the number of requests in a day.
Multiplying by 365 gives you the number of requests in a year. Multiplying by 10 years.
Whatever this number would be, you should be able to build a system which can handle
these number of unique requests.
Let's just say this number is 'Y'. Now, the next question to ask is -
What all characters can you include in the short URL? Should it be just numbers? Could it be alphabets?
Could it be capital alphabets and small case alphabets? What all is allowed?
Generally, people take [A to Z], [a to z] and numbers.
So maybe we can stick to that but you could ask your interviewer and then you could come up with
a set of, basically a character set that is allowed for short URLs.
For now, let us assume that we'll take [a to z], [A to Z] and numbers [0 to 9].
These are 62 characters which you can use in the short URL.
Now, you need to have some length, you need to come up with a length which can handle
these number of URLs, these number of unique combinations of these 62 characters.
Let's just say if the length was 1, then you can support 62 URLs.
Because there is one possibility each.
If the length of the short URL is 2, then you can support (62 ^ 2), which is some number let's say.
Basically, you need to come up with a number where (62 ^ n) is greater than this number Y.
You solve this equation and you get the answer to n. n would be log62(Y).
The feeling of it.
Let's say, if this number comes down to be 4.5, then you take n = 5 so you will create short URLs of length 5.
As a general number, (62 ^ 6) is somewhere around 58 billion if I am not wrong
and (62 ^ 7) is somewhere around 3.5 trillion. Both of these are very massive numbers.
Depending upon what is your use case, you can come up with some length.
Let's just say, for our use case, we will stick to 7 characters.
Generally, I have seen it's a standard that these short URLs generally use 7 characters, may be more at times.
But we will stick to 7 characters and we will be able to support 3.5 trillion unique URLs.
That I am assuming is more than this number (Y).
If it so happens that this Y is even more, then you can probably take 8 characters,
9 characters or whatever.
Now, let's look at one of the potential architectures that can solve this problem.
This is not a final version. We will iterate over it.
So, let's just say, there is this UI, which takes the long URL as input and gives back the short URL.
So, it will call the Short URL Service. There'll be multiple instances of this service, which somehow
generates a short URL and stores it in the database and returns the short URL to the customer.
On the other side, this could also be used when somebody hits a short URL wherein the Short URL Service
would fetch the long URLs from the database and then redirect the user to the longer URL.
This looks nice.
But then we have not covered one important point that how does this service really generate a short URL.
Even though we had those 62 characters, for all the communication further we'll just assume,
we'll just use numbers so that it's easy to calculate and think over it.
Let's just say, there are multiple instances of this service and all of them are getting some requests.
This service could also generate a number 111, this service could also generate a number 111
and then we will end up with a collision.
If two services generate a same short URL for two requests then we have a problem.
In technical terms it's called a collision but for us it's a problem
because we cannot have one short URL pointing to two long URLs.
You can say that the possible solution could be that we first check in the database and then you know
kind of retry. That would be fine but then the problem is that's not really efficient.
What we need is a predictable way to generate a short URL knowing that there would be no collisions at all.
One very simple way to do that is using one of the features of Redis.
So let's just say, we use a Redis cluster and all of these services request a number from Redis.
Redis makes sure that it will always return a unique number. So basically what it will do is -
it will start the counting from 1 all the way up to billions, trillions, whatever.
And each time it gets a request, it increments the counter and responds back.
We will make sure that all of these guys are getting unique numbers and from
unique number at base 10, they can convert to base 62 and generate the short URL.
Now that would work fine but then we have a problem.
The problem is that first of all, everybody now is connecting to Redis.
So, we have Redis under a huge amount of load.
Second problem is - remember one of the important key things of any system design interview,
you should never create a single point of failure.
What is happening is - in all the requests that have to generate a short URL, this Redis becomes a single point of failure
and there is no backup to it. If this goes down then there is no way we can recover, which is a problem.
You can argue that we can have multiple Redis but that's also tricky.
Moreover, if the scale becomes beyond the capacity of one Redis machine, then we are in much bigger problem.
Why? Because, let's say, if one Redis is not able to scale to the latency requirements that we wanted to,
this will again start choking the whole system.
Another alternate is - that we keep multiple Redis that will give us high availability and it will give us
better performance because now there are multiple machines and some of these
machines will talk to this Redis and let's say this machine talks to this Redis.
This would work just fine till the time these Redis don't start generating duplicate numbers.
If both the Redis are starting from the same index, again they'll start generating duplicates.
What we need to do is - somehow make sure that both these Redis do not generate a same number.
One very simple way is - to give a series to this Redis, another series to this Redis to start with.
That would work fine. But, what if you have to add a third Redis.
Now it becomes complicated. Now you need somebody to manage what series are being given to whom.
Plus once you have that management piece, we might as well try to get away from Redis.
Let's try to look at a slightly more advanced system which can generate unique URLs without using a Redis,
without having a single point of failure.
Let's look at one of the possible solutions which can solve this problem.
Remember, what we need. Whenever we get a long URL and we need to convert it to a short URL,
we basically need to make sure that our service machine should be able to generate a unique number,
which will never ever get repeated even by any other instance of the service.
So, each of these two service instances should generate a number, which other ones will never ever generate.
Convert that number from base 10 to base 62 and return back. This is the flow that we need.
What we've done is - again going over the UI,
so this is the, green thing is basically the Long to Short UI,
from where somebody can give a long URL and expect to get a short URL as an output.
It talks to a load balancer and behind it are multiple instances of Short URL Service.
What I am trying to now do is - I have introduced a Token Service.
The idea is all of these guys need to have a number.
And making sure that none of them, none of the other machines generate the same number.
One of the simple ways to do that is -assign ranges to each of the machines.
I can say that - this Short URL Service will request a token range from Token Service.
Each time Token Service gets a request, it will run on a single threaded model
and it will give a range to any of the service instances.
Let's say, it says that, I am giving a range (1 - 1,000) to this machine.
Now, this machine on startup will also call Token Service.
So, all of these services call Token Service either when they start up
or when they are running out of the range.
Now, let's say, this service got a range from (1001 - 2000).
Similarly, other instances of the same service would also get some range.
This Token Service can be built on top of a MySQL, because it runs at a very low scale.
It will get a call once in a couple of hours and it will return like the ranges.
I have taken these numbers as (1 - 1,000). Realistically, it will be a bit larger number
so that at least it can run for an hour or so.
So, now Token Service has given numbers to each of these services.
Now, the flow is very simple.
Let's say, a request reaches this service. It takes the first number 1001, converts that into base 62 and
returns the output. Obviously after saving it.
The next request comes, it increments it. It takes 1002, saves it in Cassandra, returns back.
Next request comes in, it uses 1003.
Let's say, it somehow gets lots of requests, gets to 2000. At that point in time, maybe a bit before that also,
when we are closing the end of the range, it will query Token Service and get the next range.
Let's say, the next range that this service got is (5001 - 6000), let's say.
So, now it will continue processing this one.
Now, when (5001 - 6000) is assigned to this service, when this service will make a request,
it can possibly get (6001 - 7000), but it will never get this (5001 - 6000) range. Why?
Because that thing is maintained by Token Service and the way it would probably work is -
it will keep a range as a record and keep a flag, whether it is assigned or not.
In a simple MySQL, on transaction basis, you get a record. When you get a request, you take the
first unassigned token range and return that.
And because it will then sit on top of a MySQL database, it will make sure that it is always unique.
So, this becomes kind of our read flow.
Now, let's say, there is a massive amount of traffic.
All we need to do is - keep multiple instances of Token Service, at max.
Even though it might... so, there are multiple ways to scale it. First of all, we can increase the length of the range
so that it doesn't get bombarded too often. So, instead of thousands - thousands, we can allocate
millions of tokens at once so it doesn't get too many requests.
And anyway, this service will be distributed in multiple geographies at least and multiple data centers
so that this doesn't become a single point of failure.
There are some problems in this.
Possible problems are - What if this service got this (5001 - 6000) range, started working for a while,
used a couple of tokens and then it kind of got shut down and died?
Let's say, there was an Out Of Memory error and this process got killed. What happens to those tokens?
So, those tokens go away. Go away as in, there is no track record of how many tokens are being used.
All what is happening is - it is iterating over that list in-memory and assigning one token to each request.
If this service dies down, it will probably, on the same machine, it will get started up again
and the next time we will get another token range.
Let's say, third time, it gets a token range of (9001 - 10000).
Now, all the unused tokens of this (5001 - 6000) range, there is no record of them.
So, they are kind of gone forever.
If you look at it, let's say, even if this happens a couple of times in a day, overall how many tokens?
If with the length of 7 for a short URL, we are able to support 3.5 trillion unique numbers. That is a very massive number.
If you lose out a couple of thousands of tokens out of this range, it is like taking a bucket out of an ocean.
It doesn't really matter. So, even if you lose out some tokens, it's okay!!
But, if you start tracking those tokens, then it will become a fairly complicated system.
Just to make it easy for us to develop, we will say that - okay! when the system shuts down,
all those tokens go away. We don't need them. We will get a new range of tokens and work out of that.
That is one thing that we could do. This is basically the long URL to short URL path.
On the other side, when a short URL is hit, all you do is - you get a request in short URL, you hit your database
and you get the longer URL, you send it back to this service. This service does a redirect
and the user gets redirected to the main URL.
One thing I have not answered is - Why did I use a Cassandra?
Ideally, I could have used any database here. All we need to do is - to keep a database which can handle
these many number of unique URLs.
A MySQL, at these number of records would start giving some problems.
We can shard it probably and make it work.
Cassandra will, for sure, work. So, that's the reason I've kept Cassandra.
You can try using a MySQL here. With enough sharding and all of that, that would possibly also work fine.
That's not that big of a problem.
Plus this Token service is also using a MySQL. So, that could be shared across both the services as well.
Whatever we have built till now is good enough from a functional and non-functional standpoint
and it does what we were supposed to build.
But is it good enough? Definitely not!!
Why? Because this system, what we have built, does not give us any metrics about how it is being used,
what kind of URLs are the top most used URLs, what kind of geographies people come from
and it does not give any kind of metrics that we would want to have out of this system.
For example, whenever a person is generating a short URL, wouldn't it be nice, if we can tell them
what kind of geography do your people come from or what kind of hits are you getting
or what kind of user agents or devices the people are connecting from.
All of those things would be valuable information for any person who's generating a short URL.
Similarly for us, let's say, if we have multiple data centers.
Let's say, for example, we have 4 data centers and we want to choose any 2 of them as primary
and 2 of them as standby data centers - and they are spread across the globe.
If you have to decide which ones should be primary and which ones should be secondary,
normally, what companies do is - they just make a decision based on somebody's gut feeling.
But we could make a data-driven decision over here, depending upon what kind of traffic we are getting,
from what kind of geographies and wherever we are getting most amount of traffic from,
the data centers closer to those geographies could be made as primary data centers.
For doing all of these things, it will be good to have a good enough amount of analytics that this system emits out.
Let's look at how can we do that?
Let's see how the analytics is being built.
The normal flow, the way it works is - whenever we get a request with a short URL,
let's say, a client sends us a request. We query our database, get the longer URL from the database
based on the short URL and return it back to the client, which then redirects the user to the longer URL.
Instead of just simply returning the URL, what we'll essentially do is - each time the request comes to us,
that request will come with a lot of attributes. It will give us some origin header saying what is the platform
from where this request has come in. So, let's just say, we posted this link on a Facebook or a LinkedIn
kind of a platform, it will have that information. It will also have information about the user agent
which is calling, let's say, if it's an Android application or iOS application or a browser.
It will have that kind of an information as well and it will also have a source IP address.
All of these information, before sending out the response, we will also put that into a Kafka,
which will be used to then power the Analytics.
Based on the IP address, we'll also be able to come up with what country it is.
So, all of those things can be taken care of on the Analytics side.
But if we start putting into Kafka on each request when we get the request,
it will impact our non-functional requirement of latency.
Why? Because, now we are doing an additional step in the process and that will increase the latency.
So, I would not recommend doing that.
What instead we could do as a fairly nice solution is - make that Kafka request as a parallel call.
So, you can have a different thread, in which you can send the Kafka write and return back to the user.
And asynchronously, it can be returned to Kafka.
What is the problem in doing an asynchronous thing?
There is a potential possibility that the Kafka write could fail, for whatever reason, and
you have returned back to the user so you'll miss certain analytics.
Now, in this kind of a system, because payment and all is not involved, it's just some simple analytics,
we should be okay if we are losing out on certain events. So, I think it's okay if we build it this way also.
Worst case, we'll just lose out on some analytics, which is fine.
But, can we improve it even further?
So, the problem with this approach is - each time we write to Kafka, it involves some I/O (input/ output).
There's a CPU involved, there's a I/O involved, there's a network transfer involved.
Can we avoid all of this i.e. doing all of this on each call?
Probably! So, what we could do is - instead of writing into Kafka on each request,
we could maybe aggregate that information locally in a machine.
So, maybe you can have a queue or some kind of a data structure in which
you are persisting each record that we got a request for this short URL with count 1.
The request again comes in, you increment the count to count 2.
Something of that sort you could implement. Or you could simply make it as a queue of all the requests.
Whenever the size of the data structure crosses some threshold,
or you could do it on a time basis also, saying every 10 seconds you'll flush that.
So, in whatever condition you can flush the data into that data structure, into one call in Kafka.
Basically you are reducing the I/O that you are doing with each request and doing it as a batch write.
The benefit you get is - you can have now drive much more performance out of that single machine,
thereby helping us in the non-functional requirements of high availability, low latency, all of that.
And secondly, it will improve the overall performance of the system.
Again, the drawback here is - now you can lose out, not just one event but a lot of events.
Based on your conversation with your interviewer, you can decide on - if that's okay or not.
Personally, I would say it's a fairly okay thing if you lose out on 10, 20, 30 events. That's perfectly fine.
Now, we have put all the events into Kafka. Now what happens?
So, there are a bunch of ways you can implement Analytics.
One very simple approach is - you can dump all of this data into a Hadoop and then build some
Hive queries and some kind of queries on top of it, which will generate you aggregate results.
Alternatively, you could also have a Spark Steaming job running, which comes up every 10 minutes, let's say
and takes all the data in last 10 minutes and does some aggregate analysis on top of it,
saying which URL was it, how many times, which geographies people came in, how many times,
something of that sort and dumps the aggregate information into a data store,
which can then be used to power various kinds of analytics that user can see.
So, you could implement it in either way you want, based on the conversation with your interviewer.
So yeah!! I think this should be it for a URL shortening system.