Placeholder Image

Subtitles section Play video

  • Hello, everyone! In this video, we'll look at how can we design a notification

  • service that is scalable enough.

  • This is never a stand alone system but this is always embedded in some

  • other system design. Even I have used it in a lot of other

  • system designs videos that I have made. Let's say if you are building a

  • e-commerce application or a booking system or anything of that sort.

  • You'll always have a notification service which will be used to notify your

  • consumers.

  • Let's look at how can we build a notification service.

  • Now let's look at some Functional (FRs) and Non Functional Requirements (NFRs)

  • that this platform should support.

  • The very obvious first thing is that it should be able to send notifications.

  • The next thing is that it should be pluggable. Now what does pluggable mean?

  • So let's say we want to support SMS and email as one form of notification. Now, tomorrow somebody

  • might want to say that I want to support in-app notifications as well.

  • So it should be easy enough to add that. It can be further extended to a lot of kinds of notifications.

  • So, for example, you could have something like a WhatsApp notification or anything of that sort.

  • So, extendability is something that should be taken care of.

  • The next thing is that it should be built as a SaaS product. Why SaaS? because the main idea is

  • you should know who's sending what number of notifications.

  • And it should be possible for you to rate limit.

  • There are two use cases where you'll need that.

  • So rate limiting as a offering you'll need, if you are giving out to other companies as a product to use.

  • But, even internally, you would need to do some kind of rate limiting.

  • Let's just say its being used by a company like Amazon.

  • Now there are multiple business verticals.

  • If all of them send you notifications, then you'll end up getting hundreds of notifications in a day.

  • So that's a very bad user experience.

  • So notification system should be able to overall put a rate limit across all the users

  • across all the platforms saying a particular user should not get more than five notifications in a day.

  • There could be certain amount of classifications done.

  • So there are two kinds of notifications.

  • One is a transactional notification saying you have made an order, this much amount of money

  • has been deducted from your account, something of that sort.

  • So transactional notifications are ok to get.

  • But promotional notifications should always have a rate limit.

  • So if you're building for within your company, then you'll build a user level rate limiting.

  • But if you're giving it out externally to other companies as a product, then you might want to put

  • a rate limiting saying how many requests can you make for this server.

  • Or maybe make billing tiers and price them accordingly.

  • So basically this is something that you need to capture at least and then act upon.

  • The next thing is prioritization. Again...

  • we'll support multiple priorities of messages.

  • So the idea of priorities is that some messages are low priority and certain messages are high priority.

  • Let's say, if you are sending a one time password (OTP), then

  • that's a very high priority message. Why? Because if the user wouldn't get that SMS or email or anything,

  • then they would not log into your platform and they'll not do the transaction that they wanted to do.

  • But if it's a promotional message, it is okay if it's low priority and if it gets delayed, if it goes to the

  • user let's say half an hour late also, it doesn't really matter.

  • So we'll have this prioritization, basis which you'll always process the high priority messages first

  • and then the low priority messages.

  • Coming to the non functional requirements (NFRs). This platform should always be available. Why?

  • Because, if you are planning to build it as a SaaS product which can be used

  • by other companies, then down time would really cost us a lot.

  • Plus, it should be built in a way that it's easy enough to add more clients.

  • And whenever we want, we should be able to attribute saying

  • how many clients have made us how many number of requests.

  • Now, let's look at the overall architecture of the whole system. Just a disclaimer to start with,

  • if you are building it for a small enough use case like let's just say if you want to send out some

  • email notifications to some customers, given some criteria,

  • you can build all of this as one deployable service and put all the logic in one place.

  • But if you truly want to build it as a SaaS product wherein enormous number of clients would be

  • using it for a lot of notifications, then this kind of a system would probably be worth it.

  • Let's just look at the starting point.

  • The starting point of the whole system is then a couple of clients i.e. Client 1, Client 2,

  • could be any number of clients, who want to send out a notification.

  • Now, there are 2-3 kinds of requests that they can send you. But all of those requests would come

  • into something called as a Notification service, which is an interface for us to talk to the other

  • teams in the company, other companies or anything that we want.

  • There are two kinds of request mainly. One is where they tell you that I want to send this

  • particular content to this particular user, let's say a email id kind of a thing, as an e-mail.

  • This is one of the requests. Other request could be saying, I have this user id and send them

  • this notification.

  • And you decide how do you want to send, whether as an email or as a SMS or whatever?

  • Normally the first kind of a model would be used by other companies where they want to decide that

  • they need to send as SMS or email or whatever. The second model would generally be used when you

  • are building this to be consumed within your own company.

  • But normally as a SaaS product, its good to have both the interfaces. The idea

  • of notification service is, for most kind of requests,

  • it will just take the request, put it into Kafka and respond back to the client saying, I have taken

  • the request and I will send the notification in a couple of seconds at max.

  • You could make this transaction as a synchronous flow but that will take a bit more time

  • and that will keep the client blocked.

  • So, it's probably best to take the request and dump it into a Kafka and move on.

  • But let's just say if for a very critical scenario, you need it to be a synchronous flow.

  • You can basically make the whole process synchronous through API calls.

  • But assuming its a Kafka for now, let's look at what happens next.

  • This Notification service would just do a very basic set of validations saying that email id should not be

  • null or user id should not be null, content should not be null, something of that sort.

  • There might be a lot of other validations, given certain kinds of events that you want to do.

  • All of those validations

  • would happen in this component which is called Notification Validater and Prioritizer.

  • It does a couple of things. Validations is one part of it.

  • The main thing that it does, it decides the priority for a message.

  • So based on some attribute within the message, let's say, we''ll keep a message type identifier kind

  • of a thing in a request and based on that, it will decide the priority of a message.

  • Normally, OTP is something which will be high priority because if the user can't login into the

  • system, they can't place an order and things like that.

  • The second most high priority thing would be a transactional message

  • saying your order has been placed, it will be delivered by so and so time.

  • The third, least priority one would would a promotional message

  • wherein you are sending out offers and coupons and stuff like that.

  • So, it decides the priority and puts the event into a Kafka topic, specific for each priority.

  • So there'll be a topic for high priority messages, there'll be a different topic for medium priority,

  • there'll be a different topic for low priority.

  • And while consuming, the consumers will first consume the high priority messages

  • and then the medium and low priority.

  • The idea is - we don't want any lag in the high priority messages, but we should be at times okay if

  • there's a spike on low priority message and if it takes time, that's okay.

  • Now comes something called as a Rate Limiter. These two components can be interchanged

  • depending upon the conversation with your interviewer. But I'm keeping Rate Limiter

  • before it because the next component is a slightly heavier component.

  • Rate Limiter does two kinds of rate limiting. 1) It checks that the client who's calling me,

  • is that client allowed to call me these many number of times. That is the first thing.

  • 2) The user whom I'm sending this notification,

  • am I supposed to send this notification to this user these many number of times.

  • Let's go into detail.

  • So there could be a subscription that we have with a certain client

  • that they can call us maybe 10 times in a second.

  • All of those kind of rate limiting, would be one kind of rate limiting.

  • The other thing is there could be a configuration saying I cannot send more than three

  • promotional messages to a particular user in a day. That would be another kind of rate limiting.

  • Both these rate limiting would be implemented in a very similar way.

  • Saying there would be a key.

  • It could be a client id, user id, anything.

  • And whenever you get a request, you basically increment that key into this Redis for a certain timestamp.

  • Or maybe the timestamp need not be a exact timestamp,

  • it could be day, it could be second,

  • it could be at a minute level or any level.

  • The moment you exceed that threshold is when you drop that request.

  • There's one more thing that it does, which is called request counting.

  • Let's just say with some of the clients you have a pay for use kind of a subscription model,

  • saying they can call you any number of times, there is no rate limiting as such

  • but as the number of calls increases, the amount of money we charge them also increases.

  • So they pay at a request basis.

  • For those kind of clients, we'll implement it in a very similar way. Just that we will not restrict the

  • request. We'll keep incrementing the counter and then there can be reporting built on top of it.

  • The next component is something called as a Notification Handler and User Preferences.

  • First let's look at what are user preferences. User might give us a preference saying

  • don't send me SMS, send me emails instead.

  • Or a user could have said that I want to unsubscribe from all your promotional messages.

  • So all of those would be handled by this User Preference Service.

  • It has two components that it talks to. One is a Preferences DB, which is a single source of

  • truth for all the user preferences that we have in our system.

  • This would be mainly used when we are building it for our own use.

  • Let's say Amazon is building this notification system for their own use.

  • This will normally not be used when you're integrating with third parties.

  • Third parties would handle this piece on their end. The second thing it will talk to is a User service

  • which is basically - let's say you got a request saying to this user id 123, send

  • this particular text.

  • What is that user 123? It's something that you can search from user service

  • to get the email id or phone number or anything of that sort, given a user id.

  • This basically could handle one more kind of rate limiting. A user might say that don't send me

  • more than three promotional messages in a week or one promotional message in a week.

  • So, if that kind of a rate limiting also exists and want to support that,

  • then this whole Rate Limiting module would come after this Notification Handler.

  • But, because this is a very light weight thing and this calls a DB, calls an external service,

  • that's the reason I have kept it after the Rate Limiter.

  • But based on your conversation with your interviewer, you could swap either of these.

  • Once we have all of this finalized, we basically have a full fledged system

  • that we can actually use to notify.

  • We have a phone number,

  • we know that we'll send it as an SMS and we have the content or we have the email id,

  • we know we'll send it as an e-mail and we have the content.

  • All of that is put into another Kafka for basically sending it out.

  • Why do we need so many Kafkas?

  • Because let's just say we had a lot of SMSes and this SMS service is not able to handle it.

  • One way is to increase the number of consumers.

  • But if there are certain spikes in certain clients, it's probably not worth to add a hardware for all

  • the time for that.

  • So we might as well put into Kafka and this SMS Handler could handle things at its own pace.

  • There are multiple handlers that sit on top of this Kafka. You could decide to build

  • all of these as one deployable unit, but again, with the SaaS (Software as a Service) thing in mind,

  • I'll rather keep that as a separate service in each of them.

  • So there are multiple handlers. Let's take a couple of examples.

  • So all the SMS requests that will come into this Kafka would be handled by this SMS Handler.

  • Plus, there could be different topics for each of these.

  • So email could have one topic, SMS could have one topic and stuff like that.

  • Now SMS Handler might integrate with multiple vendors for sending SMSes.

  • Let's say you are a global company. You have one vendor that deals with SMSes in Asia,

  • another vendor that deals with SMSes in USA , another vendor for Europe, stuff like that.

  • So you could have multiple vendors that you are integrated with and all the SMS vendor integration

  • could be done through this SMS Handler.

  • Now, let's say if you have... if you are in India and you have 5 vendors that work in India.

  • So what request should go to which vendor?

  • What is the priority?

  • How do we distribute request amongst them?

  • That piece could also be handled within this own handler. Same thing, you have for emails.

  • You have a Email Handler. And by the way, this communication with the handler and vendor

  • could be a sync call.

  • Moving back to Email Handler. Email Handler takes all the e-mail requests and calls the email vendor

  • which will send out the real emails.

  • This email vendor could be a simple SMTP server that you have at your end.

  • Then you will have In-App Handler, which basically handles all the in-app notifications that you want to

  • send out or a push notification, kind of a thing.

  • You can use a Firebase for Android and probably use a Apple push notification service for

  • sending out such notifications on iOS platform.

  • You could also have a IVRS Handler. A classic example would be, I don't know,

  • either Amazon or Flipkart, one of them.

  • When you place a very high value cash on delivery (CoD) order, they actually send you an IVRS call,

  • saying that we have got this order from you, do you, are you really sure you have kept this as a cash on

  • delivery order and you can press 1 or 2 to kind of confirm or reject.

  • So all of those things could be handled by the IVRS Handler.

  • This is a very low throughput scenario and it would happen once in a while when compared to SMS

  • or emails or anything of that sort.

  • You can again scale these independently based on the kind of traffic you have.

  • Now coming to pluggability.

  • Let's say somebody comes and says that you need to add WhatsApp aslo as a notification mechanism.

  • So basically you just need to add a WhatsApp Handler and basically introduce a type called

  • WhatsApp Messaging at this layer and all of that can flow through all the way. This Kafka...

  • this basically Notification Handler will have a new topic for WhatsApp messages and a

  • WhatsApp Handler will basically take care of sending those notifications out.

  • So that's how you can make it fairly pluggable.

  • Now coming to what this Notification Tracker is? You always need to keep a track of what all

  • notifications you have sent. In case somebody sues you or there is some

  • auditing required, you need to know what all communication you have sent out.

  • So for all of that, you have a Notification Tracker which puts all the notifications that you have

  • sent out on its own Cassandra. This is normally a write only thing, which would be read

  • once in a while when there's an audit or something of that sort.

  • Depending upon your traffic and the throughput, you could decide to club some of these components.

  • So for example, you can decide to club all of these components as a single deployable service.

  • You could also club both these components into a single deployable service and

  • you can move the Prioritizer right there and

  • make lesser number of components and try to club all of this as one service as well

  • if it's a very low throughput thing.

  • You can decide it based on your conversation with your interviewer on how do you want to do that?

  • Now, let's do something different.

  • What if you want to send bulk notifications?

  • What if you want to do something like - for all the users who have ordered a TV in last 24 hours,

  • send them a notification like installation service or anything of that sort.

  • Or for all the users who had ordered milk 3 days back, send them a notification that

  • whether do you want to order the same thing again or not?

  • So all of those things come under something called as a bulk notification.

  • Now, how do you want to do that?

  • So the very first thing is there would be a UI kind of a thing, which is represented as Bulk Notification UI,

  • which will talk to something called as a Bulk Notification service, which takes

  • a filter criteria and a notification and sends it out.

  • Filter criteria would be something like find all the users who have placed a milk order,

  • anything in last three days and send them a notification with certain text.

  • How does that work?

  • So there's something called as this User Transaction Data.

  • This basically abstracts out a lot of services which are outside the scope of this notification service

  • but real business functional services.

  • For example, again, taking the example of Amazon, let's say, there are a couple of services that handle

  • their e-commerce orders.

  • There are a couple of services that handle their pantry orders. All of those services would be

  • putting their transactional information - that your order has been placed, it has been delivered,

  • it has been canceled - all of those kind of attributes into various Kafka topics.

  • What we could do is basically build a search functionality on top of those topics.

  • Normally, in most companies, you do have those kind of products already built.

  • But assuming you don't have that, let's try to build that.

  • And if we have that, we can just leverage it. What will happen is - this will put information

  • into a lot of Kafkas, lot of topics. I am abstracting it out as this one Kafka.

  • All of those information would be listened by something called as a Transaction Data Parser.

  • Now, this Transaction Data Parser basically... first of all, parses all of that information.

  • Now, because you have multiple clients

  • and multiple services who are sending data, they could be putting in multiple formats of json or XML.

  • So, for example, pantry (Amazon) could have a separate signature and regular e-commerce

  • could have a separate signature.

  • This data parser basically takes all of those messages, puts it in a format or converts it

  • in a format that it understands and then it puts it into its own data store.

  • It could be Elasticsearch or a MongoDB kind of a structure where you could do

  • complicated aggregations and nested queries.

  • On top of this data store, there's something called as a Query Engine. That Query Engine basically

  • takes a query, which could be like an aggregation plus a filter kind of a thing,

  • saying, find out all the users who are in Bangalore let's say, who have ordered some food item

  • in the last few days, or find all the people who are having some items in their cart but have not

  • placed the order or any kind of a filter criteria that you want to build that would be powered by this

  • Query Engine. It would have its own DSL (Domain Specific Language), which would be a format -

  • how do you kind of structure your query and that would be a signature that it exposes.

  • Now, this Query Engine would query that thing on this data store.

  • So it also understands the schema of this data store and it basically would return us a list of users

  • that match that criteria.

  • Now this Bulk Notification service has a list of users that it has got from Query Engine,

  • it has a message that it got from this UI, it basically call this Notification service saying send

  • this notification to all of these users and these users, it would get from this Query Engine.

  • This Query Engine is not something which would just be used by this Notification service.

  • It could be used by a lot of services.

  • You could have a very simple Rule Engine which possibly says that if a user has cancelled 10

  • consecutive orders in last 3 days, go deactivate their account.

  • They are probably, you know, just doing fraud.

  • Or, there could be a simple Fraud Engine running

  • which has some criteria and takes some action basis that.

  • All these condition and action kind of a thing, wherever you have a condition running

  • and action happening, all of those could be, you know, powered by this kind of a thing.

  • In fact, your Notification service itself is that kind of a thing.

  • So you have a filter criteria as your condition and action is basically the notification sending process.

  • So this could also be a part of your Rule Engine.

  • Something like 5 days after an order has been delivered, send a mail to consumers

  • for giving a product review. So something of that sort.

  • And this could then also be used to build a Search platform altogether.

  • But overall, this is something that and if this is something that is already built in the company,

  • you could just leverage that and

  • have this Bulk Notification service talk to the querying platform of that whole platform.

  • So, yeah, I think that should be it for a Notification service.

  • Thanks for watching.

Hello, everyone! In this video, we'll look at how can we design a notification

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it