SQL, NoSQL, Graph: A Commentary on Databases
Understand why there are new kinds of databases coming out
When I started my programming journey, I learned many languages like Python and web frameworks like Django quite in-depth. I didn’t care much about databases and how they store information or why they have certain design patterns.
But over the period of time, I have realized that it’s quite important to actually understand what’s going on behind the scenes.
When I ask people why they use certain database systems at their workplace, they don’t have much clue. Often these decisions are taken by higher-ups.
The objective of today’s newsletter is to educate you on how various database systems differ in terms of their purpose of use.
🧠 Food for thought! Why do we have to limit ourselves when dealing with database systems?
They are quite complex systems, I agree, but in a broader perspective they are just systems to store your data, right? You just need the most efficient way to store and retrieve your data. Then why limit yourselves to relational DB or a document DB?
Is every application required to use only one type of database? Are questions like “Which is the best database system in 2022?” even sensible to ask?
Think about it! Come back to this once you have read the whole article
In one of my previous issues, I talked about a few practices to optimize your database queries. But that’s just scratching the surface. It didn’t talk about database design patterns.
Why different databases? 🤔
From time to time, our data requirements evolve. Earlier the data wasn’t huge. The value of collecting data wasn’t realized then. Also, there wasn’t enough technology to capture this data either.
But over the period of time data evolved and grew in size. We had organized databases in the beginning like SQL, later on we had databases like MongoDB to store messy and inconsistent data. And in today’s connected world, we also have graph DBs which are good to model data that is intricately connected with each other.
Basically, this evolution shows that you need to optimize and choose the storage engine or database type based on what you want to do with the data (think OLAP vs OLTP 😉)
A Commentary on Document and Relational Databases 💭
Relational Databases
Relational DBs were the dominant player in the database world for a very long time. It successfully defended itself against others and was a de-facto for most applications, so much so that it would be the first and only database technology everyone would learn in their tech career.
What are relational DBs?
Relational databases would arrange your data in columns and rows called tables.
If there are more types of data to be added then you can split it into multiple tables and then link them through “keys”. Keys are identifiers used to reference to a particular row in another table.
This way of organizing data is what we call “referential integrity”. Because the data is not edited everywhere, but only once where it’s referenced. And since it’s fetched from the referenced area, you need to update it only at one place
Relational databases are not so easy to scale. I am not saying they can’t. Facebook uses relational databases. So they can scale. But it’s a bit challenging for the architects to do it.
Here’s a case study of Notion and how they sharded their databases as they expanded.
Relational databases are not very good where the amount of data is varying. Let’s take the example of a model where you need to store details of a candidate applying for a job.
A candidate might have worked at 10 different companies or maybe just 1. Or they could have gone to 2 schools and 0 universities but might have done great courses on Udemy. Such data is not quite ideal to have a schema attached to it, hence not so suitable for relational databases.
Also, if the data has too many types then it’s not ideal to create tables for each type since joins will induce their own write and read overheads.
Having a schema can be limiting. Relational DBs have constraints tied to them. For e.g. if two tables have a foreign key relationship, deleting a row in one table will delete the associated row in the related table.
So it’s useful in cases where the data needs to be rigid, but you will have to take care of such corner cases.
Multilevel joins can slow down your queries. When you query multiple tables for information, you create joins. When you have multi-level joins like this, it becomes slower and slower to query. If your APIs are getting slower, check the SQL queries and you may understand why.
So when will you choose relational databases?
If you want absolute consistency in your data,
You know that the data requirements won’t change too much in the future
You know that you will be able to scale it when the time comes
Document Databases
Document DBs were introduced to overcome some of these limitations that relational DBs had.
What are Document DBs?
Document DBs store information in JSON-like documents. Each row in relational DBs is akin to a document in document DBs. The most popular document DB is MongoDB. It’s quite handy in storing dynamic data.
It’s also quite similar to the kind of data your application works with. So there’s no requirement of an ORM(object-relational mapping). Because of this, the queries are fast!
If you are dealing with a large scale of data that can be quite dynamic in nature then document DBs are your go-to solution.
Another advantage of a document DB is that the queries are faster if you want to access the entire document, as compared to relational DB where you have to join multiple tables to access the same data.
But document DBs have their own issues.
First of all, it does not have support for joins which becomes a drawback while combining data.
In short, if the “connectedness” of data is high then it’s complicated to use document DBs.
Document DBs are good when it comes to one to many relations, but not when it’s many to one or many to many. Normalizing data in a document model is also not easy.
So when will you choose document databases?
If your data is going to be messy and inconsistent and maintaining a schema will be a pain to manage
If your use-case involves accessing entire documents of data
If your data points are self-contained and don’t require interacting with other types of data too much.
Graph databases are kinda amazing too! 🎉
In the age of social media, some use cases may not be optimum with traditional database approaches.
The connectedness of data has increased many folds. The number of parameters and data points (such as price, product color) associated with each entity(such as a product on an e-commerce site) has gone up.
Relational DBs are good for many to many but not if there are too many of them.
That’s where graph databases come in. They are pretty cool, check it out
Let’s take a scenario! 🖌
So, I want to query the following:
Fetch me all the friends of Rahul who are married and living in Delhi but also are a Python developer.
The horror of writing a SQL query for this would be unimaginable, and I am not talking about time to query yet.
This is where Graph databases shine.
In a graph database, you can answer this question if there are paths connecting them.
In my opinion, graph databases are simpler to visualize and easier to understand. They are basically nodes storing data and properties and edges showing the relation between the nodes.
Takeaway
There’s no one size fits all solution. If history is any evidence then there’s gonna be revolutionary ways of storing data yet to be discovered.
But the major takeaway is that you should deeply understand what data is your application storing and how it’s retrieving it. If you feel some kind of data in your application is supposed to be document type then having two databases in the same application is not wrong either. Like a hybrid solution!
Many big companies follow this practice.
This week I would like you to take a look at your data and ask, does if it fits well in your database?
If you liked my newsletter, please share it with your friends! It would mean the world to me
Connect with me!
You can drop me a “Hi” on Twitter ! Let’s talk tech, my DMs are open for you 🤓
If you need help with your service architecture, you can also email me 💌 : dennysam14@gmail.com
Thank you for your wonderful insights, Sir!