When to use SQL or NoSQL
-
Basically you use Relational databases (what people often call SQL because SQL is the standard language for talking to them) when strong relational handling is required or desired. You use non-relational (aka NoSQL) databases anytime that those are not needed. That's pretty much it. Relational databases have much more overhead and complexity so only use them when that is needed
-
Size is not a reason for either. It's just that making the decision to use one or the other matters more as you get big. When a database is tiny, relational overhead is trivial. But you should always write your code well, regardless of size, so you would always pick the right structure for your application's data no matter how big you expect it to be.
As a system admin, relational vs. non-relational is never really one for us to make. This decision exists purely with the developers and application architects.
-
Thanks for the info, it really helps a dev noob like me. I'm writing a web app in a couple months for someone and the past couple web applications I've developed I've done in PHP and mySQL. I really want get some experience with node.js and javascript, so I am hoping to deploy a MEAN stack for my next application and use MongoDB, but I've never touched a noSQL before.
-
Node.js doesn't have the best relational database drivers, so that's one big factor. MongoDB and Node.js have tons and tons of testing together, so they are very natural. NodeBB runs on NoSQL databases and in the case of MangoLassi uses MongoDB which we are loving. Can't wait to upgrade to the 3.x series. As we grow we will be adding Reddis for session data too.
But it really comes down to the needs of the app. If you have financial data, for example, you almost certainly want to be using a relational database. And pretty much that always means PostgreSQL.
NoSQL is a huge category defined only by what it isn't, not by what it is. So the places where you would use Reddis, for example, you would almost never use Cassandra. There are tons of NoSQL choices, far more than relational choices, and you really have to know what they do and what they are for before choosing one. MongoDB is one of the best "well rounded" choices as it does a lot of things well. But if you need really, really light data access in memory, Reddis rocks.
I manage Redis farms, Cassandra farms, MySQL, PostgreSQL, MongoDB and others all of the time
-
We deal with both MySQL and a few different NoSQL databases as well. It really depends on the scenario, but if you don't know, then you can just use a regular relational database.
In general though you don't need a lot of data to justify MongoDB or Cassandra (we use both, though we're slowly transitioning to only Cassandra), in the same way you don't need to be doing relational or financial things to justify MySQL or anything like that.
NoSQL works best for key-value store, as in you already know the information you want, it likely isn't directly related to anything else and so probably doesn't need to be joined (joins tend to be very expensive in NoSQL if supported at all), and you don't need to worry too much about the same level of consistency. So some examples:
- Sessions
- Logs or user activity
- Expensive but rarely rebuilt things from your relational database for permanent caching
There's also the issue of heavy writing, as NoSQL databases tend to be faster at writes than reads, which is why we use it for really write heavy things that either we do not need back right away or we allow the user to keep (when they make a comment, show it based on what they put, instead of getting it back, for example).
Relational databases can be used for all the same things as NoSQL databases and many actually do a fairly good job at it, and you don't really need to worry about the differences until you get hundreds of thousands or millions of requests (like us) per day.
Aside from that MySQL (and others) have been around for years, are highly tested in large environments, etc and anything which is relational, would need to be associated with things in other tables (foreign key scenario for example), dealing with financial information, etc.
When you don't know or don't have the environment or resources to test, then always just use your relational database, it's probably the right choice anyway.
If you are curious, in our environment for our adult content sites, we've got about 630 servers so and we use MySQL, MongoDB, Cassandra, and memcached (and Apache, nginx, and PHP), and our other products and services we use similar setups but usually either MongoDB or Cassandra, and like I said we're trying to transition to it as we've gotten more out of it and better performance for us. I've noticed a lot of issues with MongoDB and returning things in reverse order (even with a reverse index) and also paging this way, it's painfully slow compared to Cassandra (1 - 2 seconds compared to 400ms or so; doesn't seem like a lot, but keep in mind that's on top of other loading, so loading new comments or whatever on some of our older sites is up to 3 seconds, and most people start to feel things are "too slow" after about half a second.), so I suggest maybe not even bothering with MongoDB unless it fits some other scenario of yours.
-
Other things that are ideal for NoSQL include...... discussions in online communities! Think about it - we don't need tight atomic commits for posts, but they include a lot of data. So this type of data is very expensive for a relational database to handle and very efficient for a document database (a la MongoDB) to handle. We don't need the whole thread locked while someone is posting, if two people post at once and there isn't a really tight "who posted first" it doesn't really matter.
-
Blogs would be the same way. Many, many things work well without tight relationships. Accounts and financial data are very common and major exceptions.
-
@scottalanmiller said:
Other things that are ideal for NoSQL include...... discussions in online communities! Think about it - we don't need tight atomic commits for posts, but they include a lot of data. So this type of data is very expensive for a relational database to handle and very efficient for a document database (a la MongoDB) to handle. We don't need the whole thread locked while someone is posting, if two people post at once and there isn't a really tight "who posted first" it doesn't really matter.
Sure, we use NoSQL for video/photo/etc comments and also messaging as well.
-
@scottalanmiller said:
Blogs would be the same way. Many, many things work well without tight relationships. Accounts and financial data are very common and major exceptions.
When it comes to building financial reports it's definitely superior to use relational databases, but I think most things can probably just be in NoSQL databases, but the difference is that databases like MySQL, Microsoft SQL Server, etc come with long, tested histories and a reliability/availability you can't usually get out of a NoSQL database without a ton of effort, but we've noticed Cassandra sort of breaks that mould.
Having said the above though, we do keep transaction histories, payment histories, etc in Cassandra as well, and we simply cache harder to generate things that need to use that information (very rare that things do) and also we track, for example, the amount changes so we don't need to do anything like SUM(10,000 rows) like I've seen people do for some reason.
-
Sure, you can find ways to do things in NoSQL that you do with an RDBMS, but it takes more care and possibly moving some of the data protection into the application layer. RDBMSs are really ideal when those things are needed. Tons of testing, as you mentioned, and tons and tons of research into how to make those transactions efficient.
-
One thing that is interesting is that PostgreSQL now talks in native JSON like MongoDB does. This is a case of the NoSQL world pushing the relational world to change how they think and do new things.
-
@scottalanmiller said:
One thing that is interesting is that PostgreSQL now talks in native JSON like MongoDB does. This is a case of the NoSQL world pushing the relational world to change how they think and do new things.
Definitely, I think that's a pretty kick ass feature.
-
One of the most important things about NoSQL is that they are adding variety and forcing everyone to look at databases in a fresh light. We had a lot of this stuff back in the 1980s, heck I was doing NoSQL for Eastman Kodak in 1989, but everyone on the IT side seemed to have literally forgotten about non-relational databases since the mid-1990s.
-
@scottalanmiller said:
One of the most important things about NoSQL is that they are adding variety and forcing everyone to look at databases in a fresh light. We had a lot of this stuff back in the 1980s, heck I was doing NoSQL for Eastman Kodak in 1989, but everyone on the IT side seemed to have literally forgotten about non-relational databases since the mid-1990s.
Yeah you know when this stuff started taking off, it seemed a little weird to me, because I had dealt with similar databases years ago, and an unnamed large "ISP" I worked for used essentially a NoSQL key-value store for buddies, statuses, etc and this was in the late 90s. People were acting as if nobody had thought of this stuff before.
-
Exactly. I am pretty sure that an entire generation of IT actually doesn't know what databases are, what relational databases are, relational theory, the difference between just a database and a DBMS. Discussions, especially on some other communities, really highlight that databases are a complete mystery. Even IT people are often confused as to what is a database and what is an application, which is mind blowing to me. And people wanting to say that they are "database programmers" - what does that even mean? They certainly don't mean that they are writing databases and I've never heard of anyone, ever, who did nothing but "program" in a database language (a la SQL.) And so many people think that half of IT is DBAs even though none of them have ever met one.
Something so basic and core to our field yet so completely unknown is odd.
-
@scottalanmiller said:
Exactly. I am pretty sure that an entire generation of IT actually doesn't know what databases are, what relational databases are, relational theory, the difference between just a database and a DBMS.
You said it all right there. I see the questions similar to the OP's question fairly often, typically it's "which is better?" and it's sort of odd to me that they didn't first ask themselves "well, does it need to be relational?" instead it's a question of speed, which isn't even applicable for 99% of people. I think this is because databases aren't seen as anything other than places to store things, and their actual power is taken for granted. I blame the MTV!
-
The speed thing, I think, is an SMB problem. Only since being on SW so much have I come to realize the obsession with unmeasureable and often purely theoretical speed improvements that don't matter at all. Tiny, tiny tweaks that cause no end of issues and expense and no one can tell if they even work. At the scope of most SMBs, everything is about the same. You've got to really have something be bad, whether it is disk array, database type, application, protocol (SMB vs. NFS, Block vs. File storage, iSCSI vs. FC), networking you name it, for it to be a problem. SMBs putting 10GigE to the desktop and bonding four devices when they can't push GigE of data is crazy.
-
@scottalanmiller You just don't get it, I need GigE for my ancient refurbished Dell machines with Windows XP.
-
GiE is a must have for VoIP today! And I need QoS on my massive GigE switch or I won't be able to make my 100Kb/s phone calls!
-
@scottalanmiller said:
And people wanting to say that they are "database programmers" - what does that even mean?... And so many people think that half of IT is DBAs even though none of them have ever met one.
During the twenty-or-so contracts I've worked as a software engineer, I've met probably half a dozen DBAs, and only one of them actually wrote SQL for our application. The rest of it was done by developers. I'm sure somewhere there is a person doing nothing but write stored procedures and database functions, but I have yet to meet them.