Using In-Memory Databases, Anyone?



  • I have been listening to the Datanauts podcast quite a bit lately. One of the most interesting episodes was on on in-memory databases (http://packetpushers.net/podcast/podcasts/datanauts-085-understanding-memory-databases/). They specifically talked about Apache Geode on the show, mentioning that a 110 second transaction on an in-memory database was the equivalent to 115 days on spinning disk and 10 hours on SSDs.

    I have a few questions:

    • Has anyone seen this used in the wild?

    • Are the techniques for running databases in memory that Geode uses similar to those that SQL Server utilizes for tables you can select to run in memory? I have never had the license to turn on that specific SQL feature but always wondered about it.

    • They actually point out in the podcast that they had the best success with in-memory databases on local storage or SAN (NAS not a good choice) and that ethernet was the networking avenue of choice.

    I would love to hear anyone's thoughts on this topic because I find it fascinating.



  • @networknerd currently all sql dbs do in memory cache AFAIK therefore if your dataset is not really sparse you should be served well



  • @networknerd said in Using In-Memory Databases, Anyone?:

    • They actually point out in the podcast that they had the best success with in-memory databases on local storage or SAN (NAS not a good choice) and that ethernet was the networking avenue of choice.

    Not sure what this means... not that I know about this, but, if its in memory, its in... memory, right? So, what does it have to do with local storage, SAN, or NAS? 😕

    Anyway, other databases load what is used frequently in to memory so its quick to access. Its why lots of places would build an SQL server with a ton of RAM - so that SQL can put what is used often in to RAM (fast) and keep it available... right?



  • @networknerd said in Using In-Memory Databases, Anyone?:

    • They actually point out in the podcast that they had the best success with in-memory databases on local storage or SAN (NAS not a good choice) and that ethernet was the networking avenue of choice.

    Wha? In-memory database changing performance based on storage? IE: Performance changing because of something that is supposed to be taken out of the performance equation.

    I've not really delved into the in-memory databases and how they work differently than MySQL or MSSQL. Still, seems to miss the point to my admittedly limited knowledge on the subject.



  • @matteo-nunziati said in Using In-Memory Databases, Anyone?:

    @networknerd currently all sql dbs do in memory cache AFAIK therefore if your dataset is not really sparse you should be served well

    this



  • @jimmy9008 said in Using In-Memory Databases, Anyone?:

    @networknerd said in Using In-Memory Databases, Anyone?:

    • They actually point out in the podcast that they had the best success with in-memory databases on local storage or SAN (NAS not a good choice) and that ethernet was the networking avenue of choice.

    Not sure what this means... not that I know about this, but, if its in memory, its in... memory, right? So, what does it have to do with local storage, SAN, or NAS? 😕

    Anyway, other databases load what is used frequently in to memory so its quick to access. Its why lots of places would build an SQL server with a ton of RAM - so that SQL can put what is used often in to RAM (fast) and keep it available... right?

    Well, at some point you do have to write the data to disk (whether synchronously or asynchronously) to keep a less volatile copy of the data (at least that is what I remember from the podcast). It seems like synchronous data writes would be near impossible because you cannot keep it going at the same speed.

    I definitely do not claim to understand it all really well.



  • Yes, it is standard for any high performance application in modern times to use an in memory database for performance of at least part of its workload. Sodium is planning on this as part of its design.

    The most commonly known database for this is Redis.



  • Caching a traditional database to memory and having an in memory database do not behave the same. A pure in memory database has advantages to structure that databases writing to disk do not. There is no need to structure the data in preparation for writing to disk.



  • @quixoticjustin said in Using In-Memory Databases, Anyone?:

    Caching a traditional database to memory and having an in memory database do not behave the same. A pure in memory database has advantages to structure that databases writing to disk do not. There is no need to structure the data in preparation for writing to disk.

    Was wondering when you'd show up on this one.



  • @networknerd said in Using In-Memory Databases, Anyone?:

    @jimmy9008 said in Using In-Memory Databases, Anyone?:

    @networknerd said in Using In-Memory Databases, Anyone?:

    • They actually point out in the podcast that they had the best success with in-memory databases on local storage or SAN (NAS not a good choice) and that ethernet was the networking avenue of choice.

    Not sure what this means... not that I know about this, but, if its in memory, its in... memory, right? So, what does it have to do with local storage, SAN, or NAS? 😕

    Anyway, other databases load what is used frequently in to memory so its quick to access. Its why lots of places would build an SQL server with a ton of RAM - so that SQL can put what is used often in to RAM (fast) and keep it available... right?

    Well, at some point you do have to write the data to disk (whether synchronously or asynchronously) to keep a less volatile copy of the data (at least that is what I remember from the podcast). It seems like synchronous data writes would be near impossible because you cannot keep it going at the same speed.

    I definitely do not claim to understand it all really well.

    Not necessarily. Persistence to disk is not a guaranteed feature of in memory databases. It is common, for sure, but not guaranteed. But you should think of it as taking a backup of the database, not as storing it normally. That's the big difference between the two approaches.



  • @networknerd said in Using In-Memory Databases, Anyone?:

    @jimmy9008 said in Using In-Memory Databases, Anyone?:

    @networknerd said in Using In-Memory Databases, Anyone?:

    • They actually point out in the podcast that they had the best success with in-memory databases on local storage or SAN (NAS not a good choice) and that ethernet was the networking avenue of choice.

    Not sure what this means... not that I know about this, but, if its in memory, its in... memory, right? So, what does it have to do with local storage, SAN, or NAS? 😕

    Anyway, other databases load what is used frequently in to memory so its quick to access. Its why lots of places would build an SQL server with a ton of RAM - so that SQL can put what is used often in to RAM (fast) and keep it available... right?

    Well, at some point you do have to write the data to disk (whether synchronously or asynchronously) to keep a less volatile copy of the data (at least that is what I remember from the podcast). It seems like synchronous data writes would be near impossible because you cannot keep it going at the same speed.

    I definitely do not claim to understand it all really well.

    Sql DBs use various forms of write transaction logs to keep track of operations to do and write data async.
    They shortly write logs of activities to do. These are smaller then actual transactions and are used as sort of journal to keep data integrity. Then they write data on disk async.
    Issue is when ram is exausted and them has to flush ram on disks. At least this is an issue on ms sql. Never saturated ram with postgres.



  • @quixoticjustin said in Using In-Memory Databases, Anyone?:

    @networknerd said in Using In-Memory Databases, Anyone?:

    @jimmy9008 said in Using In-Memory Databases, Anyone?:

    @networknerd said in Using In-Memory Databases, Anyone?:

    • They actually point out in the podcast that they had the best success with in-memory databases on local storage or SAN (NAS not a good choice) and that ethernet was the networking avenue of choice.

    Not sure what this means... not that I know about this, but, if its in memory, its in... memory, right? So, what does it have to do with local storage, SAN, or NAS? 😕

    Anyway, other databases load what is used frequently in to memory so its quick to access. Its why lots of places would build an SQL server with a ton of RAM - so that SQL can put what is used often in to RAM (fast) and keep it available... right?

    Well, at some point you do have to write the data to disk (whether synchronously or asynchronously) to keep a less volatile copy of the data (at least that is what I remember from the podcast). It seems like synchronous data writes would be near impossible because you cannot keep it going at the same speed.

    I definitely do not claim to understand it all really well.

    Not necessarily. Persistence to disk is not a guaranteed feature of in memory databases. It is common, for sure, but not guaranteed. But you should think of it as taking a backup of the database, not as storing it normally. That's the big difference between the two approaches.

    Interesting.



  • @quixoticjeremy said in Using In-Memory Databases, Anyone?:

    @quixoticjustin said in Using In-Memory Databases, Anyone?:

    Caching a traditional database to memory and having an in memory database do not behave the same. A pure in memory database has advantages to structure that databases writing to disk do not. There is no need to structure the data in preparation for writing to disk.

    Was wondering when you'd show up on this one.

    🙂 Finally one I can talk about.

    With Sodium, for example, we use a "traditional" on disk database for the main data (well all data right now.) But we know that as we scale that that will not suffice for performance. Moving to in memory databases for things like session data is a standard model for application acceleration. Redis rules in this particular space, as it is a key value pair in memory cluster aware database system.

    Redis is not Sodium's guaranteed in memory option, but certainly on the short list.



  • @quixoticjustin you mean volatile session data?



  • Some, more or less, standard things about in memory databases...

    1. They require enough memory to load the entire database at "load time." There is no "making do with less memory." It just won't load. It's like a ramdisk for databases, there is no going to disk when you run out of memory, it just dies on you.
    2. They take longer to fire up because they need to get everything from disk to ram before they can do anything.
    3. While they might persist to disk at some point, that's not necessarily part of the database design. That's optional by product.
    4. Data integrity is normally handled by clustering rather than writing to disk. You rely on high availability and serve requests from multiple nodes.


  • @matteo-nunziati said in Using In-Memory Databases, Anyone?:

    @quixoticjustin you mean volatile session data?

    Yes, data that is needed for performance and consistent session behavior but if lost would not have a real impact.



  • One of the company I work with going to move SAP B1 HANA in production till the end of the year. It use a SuSe vm with 64Gb of ram, but I think it can scale to a lot more.



  • We have plans to add an in memory database to ML in the future when more performance is needed, as well.