Tonight's Platform Update

scottalanmiller

I'm a bit lost on Jason's question - at least with regards for the DB farm part. Unless you're going to put the DB on dedicated hardware/cluster the boxes would all be virtualized, right? So assuming you are providing the correct resources to each VM would it make any difference if it's on a single host vs a cluster?

Standard design is to have the database layer on its own VMs without anything on them except the database code. So generally at least three MongoDB Shard servers.

Then you would have a layer of application servers. Node + NodeBB in this case. That run nothing but that.

Then a layer of NGinx reverse proxies that do nothing but that in front of them.

Then a load balancer pair sitting in front of that that spreads the load out to them.

Given this design, you can scale any layer as needed to handle load.

scottalanmiller

@Dashrender said:

At what point do you split the pieces out for performance?

When you out scale what you can do in a single box. Because a single box has performance advantages, quite large ones, but when you go beyond its practical limits you generally will grow best by having a full tiered approach.

scottalanmiller

@Dashrender said:

You can build some pretty beefcake servers giving the VMs some outrageous resources.

Not with Rackspace or some others. We are already pushing the capacity of Rackspace's largest appropriate VM type. Linode lets us go much larger, so we have a lot of breathing room, but we are already at the limits of "common" cloud nodes.

coliver

@scottalanmiller said:

@Dashrender said:

You can build some pretty beefcake servers giving the VMs some outrageous resources.

Not with Rackspace or some others. We are already pushing the capacity of Rackspace's largest appropriate VM type. Linode lets us go much larger, so we have a lot of breathing room, but we are already at the limits of "common" cloud nodes.

So where do you go from there? Once you out grown the current capacity?

scottalanmiller

@coliver said:

@scottalanmiller said:

@Dashrender said:

You can build some pretty beefcake servers giving the VMs some outrageous resources.

Not with Rackspace or some others. We are already pushing the capacity of Rackspace's largest appropriate VM type. Linode lets us go much larger, so we have a lot of breathing room, but we are already at the limits of "common" cloud nodes.

So where do you go from there? Once you out grown the current capacity?

That will depend what we have available to us at the time that we hit that scale. We can go to the full split as I described above, which we are designed to do. So that would actually be quite easy. But likely, before that, we will do a full split geographically with a full stack coming up in London and one in Singapore or Hong Kong to offload regional load in that way. That will let the database shards do the heavy lifting while the physical location of the nodes would provide improved latency for people in those regions.

Dashrender

@scottalanmiller said:

@Dashrender said:

You can build some pretty beefcake servers giving the VMs some outrageous resources.

Not with Rackspace or some others. We are already pushing the capacity of Rackspace's largest appropriate VM type. Linode lets us go much larger, so we have a lot of breathing room, but we are already at the limits of "common" cloud nodes.

Sure, at what point does a colo make more sense than rending VM space from these vendors? Of course the problem with that is a single server, single point of failure.

scottalanmiller

@Dashrender said:

Sure, at what point does a colo make more sense than rending VM space from these vendors? Of course the problem with that is a single server, single point of failure.

When we start getting to the point that having 4-6 physical CPUs and 96GB+ of RAM are needed per site for performance. That is a long, long way off. The platform that we use is so efficient that we are handling close to 100,000 views a day and were only just starting to run into memory constraints on the old system and mostly because the system has grown over the last two years to have so much content that keeping stuff in memory was bogging things down.

The leap that we have made likely will carry us for more than another two years, every aspect of the system is 300% or more faster or bigger than we had before. We have 300% more cores now, but also 15% more speed per core, that adds up. That's like a 345% total speed increase. That is a lot. And that wasn't our bottleneck. And we have heard numbers up to 300% increase from the database update. And 300% faster disk IO!!! These things all add up. The CPU waits on the disks less, the database requires less of all of the resources, more things are cached in memory - if we were capping out at 100,000 views a day (before people could notice some minor lag) we are guessing that the new system can handle a million or more.

The amount of growth that we are prepared to handle with zero to trivial effort is pretty enormous. That we will need to consider anything else for a very long time is unlikely.

Chances are in two years we will want to revisit the architecture and, at that time, there is a very good chance that getting a 100% boost in memory size will be obvious and simple, that per core CPU speeds will have increased, etc. The platform naturally gets faster underneath us in many ways. So the need for moving to a completely different approach is much farther off than you would think. Because of the type of site that we are, the scale that can be handled from the current design is pretty extreme.

scottalanmiller

@Dashrender said:

Of course the problem with that is a single server, single point of failure.

With good database backups, we could mitigate that pretty easily. MongoDB does an export, compress, transport, decompress, restore with blinding speed. We just did it last night and it was insane how fast the entire site was able to be transported around.

Dashrender

@scottalanmiller said:

Chances are in two years we will want to revisit the architecture and, at that time, there is a very good chance that getting a 100% boost in memory size will be obvious and simple, that per core CPU speeds will have increased, etc. The platform naturally gets faster underneath us in many ways. So the need for moving to a completely different approach is much farther off than you would think. Because of the type of site that we are, the scale that can be handled from the current design is pretty extreme.

Actually, this is exactly what I would expect. As the platform under the VMs becomes better as old hardware is replaced, I would expect there to be less need to migrate to something new. Sure you might need to assign more RAM, something that the underlying hardware might have more of now because it was upgraded, but the VM won't because, well that's not how that works. But when the CPU and disk are improved, the VM just gets those gains because everything on the system get those gains - not taking tiered storage into account.

scottalanmiller

@Dashrender said:

@scottalanmiller said:

Chances are in two years we will want to revisit the architecture and, at that time, there is a very good chance that getting a 100% boost in memory size will be obvious and simple, that per core CPU speeds will have increased, etc. The platform naturally gets faster underneath us in many ways. So the need for moving to a completely different approach is much farther off than you would think. Because of the type of site that we are, the scale that can be handled from the current design is pretty extreme.

Actually, this is exactly what I would expect. As the platform under the VMs becomes better as old hardware is replaced, I would expect there to be less need to migrate to something new. Sure you might need to assign more RAM, something that the underlying hardware might have more of now because it was upgraded, but the VM won't because, well that's not how that works. But when the CPU and disk are improved, the VM just gets those gains because everything on the system get those gains - not taking tiered storage into account.

Yup, and realistically we grow at a rather steady pace not like we get twenty fold increase in a month. So the underlying hardware tends to grow with us pretty steadily.

scottalanmiller

NodeBB 1.0.2 is out. Might as well upgrade It has been a day of upheaval already.

scottalanmiller

That was ridiculously fast, lol.

scottalanmiller

We did a full update in under a minute!!

nadnerB

Huzzah! What did it fix/change?

scottalanmiller

@nadnerB said:

Huzzah! What did it fix/change?

No idea

dafyre

@scottalanmiller said:

@nadnerB said:

Huzzah! What did it fix/change?

No idea

There's this thing called Changelog... Generally one should read it before applying said updates...

0_1458616061260_upload-98bf6ccd-584f-4055-99f0-40a07b2728dd

scottalanmiller

A month in and Linode is still rocking it!