Where the SAM-SD Concept Originated
-
The SAM-SD concept, the idea of enterprise class open storage, really started for me when working on Wall Street around late 2006 when faced with the need to provide large scale, high performance network filesystem storage to a massive compute cluster, and we found that nothing currently on the market was meeting our needs.
At the time, our decision support cluster was ten thousand nodes, each with either two or four processors -that means between twenty and forty thousand physical processors and even more cores; between forty thousand and one hundred and sixty thousand cores in a single unified cluster in 2006, the largest in the financial industry. Each node had only enough local storage to use for the local operating system as well as scratch space for the calculations. Everything else was stored centrally, via NFS filers. During testing, before the cluster was in production, the existing filers were adequate, if not performant. The test system used a series of NAS devices from the largest NAS vendor of the day, using NFS v3.
As we moved towards production it became more and more apparent that the traditional commercial NAS devices that we had were not up to the task. We brought in the NAS vendor and, not too surprising, their recommendation was to buy more NAS devices and split the load while moving to their largest appliance models, at a whopping half million dollars per NAS device. This seemed foolish as the cost was ridiculously high given how many would be needed, how cumbersome it would be to manage so many and we did not feel that their smaller quarter million dollar devices were performing adequately, so rewarding them with huge profits for performing badly did not make any sense.
The engineering team proposed that the throughput needed was easily obtainable using our traditional server hardware and local disks, we could not see why the NAS had been brought in at all (it predated the cluster engineering team being formed and was likely provided as a knee jerk reaction by the storage team as the filers were a stock answer to a generic file sharing question.) We proposed to management that we could build our own storage and solve the problem for a fraction of the cost just using common sense design. It was, in fact, so obvious to us probably because so much of the team was so young, we simply didn’t have the moment of the industry telling us that buying storage appliances was the only possible answer, so our initial reaction to needing storage, being a UNIX engineering team by background, was to build UNIX file sharing systems. We were not opposed to appliances, but were confused why expensive appliances were being used that were not performing as well as we would expect normal servers to perform. Of course we had the benefit of being young engineers in the era when the Sun Thumper was only just released and were some of the very few who had actually touched them and we were already experienced on ZFS at the time. To us, open storage was not just an obvious answer but also the hot topic of the day.
After some discussion, we were given the hardware to do a real test. This turned into one of the more important testing projects of my career as we were not just tasked with building a test system and seeing if it would work, but the team had to produce not just full results to a test, but go head to head with the world’s largest file storage vendor bringing in their top hardware and their own engineers to ensure that it operated as well as possible. We also had to produce prediction papers stating why we felt our system could compete and how it would do it. The desire to ensure that the right decisions were being made and the right understanding of why we felt we could compete with the “best in the business” even led to bringing the Sun ZFS team in themselves to consult on the project even though no Sun hardware or software (we had requested it but were denied) would be used. Sun was excited about the project and their engineers were very sure that our designs were significantly more powerful than what the appliance vendor could muster.
One of the key factors in the tests that we would run was that the commercial appliances against which we were competing were not just failing to perform adequately but would actually crash and restart when reaching their upper limits. Through careful testing we were able to identify their in use kernel through this failure characteristic and hypothesized where CPU and RAM considerations, along with their choice of operating system underpinnings, was leading to performance and crashing issues. We knew to take a different approach. The appliance vendor was limited to working with appliance designs that they sold, while working with open storage designs we were free to configure our servers as necessary to meet the needs of the environment. We had far more flexibility to tune because we were not limited by hardware or software, not even the operating system.
The investment bank provided us a three thousand node testing harness for demonstrating the potential of the two storage devices. In the end, the vendor had a $500,000 NAS appliance and a team of dedicated NAS specialists to ensure it ran as optimally as possible. We had a team of three engineers and a $20,000 investment into our open storage solution.
Our solution was an HP Proliant DL585 G2 4U server with four AMD Opteron dual core processors, 32GB of RAM and loaded with eight 15K SAS drives in hardware RAID 10 using the SmartArray P400 hardware RAID solution with 256MB of battery backed cache. Our operating system of choice was Red Hat’s RHEL 4, which was cutting edge at the time. Today, these specifications sound anæmic at best, but at the the time this was a mammoth server with vastly more computational power and speed and far more memory than any storage appliance on the market, by no small degree. The 15K SAS drives were as fast as we could obtain, the RAID card the biggest and fastest available and the system RAM far exceeded anything expected to be used for storage.
In the end our predictions were correct, the extra disks and special software of the NAS appliance could not make up for its lack of CPU, RAM and its kernel behaviour under extreme load. Even the much larger device crippled under the testing harness, falling over completely and being unable to complete the test. Our performance graphs from the real world test turned out exactly as predicted. Our own solution performed so well under the same testing harness that not only did the system not fail and managed to complete the entire test, but the performance graphs produced from the test showed that the device would have been able to scale larger than the three thousand node cluster and were not beginning to show signs of load degradation. The test that we had run, while killing the appliance, was not large enough to gauge the upper limits of the open storage solution!
The engineering test showed us that the power and flexibility of the open storage solution, at one twenty fifth of the cost of the appliance, had proven to be the factors that mattered. Our proposal lowered the cost of the full scale cluster by many millions of dollars while providing a solution that was far more flexible and performant and would allow the cluster to perform better. Our solution also had the benefit of matching much of the technology in the compute cluster so unlike the appliance solution that introduced a need for additional skills and technology to the cluster, our open solution kept the technology uniform throughout lowering support costs.
This experience was extremely influential to the creation of the SAM-SD concept and continues to server of a great example that just because products are expensive does not imply that there is going to be added value in all cases. It was the open nature of our solution that allowed it to be so cost effective and flexible enough to meet the specific needs of the cluster.
Just six months later, HP would release this design as the HP Proliant DL585 G2 Storage Server, but modified and using Windows Storage Server 2003 rather than Red Hat Enterprise Linux. We sometimes refer to this early design as the SAM-SD Zero.
Originally on the SAM-SD Blog.