"Reading Back" Technology Purchases

scottalanmiller

It is pretty common for a consultant, and by extension someone in a technical community like this one as we are all volunteer consultants in a way, to be put into a position of having to "read back" decisions or objectives that someone had at the time that they purchased a system due to a lack of other information. It is extremely common to be brought into a situation such as all hardware having already been purchased but without clear guidelines as to the reasoning behind the purchase. In some cases we might know how the system was intended to be used so we can "read back" the business goals. In others we may not even know how the equipment was meant to be deployed and we might have to "read back" even more.

This is a forensic practice, of course, and not science. And it is difficult to read back a goal versus a compromise. To a large degree we are forced to guess, empathize, compare to other organizations, etc. But there is almost always a large amount to be learned from "reading back" purchases.

scottalanmiller

Example case:

Customer or poster has purchased two server nodes, two switches and a low end SAN along with a hypervisor license with extra money spent on the high availability licensing option.

Bring brought in "after the fact" to determine what the purchaser was thinking is tough but there are things that we can figure out. For example:

By adding up the cost of the gear and some assumptions about support agreements we can come to a rough expense for the system even in an absence of receipts. This gives us a rough figure that we can assume was deemed "acceptable" by budgetary standards. We don't know if more money could have been spent, we don't why so much was spent, but we know that someone approved spending this much. This might seem obvious, but it is easily overlooked.
By looking at the storage we can determine the storage capacity that was deemed to be acceptable.
We can work out the IOPS that were considered to be enough for performance.
By looking at the architecture, the dependency chain, the failure domains and the resulting risk we can work out the reliability needs and risk aversion of the system.

There is much that can be learned from looking at what decisions have come before. This can be extremely handy. In our example case we might not know how much risk aversion there is, but we can be sure that if we propose an alternative that if we beat this one, that would be not just enough, but better. If we lower the cost, that isn't just cheap enough, it's better. So on and so forth.

Very often when looking at solutions people forget to look at just how much can be determined from previous decisions and don't take advantage of the knowledge to which they have access.

scottalanmiller

The most common place that I see this is in risk. A system will be purchased that is at standard availability or lower, very often low availability. But the IT team or others will constantly feel like they need high availability solutions - yet their existing purchases or existing design decisions dictate that that has not been the case (this can, of course, change over time but assuming recent decisions.)

One of the best places to see this is in the example above. In an architecture like that you can always improve the design by removing one server, both switches and the SAN going from five components to just one. This saves money and improves reliability. That are two pure wins. The single server approach always beats the design shown.

Yet many people will argue that a single server is not reliable enough.... at the same time that they have demonstrated that seconds before they considered something less reliable to be more than adequate! It cannot exist both ways. Either the original design was inadequate and was unacceptable or it was at least "good enough." In both cases the single server approach is better. It the first case we don't have enough information to know if it is better enough to meet needs, but assuming the original design was at least adequate then we know for sure that the single server design it's just better than the alternative, but that it is also better than needed or better than good enough.

scottalanmiller

This can be broken down into a simple logic example.

If solution A is good enough and solution B is better than solution A, then solution B is better than good enough.

scottalanmiller

Reading back is a very important to tool to use as very often it is all that we have to fill in critical knowledge about an infrastructure or system design. It is often very difficult to obtain the history of a design decision and so we need to look at the decision itself to determine what the design needs are.

dafyre

It is also highly likely that the people you are doing the read back on (for?) do not actually think the same as you do. For instance, if it were me, I would skip the SAN and build two servers with enough storage and RAM to handle everything and then replicate or use StarWind, and a good backup solution.

While somebody like @scottalanmiller would simply, most likely skip the StarWind solution as it adds extra complexity that may likely not be needed in a lot of situations.

scottalanmiller

@dafyre said:

It is also highly likely that the people you are doing the read back on (for?) do not actually think the same as you do. For instance, if it were me, I would skip the SAN and build two servers with enough storage and RAM to handle everything and then replicate or use StarWind, and a good backup solution.

While somebody like @scottalanmiller would simply, most likely skip the StarWind solution as it adds extra complexity that may likely not be needed in a lot of situations.

Well what I would say there is that when reading back, or ever doing a determination like this, you go to the minimum. There is nothing in their solution that suggests that anything more than a single server is warranted. Going to a dual server solution is not suggested by the previous design in any way. Going past what we have learned from "reading back" is dangerous because we move from interpreting their motives from their decisions or their tolerances from their decisions to pushing our own motives or needs onto them.

A two server solution with full fault tolerance is very often a great solution. But it doesn't apply in all cases and you should never use it, no matter what business you are, unless the cost of the fault tolerance is justified by the combination of risk and risk mitigation. It's not a solution that should ever be "jumped to", but it is certainly a commonly good choice.

In the example given, we know that they didn't see high availability as a requirement as they did not have it in their design (and were, in fact, below standard availability.) So all that we know for sure is that a single server design, delivering every feature that they had plus moving them up to standard availability while reducing cost is a guaranteed win relative to what they had.

We have no means to know if high availability or a second server would be financially warranted for them as we don't know their loss numbers. But we do know that they did not feel that they needed it before. So it isn't that we know that they shouldn't have your design, but we do know that it is not needed.

Dashrender

I disagree that they might not have needed it before.
Why do I feel this way? Because just because a solution has been purchased/installed doesn't mean that a real IT person ever vetted the solution to ensure the desired effect.

Lets take a smallish company where the office admin doubles for it support. This person has figured out that their down time warrants spending the dollars to buy an inverted pyramid, but this person isn't really an IT person and BELIEVE they have designed a highly available solution if for no other reason than they listened to a salesperson.

5 years go by they have no major problems but the system is getting long in the tooth so they decide it's time for an upgrade.

Now if Scott walked into this place he'd tell them they clearly didn't care about HA because of their previous decisions, when this was clearly not the case. Of course on this upgrade they find themselves lucky enough to have someone who is doing real research on a good replacement solution. Explaining why the customer really was more lucky than not with their old solution and show why the new, much less complicated design is better will hopefully make this a better situation.

Dashrender

What I'm not really sure how to handle, at least well is the presented problem in the thread this is based on. Assuming it's true that the person setting it up was brought in after the purchases were made. What should he do? Should he walk away from the job, tell the owners the full extent of the problems offer the correct (best correct) option? If they approve, problem solved, though they just as easily fire him or demand that he only use the parts they have.

Assuming he's a consultant in this case with my own reputation on the line, it's not really a clear choice. Considering what we already know, it not outside the realm of possibilities that if he quits the job, or something goes horribly wrong afterwards that the owners wont trash him.

Sadly they owner's opininion will probably carry a lot of weight in his own circles, and only if the consultant already has a fairly solid rep, will he weather that storm.

scottalanmiller

@Dashrender said:

I disagree that they might not have needed it before.
Why do I feel this way? Because just because a solution has been purchased/installed doesn't mean that a real IT person ever vetted the solution to ensure the desired effect.

One would assume that none did. Although it is important to note that IT knowledge or skill is not necessary to understand the issues with an inverted pyramid of doom, which is the example case. Basic business, logic or engineering knowledge would all make it very obvious that there is no redundancy, there is a single point of failure, that everything depends on the most fragile part, that there is a dependency chain, etc. If anything IT knowledge makes us realize that this design is so common that we often forget to stop and question it. It's the pressure of the IT industry to not expose our own past mistakes or the mistakes of others that does the most to make this acceptable socially within IT. In theory, non-IT people will not experience this effect.

scottalanmiller

@Dashrender said:

Lets take a smallish company where the office admin doubles for it support. This person has figured out that their down time warrants spending the dollars to buy an inverted pyramid, but this person isn't really an IT person and BELIEVE they have designed a highly available solution if for no other reason than they listened to a salesperson.

Very possible. Although this tells us that the company did not feel that the system in question was important enough to get reviewed by someone who understands it. So that along tells us much.

scottalanmiller

@Dashrender said:

What I'm not really sure how to handle, at least well is the presented problem in the thread this is based on. Assuming it's true that the person setting it up was brought in after the purchases were made. What should he do? Should he walk away from the job, tell the owners the full extent of the problems offer the correct (best correct) option? If they approve, problem solved, though they just as easily fire him or demand that he only use the parts they have.

Assuming he's a consultant in this case with my own reputation on the line, it's not really a clear choice. Considering what we already know, it not outside the realm of possibilities that if he quits the job, or something goes horribly wrong afterwards that the owners wont trash him.

Sadly they owner's opininion will probably carry a lot of weight in his own circles, and only if the consultant already has a fairly solid rep, will he weather that storm.

This is a bigger problem in SMB IT - reputation is often based completely on the information of people who have no ability to judge your skill, integrity or the results of your actions. This is something that needs to be addressed in a completely different way and is very much outside of the scope of this thread but certainly worthy of its own. But in a system where we don't often work with our peers, how does reputation function?

Dashrender

@scottalanmiller said:

One would assume that none did. Although it is important to note that IT knowledge or skill is not necessary to understand the issues with an inverted pyramid of doom, which is the example case. Basic business, logic or engineering knowledge would all make it very obvious that there is no redundancy, there is a single point of failure, that everything depends on the most fragile part, that there is a dependency chain, etc. If anything IT knowledge makes us realize that this design is so common that we often forget to stop and question it. It's the pressure of the IT industry to not expose our own past mistakes or the mistakes of others that does the most to make this acceptable socially within IT. In theory, non-IT people will not experience this effect.

I'm not sure I agree with this either. You think that non-IT people are more willing to admit their mistakes than IT people?
There is a problem on both sides of the isle - From what I've seen in SMB management hires IT professionals - probably most of which aren't as thorough as you. I know that I'm about 5% closer to your level since I joined SW 6+ years ago. But most don't think in the terms you do, is this a failing of IT, nah I think it's human nature or the human norm.

Dashrender

@scottalanmiller said:

@Dashrender said:

Lets take a smallish company where the office admin doubles for it support. This person has figured out that their down time warrants spending the dollars to buy an inverted pyramid, but this person isn't really an IT person and BELIEVE they have designed a highly available solution if for no other reason than they listened to a salesperson.

Very possible. Although this tells us that the company did not feel that the system in question was important enough to get reviewed by someone who understands it. So that along tells us much.

Sure, but this boils down to the 'You don't know what you don't know' situation.

Thinking critically isn't a normal thought process today. Those who do generally find themselves in very good positions in life.

scottalanmiller

@Dashrender said:

I'm not sure I agree with this either. You think that non-IT people are more willing to admit their mistakes than IT people?

No, there is no "admitting" in the given scenarios in this thread because this is about "reading back" as an alternative to having access to the source information.

What I was saying is that non-IT people can see the flaws in the inverted pyramid probably more easily that IT people are because they don't have the constant barrage of people misusing redundancy as a proxy term for reliability to hide the flaws or industry pressure to socially accept this bad design as pointing it out calls out easily more than half of all firms!

scottalanmiller

@Dashrender said:

Sure, but this boils down to the 'You don't know what you don't know' situation.

Maybe, but you do know if you are looking for advice and engineering insight or not. If not, you have to assume that the priority was not that high. If you want a house built and feel that the design is important we all know that you high a trained architect. If not and you just want to be cheap, you just buy a book of blueprints and hope for the best. But we know what we don't know.