Friday, May 6, 2016

Quality

On the outside is the irrational world of the users, that has evolved chaotically over generations. It is murky and grey, with a multitude of intertwining special cases. It is ambiguous. It is frequently changing, sometimes growing, other times just cycling round and round again. The users have turned towards the computer to help them with this problem.

On the inside, deep down is the formal logic of a discrete system. It is nearly black and white, pure and it can be viewed with a rational, objective lense. There are plenty of theories, and although there are intrinsic trade-offs such as space vs. time, they can be quantified. If a system is built up well-enough from this base, it is a solution waiting to be applied.

Software is a mapping between these two worlds. It is composed of two distinct pieces: the computer code and the equally important underlying structure of the data that is being collected and manipulated. The quality of a mapping depends on the combination of both code and structure. A good piece of software has found a mapping that is effective; useful. As time progresses that mapping should continue to improve.

Both sides of the mapping are tricky, although the outside also has that extra level of irrationality. This leaves considerable leeway for there to be many different, but essentially equivalent maps that would act as reasonable solutions. There is no perfect mapping, but there are certainly lots of ill fitting ones.

To talk about quality in regards to software, we can assess this mapping. Mapping problems manifest as bugs and they diminish the usability of the software and its collected data. A good mapping needs to be complete, and it should not have gaps nor overlaps. It doesn’t have to solve all of the problems outside, but where it attempts to tackle one it should do it reliably. This means that all of the greyness on the outside needs to be identified, categorized and consistently mapped down to the underlying formal mechanics. Solving only a part of a problem is just creating a new one, which is essentially doubling the amount of work.

Because software development is so slow and time intensive, we also need to consider how the mapping is growing. Having a good initial mapping is nearly useless if the pace of development rapidly degrades it. Mappings are easy to break, in that small imperfections tend to percolate throughout. The longer the problem has been around, the more work it requires to fix it. For code, that is relative to adding more code, but for the structure it is relative to adding more data, so that imperfections are always growing worse even when no active development.

As a map, we can talk about the ‘fit’, and it is in this attribute that we can define quality. An ill fitting map is qualitatively poor, but as noted there are many different maps that would fit with higher quality. If we decomposed the map into submaps and assess their fit, we can get a linear metric for the overall system. The most obvious indicator of not fitting is a bug. That is, a system with 70% quality means that 30% of the code and structure has some form of noticeable deficiency in their usage. If you could decompose the code into 5 pieces and the structure into another 5 (ignoring weights between them), then if 6 of these pieces are complete, while 4 of them are really bad and need to be reworked the quality is 60%. If however, all ten pieces contained problems, then at least from the top-down perspective the overall quality of the system is 0%.

That may seem like an excessively brutal metric, but it is mitigated by the fact that we are only considering problems that directly bother the users, the future developers and the operations people. That is, theses are bugs, not trade-offs. Thus if the system is slow because the code was badly written, that is a bug, but if it is slow because it was weighted on a space trade-off made because of hardware limitations then unless the hardware situation changes it is not really a bug.

Obviously this metric is similar to measuring a coastline, in that getting down to finer details will change it. If you decompose part of the system into a single module of 100 lines with a bug, then the quality for the module is 0%, but if you isolate only the one line that is bad, it seems to jump back up to 99%. Quite the difference. But that points to a significant fundamental property of the mapping. Some parts of it are independent, while other parts are dependent. In the previous case of the 99 lines of code, if they are all dependent on the bad one then all 99 are tainted, thus the quality really is 0%. If only 49% are dependent, then the quality is 50%. A line of code is meaningless in isolation, it only has value when it is brought together with other lines within the system.

What we a have to assume with code though, is unless proven otherwise, there is a dependency. That again could be problematic. A programmer could see a huge code base with only 1 bug and assume that the quality is 0%. We do this quite naturally. But that illustrates the need for an architecture in a large code base that clearly delineates these dependencies. If the code can’t be easily decomposed, then the mapping is unlikely to be changeable, thus the quality really is lower, although probably not 0%.

Now that was for code, but the situation is even more dire for the underlying structure of the data. If you build a large system on a bad denormalized database, then those underlying bugs taint all of the code above it. Code written to collect bad data is inherently bad code. This is a reasonable proposition, since the code is 100% dependent on that broken structure. If the database is screwed, the value of the code is too.

Reuse and redundancy have an effect as well. If you have a tight engine for the main functionality of the system with 100K lines of code, and you have an administrative part, badly written, with 500K of spaghetti code, you might assess its quality incorrectly. Lets use the Pareto rules and say that 80% of the usage is that core engine. If there are lots of bugs in the administrative stuff, from an LOC perspective it might appear as if the overall quality was less 20%. However, if the engine was great then from a usage standpoint it is clearly at least 80%. This flip-flop shows that the value of any line of code is weighted by its functionality, and also its reuse or redundancy. A line of code in the engine is worth 20x more than one in the administration. That extremeness of the weighting is not surprising, and should be captured by the architecture at a higher level, and driven by usage or business needs. That difference shows why good architectures are necessary to assess priorities properly because they help pinpoint both dependencies and weights.

Another significant issue is trying to apply a static mapping to an inherently dynamic problem. If the essence of the problem keeps changing, then all of the collected static data is constantly out-of-date by definition. The mapping needs to keep pace with these changes, which is either a rapidly increasing amount of static coding work or the dynamic elements must be embedded in the map itself. The former approach might appear quicker initially, but will not scale appropriately. The latter is more difficult, but the scaling problems are mitigated. This comes across in the quality metric because the completeness attribute is always declining, making the structure out-of-date and then associated code. Note that this happens whether or not the code is in active development. Dynamic problems gradually degrade the quality in operations as do changes to any external dependency. They are both forms of rust.

Now given this metric, and its inherent weakness, it can be applied to any existing development project with the obvious caveat that an outsider is mostly going to undervalue quality. That is, where someone doesn’t know the weights and they don’t know the dependencies, the results will skew the effect of bugs, which every system has. That’s not as bad as it sounds, because what makes a project sustainable is the knowledge of this mapping. If that knowledge is lost, or blurry, or incorrect, then from a larger context the quality just isn’t there. If all of the original developers left and the new crew is just putting a detached bag on the side of the system instead of really enhancing it, although this new work is mostly independent, it is also redundant and full of gaps or overlaps with the existing pieces, which is reducing the quality. This gets caught quite nicely by this metric. If the knowledge is readily available, as it should be if the system is well-designed, then that feeds into both raising the quality but also making sure the existing deficiencies are less expensive to fix. In that sense a ball of mud really does have the quality of 0, even if it has a few well-written sections of code and a couple of decent structures. Good work on a small part of a bad system won’t change the fact that it is a bad system.

This metric is fairly easy to calculate in that to get a reasonable estimate all you need to know is the number of lines of code, the architecture, the denormalization of the database, the current bug list and the current usage. Often, you can even get close by just playing around with the main features of the application and guessing at the underlying code organization from the user interface. That is, if the screens are a mess and there are plenty of places where the functionality should have been reused but wasn’t, and the structure of the data being captured is erratic, then you know that the overall quality is fairly low even if you’ve never seen the code or the database. Bad mappings propagate all of the way to out to the interface. They can’t be hidden by a nice graphic design.

As Software Developers we want to build great systems; it’s why we were drawn to computers. We can’t know if the work we are contributing is any good unless we have some means of assessing the quality. We can’t know if some work is really an improvement if we have no way of understanding how it affects the overall system. Because we care about doing a good job and we want to know that our efforts have made things better, we need a means of quantifying quality that is oriented to what’s important for the usage of the software. Having a notion such as technical debt is fine, but because it is so vague it is easily misused or ignored. Knowing that there are rapidly growing problems in the mapping however allows us to visualize the consequences of our short-cuts, disorganization and design decisions. Realizing that haphazard work or muddled thinking really does cause a significant drop in the quality gives us a much better chance of raising the bar. It also gives us a means of determining if a project is improving over time, or if it is just getting worse. What the world needs now is not more code or more poorly structured data, but reliable mappings that really solve people’s problems. Once we have those, programming is just the act of making them accessible.