Good Code, Bad Code.

This summer, I'm currently interning at a local software company, working on Technical Debt. I think, perhaps, one of the hardest things I've come across while working there is not the work itself, but explaining what I do to my friends and family. I mean, on the outset, when I explain I'm looking at older code, they think of 100 better ways of spending their summer than that.

(I'll have you know, I'm having a nerdy fantasy doing this, it's one step down from being let loose in 30TB of data which another company was offering for their summer internship)

However, over the past couple of months I've managed to whittle it down to a brief set of loose terms. Here are a couple:

  1. In Software, where Bugfixing is fixing something that doesn't work, or doesn't work as expected, Technical Debt is fixing code that works as expected, but not in the best of ways
  2. Think as software people write as a purchase they make. If a company decides to make a piece of software work in order for it to be shipped to the consumers as fast as possible, it will accrue debt in parts where the codebase isn't completed, be it entirely, or in a fashion deemed properly.

This isn't saying that some companies are bad at their code, I'll explain later that unavoidable factors can contribute to the acrual of debt. As Martain Fowler describes, Technical Debt is a metaphor that explains why certain decisions are sensible. Just as a business incurs some debt to take advantage of a market opportunity developers may incur technical debt to hit an important deadline. It's when these developers let this debt get out of hand, that's when its a problem.

As you can possibly see, I've managed to go wildly off course, and I've probably gone and confused a bunch of you. But when I describe what I do, I'm generally met with the question "How is it possible to have bad code?" and thats what I hope to address today in this post.

I'm just going to warn you now, there is some code below that I will briefly explain what happens, but overall, please don't be scared :)

Counting to 10.

Say I want to get the number ten in my code. 99.999% of programmers will use the following line to set myNumber to 10.
int myNumber = 10;

And, yeah, that seems reasonable. Now, how about I show you the multitude of different ways to do the EXACT same thing.

I mean, it's pretty obvious that these are just silly. Setting the computer to calculate these numbers isn't a particularly heavy task, but the idea is there. Now, how about I show you this next one. I'm going to use a method, think of it as a box where something goes in, stuff happens to it, then I return the value after the voodoo stuff has been done.
Hang on a minute. I did nothing to the number I passed in, surely it's the same as when I just had int myNumber = 10 right? Not exactly. In a programming language like Java, every time you 'call' a method like I just did above, the instructions are sent to a pile known as the stack, where, in our case, Java adds the instructions on top of a pile of other instructions for the computer to take them off when they can run them. Once again, as this is a simple instruction, the time taken is really impossible to tell, you can see that the idea is there.

This ones a bit more complicated. You see the while there? Thats a while loop, meaning that it will loop around until the condition, in our case i is less than 10, then finish. Once again, we've had to add another instruction onto the stack, AND then we add complexity by forcing the computer to loop around 10 times.

I mean, there are five ways there to explain how easy it is to one thing in a number of ways. Any sane person would do the first way, but if we extrapolated this problem to a larger system, you can see why some decisions are made, and why some are pretty bad. Loops simplify your life, but when you have loops within loops, you can see how complexity quickly amounts.

Bad Developer, no bone for you!

A big part of programming is knowing what to name things. Naming a method int banana(double cardNumber) doesn't tell you anything about what that method does to a cardNumber. What if it is going to deduct money off of your customers card, or process a return? "Just look at the code" you might say, but, the developer has conducted some blood magic down there, and along with their stupid naming has performed an ancient rite of baking a cake - well, thats what you can tell from their code. I mean, what if it isn't a cardNumber, and instead that number passed in is actually the phone number, or even the number of cards in the database?

Well, there goes 10 hours trying to disect this lone method in this large system, and it's only your second day at work.

I think the one lesson that has haunted me whenever I name things is what my mentor told me on one of my first days at work:

"You have enough memory, name it bloody properly"

Which is true. We don't live in 1986 any more, our computers come with more than enough memory. Naming a variable for Transmission doesn't have to be Tx anymore, you can have transmission, so other developers can know instantly what you have done.

(The exception to this is Microcontroller programming, where the flash memory available is in the lower scale of kilobytes, then you'd skimp off a little there.)

Another thing Developers can do to help one another is to document what their code actually does. Most, if not all programming languages come with an operator that allows for the commenting of code. Some languages, this is // or # but either way, their functionality is practically the same.

What these comment-operators do is tell the computer to disregard these lines all together, allowing for nice people to come along and tell people what their code does.

All well and good me telling you what it is meant to do, how about an example.

It's your first day at a company, and you've been tasked with editing the server code.
They give you this.

At first glance, what do you think it does? Luckily the developer who wrote it was nice enough to name all the objects well enough that they make sense, but figuring out where to put something is going to take you time as you run through the flow of the method.

Oh, but this is only a small method, it shouldn't take you long, right? What if the company gives you another 10 of these, cause the previous developer has a bug somewhere in his code.

Now, here is the same method, now commented, albiet probably a tad too much, but infinitely more helpful all the same:

Instantly, if your employer asks you to add to the player initial creation step, you know exactly where in this method it should go. In fact, if you have never coded before, these comments could easily show to you how the code works.

Just as an aside, look at the whitespace in my code (the tabs, the spaces, the enters). An area of heated debate in software development (I could argue hours on the subject of good whitespace. I even edited 500 lines of code to get the whitespace to work on my friends code.), it's a big part of readibility to indent your code as you 'nest', or branch off your code into, say, an if statement.

This is the other end of the "Good Code, Bad Code" idea, 50% is the processes that the developers have created through their code, and the other side is how their code is layed out visually. Believe it or not, if code is layed out horribly, it's actually a leading cause of software errors in your system, because you haven't read the code properly.

Spaghetti Code and The Big Ball of Mud

Both really technical terms, I know.

Before, I mentioned that bad code is almost unavoidable as your company becomes bigger. Not only Technical Debt in the terms of a faster route to get to market, but also something called developer turnover, and no, it is not a pastry. It's when one developer leaves, and a new one arrives with different ways of doing things. Over time, you can understand why something may get this way.

A Big Ball of Mud is a "haphazardly structured, sprawling, sloppy, duct-tape-and-baling-wire spaghetti code jungle. These systems show unmistakeable signs of unregulated growth and expedient repair." Basically, a piece of code that just looks like an absolute mess. And although undesirable, it's a common occurance in a number of larger companies that have been around for a while as developers, and beliefs, change, along with the sacrifice of form for functionality.

Basically, a large heap of crap. Left unattended, this Big Ball of Mud is just added to via Piecemeal Code, code that has been badly clumped on to the bad system, compounding the problem further. An even worse situation to be in is also overly technical term known as "Sweeping it under the rug" - and it's exactly that. Code is too hard, too spaghetti like, for new developers to comprehend, so its just left, or is added upon bringing it to the point of no return, resulting in Reconstruction. I have a line in my notes from my lectures that just defines this as "It's so bad, you may as well set fire to it, and start again".

I've thrown the term Spaghetti Code a couple times in there. To explain what it is, unfortunately there are a couple terms I need to go over - Cohesion and Coupling.

Cohesion is how well strongly related two pieces of functionality in a piece of code are.

Coupling is how interdependent two modules of software are.

At a glance, these two sound almost exactly the same, however, they're not. In object orientated programming, or the idea that everything you write is a basically a box of functionality (The biggest brush-over in the history of blogging. You probably should read this if you're interested further), a system is built from a number of modules. Each of these modules should be Highly Cohesive, and have Low Coupling, meaning that the functionality of the software within each module is grouped together strongly (high cohesion), yet their reliance on other modules is kept to a minimum (low coupling). See the subtle difference? This basically means if a developer wants to remove an area of the system, or there is an error in a module, the system avoids something similar to a rolling blackout, known as a cascading error.

Phew, thats a quick overview of Software Evolution.

Conclusion

I'm not going to lie, this is perhaps the most difficult thing to understand. I came in here, thinking that showing people that code can be good, and code can be bad would be easy, however, here we are 2000 words later, and I haven't even broken the surface on this behemoth.

I guess it's a testament to how young Software Engineering as a discipline is. What you learnt last year has a high chance of being outdated, or superceeded by a far superior practice. Add in some highly passionate Software Architects, like the ones I sit behind, and a large industry full of people who have their own idea on what is good and what is bad, it's a daunting prospect to bringing this into control.

Not to mention the rate of demand of software functionality is far outstripping our software abilites, and the development of bad code is unavoidable.

I think what I'm getting at is All Code has the ability to be Bad Code, and All Code has the ability to be Good Code, which is basically going against everything that I just blathered on about.

But, Think about it.

Software development today is completely different to what it was a decade ago, roughly when I started tinkering about with code - What used to be good, is now hideously bad. A prime example of this (without going into too much detail) is the hash algorithm primarily used to hide and store passwords for much of the 1990's and early 2000's known as md5. md5 was the standard, until a series of flaws found in 2005, 2006 and 2007 led to a number of organisations to deem md5 as cryptographically broken and unsuitable for future use by 2008. Think about it - this was the time that the internet was setting off exponentially, and now a large segment just became insecure in a decade.

This post was never meant to get this long, and the sad thing is I could talk for thousands more words about what is Good Code and what is Bad Code. I have left a lot of loose ends here, hopefully I'll be able to revisit them in the future. As always, hit me up on twitter, @rack_jobinson, and let me know if I've completely befuddled you.

In the coming weeks, I'm going to get to talk about a little project I've been working on, and show you more algorithms. Compression is coming too, but next week, I hope to give a go at reviewing some TV show I picked up while I was meant to be studying.

Happy New Year!