A long time ago I once worked with someone who said something along the lines of "If you are deleting code you are doing something right". Of course one might wonder why you should write something that needs to be deleted... but if you are not doing that, you are probably doing something wrong.
As you go through your code you should always keep an eye out for similarities - and once you find them get rid of them. The way I look at it is if your code isn't all the same you are doing something wrong, and once you notice it is the same if you don't make it all different you are doing something wrong (what?!).
Take the example that I just did in our code:
- Connect to server X and request a JSON result
- Parse the JSON result int a class (we use the Gson library from Google for that)
- Store the information for further processing
We also, for now, save some statistics such as the size and amount of time it took to get the result from the server so we can see if there are some patterns in what makes certain requests slower than others (once you have a lot of data you would be surprised what you can find out).
Each connection to a server is handled pretty much the same way: make a URL, connect to it, read the resulting stream. Each time we parse the JSON stream it is the same (except that the class we store into changes). The way the data is stored is the same. The way information is stored in the database is the same.
Take any request cycle above for any website (we have about 5 requests that we make on 3 different sites) and the code should be pretty much the same. If you were to blur the code (we will call this the "Blurry Code Stage") you would say that the code is all the same - the only difference would be some string constants, a class name, and, perhaps, some variable names.
It makes perfect sense that code that has to do something similar to something else looks pretty much the same. If it doesn't look the same then you have a big problem. Code that does the same sort of thing in a different way is harder to understand (now you need to understand 2 or more ways instead of just one) and you will probably have way more bugs in the code.
Once you are at the "Blurry Code Stage" you should realize that you have done the right thing: you have made the code the same.
Once you are at the "Blurry Code Stage" you should realize that you have done the wrong thing: you have made the code the same.
(WHAT?!????!!!!one!!!!!111?????)
The "Blurry Code Stage" is a turning point; you have done the right thing by making a lot of code follow the same pattern, but at the same time the pattern repeats time and time again which means you can consolidate the code. Consolidating means deleting code, and "If you are deleting code you are doing something right". You should strive to get your code to the "Blurry Code Stage" and then strive to remove as much code as possible.
The above example in our code probably winds up being about 100 lines in total (no, I am not going to go check the code and count the lines!). We have 5 sites. That means there are about 500 lines when there only needs to be about 100 (that will actually go up a bit because you probably need to introduce some variables and probably some additional logic, so lets' say 125 lines instead of 500).
If you are lucky you will notice after the 3rd one that the code is similar and you should refactor it. For example, I caught it after the 2nd one and managed to move all of the database code into one place. That meant that the other three requests automagically had the database code work with no further effort on my part. I noticed that the website request was the same after the 4th one and took about 120 lines of code.
Last night I noticed that I really have two ways of parsing the JSON data. I also noticed that I didn't write the code the same way, so I really have three different ways of writing it (and in reality there are four ways I have written that code). Grrrrrrr! So how to fix it? First up - parse the JSON data the same way. That will get me down to three ways. Next is to get three ways down to two... which will involve adding some new methods, some of which will do nothing (if they do nothing why add them?!, keep reading, I'll get to it).
Doing all of this in a Object Oriented language (I am using Java) is pretty easy. For example:
public class Add
{
public int add(final int a, final int b)
{
if(a < 0)
{
throw new IllegalArgumentException("a must be >= 0, was: " + a);
}
if(b < 0)
{
throw new IllegalArgumentException("b must be >= 0, was: " + b);
}
return (a + b);
}
}
public class Subtract
{
public int subtract(final int a, final int b)
{
if(a < 0)
{
throw new IllegalArgumentException("a must be >= 0, was: " + a);
}
if(b < 0)
{
throw new IllegalArgumentException("b must be >= 0, was: " + b);
}
return (a - b);
}
}
Now blur your eyes:
public class Add/Subtract
{
public int add/subtract(final int a, final int b)
{
if(a < 0)
{
throw new IllegalArgumentException("a must be >= 0, was: " + a);
}
if(b < 0)
{
throw new IllegalArgumentException("b must be >= 0, was: " + b);
}
return (a +/- b);
}
}
You see it is only Add/Subtract, add/subtract, and +/- that are different. We can easily fix that like this:
public abstract class Operation
{
public final int perform(final int a, final int b)
{
final int result;
if(a < 0)
{
throw new IllegalArgumentException("a must be >= 0, was: " + a);
}
if(b < 0)
{
throw new IllegalArgumentException("b must be >= 0, was: " + b);
}
result = doPerform(a, b);
return (result);
}
protected int doPerform(int a, int b);
}
public class Add
extends Operation
{
protected int doPeform(final int a, final int b)
{
return (a + b);
}
}
public class Subtract
extends Operation
{
protected int doPeform(final int a, final int b)
{
return (a - b);
}
}
Now if you want to add a new operation you just have to extend the Operation class and then override the doPerform method to do the right thing.
In the case of divide we might want to do something special when dividing by zero...
public class Divide
extends Operation
{
protected int doPeform(final int a, final int b)
{
// do not want a divide by zero error, but also don't necessarily want to handle the special case here...
return (a / b);
}
protected void checkNumbers(final int dividend, final int divisor)
{
if(divisor == 0)
{
throw new IllegalArgumentException("divisor cannot be 0");
}
}
}
now we need some way to call it:
public abstract class Operation
{
public final int perform(final int a, final int b)
{
final int result;
if(a < 0)
{
throw new IllegalArgumentException("a must be >= 0, was: " + a);
}
if(b < 0)
{
throw new IllegalArgumentException("b must be >= 0, was: " + b);
}
checkNumbers(a, b);
result = doPerform(a, b);
return (result);
}
protected int doPerform(int a, int b);
protected void checkNumbers(final int a, final int b)
{
// add and subtract don't do anything, they just inherit this empty method
}
}
Great... but now we don't pass the "Blurry Code Test" since the a < 0 and b < 0 are pretty close to the divisor == 0... so I'd do:
public abstract class Operation
{
public final int perform(final int a, final int b)
{
final int result;
checkNumbers(a, b);
result = doPerform(a, b);
return (result);
}
protected int doPerform(int a, int b);
protected void checkNumbers(final int a, final int b)
{
if(a < 0)
{
throw new IllegalArgumentException("a must be >= 0, was: " + a);
}
if(b < 0)
{
throw new IllegalArgumentException("b must be >= 0, was: " + b);
}
}
}
public class Divide
extends Operation
{
protected int doPeform(final int a, final int b)
{
// do not want a divide by zero error, but also don't necessarily want to handle the special case here...
return (a / b);
}
protected void checkNumbers(final int dividend, final int divisor)
{
// could do it this way, or use a doCheckNumbers like the other way
super.checkNumbers(dividend, divisor);
if(divisor == 0)
{
throw new IllegalArgumentException("divisor cannot be 0");
}
}
}
Add and Subtract remain unchanged.
Some newer programmers (and probably some older programmers) don't feel comfortable with this because now you have to jump around the code to follow it. Short answer: get over it. Your methods should be small, and pretty easy to understand.
The benefit is that you can add new operations (multiply for example) without having to do a lot of work. If that work consists of copy/paste then you are at the "Blurry Code Stage" and you really should stop and fix the code instead of continuing on.
In my case I want to add a sixth call to a website and I'd rather do that in about 10 lines rather than 100. Is it slower to change the code then just add the new one? In the short term yes it is slower.. but in the long term you get a huge pay off in terms of time when coding something new or when fixing a bug (you can fix the bug in one place not many).
While I swapped between this blog and the code I wound up doing the following:
1) getting it down to one single way of parsing instead of 4
2) normalizing the JSON I am parsing (I used to jump through some hoops to only parse certain bits, now all of the returned JSON is parsed in all cases)
3) fixed a semi-related bit of ugliness in the code - some times I used the raw JSON results, other times I used extra classes that augmented the results. Now I always use augmented results.
The last two are important because they make the code consistent (consistent code is probably the most important thing) and may lead me to more blurry code later.