Tuesday 15 May 2012

Hazards and pitfalls of equals() on class hierarchies

The problem

I've recently got bitten by some very nasty equals() hazards when extending classes. Here is a cautionary tale, that taught me the following: if you extend a class which implements its own equals() method, always carefully check the equals() method of that base class, and make sure to create a new equals() override in the subclass if that is required - and it will be, if you add new structural data to the subclass which affects the logical equality of instances of that class.
But even if your subclass doesn't change anything that affects the notion of logical equality, still remember to check the equals() method of the base class, to avoid any pitfalls. Merely extending a class may break the equals() contract! Recently I fell into one such big pitfall whilst working with Java on the DDT project. Let us look at the issue in detail. I had this code for the superclass:

class ScriptElementImageDescriptor
...
    public boolean equals(Object object) {       
        if (object == null || !ScriptElementImageDescriptor.class.equals(object.getClass())) {           
            return false;
        }
        ScriptElementImageDescriptor_Fix other= (ScriptElementImageDescriptor_Fix)object;
       
        return (fFlags == other.fFlags)
            && (fSize.equals(other.fSize)) && (fBaseImage.equals(other.fBaseImage));

     }


I created a somewhat trivial subclass of ScriptElementImageDescriptor, and the problem now was that instances of that subclass were no longer equal to any other instance whatsoever. This is due to the way that ScriptElementImageDescriptor.equals() does the check for the instance's runtime type:
    || !ScriptElementImageDescriptor.class.equals(object.getClass())
This check is not the same as the more common instanceof check. Rather, this check will fail unless the runtime type of object is exactly that of ScriptElementImageDescriptor. So subclasses will fail the check, just the same as unrelated classes, and that's what caused the bug.

And this bug was particularly dangerous because that descriptor class was not an actual domain model class for the application - it wasn't used in any direct user/application functionality, but rather it was related to performance internal functionality: it was mainly used as a descriptor for caches of image resource objects. Therefore, what this bug caused was a silent resource leak, one which was quite harder to trace than a direct application functionality bug (like a UI bug or crash). Only by sheer luck did I write some other unrelated code (code that was disposing and recreating a certain UI component 100 times more often than normal, to test flickering issues), which caused this bug to manifest clearly and early, very soon after the original change that introduced it. If this bug had shipped, it would have been very, very hard to replicate (it might take hours of normal application usage to hit the leak limits) and to diagnose (as even once the leak is hit, it wouldn't be clear at all what caused it!).


So the horridness of this bug ingrained in my mind that I should always check and remember how a superclass does equals() (and hashcode()). But my considerations didn't end with the ScriptElementImageDescriptor case - I started to think about this issue in a more general way, considering all sorts of cases. What should I do with my code (assuming I can also modify the superclass) when I create a subclass, that may or may not add structural data affecting its notion of equality (sometimes called an aspect) ?

The solutions

My first idea to address this issue was to change the superclass equals() to use the more common instanceof runtime type check, like this (I have now simplified the classes for the rest of the article):

class SuperClass {
    int fieldA;
    int fieldB;
   
    public boolean equals(Object object) {
        if (
!(object instanceof SuperClass))
            return false;

        SuperClass other = (SuperClass)object;
        return fieldA == other.fieldA && fieldB == other.fieldB;
    }
}


Now, if SubClass doesn't modify the notion of equals, everything is done. And if it does (say, a new field is added), then a Subclass.equals() override is created which does its own checks, but calls the superclass's equals() method to check the equality specifics relating to the superclass data/state:

class SubClass extends SuperClass {
    int subclassField;
    

    public boolean equals(Object object) {       
        if (!(object instanceof SubClass))

            return false;
       
        SubClass other = (SubClass)object;

        return subclassField == other.subclassField && super.equals(other);
    }


Hold on a second...

...is the above code correct? Well, not quite! That code was my first idea, but it was clear it had a problem: if it can be that SuperClass and SubClass objects end up being compared to each other, you will get different (and thus incorrect) results if the equals() method is called with a SuperClass receiver, with a SubClass parameter. So:
    SuperClass superClass = new SuperClass(...)
    SuperClass subClass = new SubClass(...)
   
superClass.equals(subClass)
may not return the same value as:
    subClass.equals(superClass)
because different equals() methods will be called. Returning a different value is a clear violation of the equals() symmetry contract in Java (and in other languages it's likely to also be a contract violation, or, at the very least, an ugly and brittle design). This problem is discussed in more detail in Items 7 and 8 of Joshua Bloch's Effective Java Programming Language Guide. He mentions that: "There is simply no way to extend an instantiable class and add an aspect while preserving the equals contract.", but I am not sure that is entirely correct. For example, a potential fix to this is to change equals() so that it requires that the runtime type of the objects being compared be exactly the same. That would be the second solution:



class SuperClass {
    int fieldA;
    int fieldB;
   
    public boolean equals(Object object) {

        if (this == object)
            return true;
        if (object == null || this.getClass() != object.getClass())
            return false;
       
        return equalsOther((SuperClass)object);
    }
   
    public boolean equalsOther(SuperClass other) {
        return fieldA == other.fieldA && fieldB == other.fieldB;
    }
}


class SubClass extends SuperClass {
    int subclassField;

    public boolean
equalsOther(SuperClass object) {       
        SubClass other = (SubClass)object;
        return subclassField == other.subclassField && super.equalsOther(other);
    }
}


This is now a correct implementation of equals(), useful without violating its contract. Also added is the if(this == object) check just as a minor optimization. But this solution still has a significant disadvantage. Because we reverted to the use of a direct class equality requirement:
    this.getClass() != object.getClass()
then other subclasses that *don't* change the notion of logical equality of their parent will not be equal to instances of their superclass, even though they should be logically equal (for those familiar with the concept, we forcibly cause *all* subclass instances to break the Liskov substitution principle). Whether this limitation is an actual problem or not, that depends on your code, and what you want to do with your subclassses, if you have any at all. This approach can be a significant problem more often than you think at first, though. For example, even if in your code there are no subclasses of such a base class, if you use ORM persistence frameworks like Hibernate, these use reflection and byte-code generation to dynamically create proxy subclasses of the classes you are trying to persist! So the above pattern totally breaks if you are using such frameworks, making it unequivocally unusable.

The final solution

There is a way to address this issue without any drawbacks (well, other than verbose code). Basically the approach I figured out uses a combination of both instanceof and class comparison. The main type check will be instanceof as in the first solution. However, after the instanceof check, we will do another type check on the other object to prevent symmetry problems: we will ask the other object what is the minimum class (most derived class) that the object can be equal to. This is done by introducing a new method called getEqualityClassRequirement(). See the code:



class SuperClass {
    int fieldA;
    int fieldB;
   
    public boolean equals(Object object) {
        if (this == object)
            return true;  
        if (!(object instanceof SuperClass))           
            return false;
       
       
SuperClass other = (SuperClass) object;
        return other.getEqualityClassRequirement() == this.getEqualityClassRequirement() 
            && equalsOther(other);
    }
   
   
protected Class<?> getEqualityClassRequirement() {
        return
SuperClass.class;
    }
   
    public boolean equalsOther(SuperClass other) {
        return fieldA == other.fieldA && fieldB == other.fieldB;
    }
}


Then, as result, subclasses of SuperClass that modify the notion of logical equality must override getEqualityClassRequirement to specify their own class type as a requirement:


class SubClass extends SuperClass {
    int subclassField;
 
    
    protected Class<?> getEqualityClassRequirement() {
        return
SubClass.class;
    }
    
    public boolean equalsOther(SuperClass object) {       
        SubClass other = (SubClass)object;
        return subclassField == other.subclassField && super.equalsOther(other);
    }
}

This is the best way I came up to fully address this issue in the Java language. If you use Eclipse JDT, this could be a good opportunity to create a code template to produce the boilerplate above.
I can see that a lot of verbosity-averse programmers will be cringing at the above code. If you have such aversions, or if you are just feeling particularly lazy whilst coding a certain class, you can still use the first solution (the lightweight version which just uses instanceof), as long as you make sure your equals() method is final. This will turn the first solution into a correct one, although with severe limitations (no subclasses will be able to change the notion of logical equality). This is actually fine for code that is not released externally (i.e., as a library component), because if you ever run into the limitation, you can just change the equals() code to one of the other solutions which allow subclasses to change the notion of equality. But this trick of using final is likely not appropriate if you release your code publicly and expect users might derive from your superclass - because obviously that prevents the users from being able to change the equals() code of your superclass if they want to remove the limitation.

Hashcode

It should go without saying that, if you are overriding equals(), then of course don't forget to update hashcode() as well. Fortunately, extending hashcode() is much simpler as none of the problems above arise. That's because hashcode() only deals with one object (the receiver), not two objects like equals(), and thus there are no issues with mixed types.

Other languages

I mentioned before I didn't find a sensible way to improve this pattern with the Java language. But this does leave open to consideration how this issue might be addressed and possibly improved in other languages. D in particular comes to mind, due to its meta-programming and generative code capabilities. Mixins for example could be used to alleviate some of the boilerplate that results in the Java case.
Going even further, one can bring this issue to the realm of language design. Perhaps the very way that one defines an equals/hashcode in classes could be changed in order to address this issue just as safely and comprehensively, but in a more pleasant and concise way (D follows the Java system pretty closely, as most OO languages do). But at the moment these are enough considerations for me. Thinking about this issue as applied to D or other evolving languages might be left for another article or discussion in the future, perhaps picked up further by others.