j.c.'s Blog

A Meaningful __eq__

Jul 15, 2008 | 3 minutes read

While investigating a bug in a recent project of mine I had the following perplexing experience in my python interpreter:

>>>tokens[0] == retoks[0]
True
>>>tokens[0] != tokens[0]
True

This didn’t at first glance appear to make any sense. The object here, Token, had a defined __eq__ function, which checked the equality of a small subset of Token’s attributes. So how could something that basically checks that x == x and y == y return both True and False?

The answer lies in how rich comparison and equality actually works in python, and those who are familiar with the peculiarity at work here are already laughing at me. Everything decended from Object, which is to say, every class you’ve ever written, has a built in function called __eq__. If you don’t override it, it operates largely in the same way as is—memory location comparison. One might be tempted to assume that if __eq__ returned false, than those two objects are not equal.

This is where another function, __ne__ comes in and mucks up your assumptions. __ne__ is the function that says two objects are not equal, and if you don’t override it, it defaults to returning the inverse of is. This explains the weird issue I found in my interpreter. While the two objects had the same values in their attributes, they weren’t the same object, so both __eq__ and __ne__ returned true.

This two functions for not/equals thing is weird, but there’s a good reason for it. One could hypothetically have a function for which determining equality required confirming a bunch of things, but confirming inequality only needed to compare one piece of the object—in that instance, it would be a waste to do everything __eq__ does. Of course, there’s a strong argument that the loss of performance would result in more obvious and readable code, and the old maxim programmer time is worth more than processor time would seem to apply, but still, at least there’s a good reason, right?

What there’s not a good reason for is the lack of meaningfullness in these two operations, and the fact that one can’t default based on the other. I’d much rather see the following defaults:

def Object.__eq__(self, other_object):
return (sorted(self.__dict__.values()) ==
sorted(other_object.__dict__.values())

def Object.__ne__(self, other_object):
return not self.__eq__(other_object)

This, to my mind, has three advantages. One, the defaults for the two functions are much more meaningful than defaulting to is. Two, it follows the idea that there should generally be one obvious way to do something. Equality of values is in __eq__ and equality of memory location is in is. Three, it follows a much more human definition of equality and not equality—that is, one is always the opposite of the other.

I’m curious what you all think. Drop a comment.