Find the Difference Between Two Lists with Python
Let’s say you have two lists in Python that have a lot of overlap between them.
a = ["abc","def","ghi"] b = ["def","ghi","jkl"]
Now let’s say you want to determine what is in b that is not in a. In set theory, you would refer to this as the set-theoretic difference of b in a. The following simple code would accomplish this.
print set(b) - set(a)
Now let’s say you wanted to know which elements in the two lists did not overlap at all between them. This is sometimes referred to as the symmetric distance. The following code should serve this purpose and should give you “jkl” and “abc”.
c = set(a).union(set(b)) d = set(a).intersection(set(b)) print c - d
One thing to note is that the set class has union, intersection, and difference (which can be invoked in short hand using a minus sign) methods. So I converted the original lists to sets. You may want to convert them back using code such as the following.
e = list(d)
Java users must admit that this is much easier in Python than in Java.
The problem with calling set is that it will remove duplicates – set items are unique:
>>> set((1, 1, 3))
{1, 3}
This is sometimes not what you want – for example, you might want [1, 1, 3] – [1] = [1, 3]
The most obvious way to spell this is:
def difference(l1, l2):
[x for x in l1 if x not in l2]
and the symmetric difference would be:
def symmetric_difference(l1, l2):
return difference(l1, l2) + difference(l2, l1)
For syntactic nicety, you could have a class inherit from list , and define __sub__ the same way as difference above; then symmetric_difference becomes:
def symmetric_difference(self, l2):
(self – l2) + (l2 – self)