Find the number with no duplicate

There problem is: you have a list of a bunch of numbers, and every number is in there twice, except for one. What's the number that only appears once? There's a silly functional solution to this problem.

Normally I would say something like:

from collections import counter

def not_duplicated(sequence):
    for n, count in Counter(sequence).items():
        if count == 1:
            return n

I like this solution. I think it's easy to get a sense of how and why it works without even reading it very carefully. It only has to iterate over the sequence about 1.5 times so it runs in O(n) time. It would be very easy to add a check that the input matches our assumptions about it. In real life, this is what I would go with 99.9% of the time.

... but the counter uses O(n) space, and I remembered another strategy from a previous programming problem:

def not_duplicated(sequence):
    accumulator = 0
    for n in sequence:
        accumulator ^= n  # (xor)

    return accumulator

The idea is that if you xor a number with itself, you get zero, so if you xor all these numbers together, the only thing left will be the number that didn't cancel itself out! Mathematically it's cool, plus it only uses constant space.

From functional programming, there's a pattern called a "fold": Start with some initial value, use a function to combine it with each successive value in a list, and return the result. Python has this but calls it "reduce", and it used to be in the builtins but got moved to the functools library in python 3 because Guido says functional programming is hard to read (lol).

With reduce I could define sum like this:

def sum(sequence):
    return reduce(operator.add, sequence)

This takes the first value in the list and adds the elements of sequence to it, one at a time. Similarly I could say:

def not_duplicated(sequence):
    return reduce(operator.xor, sequence)

... and this reminds me of a trick from Haskell! The standard foldy way to do these in haskell is:

sum sequence = foldl (+) 0 sequence

... where "sum" is the name of the function, "sequence" is its argument, "foldl" is the equivalent of "reduce" in python, "(+)" is the combination function, and "0" is the initial value. The Haskell version of not_duplicated looks very similar:

notDuplicated sequence = foldl xor 0 sequence

Fortunately we can make this even more ~~arcane~~ concise. In Haskell if you call a function with not enough arguments, it does something called "partial application" and returns a function that takes the remaining arguments and then actually does the thing. It's a weird leap but it means that we can remove the "sequence" argument from both the signature and the function definition:

-- notDuplicated sequence = foldl xor 0 sequence
notDuplicated = foldl xor 0

It turns out Python's functools also has "partial" that does partial application! so...

from functools import partial, reduce
from operator import xor

not_duplicated = partial(reduce, xor)
>>> not_duplicated([1, 2, 3, 4, 5, 1, 2, 4, 5])
3

:-D

links

social