Mean of a list 

In this post I’m going to dissect a solution to the mean of a list problem proposed as an exercise at the end of chapter 3 in Real World Haskell. The problem reads the following:

Write a function that computes the mean of a list, i.e. the sum of all elements in the list divided by its length. (You may need to use the fromIntegral function to convert the length of the list from an integer into a floating point number.)

I’ve tried to figure out a solution to this problem half a year ago, not long before I dropped the study of Haskell. The actual difficulty that I had wasn’t about how to solve it, but rather, how to solve it using only the functions presented up to that point in the book. Sure, you need a sum of all the numbers in the list, divided the the length of the list. I knew a length function was presented already in that chapter, but I didn’t remember about a sum function. Even after looking the previous chapters for sum, I couldn’t find it (but it was there). So, I said to myself that I should solve it without that particular function. Maybe the authors were up to teach me something, right?

I, therefore, created my helper sum function that would do just that, return the sum of elements in a list. Things were working smoothly until I hit myself against the typeclasses wall. I had no idea what type to return… Double, Float, Fractional. Honestly, it’s still a mess in my head with regard to types and typeclasses. So I went to the online version of the book to solve this issue. There, I looked for comments regarding my problem and one of these solutions actually caught my eye, but not for the way it solved the return type issue. It was Rob Grainger’s solution that used nothing more than functions presented in the book up to that chapter (but not sum or length). Here’s his solution:

mean :: [Double] -> Double
mean l = snd (inner l) / fst (inner l)
         where inner []     = (0,0)
               inner (x:xs) = (fst (inner xs) + 1, snd (inner xs) + x)

This is brilliant. He, basically, created a helper function (inner) that would return both the sum of the elements in the list as well as the length of the list. All in a 2-tuple data structure. Then he would extract the two elements in the tuple using the built-in fst and snd functions, and ultimately apply the division function (/) on them.

Now, the helper inner function is a recursive function that reduces the whole list of numbers to a 2-tuple of numbers. It starts from the first element in the list and uses this element for two purposes:

  1. increment a counter of the elements in the list
  2. add it to the second element in the tuple, which is the sum of elements in the list

For clarification, a list comprising of only three elements, [1, 2, 3], would generate the following execution trace:

inner [1, 2, 3]
    == (fst (inner [2, 3]) + 1, snd (inner [2, 3]) + 1)
    == (fst (fst (inner [3]) + 1, snd (inner [3]) + 2) + 1, snd (fst (inner [3]) + 1, snd (inner [3]) + 2) + 1)
    == (fst (fst (fst (inner []) + 1, snd (inner []) + 3) + 1, snd (fst (inner []) + 1, snd (inner []) + 3) + 2) + 1, snd (fst (fst (inner []) + 1, snd (inner []) + 3) + 1, snd (fst (inner []) + 1, snd (inner []) + 3) + 2) + 1)
    == (fst (fst (fst (0,0) + 1, snd (0,0) + 3) + 1, snd (fst (0,0) + 1, snd (0,0) + 3) + 2) + 1, snd (fst (fst (0,0) + 1, snd (0,0) + 3) + 1, snd (fst (0,0) + 1, snd (0,0) + 3) + 2) + 1)
    == (fst (fst (0 + 1, 0 + 3) + 1, snd (0 + 1, 0 + 3) + 2) + 1, snd (fst (0 + 1, 0 + 3) + 1, snd (0 + 1, 0 + 3) + 2) + 1)
    == (fst (fst (1, 3) + 1, snd (1, 3) + 2) + 1, snd (fst (1, 3) + 1, snd (1, 3) + 2) + 1)
    == (fst (1 + 1, 3 + 2) + 1, snd (1 + 1, 3 + 2) + 1)
    == (fst (2, 5) + 1, snd (2, 5) + 1)
    == (2 + 1, 5 + 1)
    == (3, 6)

Or, using some ASCII art, the execution tree would look along these lines (I have represented only the arguments that inner would receive for each application):

        [1,2,3]
         /   \
        /     \
       /       \
   [2,3]        [2,3]
    / \          / \
   /   \        /   \
 [3]   [3]    [3]   [3]
 / \   / \    / \   / \
[] [] [] []  [] [] [] []

As you can see, there’s a lot happening in there for such a small input list. The main reason is that, for every element in the list, the inner function is executed twice. It is doubly recursive, which, if I’m not wrong, it means the solution has a complexity of O(n2)? Honestly, big O notation is not my strong point, so I may be wrong about this.