When Big O Fools Ya

Big O notation is a great tool. It allows one to quickly make smart choices among various data structures and algorithms. But sometimes a casual Big O analysis can fool us if we don’t think carefully about the impact of constant factors. One such example comes up very often when programming on modern CPUs, and that is when choosing between an Array, and a List, or Tree type structure.

Memory, Slow Slow Memory

In the early 1980s, the time it took to get data from RAM, and the time it took to do computation on the data were roughly in parity. You could use algorithms that hop randomly over the heap, grabbing data and working with it. Since that time, CPUs have gotten faster at a much higher rate than RAM has. Today, a CPU can compute on the order of 100 to 1000 times faster than it can get data from RAM. This means when the cpu needs data from RAM it has to stall for hundreds of cycles, doing nothing. Obviously this would be a useless situation, so modern CPUs have various levels of cache built in. Any time you request one piece of data from RAM, you also get chunks of contiguous memory pulled into the caches on the CPU. The result is that when you iterate over contiguous memory, you can access it about as fast as the CPU can operate, because you will be streaming chunks of data into the L1 cache. If you iterate over memory in random locations, you will often miss the CPU caches, and performance can suffer greatly. If you want to learn more about this, Mike Acton’s CppCon talk is a great starting point and great fun too.

The consequence of this is that arrays have become the go to data structure if performance is important, sometimes even when Big O analysis suggests it would be slower. Where you wanted a Tree before you may want a sorted array and a binary search algorithm. Where you wanted a Queue before you may want a growable array, and so on.

Linked List vs Array List

Once you are familiar with how important contiguous memory access is, it should be no surprise that if you want to iterate over a collection quickly, that an array will be faster than a Linked List. Environments with clever allocators and garbage collectors may be able to keep Linked List nodes somewhat contiguous, some of the time, but they can’t guarantee it. Using a raw array usually involves quite a bit more complex code, especially if you want to be able to insert or add items, as you will have to deal with growing the array, shuffling elements around, and so on. Most language’s have core libraries which include some sort of growable array data structure to help with this. In C++ you have vector, in C# you have List<T> (aliased as ResizeArray in F#), and in Java there is ArrayList. Usually these data structures expose the same, or similar interface as the Linked List collection. I will refer to such data structures as Array Lists from here on, but keep in mind all the C# examples are using the List<T> class, not the older ArrayList class.

So what if you need a data structure that you can insert items into, and iterate over quickly? Let us assume for this example, that we have a use case where we will insert into the front of a collection about 5 times more often that we iterate over it. Let us also assume that the Linked List and Array List in our environment have interfaces which are equally pleasant to work with for this task. All that remains then to make a choice is to determine which one performs better. In the interest of optimizing our own valuable time, one might turn to Big O analysis. Referring to the handy Big-O Cheat Sheet, the relevant time complexities for these two data structures are:

	Iterate	Insert
Array List	O(n)	O(n)
Linked List	O(n)	O(1)

Array Lists are problematic for insertion, at a minimum it has to copy every single element beyond the insertion point in the array to move them over by 1 to make space for the inserted element, making it O(n). Sometimes it will also have to reallocate a new, bigger array to make room for the insertion. This doesn’t change the Big O time complexity, but does take time, and waste memory. So it seems for our use case, where insert happens 5 times more often than iterating, that the best choice is clear. As long as n is large enough, Linked List should perform better overall.

Empiricism

But, to know things for sure, we always have to count. So let us do an experiment in C#, using BenchMarkDotNet. C# provides generic collections LinkedList which is a classic Linked List, and List which is an Array List. Their interfaces are similar, and both allow us to implement our use case with ease. We will assume a worst case scenario for Array List, by always inserting at the front, necessitating that the entire array be copied on each insertion. The testing environment specs are:

Host Process Environment Information:
BenchmarkDotNet.Core=v0.9.9.0
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-4712HQ CPU 2.30GHz, ProcessorCount=8
Frequency=2240910 ticks, Resolution=446.2473 ns, Timer=TSC
CLR=MS.NET 4.0.30319.42000, Arch=64-bit RELEASE [RyuJIT]
GC=Concurrent Workstation
JitModules=clrjit-v4.6.1590.0

Type=Bench  Mode=Throughput  

Test Cases:

    [Benchmark(Baseline=true)]
    public int ArrayTest()
    {        
        //In C#, List<T> is an array backed list.
        List<int> local = arrayList;
        int localInserts = inserts;
        int sum = 0;
        for (int i = 0; i < localInserts; i++)
        {
            local.Insert(0, 1); //Insert the number 1 at the front
        }

        // For loops iterate over List<T> much faster than foreach
        for (int i = 0; i < local.Count; i++)
        {
            sum += local[i];  //do some work here so the JIT doesn't elide the loop entirely
        }
        return sum;
    }

    [Benchmark]
    public int ListTest()
    {
        LinkedList<int> local = linkedList;
        int localInserts = inserts;
        int sum = 0;
        for (int i = 0; i < localInserts; i++)
        {
            local.AddFirst(1); //Insert the number 1 at the front
        }

        // Again, iterating the fastest possible way over this collection
        var node = local.First;
        for (int i = 0; i < local.Count; i++)
        {
            sum += node.Value;
            node = node.Next;
        }

        return sum;
    }

Results:

Method	length	inserts	Median
ArrayTest	100	5	38.9983 us
ListTest	100	5	51.7538 us

The Array List wins by a nice margin. But this is a small list, Big O only tells us about performance as n grows large, so we should see this trend eventually reverse as n grows larger. Let’s try it:

Method	Length	Inserts	Median
ArrayTest	100	5	38.9983 us
ListTest	100	5	51.7538 us
ArrayTest	1000	5	42.1585 us
ListTest	1000	5	49.5561 us
ArrayTest	100000	5	208.9662 us
ListTest	100000	5	312.2153 us
ArrayTest	1000000	5	2,179.2469 us
ListTest	1000000	5	4,913.3430 us
ArrayTest	10000000	5	36,103.8456 us
ListTest	10000000	5	49,395.0839 us

Length	ArrayList	LinkedList
100	38.9983	51.7538
1000	42.1585	49.5561
100000	208.9662	312.2153
1000000	2179.2469	4913.3430
10000000	36103.8456	49395.0839

Here we get the result that will be counterintuitive to many. No matter how large n gets, the Array List still performs better overall. In order for performance to get worse, the ratio of inserts to iterations has to change, not just the length of the collection. Note that isn’t an actual failure of Big O analysis, it is merely a common human failure in our application of it. If you actually “did the math”, Big O would tell you that the two data structures here will grow at the same speed when there is a constant ratio of inserts to iterations.

Where the break even point occurs will depend on many factors, though a good rule of thumb suggested by Chandler Carruth at Google is that Array Lists will outperform Linked Lists until you are inserting about an order of magnitude more often than you are iterating. This rule of thumb works well in this particular case, as 10:1 is where we see Array List start to lose:

Method	Length	Inserts	Median
ArrayTest	100000	10	328,147.7954 ns
ListTest	100000	10	324,349.0560 ns

Devils in the Details

The reason Array List wins here is because the integers being iterated over are lined up contiguously in memory. Each time an integer is requested from memory an entire cache line of integers is pulled into the L1 cache, so the next 64 bytes of data are ready to go. With the Linked List, each call to node.Next makes a pointer hop to the next node, and there is no guarantee that nodes will be contiguous in memory. Therefore we will miss the cache sometimes. But we aren’t always iterating over value types like this, especially in OOP oriented managed languages we often iterate over reference types. In that case, even with an Array List, while the pointers themselves are contiguous in memory, the objects they point to are not. The situation is still better than with a Linked List, where you will be making two pointer hops per iteration instead of one, but how does this affect the relative performance?

It narrows it quite a bit, depending on the size of the objects, and the details of your hardware and software environment. Refactoring the example above to use Lists of small objects (12 bytes), the break even point drops to about 4 inserts per iteration:

Method	Length	Inserts	Median
ArrayTestObject	100000	0	674.1864 us
ListTestObject	100000	0	1,140.9044 us
ArrayTestObject	100000	2	959.0482 us
ListTestObject	100000	2	1,121.5423 us
ArrayTestObject	100000	4	1,230.6550 us
ListTestObject	100000	4	1,142.6658 us

Managed C# code suffers a bit in this case because iterating over this Array List incurs some unnecessary array bounds checking. C++ vector would likely fare better. If you were really aggressive about this you could probably write a faster Array List class using unsafe C# code to avoid the array bounds checks. Also, the relative differences here will depend greatly on how your allocator and garbage collector manage the heap, how big your objects are, and other factors. Larger objects tended to cause the relative performance of the Array List to improve in my environment. In the context of a complete application the relative performance of Array List might improve as well as the heap gets more fragmented, but you will have to test to know for sure.

As an aside, if your objects are sufficiently small (16 to 32 bytes or less, depending on various factors) you should consider making them value types (struct in .NET) instead of objects. Not only will you benefit greatly from contiguous memory access, but you will potentially reduce garbage collection overhead as well, depending on your usage of them:

Method	Length	Inserts	Median
ArrayTestObject	100000	10	2,094.8273 us
ListTestObject	100000	10	1,154.3014 us
ArrayTestStruct	100000	10	792.0004 us
ListTestStruct	100000	10	1,206.0713 us

Java may handle this better since it does some automatic cleverness with small objects, or you may have to just use separate arrays of primitive types. Though onerous to type, this can sometimes be faster than an array of structs, depending on your data access patterns. Consider it when performance matters.

Make Sure the Abstraction is Worth It

It is common for people to object to these sorts of considerations on the basis of code clarity, correctness, and maintainability. Of course each problem domain has it’s own priorities, but I feel strongly that when the clarity benefit of the abstraction is small, and the performance impact is large, that we should choose better performance as a rule. By taking time to understand your environment, you will be aware of cases where a faster but equally clear option exists, as is often the case with Array Lists vs Lists.

As some food for thought, here are 7 different ways to add up a list of numbers in C#, with their run times and memory costs. Checked arithmetic is used in all cases to keep the comparison with Linq fair, as it’s Sum method uses checked arithmetic. Notice how much better performing the fastest option is. Notice how expensive the most popular method (Linq) is. Notice that the foreach abstraction works out well with raw Arrays, but not with Array List or Linked List. Whatever your language and environment of choice is, understand these details so you can make smart default choices.

Method	Length	Median	Bytes Allocated/Op
LinkedListLinq	100000	990.7718 us	23,192.49
RawArrayLinq	100000	643.8204 us	11,856.39
LinkedListForEach	100000	489.7294 us	11,909.99
LinkedListFor	100000	299.9746 us	6,033.70
ArrayListForEach	100000	270.3873 us	6,035.88
ArrayListFor	100000	97.0850 us	1,574.32
RawArrayForEach	100000	53.0535 us	1,574.84
RawArrayFor	100000	53.1745 us	1,577.77

    [Benchmark(Baseline = true)]
    public int LinkedListLinq()
    {
        var local = linkedList;
        return local.Sum();
    }

    [Benchmark]
    public int LinkedListForEach()
    {
        var local = linkedList;
        int sum = 0;
        checked
        {
            foreach (var node in local)
            {
                sum += node;
            }
        }
        return sum;
    }

    [Benchmark]
    public int LinkedListFor()
    {
        var local = linkedList;
        int sum = 0;
        var node = local.First;
        for (int i = 0; i < local.Count; i++)
        {
            checked
            {
                sum += node.Value;
                node = node.Next;
            }
        }

        return sum;
    }

    [Benchmark]
    public int ArrayListFor()
    {
        //In C#, List<T> is an array backed list
        List<int> local = arrayList;
        int sum = 0;

        for (int i = 0; i < local.Count; i++)
        {
            checked
            {
                sum += local[i];
            }
        }

        return sum;
    }

    [Benchmark]
    public int ArrayListForEach()
    {        
        //In C#, List<T> is an array backed list
        List<int> local = arrayList;
        int sum = 0;
        checked
        {
            foreach (var x in local)
            {
                sum += x;
            }
        }
        return sum;
    }

    [Benchmark]
    public int RawArrayLinq()
    {
        int[] local = rawArray;
        return local.Sum();
    }

    [Benchmark]
    public int RawArrayForEach()
    {
        int[] local = rawArray;
        int sum = 0;
        checked
        {
            foreach (var x in local)
            {
                sum += x;
            }
        }
        return sum;
    }

    [Benchmark]
    public int RawArrayFor()
    {
        int[] local = rawArray;
        int sum = 0;

        for (int i = 0; i < local.Length; i++)
        {
            checked
            {
                sum += local[i];
            }
        }

        return sum;
    }