[SOLVED] C/C++ performance of static arrays vs dynamic arrays

Issue

When performance is essential to an application, should consideration be given whether to declare an array on the stack vs the heap? Allow me to outline why this question has come to mind.

Since arrays in C/C++ are not objects and decay to pointers, the compiler uses the provided index to perform pointer arithmetic to access elements. My understanding is that this procedure differs from a statically declared array to a dynamically declared array when going past the first dimension.

If I were to declare an array on the stack as follows;

  int array[2][3] = { 0, 1, 2, 3, 4, 5 }
  //In memory        { row1 } { row2 }

This array would be stored in Row Major format in memory since it is stored in a contiguous block of memory. This means when I try to access an element in the array, the compiler must perform some addition and multiplication in order to ascertain the correct location.

So if I were to do the following

  int x = array[1][2]; // x = 5

The compiler would then use this formula where:

i = row index j = column index n = size of a single row (here n = 2)
array = pointer to first element

  *(array + (i*n) + j)
  *(array + (1*2) + 2)  

This means if I were to loop over this array to access each of its elements, an additional multiplication step is performed for each access by index.

Now, in an array declared on the heap, the paradigm is different and requires a multi stage solution. Note: I could also use the C++ new operator here, but I believe there is no difference in how the data is represented.

  int ** array;
  int rowSize = 2;
  // Create a 2 by 3 2d array on the heap
  array = malloc(2 * sizeof(int*));
  for (int i = 0; i < 2; i++) {
      array[i] = malloc(3 * sizeof(int));
  }

  // Populating the array
  int number = 0;
  for (int i = 0; i < 2; i++) {
      for (int j = 0l j < 3; j++) {
          array[i][j] = number++;
      }
  }

Since the array is now dynamic, its representation is a one dimensional array of one dimensional arrays. I will try to draw an ascii picture…

              int *        int int int
int ** array-> [0]          0   1   2
               [1]          3   4   5

This would imply that multiplication is no longer involved right? If I were to do the following

int x = array[1][1];

This would then perform indirection/pointer arithmetic on array[1] to access a pointer to the second row and then perform this once again to access the second element. Am I correct in saying this?

Now that there is some context, back to the question. If I am writing code for an application that requires crisp performance, like a game which has around 0.016 seconds to render a frame, should I think twice about using an array on the stack vs the heap? Now I realize there is a one time cost for using malloc or the new operator, but at a certain point (just like Big O analysis) when the data set becomes large, would one be better off iterating through a dynamic array to avoid row major indexing?

Solution

These will apply to “plain” C (not C++).

First let’s clear some terminology

“static” is a keyword in C which will drastically change the way your variable is allocated / accessed if it is applied on variables declared within functions.

There are 3 places (regarding C) where a variable (including arrays) may sit:

  • Stack: these are function local variables without static.
  • Data section: space is allocated for these when the program starts. These are any global variables (be it static or not, there the keyword relates to visibility), and any function local variables declared static.
  • Heap: dynamically allocated memory (malloc() & free()) referred by a pointer. You access this data only through pointers.

Now let’s see how one dimensional arrays are accessed

If you access an array with a constant index (may be #defined, but not const in plain C), this index can be calculated by the compiler. If you have a true array in the Data section, it will be accessed without any indirection. If you have a pointer (Heap) or an array on the Stack, an indirection is always necessary. So arrays in the Data section with this type of access may be a very little bit faster. But this is not a very useful thing which would turn the world.

If you access an array with an index variable, it essentially always decays to a pointer since the index may change (for example increment in a for loop). The generated code will likely be very similar or even identical for all types here.

Bring in more dimensions

If you declare a two or more dimensional array, and access it partially or fully by constants, an intelligent compiler may well optimize these constants out as above.

If you access by indices, note that the memory is linear. If the later dimensions of a true array are not a multiple of 2, the compiler will need to generate multiplications. For example in the array int arr[4][12]; the second dimension is 12. If you now access it as arr[i][j] where i and j are index variables, the linear memory has to be indexed as 12 * i + j. So the compiler has to generate code to multiply with a constant here. The complexity depends on how “far” the constant is from a power of 2. Here the resulting code will likely look somewhat like calculating (i<<3) + (i<<2) + j to access the element in the array.

If you build up the two dimensional “array” from pointers, the size of the dimensions do not matter since there are reference pointers in your structure. Here if you can write arr[i][j], that implies you declared it as for example int* arr[4], and then malloc()ed four chunks of memory of 12 ints each into it. Note that your four pointers (which the compiler now can use as base) also consume memory which wasn’t taken if it was a true array. Also note that here the generated code will contain a double indirection: First the code loads a pointer by i from arr, then it will load an int from that pointer by j.

If the lengths are “far” from powers of 2 (so complex “multiply with constant” codes would have to be generated to access the elements) then using pointers may generate faster access codes.

As James Kanze mentioned in his answer, in some circumstances the compiler may be able to optimize access for true multi-dimensional arrays. This kind of optimization is impossible for arrays composed from pointers as the “array” is actually not a linear chunk of memory that case.

Locality matters

If you are developing for usual desktop / mobile architectures (Intel / ARM 32 / 64 bit processors) locality also matters. That is what is likely sitting in the cache. If your variables are already in the cache for some reason, they will be accessed faster.

In the term of locality Stack is always the winner since the Stack is so frequently used that it is very likely to always sit in the cache. So small arrays are best put in there.

Using true multi-dimensional arrays instead of composing one from pointers may also help on this ground since a true array is always a linear chunk of memory, so it usually might need fewer blocks of cache to load in. A scattered pointer composition (that is if using separately malloc()ed chunks) to the contrary might need more cache blocks, and may rise cache line conflicts depending on how the chunks physically ended up on the heap.

Answered By – Jubatian

Answer Checked By – Marie Seifert (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *