A Student’s C Book: 1.6. Arrays

Level 1. Introduction to C

1.6. Arrays

Hold my integers

Imagine a case where you need to work with tabular-like data of the ages of 50 people. How would you store these numbers in the computer’s memory? Would you type out 50 different variable names one by one, such as age1, age2, age3, etc., and then assign values to them? You could absolutely do that. However, there is a better way! It is by using an array…

An array is a bunch of elements that are located in the memory consecutively. That is to say, the memory addresses of any ordinary variables, such as age2 and age3, did not need to be consecutive in the memory, whereas every successive pair of elements in an array is guaranteed to be consecutive in the memory as well. You might ask, what is the advantage of using an array over multiple variables? Well, there are several advantages:

  • Instead of typing out 50 different variable names by hand, we now have to type only one (that is, the array variable);
  • Programmers like to keep things organized. If you have 50 names and 50 ages to work with in your C project, you might as well represent them as two separate chunks of memory blocks;
  • Finally, pointer arithmetic can be used to jump between the array elements very conveniently.

An array in a C program can be declared by using a pair of square brackets as shown below:

unsigned int ages[50];

That will tell the computer to allocate 50 positive integers in consecutive order in the memory. In fact, you can test this by using the pointer arithmetic since the variable ages is actually going to be used as an alias for the memory address of the first unsigned integer. This can be done as follows:

unsigned int ages[50] = {0}; // initialize all ages to 0
for(int i=0; i<50; i=i+1){
    ages[i] = 2*i + 1;  // ages[0]=1, ages[1]=3, ages[2]=5, ...
    printf("ages[%d] = %u and is at address %p\n", i, ages[i], &ages[i]);
}

Since the memory addresses are consecutive, we could easily replace &ages[i] with &ages[0]+i and ages[i] with *(&ages[0]+i), and everything would still work the same as before. Note that the difference between &ages[i] and &ages[0]+i is that the former is a pointer to the element at index i, and the latter is the result of the pointer arithmetic that reads as adding number i to the pointer that points to the first element of the array (i.e., the element at index 0). The two must be the same because of the fact that the array is consecutively stored in the memory. This can be simply visualized as follows:

addresses:       0   1  2  3   4  5  6  7 ... 53  54 55  56  57
memory cells:  ['a'][3][ ][-2][1][3][5][7]...[99][ ][ ]['?'][0]...
array index:                   0  1  2  3 ... 49
        the array begins here--^     ^--let's pick ages[2]=5

In the example shown above, the elements of our array (i.e., 1, 3, 5, 7, …, 99) are stored consecutively starting from the memory address 4. The first element of the array is always at an index 0, so the address of the first element is &ages[0], and if you printed its value (given our hypothetical memory layout shown above), it would be 4. So, from this picture, it should be obvious that &ages[2] (which is 6) is also equal to &ages[0]+2 (which is 4+2=6). Moreover, this is true for any index, i.e., &ages[i] equals &ages[0]+i. Recall that the array variable itself is also a pointer to the first element of the array, and therefore, age is a memory address that is equal to &ages[0]. With that being said, we can rewrite the code snippet above as shown below.

unsigned int ages[50] = {0};
for(int i=0; i<50; i=i+1){
    *(ages+i) = 2*i+1;
    printf("ages[i] = %u and is located at %p\n", i, *(ages+i), ages+i);
}

Working with 50 numbers got easier with the help of arrays. Since we did not have any dimension other than age, we used a single-dimensional or 1D array. Now, let’s assume that we have to store the following tabular data by using an array:

IndexAgeHeight
023174
110140
216180
345179
.
.
.
.
.
.
.
.
.
4934156

In this case, having just 50 integers won’t cut it; we have to allocate 100 integers, i.e., 50 to store ages and 50 for heights. That’s when we can use a two-dimensional or 2D array where the second dimension will be used to hold a pair of numbers (i.e., age and height). This can be done as follows:

unsigned int data[50][2] = {0}; // 50 * 2 = 100 integers in total
data[0][0] = 23; data[0][1] = 174;
data[1][0] = 10; data[1][1] = 140;
data[2][0] = 16; data[2][1] = 180;
data[3][0] = 45; data[3][1] = 179;
// ...
data[49][0] = 34; data[49][1] = 156;

In the code snippet above, each data[i] holds two integers (i.e., age and height, respectively) in a single row indicated by the last dimension (i.e., unsigned int data[50][2]) while the first dimension is indexed by the number of people we have. Since data[i] (where 0 \leq i \leq 49) refers to the (i+1)-th row, we can further access the age or the height information by using the second index 0 \leq j \leq 1, that is data[i][j] (data[i][0] is the age given in the (i+1)-th row and data[i][1] is the height given in the (i+1)-th row).

I should probably also tell you that we can change the order of the dimensions as long as we have storage for 100 integers. Let’s try to imagine the following declaration: unsigned int data[2][50]. By using this declaration, we could say that now the first dimension indicates the columns instead of the rows, and the second dimension indicates the rows of our table instead of the columns. data[0] holds all 50 ages, and data[1] holds all 50 heights, and to access/modify the individual age and height numbers, the second index must be used (data[0][j] refers to the age given in the (j+1)-th row and data[0][j] refers to the height given in the (j+1)-th row). Let’s see how we would initialize such an array in C:

unsigned int data[2][50] = {0}; // 2 * 50 = 100 integers in total
data[0][0] = 23; data[1][0] = 174;
data[0][1] = 10; data[1][1] = 140;
data[0][2] = 16; data[1][2] = 180;
data[0][3] = 45; data[1][3] = 179;
// ...
data[0][49] = 34; data[1][49] = 156;

You may already get the feeling for the arrays in C. They are useful for representing tabular data, and they can scale easily to any arbitrary number of dimensions you want. You can create an n-dimensional array by putting n opening-closing brackets [ ] after the array variable (e.g., float arr[4][3][6][2][5]; for a 5D array that holds 4*3*6*2*5=720 floating-point numbers). However, there is a catch: an array can only hold a single type of elements, meaning that if you declare a float array, then it can only hold floating-point numbers and not characters, for example. So, you cannot mix different types in a single array. To illustrate the issue, let’s suppose that you should store the following table in the computer’s memory:

IndexAgeHeightWeightFirst NameLast Name
02317478.3BobSurfers
11014059.9NickBostrom
21618065.8KatherineCillian
34517995.5BillLohm
.
.
.
.
.
.
.
.
.
493415680.0AlexBeckham

We can use an array of positive integers to store age and height information, an array of floats to store weight information, and an array of strings to store the first and last names. This could be done as shown below.

// table1[0] represents ages and table1[1] represents heights
unsigned int table1[2][50] = {0};

// table2 represents weights
float table2[50] = {0};

// table3[0] represents first names and table3[1] represents last names
// let's imagine the max name length can be 32 characters
char table3[2][50][32] = {0};

With all this being said, there is one thing left to be mentioned in this section. That’s the array initialization. Instead of initializing an array to all 0s (i.e., by assigning them to {0} in the declaration), you can put any arbitrary numbers inside the curly braces to initialize your array however you want. Let’s see the examples shown below.

float weights[3] = {45.3, 66.7, 89.0};      // now weights[0] = 45.3; weights[1] = 66.7; weights[2] = 89.3
int numbers[] = {3, 6, -2, 0};              // same as numbers[4] = {...};
char first_name[] = {'B', 'o', 'b', '\0'};  // it's recommended to put '\0' as the last character
char* last_name = "Marley";                 // same as last_name[] = {'M', 'a', 'r', 'l', 'e', 'y', '\0'};

The last two initializations are string1 initializations, where we can put a bunch of characters together in a character (char) array to represent words or any text in general. When we do not indicate the size of the array (that is, the number of elements) inside the square brackets [] during the initialization, the compiler infers it automatically by looking at the elements inside the curly braces { ... }. For example, saying int numbers[] = {3, 6, -2, 0}; is perfectly understood by the compiler because we have provided all the elements of the numbers array inside the curly braces. Initializing strings is a bit different than initializing other built-in types, such as integers or floating-point numbers. In C, it is a good practice to end a character array or a string with a null-terminator. A null-terminator is the special character '\0', and it is used to indicate the end of the string. When we use a string initialization by using double quotes (e.g., char* last_name = "Marley";), the compiler automatically puts the null-terminator at the end. Having a null-terminator in our string also implies that its size is actually one more than what it needed to contain initially. That is to say, the first_name string contains 4 characters instead of 3, and the last_name string contains 7 characters instead of 6.

The size of my array

An array in C is a pointer to its first element and, hence, does not contain the size information by itself. This means that just by knowing the first element or its address, you cannot know the number of elements in the whole array. For this reason, C programmers usually pass the size information along with the array pointer to functions that need them. Since array variables are also the pointers to the first elements, one can also pass them to functions as pointers. Here’s a quick example:

#include <stdio.h>

double sum(double arr[], unsigned int size){
    double out = 0.0;
    for(unsigned int i=0; i<size; i=i+1){
        out = out + arr[i];
    }
    return out;
}

double mult(double* arr, unsigned int size){
    double out = 1.0;
    for(unsigned int i=0; i<size; i=i+1){
        out = out * arr[i];
    }
    return out;
}

int main(){
    double* numbers = {-2, 1, 3.5};
    unsigned int size = sizeof(numbers) / sizeof(double); // size = 24 / 8 = 3
    printf("sum of [-2, 1, 3.5] is %lf\n", sum(numbers, size));
    printf("multiplication of [-2, 1, 3.5] is %lf\n", mult(numbers, size));
    return 0;
}

Notice the use of the sizeof operator in the main() function. This operator returns the size of its argument in bytes. For example, in a 64-bit machine, sizeof(char) gives us 1, sizeof(int) gives us 4, sizeof(float) gives us 4, sizeof(double) gives us 8, sizeof(char*) gives us 8 bytes, and so on. When its argument is an array, this operator behaves a bit differently. If the array passed as an argument to the sizeof operator is in the same scope where the sizeof operator has been used, it returns the number of bytes the array contains. On the other hand, if the array passed as an argument is not in the same scope, then the sizeof operator treats it as a pointer and returns the size of the pointer (i.e., 8 bytes in 64-bit machines) and not the whole array. That’s why, to get the actual number of elements in the numbers array, we had to divide its size in bytes (i.e., sizeof(number))by the size of its type (i.e., sizeof(double)). Moreover, we passed this size variable to the other functions because passing an array to a function as an argument always causes it to be treated as a pointer inside the function’s body and, hence, using the sizeof operator on an array argument would return 8 bytes all the time instead of the array’s actual size. We can test this easily as follows:

#include <stdio.h>

void test(char* name){
    printf("sizeof(name) = %zu bytes in test()\n", sizeof(name));
}

int main(){
    char* name = "Bob"; // has 4 characters - 'B', 'o', 'b', and '\0'; each character is 1 byte
    printf("sizeof(name) = %zu bytes in main()\n", sizeof(name));
    test(name);
    
    int x = 7;
    int* ptr = &x;
    printf("sizeof(int*) = %zu and sizeof(ptr) = %zu\n", sizeof(int*), sizeof(ptr));
    printf("sizeof(int) = %zu and sizeof(*ptr) = %zu\n", sizeof(int), sizeof(*ptr));
    return 0;
}

This would produce the following output on your screen:

sizeof(name) = 4 bytes in main()
sizeof(name) = 8 bytes in test()
sizeof(int*) = 8 and sizeof(ptr) = 8
sizeof(int) = 4 and sizeof(*ptr) = 4

Since the ptr variable has the type of int*, sizeof(ptr) and sizeof(int*) are equal. Moreover, since dereferencing ptr (i.e., *ptr)gives us an int, sizeof(*ptr) and sizeof(int) are also equal. This works for any type in C.

Table of Contents

  1. Preface
  2. Level 1. Introduction to C
    1. Hello, World!
    2. Basics
      1. Your computer can memorize things
      2. Your computer can “talk” and “listen”
      3. Compiling and Running programs
    3. Functions
      1. I receive Inputs, You receive Output
      2. Simple pattern matching
      3. Function calling and Recursion
    4. Control Flow
      1. Branching on a condition
      2. Branching back is called Looping
    5. Pointers
      1. Memory address of my variable
      2. Pointer arithmetic
    6. Arrays ← you are here
      1. Hold my integers
      2. Size of my array
    7. Data Structures
      1. All variables in one place
      2. Example: Stack and Queue
      3. Example: Linked List
  3. Level 2. Where C normies stopped reading
    1. Data Types
      1. More types and their interpretation
      2. Union and Enumerator types
      3. Padding in Structs
    2. Bit Manipulations
      1. Big and Little Endianness
      2. Logical NOT, AND, OR, and more
      3. Arithmetic bit shifting
    3. File I/O
      1. Wait, everything is a file? Always has been!
      2. Beyond STDIN, STDOUT, and STDERR
      3. Creating, Reading, Updating, and Deleting File
    4. Memory Allocation and Deallocation
      1. Stack and Heap
      2. Static allocations on the stack
      3. Dynamic allocations on the heap
    5. Preprocessor Directives
    6. Compilation and Makefile
      1. Compilation process
      2. Header and Source files
      3. External Libraries and Linking
      4. Makefile
    7. Command-line Arguments
      1. Your C program is a function with arguments
      2. Environment variables
  4. Level 3. Becoming a C wizard
    1. Declarations and Type Definitions
      1. My pointer points to a function
      2. That function points to another function
    2. Functions with Variadic Arguments
    3. System calls versus Library calls
      1. User and Kernel modes
      2. Implementing a memory allocator
    4. Parallelism and Concurrency
      1. Multiprocessing
      2. Multithreading with POSIX
    5. Shared Memory
      1. Virtual Memory
      2. Creating, Reading, Updating, and Deleting shared memory
      3. Critical section
    6. Safety in Critical Sections
      1. Race conditions
      2. Mutual exclusion
      3. Semaphores
    7. Signaling
  5. Level 4. One does not simply become a C master
  1. A string is a character array. ↩︎

Leave a Reply

Your email address will not be published. Required fields are marked *