A Student’s C Book: 1.7. Data Structures

Level 1. Introduction to C

1.7. Data Structures

All variables in one place

Being able to represent integers, floating-point numbers, characters, and pointers (i.e., memory addresses) is cool. But what if you wanted to represent more complex and/or abstract objects? To illustrate it with a practical example, let’s suppose that we want to represent a person by storing his/her age, gender, heigh, weight, and full name. As a side note, we already know that we can represent age and height as a couple of unsigned integers, gender as a single character, weight as a floating-point number, and full name as a string (i.e., character array). So, the following code snipped seems like a good fit.

unsigned int age, height;
char gender;
float weight;
char name[100];

It would be fair to use the variable declarations shown above if you wanted to represent a single person. What if you needed to deal with 50 different people’s data? You may say that we can just use arrays for each variable whose indices refer to different people. This can be written as shown below.

unsigned int age[50], height[50];
char gender[50];
float weight[50];
char names[50][100];

This would work well technically. However, it does not seem very readable or robust enough for future code modifications. I mean, the fact that we don’t have a single variable named person that holds all of the necessary information together is what makes it harder to work with. For example, let’s consider passing the person’s data to another function. With what we currently have in our hands, we would need to do something like this:

void func(unsigned int age, char gender, unsigned int height, float weight, char* name);

If you decided to remove the age field or add another field to the person object later on, you would need to modify the removed/added fields in all the places that used to use these fields (e.g., functions taking all the person’s data as arguments). This is just one of the dozens of problems with the current approach that we have taken with the representation of a person object. In order to avoid these annoying and error-prone issues, it would be better to have the ability to create one’s own custom data type named person.

It is indeed possible to create your own data type that can be used to represent more complex abstract objects that may have many internal details (such as the person object having age, gender, and so on). For this, we use data structures! The below is given the generic template to create your own data structure or a custom data type in other words:

struct TAG {
    TYPE VARIABLE;
    // 2nd variable with any type
    // 3rd variable with any type
    // ...
};

With the help of a custom data structure, we can now easily represent the person abstraction in C as shown below.

struct Person {
    unsigned int age, height;
    char gender;
    float weight;
    char name[100];
};

You can initialize an object of any custom type declared as a struct by initializing each of its fields separately. One simple way of doing it is as follows:

int main(){
    struct Person person;
    person.age = 13;
    person.height = 150;
    person.gender = 'm';
    person.weight = 65.7;
    strcpy(person.name, "Bob Marley");
    return 0;
}

There are certain data structures that are well-known because of how useful they have been in solving many real-world problems. Now, we are going to take a look at a couple of them and see how they can be implemented in C.

Example: Stack

Stack is a well-known data structure that represents a last-in-first-out (LIFO) operation on a blob of data. How LIFO stack works is similar to how you put dishes on top of each other in the kitchen as you wash them and how you always take the top dish when you need to use a new dish. The similarity between the two is that the last dish you put on top of the stack of clean dishes is the first one you are going to take when you need it, and likewise, you put elements on top of each other in the stack, and you can only take out the top one when you need to. In more technical terms, putting an element on top of the stack is called pushing into the stack, and taking the top element out is called popping from the stack. Let’s define the stack data structure now.

struct Stack{
    int nums[100];
    unsigned int size;
};

The data structure shown above represents a stack that can be used to push/pop integers into/from the nums array. Moreover, it has a certain predefined capacity, i.e., holding up to 100 integers. The stack has two types of essential operations as mentioned previously: pushing an element to the top and popping an element from the top. These operations can be implemented as C functions as shown below.

void stack_push(struct Stack* stk, int num){
    if(stk->size < 100){
        stk->nums[stk->size] = num;
        stk->size += 1;
    }
}

void stack_pop(struct Stack* stk){
    if(stk->size > 0){
        stk->size -= 1;
    }
}

stack_push() function pushes an integer (the second argument num) into the stack (the first argument stk). The elements of the stack are stored in an array named nums, where the index 0 denotes the bottom element in the stack’s array; therefore, the index size-1 denotes the element on the top. The top index (stk->size-1) is used to store an integer val, and then the size is incremented by one because the stack now has one new element. However, an element is pushed if and only if there is an empty space in the stack’s array, i.e., less than 100 elements already present in the stack’s array.

stack_pop() function removes an element from the top of the stack, i.e., removing an element whose index is stk->size-1. To make sure that the top element is removed from the stack, we just decrement the stack size by 1; so, the new top element is now pointed by the index old stack size minus two. It is also worth mentioning that an element can be removed from the stack if and only if there is at least one element already in it, and therefore, we have to check if stk->size > 0.

One last thing is to pay attention to the first argument of the stack functions being a stack pointer, not the stack object itself. If we passed the stack object itself, it would be copied to another memory location in your computer, and hence, making any modifications to the copied version would have no effect on the original stack object. That’s why we need to pass the memory address of the original stack to these functions. While it is true that with the current implementation, the memory addresses of the original stack object are getting copied into other locations in the memory, dereferencing the copied pointer values gives us the original stack object since the copied values are the original stack’s memory address.

Now, we can use our stack data structure as shown below. Do not forget to initialize your stack with the correct size information before using it.

#include <stdio.h>

struct Stack{
    int nums[100];
    unsigned int size;
};

int main(){
    struct Stack stack;
    stack.size = 0;      // you MUST initialize the stack size to 0 before using it
    
    stack_push(&stack, 3);    // stack now has 1 number:  [3]
    stack_push(&stack, 2);    // stack now has 2 numbers: [3, 2]
    stack_push(&stack, 92);   // stack now has 3 numbers: [3, 2, 92]
    
    printf("stack size: %u\n", stack.size);
    for(unsigned int i=0; i<stack.size; i+=1){
        printf("index %u: %d\n", i, stack.nums[i]);
    }

    stack_pop(&stack);    // stack now has 2 numbers: [3, 2]
    stack_pop(&stack);    // stack now has 1 number:  [3]

    printf("stack size: %u\n", stack.size);
    for(unsigned int i=0; i<stack.size; i+=1){
        printf("index %u: %d\n", i, stack.nums[i]);
    }

    stack_pop(&stk);    // stack now has no element: []
    stack_pop(&stk);    // no change has been made since stk.size = 0
    return 0;
}

Example: Naive Queue

Queue is another well-known data structure that works similarly to the stack in the sense that it is a container used to hold elements. Unlike the stack, the queue follows first-in-first-out (FIFO) logic. As LIFO is analogous to the dish-washing/using process, FIFO is analogous to waiting in line to make a purchase in the store. For simplicity, we assume that whoever enters the store first is also the first one on the line making a purchase. This is what a queue is. You push elements to the back of the queue, and you pop elements from the front. Let’s see how a naive queue data structure can be defined.

struct Queue{
    int nums[100];
    unsigned int size;
};

The Queue data structure shown above can be used to hold integers. Here, index 0 denotes the front of the queue and the last index size-1 denotes the front. As already mentioned, the two most important operations on queues are pushing an element to the back of the queue’s array and popping an element from the front of the queue’s array. Let’s see how these functions can be implemented in C.

void queue_push(struct Queue* que, int num){
    if(que->size < 100){
        que->nums[que->size] = num;
        que->nums += 1;
    }
}

void queue_pop(struct Queue* que){
    if(que->size > 0){
        for(unsigned int i=1; i<que->size; i+=1){
            que->nums[i-1] = que->nums[i];
        }
        que->size -= 1;
    }
}

Now, we can use our simple queue data structure in our C program. Do not forget to initialize the size of the queue before starting to push/pop numbers. Otherwise, your program may behave buggy, resulting in unexpected and weird outputs.

#include <stdio.h>

struct Queue{
    int nums[100];
    unsigned int size;
};

void queue_push(struct Queue* que, int num){
}

void queue_pop(struct Queue* que){
}

int main(){
    struct Queue que;
    que.size = 0;        // always initialize size to 0 before using the queue

    queue_push(&que, 30);  // que now has 1 element:  [30]
    queue_push(&que, 20);  // que now has 2 elements: [30, 20]
    queue_push(&que, 97);  // que now has 3 elements: [30, 20, 97]

    printf("queue size: %u\n", que.size);
    for(unsigned int i=0; i<que.size; i+=1){
        printf("index %u: %d\n", i, que.nums[i]);
    }
    
    queue_pop(&que);  // que now has 2 elements: [20, 97]
    queue_pop(&que);  // que now has 1 element:  [97]

    printf("queue size: %u\n", que.size);
    for(unsigned int i=0; i<que.size; i+=1){
        printf("index %u: %d\n", i, que.nums[i]);
    }

    queue_pop(&que);  // que now has no element: []
    queue_pop(&que);  // no change has been made since (que.size > 0) is false
    return 0;
}

Table of Contents

  1. Preface
  2. Level 1. Introduction to C
    1. Hello, World!
    2. Basics
      1. Your computer can memorize things
      2. Your computer can “talk” and “listen”
      3. Compiling and Running programs
    3. Functions
      1. I receive Inputs, You receive Output
      2. Simple pattern matching
      3. Function calling and Recursion
    4. Control Flow
      1. Branching on a condition
      2. Branching back is called Looping
    5. Pointers
      1. Memory address of my variable
      2. Pointer arithmetic
    6. Arrays
      1. Hold my integers
      2. Size of my array
    7. Data Structures ← you are here
      1. All variables in one place
      2. Example: Stack
      3. Example: Naive Queue
  3. Level 2. Where C normies stopped reading
    1. Data Types
      1. More types and their interpretation
      2. Union and Enumerator types
      3. Padding in Structs
    2. Bit Manipulations
      1. Big and Little Endianness
      2. Logical NOT, AND, OR, and more
      3. Arithmetic bit shifting
    3. File I/O
      1. Wait, everything is a file? Always has been!
      2. Beyond STDIN, STDOUT, and STDERR
      3. Creating, Reading, Updating, and Deleting File
    4. Memory Allocation and Deallocation
      1. Stack and Heap
      2. Static allocations on the stack
      3. Dynamic allocations on the heap
    5. Preprocessor Directives
    6. Compilation and Makefile
      1. Compilation process
      2. Header and Source files
      3. External Libraries and Linking
      4. Makefile
    7. Command-line Arguments
      1. Your C program is a function with arguments
      2. Environment variables
  4. Level 3. Becoming a C wizard
    1. Declarations and Type Definitions
      1. My pointer points to a function
      2. That function points to another function
    2. Functions with Variadic Arguments
    3. System calls versus Library calls
      1. User and Kernel modes
      2. Implementing a memory allocator
    4. Parallelism and Concurrency
      1. Multiprocessing
      2. Multithreading with POSIX
    5. Shared Memory
      1. Virtual Memory
      2. Creating, Reading, Updating, and Deleting shared memory
      3. Critical section
    6. Safety in Critical Sections
      1. Race conditions
      2. Mutual exclusion
      3. Semaphores
    7. Signaling
  5. Level 4. One does not simply become a C master

Leave a Reply

Your email address will not be published. Required fields are marked *