Skip to main content

Lab 2 - Memory Operations. Introduction to GDB

Task: Iterating through an Integer Array

You will solve this exercise starting from the iterate.c file located in the drills/tasks/iterate/support directory.

Here is the given piece of C code:

##include <stdio.h>

int main() {
int v[] = {0xCAFEBABE, 0xDEADBEEF, 0x0B00B135, 0xBAADF00D, 0xDEADC0DE};

return 0;
}

Display the addresses of the elements in the v array along with their values. Iterate through the addresses in v byte by byte, two bytes at a time, and four bytes at a time.

TIP: You can iterate through memory byte by byte starting from a specific address using a pointer of type unsigned char* (since the char type is represented by one byte).

unsigned char *char_ptr = v;

For displaying the address and the value, you can use:

printf("%p -> 0x%x\n", char_ptr, *char_ptr);

If you're having difficulties solving this exercise, go through this reading material.

Task: Deleting the First Occurrence of a Pattern from a String

You will solve this exercise starting from the delete-first.c file located in the drills/tasks/delete-first/support directory.

Given a string and a pattern, implement the delete_first(char *s, char *pattern) function that returns the string obtained by removing the first occurrence of the pattern in s.

NOTE: For s = "Ana are mere" and pattern = "re", the function should return the string "Ana a mere".

IMPORTANT: Warning

char *s = "Ana are mere"; // allocates the string in a read-only memory area (immutable content)
char s[] = "Ana are mere"; // allocates the string in a read-write memory area (modifiable content)

If you're having difficulties solving this exercise, go through this reading material.

Task: Pixels

You will solve this exercise starting from the pixels.c file located in the drills/tasks/pixels/support directory.

Consider the structure of a pixel and an image described in the pixel.h file:

typedef struct Pixel {
unsigned char R;
unsigned char G;
unsigned char B;
} Pixel;

typedef struct Picture {
int height;
int width;
Pixel **pix_array;
} Picture;

Implement the following:

  • The reverse_pic(struct picture *pic) function, which takes a Picture as a parameter and returns the reversed image. By a reversed image, we mean the inversion of the rows of the pix_array matrix in the Picture structure.
  • The color_to_gray(struct picture *pic) function, which takes a Picture as a parameter and returns the new image by converting each pixel to its grayscale value. The grayscale value of a pixel is calculated using the following formula:
p.r = 0.3 * p.r;
p.g = 0.59 * p.g;
p.b = 0.11 * p.b;

IMPORTANT: Accessing the elements of the pixel matrix will be done using pointer operations. Hint: For simplicity, you can use the following macro:

#define GET_PIXEL(a, i ,j) (*(*((a) + (i)) + (j)))

If you're having difficulties solving this exercise, go through this reading material.

Task: Find Maximum in Array

You will solve this exercise starting from the find-max.c file located in the drills/tasks/find-max/support directory.

Implement the following functions:

find_max(void *arr, int n, int element_size, int (*compare)(const void *, const void *))

which calculates the maximum element from an array based on a given comparison function:

compare(const void *a, const void *b)

If you're having difficulties solving this exercise, go through this reading material.

Task: Pointers

You will solve this exercise starting from the pointers.c file located in the drills/tasks/pointers/support directory.

Implement the functions memcpy, strcpy, and strcmp using pointer operations.

If you're having difficulties solving this exercise, go through this reading material.

Task: Data Inspection

You will solve this exercise starting from the inspect.c file located in the drills/tasks/inspect/support directory.

Given the following declarations:

##include <stdio.h>

int main() {
unsigned int a = 4127;
int b = -27714;
short c = 1475;
int v[] = {0xCAFEBABE, 0xDEADBEEF, 0x0B00B135, 0xBAADF00D, 0xDEADC0DE};

unsigned int *int_ptr = (unsigned int *) &v;

for (int i = 0 ; i < sizeof(v) / sizeof(*int_ptr) ; ++i) {
++int_ptr;
}

return 0;
}

Compile the source code and run the executable with GDB. Set a breakpoint at main and observe how the data is represented in memory. For this task, you will use the print and examine commands.

NOTE:

  • To display the value of a variable in hexadecimal, use p/x variable_name
  • To display the value from a pointer, use p *pointer_name, and to inspect the data at a memory address, use x memory_address.

If you're having difficulties solving this exercise, go through this reading material.

Pointers

In the C programming language, memory interaction is achieved through pointers. We remind you that a pointer is a variable that holds a memory address. The general declaration form is as follows: type *variable_name, where type can represent any valid data type in C.

WARNING: The asterisk (*) used in declaring a pointer denotes that it is a pointer and should not be confused with the dereference operator. These are two entirely different concepts represented by the same symbol. Declaring a pointer does not mean allocating a memory area to store data. A pointer is also a data type, whose value is a number representing a memory address. The size of the pointer data type is always the same, regardless of the type of data it points to, and is determined by the architecture and operating system on which the program was compiled (but usually 4 bytes on 32-bit systems and 8 bytes on 64-bit systems).

  int *p = 0xCAFEBABE; /* Declaring a pointer */
int x = *p; /* The value at the address stored in p */

In C, a pointer can represent:

  • The address of data of a certain type
  • The address of a memory area
  • The address of a function
  • The address where data of an unknown type is held (void pointer)

TIP: The size of a pointer depends on the architecture and operating system on which the program was compiled. The size of a pointer is determined by sizeof(void*) and is not necessarily equal to the size of an int.

Pointer Operations and Pointer Arithmetic

Arithmetic operations on pointers are slightly different from those on integer data types. The only valid operations are incrementing or decrementing a pointer, adding or subtracting an integer from a pointer, and subtracting two pointers of the same type. The behavior of these operations is influenced by the data type to which the pointers refer.

When incrementing a pointer related to a data type T, the address is not increased by 1 but by the value sizeof(T), which ensures addressing the next object of the same type. Similarly, adding an integer n to a pointer p (thus the operation p + n) actually represents p + n * sizeof(*p). For example:

char *char_ptr = 1000;
short *short_ptr = 2000;
int *int_ptr = 3000;

++char_ptr; /* char_ptr will point to address 1001 */
++short_ptr; /* short_ptr points to address 2002 */
++int_ptr; /* int_ptr points to address 3004 */

A diagram which visualizes arithmetic operations on pointers

Subtracting two pointers is possible only if both have the same type. The result of the subtraction is obtained by calculating the difference between the memory addresses they point to. For example, calculating the length of a string:

char *s = "Learn IOCLA, you must!";
char *p = s;
for (; *p != 0; ++p); /* Iterating character by character until '\0' */

printf("%ld", p - s); /** It will display 22, the length of the string
* referenced by `s`. */

Interpreting Data in Memory

On most modern computers, the smallest unit of data that can be addressed is the byte (8 bits), meaning that we can view data in memory as a sequence of bytes, each with its own address. As mentioned in the previous lab, when we want to store information represented by multiple bytes, we need to consider the order imposed by the system architecture, called endianness. Below is the mechanism for extracting data from memory on a little-endian architecture:

int n = 0xCAFEBABE;
unsigned char first_byte = *((unsigned char*) &n); /* Extracting the first byte of n */
unsigned char second_byte = *((unsigned char*) &n + 1); /* Extracting the second byte of n */
printf("0x%x, 0x%x\n", first_byte, second_byte); /* It will display 0xBE, 0xBA */

NOTE: For casted pointers, arithmetic operations are performed on the type to which they have been cast.

WARNING: Do not confuse *p++ with (*p)++. In the first case, it increments the address pointed by p, while in the second case, it increments the value at that address. Arithmetic on pointers of type void is not possible due to the lack of a concrete data type they point to.

Pointers to Arrays

There is a very close relationship between pointers and arrays. In C, the name of an array is a constant pointer (its address is allocated by the compiler and cannot be modified during execution) to the first element of the array: v = &v[0]. For example:

int v[10], *p;
p = v;
++p; /* Correct */
++v; /* ERROR */

Arrays are stored in a continuous block of memory, so pointer arithmetic works the same way for arrays as well. Here are some equivalences:

v[0] <==> *v
v[1] <==> *(v + 1)
v[n] <==> *(v + n)
&v[0] <==> v
&v[1] <==> v + 1
&v[n] <==> v + n

Additionally, an array also contains information about its length and the total size occupied in memory, so sizeof(v) will return the space occupied in memory (number of bytes), and sizeof(v) / sizeof(*v) will return the number of elements in v.

Using pointers, we can dynamically allocate memory. In this sense, dynamic allocation of a two-dimensional array (a matrix) can be done as follows:

The traditional method, where we allocate an array of pointers to pointers:

int **array1 = malloc(nrows * sizeof(*array1));
for (i = 0; i < nrows; ++i)
array1[i] = malloc(ncolumns * sizeof(**array1));

If we want to keep the array in a continuous block of memory:

int **array2 = malloc(nrows * sizeof(*array2));
array2[0] = malloc(nrows * ncolumns * sizeof(**array2));
for (i = 1; i < nrows; ++i)
array2[i] = array2[0] + i * ncolumns;

Below is the difference between the two approaches:

A diagram which showcases the fact that the second approach keeps all the elements in a continuous block of memory, while the first fragments the lines in different places in memory

In both cases, the elements of the matrix can be accessed using the indexing operator []: arrayX[i][j]. Also, just like with vectors, we can replace indexing with pointer operations. Thus, arr[i][j] = *(arr + i)[j] = *(*(arr + i) + j).

WARNING: Whenever you allocate memory using a pointer, use p = malloc(n * sizeof(*p)) instead of p = malloc(n * sizeof(int)). Using sizeof(*p) makes the code more robust and self-documenting, so anyone reading the code will see that the correct number of bytes is being allocated without needing to check the data type that p is pointing to.

Structures and Pointers to Structures

Structures are data types in which we can group multiple variables, possibly of different types (unlike arrays, which contain only data of the same type). A structure can be defined as follows:

struct struct_name {
field_declarations
};

For simplifying declarations, we can associate a structure with a data type name: typedef struct {field_declarations} struct_name;

typedef struct student {
char *name;
int year;
float grade;
} Student;

int main() {
Student s;
s.name = (char *) malloc(20 * sizeof(*s.name));
s.year = 3;
return 0;
}

Accessing members of a structure is done using the . operator.

In the case of pointers to structures, accessing members is done by dereferencing the pointers:

Student *s = (Student *) malloc(sizeof(*s));
(*s).year = 3;
/* In practice, to ease writing, the "->" operator is used */
s->year = 4;

The size of a structure is not always equal to the sum of the sizes of its fields. This happens because of padding added by the compiler to ensure proper memory alignment. Padding is added after a structure member followed by another member with a larger size, or at the end of the structure.

struct A {
/* sizeof(int) = 4 */
int x;
/* Padding with 4 bytes */

/* sizeof(double) = 8 */
double z;

/* sizeof(short) = 2 */
short y;
/* Padding with 6 bytes */
};

printf("Size of struct: %zu", sizeof(struct A)) /* Will print 24 */

A diagram visualizing the padding of each structure field, 4 bytes for x, 6 for y, and 0 for z

The red portion represents the padding added by the compiler, and the green parts represent the structure's members.

However, we can prevent the compiler from adding padding by using __attribute__((packed)) when declaring the structure (More details about this in the Computer Communication Protocols course). Thus, for the previous example, the result would be 14.

NOTE: If you declare pointers to structures, don't forget to allocate memory for them before accessing the structure fields. Also, remember to allocate and initialize structure fields that are pointer types before using them. Also, pay attention to how you access structure fields.

Void Pointers

Memory can be seen as an array of bytes, accessible through pointers. By the type of the pointer, the addressed memory area gains a certain interpretation, as discussed above. There are cases where we want to address a zone of this 'array' in a generic way, thus requiring void pointers.

A pointer to void is a pointer that does not have an associated type. Void pointers have a high flexibility because they can point to any type of data, but they also have a limitation in that they cannot be dereferenced, and to be used in pointer operations, they need to be converted to a known data type.

They are most commonly used in the implementation of generic functions. For example, the functions malloc() and calloc() return a pointer to void, allowing these functions to be used for memory allocation for any data type.

An example of using void pointers is as follows:

##include <stdio.h>

void increment(void *data, int element_size) {
/* Check if the data entered is a char */
if (element_size == sizeof(char)) {
/* As mentioned, to be dereferenced,
* a void pointer needs to be cast
*/
char *char_ptr = data;
++(*char_ptr);
}

if (element_size == sizeof(int)) {
int *int_ptr = data;
++(*int_ptr);
}
}

int main() {
char c = 'a';
int x = 10;

increment(&c, sizeof(c));
increment(&x, sizeof(x));

printf("%c, %d\n", c, x); /* Will print: b, 11 */
return 0;
}

NOTE: In C, it is not necessary to explicitly cast the assignment of a pointer of type T with a void* pointer.

Example (Good practice):

int *array = malloc(sizeof(*array) * number_of_elements);

NOT like this:

int *array = (int*) malloc(sizeof(*array) * number_of_elements);

Pointers in Functions and Function Pointers

Within functions, pointers can be used for:

  • Passing results through arguments
  • Passing an address through the function's return
  • Passing other functions and subsequently using them

A function that needs to modify multiple values passed through arguments or that needs to transmit multiple calculated results within the function should use pointer arguments.

##include <stdio.h>

void swap(int *a, int *b) {
int c = *a;
*a = *b;
*b = c;
}

int main() {
int a = 3, b = 5;
swap(&a, &b);

printf("a = %d, b = %d\n", a, b); /* Will print a = 5, b = 3 */

return 0;
}

A function can return a pointer, but this pointer cannot contain the address of a local variable. Most of the time, the result is one of the arguments, possibly modified within the function. For example:

char* toUpper(char *s) {
/* Takes a string and returns the string in uppercase */
for (int i = 0 ; s[i] ; ++i) {
if (s[i] >= 'a' && s[i] <= 'z') {
s[i] -= 32;
}
}

return s;
}

If a function returns the address of a local variable, it must be static. The lifetime of a local variable ends when the execution of the function in which it was defined ends, and therefore the address of such a variable should not be passed outside the function.

The name of a function represents the memory address at which the function begins. A function pointer is a variable that stores the address of a function that can be called later through that pointer. Usually, function pointers are used to pass a function as a parameter to another function.

The declaration of a function pointer is done as follows: type (*pf) (formal_parameter_list)

Why is it necessary to use extra parentheses? Without them, we would be talking about a function that returns a pointer. Below are two examples of using function pointers:

int add(int a, int b) {
return a + b;
}

int subtract(int a, int b) {
return a - b;
}

int operation(int x, int y, int (*func) (int, int)) {
return func(x, y);
}

int main() {
int (*minus)(int, int) = subtract;
printf("%d", operation(10, 5, minus)); /* Will print 5 */

return 0;
}

The qsort() function from stdlib.h uses a function pointer as a comparator.

int compare(const void *a, const void *b) {
return *(int *) a - *(int *)b;
}

int main() {
int v[] = {100, 5, 325, 1, 30};
int size = sizeof(v) / sizeof(*v);

qsort(v, size, sizeof(*v), compare);
for (int i = 0 ; i < size ; ++i) {
printf("%d ", v[i]);
}

return 0;
}

Guide: Array vs. Pointer

To follow this guide, you'll need to use the array_vs_pointer.c file located in the guides/array_vs_pointer/support directory.

Compile and run the source from the skeleton.

The program simply declares an array chars and a char pointer, we'll try to understand the difference between the two.

We can observe the fact that even though both of them point to the same sequence of characters, the sizeof operator returns different values: the number of bytes needed for the array (13), while for the pointer, it simply returns its size as a data type (4/8 on most systems).

sizeof(v): 13
sizeof(p): 8

We've previously learned that an array is also technically a pointer to the first element of the array, so why would it be in any way different? This behaviour is a consequence that comes from the fact that the value of the pointer which represents the array is constant and cannot be changed. This means that we can determine the size of the array at compile time since it is not possible to make it point to a different memory location, but for a regular pointer like the one declared in the example, the address which it points to can be changed at runtime, so it will not always points to an array of the same size and we cannot even determine if it will point to an array at all (it could point to a single variable for example).

The second difference appears when attempting to change the value of one of the characters in the sequence, it seems that we can't do it using the pointer, while we can do it using the array. This is a consequence of the fact that the pointer points to read-only memory (the string literal, which we'll later learn is stored in a memory area called .rodata), while the array points to its own allocated memory, which is writable.

GNU Debugger (GDB)

Starting GDB

GDB is a powerful tool for debugging programs. It allows you to inspect the state of a program at a certain point in its execution, set breakpoints, and step through the code, among other things. To start GDB, you need to run the following command:

gdb [program_name]

Running the Program

To run the program being debugged, there are two available commands:

  • r or run - this command will run the program
  • start - unlike run, this command will start the program but immediately stop after entering main, it is equivalent to setting a breakpoint at main and then running the program

Breakpoints

The essential element of GDB is the breakpoint. Essentially, setting a breakpoint at a certain instruction causes the program's execution to halt every time it reaches that point. Setting a breakpoint is done with the following command:

break [location]

or in short form:

b [location]

where location can represent the name of a function, the line number of the code, or even a memory address, in which case the address must be preceded by the symbol *. For example: break \*0xCAFEBABE

Stepping through instructions

  • si or stepi - executes the current instruction
  • ni or nexti - similar to stepi, but if the current instruction is a function call, the debugger will not enter the function
  • c or continue - continues program execution until the next breakpoint or until it finishes
  • finish - continues program execution until leaving the current function

Inspecting Memory

  • p or print var - displays the value of var. Print is a very flexible command, allowing dereferencing of pointers, displaying addresses of variables, and indexing through arrays using *, & and []. The print command can be followed by the /f parameter specifying the display format (x for hex, d for decimal, s for string).
  • x or examine - Inspects the content at the given address. The usage of this command is as follows:
x/nfu address

where:

  • n is the number of displayed elements
  • f is the display format (x for hex, d for decimal, s for string, and i for instructions)
  • u is the size of each element (b for 1 byte, h for 2, w for 4, and g for 8 bytes)

We recommend the article Debugging for further understanding of how to use GDB both in the CLI and through an IDE.

pwndbg

pwndbg is a GDB plugin that provides a number of useful features for debugging and exploiting binaries. It makes GDB easier to use and infinitely more powerful. It will become more useful as we progress through the lab sessions.

Cheatsheet gdb + pwndbg; pwndbg features

pwndbg> show context-sections
'regs disasm code ghidra stack backtrace expressions'
## for smaller terminals
pwndbg> set context-sections 'regs code stack'
## display memory area in hex + ASCII
pwndbg> hexdump $ecx
## display stack
pwndbg> stack
## permanently display memory dump of 8 bytes
pwndbg> ctx-watch execute "x/8xb &msg"

## recommended settings in .gdbinit
set context-sections 'regs code expressions'
set show-flags on
set dereference-limit 1

Guide: GDB Tutorial: Debugging a Segfault

To follow this guide, you'll need to use the segfault.c file located in the guides/segfault/support directory.

Compile and run the source code from the skeleton (if you are not using the Makefile, make sure to compile with the -g flag). In short, the program takes a number n, allocates a vector of size n, and initializes it with the first n numbers from the Fibonacci sequence. However, after running the program, you see: Segmentation fault (core dumped).

Start GDB with the executable:

gdb ./segfault

Once you have started GDB, all interaction happens through the GDB prompt. Run the program using the run command. What do you notice? GDB hangs at the input reads.

Set a breakpoint at main using the break main command. You will see the message in the prompt:

Breakpoint 1 at 0x7d3: file seg.c, line 21 /* The memory address should not be the same */

Next, we will step through the instructions one by one. To do this, use the next or n command (watch the GDB cursor to see the current instruction and repeat the process). You will notice that GDB hangs at scanf, so input a value for n and continue stepping through. If you have entered a large value for n and want to skip the iteration, use the continue command. Eventually, you will reach the line v[423433] = 3;, and GDB will display:

Program received signal SIGSEGV, Segmentation fault

Inspect the memory at v[423433] using x &v[423433] and you will receive the message:

Cannot access memory at address 0x5555558f3e94 /* The memory address should not be the same */

What happened? We accessed a memory area with restricted access.