Lab 2 - Memory Operations. Introduction to GDB
Task: Iterating through an Integer Array
You will solve this exercise starting from the iterate.c
file located in the drills/tasks/iterate/support
directory.
Here is the given piece of C code:
##include <stdio.h>
int main() {
int v[] = {0xCAFEBABE, 0xDEADBEEF, 0x0B00B135, 0xBAADF00D, 0xDEADC0DE};
return 0;
}
Display the addresses of the elements in the v
array along with their values.
Iterate through the addresses in v
byte by byte, two bytes at a time, and four bytes at a time.
TIP: You can iterate through memory byte by byte starting from a specific address using a pointer of type
unsigned char*
(since thechar
type is represented by one byte).unsigned char *char_ptr = v;
For displaying the address and the value, you can use:
printf("%p -> 0x%x\n", char_ptr, *char_ptr);
If you're having difficulties solving this exercise, go through this reading material.
Task: Deleting the First Occurrence of a Pattern from a String
You will solve this exercise starting from the delete-first.c
file located in the drills/tasks/delete-first/support
directory.
Given a string and a pattern, implement the delete_first(char *s, char *pattern)
function that returns the string obtained by removing the first occurrence of the pattern in s
.
NOTE: For
s = "Ana are mere"
andpattern = "re"
, the function should return the string "Ana a mere".IMPORTANT: Warning
char *s = "Ana are mere"; // allocates the string in a read-only memory area (immutable content)
char s[] = "Ana are mere"; // allocates the string in a read-write memory area (modifiable content)
If you're having difficulties solving this exercise, go through this reading material.
Task: Pixels
You will solve this exercise starting from the pixels.c
file located in the drills/tasks/pixels/support
directory.
Consider the structure of a pixel and an image described in the pixel.h
file:
typedef struct Pixel {
unsigned char R;
unsigned char G;
unsigned char B;
} Pixel;
typedef struct Picture {
int height;
int width;
Pixel **pix_array;
} Picture;
Implement the following:
- The
reverse_pic(struct picture *pic)
function, which takes a Picture as a parameter and returns the reversed image. By a reversed image, we mean the inversion of the rows of thepix_array
matrix in the Picture structure. - The
color_to_gray(struct picture *pic)
function, which takes a Picture as a parameter and returns the new image by converting each pixel to its grayscale value. The grayscale value of a pixel is calculated using the following formula:
p.r = 0.3 * p.r;
p.g = 0.59 * p.g;
p.b = 0.11 * p.b;
IMPORTANT: Accessing the elements of the pixel matrix will be done using pointer operations. Hint: For simplicity, you can use the following macro:
#define GET_PIXEL(a, i ,j) (*(*((a) + (i)) + (j)))
If you're having difficulties solving this exercise, go through this reading material.
Task: Find Maximum in Array
You will solve this exercise starting from the find-max.c
file located in the drills/tasks/find-max/support
directory.
Implement the following functions:
find_max(void *arr, int n, int element_size, int (*compare)(const void *, const void *))
which calculates the maximum element from an array based on a given comparison function:
compare(const void *a, const void *b)
If you're having difficulties solving this exercise, go through this reading material.
Task: Pointers
You will solve this exercise starting from the pointers.c
file located in the drills/tasks/pointers/support
directory.
Implement the functions memcpy, strcpy, and strcmp using pointer operations.
If you're having difficulties solving this exercise, go through this reading material.
Task: Data Inspection
You will solve this exercise starting from the inspect.c
file located in the drills/tasks/inspect/support
directory.
Given the following declarations:
##include <stdio.h>
int main() {
unsigned int a = 4127;
int b = -27714;
short c = 1475;
int v[] = {0xCAFEBABE, 0xDEADBEEF, 0x0B00B135, 0xBAADF00D, 0xDEADC0DE};
unsigned int *int_ptr = (unsigned int *) &v;
for (int i = 0 ; i < sizeof(v) / sizeof(*int_ptr) ; ++i) {
++int_ptr;
}
return 0;
}
Compile the source code and run the executable with GDB.
Set a breakpoint at main
and observe how the data is represented in memory.
For this task, you will use the print
and examine
commands.
NOTE:
- To display the value of a variable in hexadecimal, use
p/x variable_name
- To display the value from a pointer, use
p *pointer_name
, and to inspect the data at a memory address, usex memory_address
.
If you're having difficulties solving this exercise, go through this reading material.
Pointers
In the C programming language, memory interaction is achieved through pointers.
We remind you that a pointer is a variable that holds a memory address.
The general declaration form is as follows: type *variable_name
, where type
can represent any valid data type in C.
WARNING: The asterisk (
*
) used in declaring a pointer denotes that it is a pointer and should not be confused with the dereference operator. These are two entirely different concepts represented by the same symbol. Declaring a pointer does not mean allocating a memory area to store data. A pointer is also a data type, whose value is a number representing a memory address. The size of the pointer data type is always the same, regardless of the type of data it points to, and is determined by the architecture and operating system on which the program was compiled (but usually 4 bytes on 32-bit systems and 8 bytes on 64-bit systems).int *p = 0xCAFEBABE; /* Declaring a pointer */
int x = *p; /* The value at the address stored in p */
In C, a pointer can represent:
- The address of data of a certain type
- The address of a memory area
- The address of a function
- The address where data of an unknown type is held (void pointer)
TIP: The size of a pointer depends on the architecture and operating system on which the program was compiled. The size of a pointer is determined by
sizeof(void*)
and is not necessarily equal to the size of anint
.
Pointer Operations and Pointer Arithmetic
Arithmetic operations on pointers are slightly different from those on integer data types. The only valid operations are incrementing or decrementing a pointer, adding or subtracting an integer from a pointer, and subtracting two pointers of the same type. The behavior of these operations is influenced by the data type to which the pointers refer.
When incrementing a pointer related to a data type T
, the address is not increased by 1 but by the value sizeof(T)
, which ensures addressing the next object of the same type.
Similarly, adding an integer n
to a pointer p
(thus the operation p + n
) actually represents p + n * sizeof(*p)
.
For example:
char *char_ptr = 1000;
short *short_ptr = 2000;
int *int_ptr = 3000;
++char_ptr; /* char_ptr will point to address 1001 */
++short_ptr; /* short_ptr points to address 2002 */
++int_ptr; /* int_ptr points to address 3004 */
Subtracting two pointers is possible only if both have the same type. The result of the subtraction is obtained by calculating the difference between the memory addresses they point to. For example, calculating the length of a string:
char *s = "Learn IOCLA, you must!";
char *p = s;
for (; *p != 0; ++p); /* Iterating character by character until '\0' */
printf("%ld", p - s); /** It will display 22, the length of the string
* referenced by `s`. */
Interpreting Data in Memory
On most modern computers, the smallest unit of data that can be addressed is the byte
(8 bits), meaning that we can view data in memory as a sequence of bytes, each with its own address.
As mentioned in the previous lab, when we want to store information represented by multiple bytes, we need to consider the order imposed by the system architecture, called endianness.
Below is the mechanism for extracting data from memory on a little-endian architecture:
int n = 0xCAFEBABE;
unsigned char first_byte = *((unsigned char*) &n); /* Extracting the first byte of n */
unsigned char second_byte = *((unsigned char*) &n + 1); /* Extracting the second byte of n */
printf("0x%x, 0x%x\n", first_byte, second_byte); /* It will display 0xBE, 0xBA */
NOTE: For casted pointers, arithmetic operations are performed on the type to which they have been cast.
WARNING: Do not confuse
*p++
with(*p)++
. In the first case, it increments the address pointed byp
, while in the second case, it increments the value at that address. Arithmetic on pointers of typevoid
is not possible due to the lack of a concrete data type they point to.
Pointers to Arrays
There is a very close relationship between pointers and arrays.
In C, the name of an array is a constant pointer (its address is allocated by the compiler and cannot be modified during execution) to the first element of the array: v = &v[0]
.
For example:
int v[10], *p;
p = v;
++p; /* Correct */
++v; /* ERROR */
Arrays are stored in a continuous block of memory, so pointer arithmetic works the same way for arrays as well. Here are some equivalences:
v[0] <==> *v
v[1] <==> *(v + 1)
v[n] <==> *(v + n)
&v[0] <==> v
&v[1] <==> v + 1
&v[n] <==> v + n
Additionally, an array also contains information about its length and the total size occupied in memory, so sizeof(v)
will return the space occupied in memory (number of bytes), and sizeof(v) / sizeof(*v)
will return the number of elements in v
.
Using pointers, we can dynamically allocate memory. In this sense, dynamic allocation of a two-dimensional array (a matrix) can be done as follows:
The traditional method, where we allocate an array of pointers to pointers:
int **array1 = malloc(nrows * sizeof(*array1));
for (i = 0; i < nrows; ++i)
array1[i] = malloc(ncolumns * sizeof(**array1));
If we want to keep the array in a continuous block of memory:
int **array2 = malloc(nrows * sizeof(*array2));
array2[0] = malloc(nrows * ncolumns * sizeof(**array2));
for (i = 1; i < nrows; ++i)
array2[i] = array2[0] + i * ncolumns;
Below is the difference between the two approaches:
In both cases, the elements of the matrix can be accessed using the indexing operator []
: arrayX[i][j]
.
Also, just like with vectors, we can replace indexing with pointer operations.
Thus, arr[i][j] = *(arr + i)[j] = *(*(arr + i) + j)
.
WARNING: Whenever you allocate memory using a pointer, use
p = malloc(n * sizeof(*p))
instead ofp = malloc(n * sizeof(int))
. Usingsizeof(*p)
makes the code more robust and self-documenting, so anyone reading the code will see that the correct number of bytes is being allocated without needing to check the data type thatp
is pointing to.
Structures and Pointers to Structures
Structures are data types in which we can group multiple variables, possibly of different types (unlike arrays, which contain only data of the same type). A structure can be defined as follows:
struct struct_name {
field_declarations
};
For simplifying declarations, we can associate a structure with a data type name: typedef struct {field_declarations} struct_name;
typedef struct student {
char *name;
int year;
float grade;
} Student;
int main() {
Student s;
s.name = (char *) malloc(20 * sizeof(*s.name));
s.year = 3;
return 0;
}
Accessing members of a structure is done using the .
operator.
In the case of pointers to structures, accessing members is done by dereferencing the pointers:
Student *s = (Student *) malloc(sizeof(*s));
(*s).year = 3;
/* In practice, to ease writing, the "->" operator is used */
s->year = 4;
The size of a structure is not always equal to the sum of the sizes of its fields. This happens because of padding added by the compiler to ensure proper memory alignment. Padding is added after a structure member followed by another member with a larger size, or at the end of the structure.
struct A {
/* sizeof(int) = 4 */
int x;
/* Padding with 4 bytes */
/* sizeof(double) = 8 */
double z;
/* sizeof(short) = 2 */
short y;
/* Padding with 6 bytes */
};
printf("Size of struct: %zu", sizeof(struct A)) /* Will print 24 */
The red portion represents the padding added by the compiler, and the green parts represent the structure's members.
However, we can prevent the compiler from adding padding by using __attribute__((packed))
when declaring the structure (More details about this in the Computer Communication Protocols course).
Thus, for the previous example, the result would be 14.
NOTE: If you declare pointers to structures, don't forget to allocate memory for them before accessing the structure fields. Also, remember to allocate and initialize structure fields that are pointer types before using them. Also, pay attention to how you access structure fields.
Void Pointers
Memory can be seen as an array of bytes, accessible through pointers. By the type of the pointer, the addressed memory area gains a certain interpretation, as discussed above. There are cases where we want to address a zone of this 'array' in a generic way, thus requiring void pointers.
A pointer to void
is a pointer that does not have an associated type.
Void pointers have a high flexibility because they can point to any type of data, but they also have a limitation in that they cannot be dereferenced, and to be used in pointer operations, they need to be converted to a known data type.
They are most commonly used in the implementation of generic functions.
For example, the functions malloc()
and calloc()
return a pointer to void, allowing these functions to be used for memory allocation for any data type.
An example of using void pointers is as follows:
##include <stdio.h>
void increment(void *data, int element_size) {
/* Check if the data entered is a char */
if (element_size == sizeof(char)) {
/* As mentioned, to be dereferenced,
* a void pointer needs to be cast
*/
char *char_ptr = data;
++(*char_ptr);
}
if (element_size == sizeof(int)) {
int *int_ptr = data;
++(*int_ptr);
}
}
int main() {
char c = 'a';
int x = 10;
increment(&c, sizeof(c));
increment(&x, sizeof(x));
printf("%c, %d\n", c, x); /* Will print: b, 11 */
return 0;
}
NOTE: In
C
, it is not necessary to explicitly cast the assignment of a pointer of type T with avoid*
pointer.Example (Good practice):
int *array = malloc(sizeof(*array) * number_of_elements);
NOT like this:
int *array = (int*) malloc(sizeof(*array) * number_of_elements);
Pointers in Functions and Function Pointers
Within functions, pointers can be used for:
- Passing results through arguments
- Passing an address through the function's return
- Passing other functions and subsequently using them
A function that needs to modify multiple values passed through arguments or that needs to transmit multiple calculated results within the function should use pointer arguments.
##include <stdio.h>
void swap(int *a, int *b) {
int c = *a;
*a = *b;
*b = c;
}
int main() {
int a = 3, b = 5;
swap(&a, &b);
printf("a = %d, b = %d\n", a, b); /* Will print a = 5, b = 3 */
return 0;
}
A function can return a pointer, but this pointer cannot contain the address of a local variable. Most of the time, the result is one of the arguments, possibly modified within the function. For example:
char* toUpper(char *s) {
/* Takes a string and returns the string in uppercase */
for (int i = 0 ; s[i] ; ++i) {
if (s[i] >= 'a' && s[i] <= 'z') {
s[i] -= 32;
}
}
return s;
}
If a function returns the address of a local variable, it must be static. The lifetime of a local variable ends when the execution of the function in which it was defined ends, and therefore the address of such a variable should not be passed outside the function.
The name of a function represents the memory address at which the function begins. A function pointer is a variable that stores the address of a function that can be called later through that pointer. Usually, function pointers are used to pass a function as a parameter to another function.
The declaration of a function pointer is done as follows: type (*pf) (formal_parameter_list)
Why is it necessary to use extra parentheses? Without them, we would be talking about a function that returns a pointer. Below are two examples of using function pointers:
int add(int a, int b) {
return a + b;
}
int subtract(int a, int b) {
return a - b;
}
int operation(int x, int y, int (*func) (int, int)) {
return func(x, y);
}
int main() {
int (*minus)(int, int) = subtract;
printf("%d", operation(10, 5, minus)); /* Will print 5 */
return 0;
}
The qsort() function from stdlib.h
uses a function pointer as a comparator.
int compare(const void *a, const void *b) {
return *(int *) a - *(int *)b;
}
int main() {
int v[] = {100, 5, 325, 1, 30};
int size = sizeof(v) / sizeof(*v);
qsort(v, size, sizeof(*v), compare);
for (int i = 0 ; i < size ; ++i) {
printf("%d ", v[i]);
}
return 0;
}
Guide: Array vs. Pointer
To follow this guide, you'll need to use the array_vs_pointer.c
file located in the guides/array_vs_pointer/support
directory.
Compile and run the source from the skeleton.
The program simply declares an array chars and a char pointer, we'll try to understand the difference between the two.
We can observe the fact that even though both of them point to the same sequence of characters, the sizeof operator returns different values: the number of bytes needed for the array (13), while for the pointer, it simply returns its size as a data type (4/8 on most systems).
sizeof(v): 13
sizeof(p): 8
We've previously learned that an array is also technically a pointer to the first element of the array, so why would it be in any way different? This behaviour is a consequence that comes from the fact that the value of the pointer which represents the array is constant and cannot be changed. This means that we can determine the size of the array at compile time since it is not possible to make it point to a different memory location, but for a regular pointer like the one declared in the example, the address which it points to can be changed at runtime, so it will not always points to an array of the same size and we cannot even determine if it will point to an array at all (it could point to a single variable for example).
The second difference appears when attempting to change the value of one of the characters in the sequence, it seems that we can't do it using the pointer, while we can do it using the array.
This is a consequence of the fact that the pointer points to read-only memory (the string literal, which we'll later learn is stored in a memory area called .rodata
), while the array points to its own allocated memory, which is writable.
GNU Debugger (GDB)
Starting GDB
GDB is a powerful tool for debugging programs. It allows you to inspect the state of a program at a certain point in its execution, set breakpoints, and step through the code, among other things. To start GDB, you need to run the following command:
gdb [program_name]
Running the Program
To run the program being debugged, there are two available commands:
r
orrun
- this command will run the programstart
- unlikerun
, this command will start the program but immediately stop after enteringmain
, it is equivalent to setting a breakpoint atmain
and then running the program
Breakpoints
The essential element of GDB is the breakpoint. Essentially, setting a breakpoint at a certain instruction causes the program's execution to halt every time it reaches that point. Setting a breakpoint is done with the following command:
break [location]
or in short form:
b [location]
where location can represent the name of a function, the line number of the code, or even a memory address, in which case the address must be preceded by the symbol *.
For example: break \*0xCAFEBABE
Stepping through instructions
si
orstepi
- executes the current instructionni
ornexti
- similar tostepi
, but if the current instruction is a function call, the debugger will not enter the functionc
orcontinue
- continues program execution until the next breakpoint or until it finishesfinish
- continues program execution until leaving the current function
Inspecting Memory
p
orprint
var - displays the value ofvar
. Print is a very flexible command, allowing dereferencing of pointers, displaying addresses of variables, and indexing through arrays using *, & and []. The print command can be followed by the /f parameter specifying the display format (x for hex, d for decimal, s for string).x
orexamine
- Inspects the content at the given address. The usage of this command is as follows:
x/nfu address
where:
- n is the number of displayed elements
- f is the display format (x for hex, d for decimal, s for string, and i for instructions)
- u is the size of each element (b for 1 byte, h for 2, w for 4, and g for 8 bytes)
We recommend the article Debugging for further understanding of how to use GDB both in the CLI and through an IDE.
pwndbg
pwndbg is a GDB plugin that provides a number of useful features for debugging and exploiting binaries. It makes GDB easier to use and infinitely more powerful. It will become more useful as we progress through the lab sessions.
Cheatsheet gdb + pwndbg; pwndbg features
pwndbg> show context-sections
'regs disasm code ghidra stack backtrace expressions'
## for smaller terminals
pwndbg> set context-sections 'regs code stack'
## display memory area in hex + ASCII
pwndbg> hexdump $ecx
## display stack
pwndbg> stack
## permanently display memory dump of 8 bytes
pwndbg> ctx-watch execute "x/8xb &msg"
## recommended settings in .gdbinit
set context-sections 'regs code expressions'
set show-flags on
set dereference-limit 1
Guide: GDB Tutorial: Debugging a Segfault
To follow this guide, you'll need to use the segfault.c
file located in the guides/segfault/support
directory.
Compile and run the source code from the skeleton (if you are not using the Makefile, make sure to compile with the -g flag). In short, the program takes a number n, allocates a vector of size n, and initializes it with the first n numbers from the Fibonacci sequence. However, after running the program, you see: Segmentation fault (core dumped).
Start GDB with the executable:
gdb ./segfault
Once you have started GDB, all interaction happens through the GDB prompt.
Run the program using the run
command.
What do you notice?
GDB hangs at the input reads.
Set a breakpoint at main
using the break main
command.
You will see the message in the prompt:
Breakpoint 1 at 0x7d3: file seg.c, line 21 /* The memory address should not be the same */
Next, we will step through the instructions one by one.
To do this, use the next
or n
command (watch the GDB cursor to see the current instruction and repeat the process).
You will notice that GDB hangs at scanf
, so input a value for n
and continue stepping through.
If you have entered a large value for n
and want to skip the iteration, use the continue
command.
Eventually, you will reach the line v[423433] = 3;
, and GDB will display:
Program received signal SIGSEGV, Segmentation fault
Inspect the memory at v[423433]
using x &v[423433]
and you will receive the message:
Cannot access memory at address 0x5555558f3e94 /* The memory address should not be the same */
What happened? We accessed a memory area with restricted access.