We at Johnny’s Software Lab LLC are experts in performance. If performance is in any way concern in your software project, feel free to contact us.
GDB is the number one debugger and it can be used to debug all sorts of problems. There are however many other tools out there that can only debug a subset of issues, but they do it much more effectively than GDB.
Today we are going to talk about one such tool called STRACE. STRACE is a tool used to trace Linux system calls and signals. Before moving on to explain what kind of problems can STRACE detect, a short introduction to system calls and signals.
What are system calls and signals?
If you are already familiar with system calls and signals you can skip the next two sections. Otherwise carry on.
System calls
When developers write their programs, many times they will use functionalities provided by the standard C/C++ library or other libraries. But for some functionalities their program (or the library function) must ask the operating system to do it. When we say a program performs a system call, what we mean is that the program is asking the operating system to do one such service. Here are a few examples of system calls:
- Working with files or directories. Programs generally are not aware of how files and directories are physically organized on storage. Instead, the operating system exposes everything a program needs to work with files and directories using a few system calls. On Linux these calls are: open, close, read, write, stat (working with files), getdents (list files in a directory).
- Working with devices. A program might want to work with a device, e.g. a terminal or a keyboard. The program is not interested in how the device works internally, instead, the operating systems expose devices in a standard way to make programmer’s life easier. On Linux, system calls that are used to work with devices are: ioctl (control the device), read (read from the device), write (write to the device) etc.
- Managing processes. A program might want to get information about or control the running processes, including itself. On Linux this is achieved using system calls: fork (create an identical copy of the calling process), clone (create a thread), execve (execute a program), exit (exit the program), wait (wait for the child process to exit).
- Memory management. Programs can request or relinquish memory from the operating system using system calls. On Linux these are calls brk and mmap (allocate memory), munmap (deallocate memory) etc.
- Communication between processes. Programs that want to communicate with other processes can achieve this using system calls. On Linux these are calls: pipe (creates a communication pipe), shmget and mmap (creates a shared memory for two processes to communicate), futex (used to implement users-pace mutexes) etc.
- Network communication. The operating system hides all the complexities of network communication and exposes the network through a socket abstraction. System calls used to work with sockets on Linux are: socket (creates a socket), accept (accept new connections from a remote server), sendto (send data), rcvfrom (receive data) etc.
These are just the most common system calls and there are many others. Linux has more than 300 system calls. A full list can be found here.
A few notes on system calls
Rarely do the programmers work directly with system calls, instead, the functions in the standard library (and other libraries as well) expose system calls to developers. For example, working with files is done with fopen/fread/fwrite
API in C. In C++ the same is achieved using classes in fstream
header.
System calls are more expensive than regular function calls. When a system call is made, the program control flow leaves the user space and enters kernel space. The kernel (operating system) does some work for us and then the control returns to our program. Switch from user space to kernel space is not cheap like a regular function call and many unnecessary system calls can make your program slow.
Signals
Signals are notifications that are sent to your program to let it know something important has happened. Operating system can send a signal SIGBUS to your process to let it know that it has accessed an inaccessible memory. Another program (e.g. Out-Of-Memory Killer) can send a signal SIGKILL to your program when it detects that your program is using too much memory. GNU Debugger sends signals SIGSTOP to pause your program and SIGCONT to continue the execution of your program.
You now get the idea of what signals are. Signals are in their essence means of interprocess communication. But I have never seen two processes have a meaningful exchange of information using processes; normally, signals are used to inform the process that something extraordinary has happened.
Every time a program receives a signal, a default action will be performed automatically unless the program has explicitly overwritten it. Default action depends on the signal type and can be: Core (create the core dump and terminate the receiving process), Term (terminate the receiving process), Ign (ignore the received signal), Stop (pause the program) and Cont (resume the stopped program).
The developer can override the default action for most signals. We call these signal maskable. Some signals are maskable like SIGINT (SIGINT is sent to the program when the user presses Ctrl + C. The default action is to terminate the running program, but the programmer can override this so the program does something else). There are also non-maskable signals where the default action cannot be overridden. Signals SIGKILL (kill the process no-matter what) and SIGSTOP (pause the execution of the program) are the only two non-maskable signals.
Here is a short list of a few most important signals taken from edspresso:
Signal | Description | Default Action |
SIGINT | Interrupt from keyboard (Ctrl + C) | Term |
SIGKILL | Force the program to terminate | Term |
SIGTERM | Politely ask the program to terminate | Term |
SIGBUS | Access to memory that doesn’t exist | Core |
SIGSEGV | Access to forbidden part of the memory | Core |
SIGSTOP | Pauses the execution of the program | Stop |
SIGCONT | Continues the execution of the paused program | Cont |
SIGILL | Program executed illegal or non-existing instruction | Core |
SIGFPE | Arithmetic instruction caused the exception (e.g. division by zero etc.) | Core |
SIGTRAP | Program executed the trap instruction | Core |
Signals are sent to your program asynchronously, i.e. when a signal arrives, your program will stop doing whatever it is it was doing and start processing the signal.
You can overwrite the default signal handler. For example, you can overwrite SIGTERM in order to delete temporary files your program created before exiting the program.
So now you know what are system calls and signals, moving on to STRACE utility.
Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us
Or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.
STRACE
As we already said in the introduction, STRACE is a Linux tool that is used to trace system calls made by your program and signals received by your program. Since programs generally communicate with the kernel a lot, this can prove to be an invaluable source of information. Also, you don’t need the source code of the program you’re investigating, and you can trace almost any program!
What kind of problem can you debug with STRACE?
Here is a list of problems that are typically debugged using STRACE:
- Check if the program is loading the correct library by monitoring open/mmap system calls.
- Check the name of the log file the program is using by monitoring open/write system calls.
- Check if the program is running other programs by monitoring execve system call.
- Debug messages like Connection refused. You can check what is the remote server your program is trying to connect to by monitoring socket call.
- Check how your program is allocating memory by monitoring mmap and brk.
- Debug messages like File not found. You can check what is the file your program is trying to open by monitoring open or stat system calls.
- Debug other weird messages without context produced by your programs.
- Check if the compiler is loading the correct headers by monitoring the open system call and filtering by header name.
- Check if
make
has missed a hidden dependency by monitoring open system call and then filtering only header files.
Please excuse the verbosity, but some of the ideas are simply brilliant. If you have other good ideas, feel free to leave a comment.
Now when we know what kind of information one can collect with STRACE, let’s move on to using this utility.
How to use STRACE?
Using STRACE is simple. Just prefix your command with strace
and you are ready to go. Let’s trace ls
command with strace.
$ strace ls
execve("/bin/ls", ["ls"], 0x7ffd909bfb40 /* 56 vars */) = 0
brk(NULL) = 0x559b54d55000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=105864, ...}) = 0
mmap(NULL, 105864, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0a30a58000
close(3) = 0
...
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=24, ws_col=80, ws_xpixel=0, ws_ypixel=0}) = 0
openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
getdents(3, /* 57 entries */, 32768) = 1840
getdents(3, /* 0 entries */, 32768) = 0
close(3) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
write(1, " badblocks.list Documents Mu"..., 67 badblocks.list Documents Music Projects Templates Videos
) = 67
write(1, " Desktop\t Downloads Pictures\t"..., 72 Desktop Downloads Pictures Public usr 'VirtualBox VMs'
) = 72
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++
In the output above I removed a lot of the output and replaced it with …
The first time you run strace
you will notice that the output is very verbose. All system calls and signals are traced and everything is dumped on the screen. The output can really look intimidating, even for a simple program. This is a time when tools for filtering output like grep
really come in handy.
Let’s have a look at few interesting lines from the example above:
execve("/bin/ls", ["ls"], 0x7ffd909bfb40 /* 56 vars */) = 0
This system call execve
starts the program with appropriate parameters.
brk(NULL) = 0x559b54d55000
The system call brk
allocates memory for the heap.
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=105864, ...}) = 0
mmap(NULL, 105864, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0a30a58000
close(3) = 0
The above code maps a library into the memory space so the program can call functions from the library. First, it opens the library file /etc/ld.so.cache. The openat
system call returns a file descriptor 3 which used in the later system calls. System call ffstat
checks if /etc/ld.so.cache is a file by specifying its file descriptor. Then it makes this file available in the memory using mmap
and starting from address: 0x7f0a30a58000. And finally it closes the file handle.
openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
getdents(3, /* 57 entries */, 32768) = 1840
getdents(3, /* 0 entries */, 32768) = 0
close(3) = 0
The few above system calls read the content of the current directory. System call fstat
makes sure that the thing we want to list is actually a directory. System call getdents
(short for get directory entries) lists the content of the directory.
write(1, " badblocks.list Documents Mu"..., 67 badblocks.list Documents Music Projects Templates Videos
) = 67
write(1, " Desktop\t Downloads Pictures\t"..., 72 Desktop Downloads Pictures Public usr 'VirtualBox VMs'
) = 72
close(1) = 0
System call write
prints the files in the directory on the screen (file descriptor 1 is the output to the terminal).
As you can see, all sorts of information are available using strace
. It is for you, the interpreter of this information, to figure out what is actually useful and what should be discarded.
We suggest you get introduced to system calls on need by need basis; if you try to grasp all system calls at once, you might get overwhelmed.
Advanced STRACE
STRACE comes with a few useful switches, that either limit the amount of information STRACE is producing or provide more interesting information. Here are a few:
Limit the tracing
STRACE comes with a switch -e
to specify which system calls to trace. Here is an example on how to trace only open
and stat
system calls:
$ strace -e trace=open,stat ./my_program
You can trace a group of related calls. You can specify keywords file
, network
, process
, signal
, ipc
or memory
to trace system calls related to working with files, network, signals, inter-process communication or memory.
$ strace -e trace=file ./my_program
When you trace your program without any parameters, strace
will trace signals as well. If you only want to trace signals, just remove tracing of all the system calls, like this:
$ strace -e trace=!all ./my_program
You will see all the received signals, but you won’t be able to see who generated those signals. STRACE cannot tell you that, but there are other tools that can.
Follow child processes
If you run your program through strace
, it might happen that it starts children processes. By default, strace
will not trace children processes, only the parent. If you want to trace children processes as well, use -f
switch.
$ strace -f -e trace=network ./my_program
Process 87283 attached
...
[pid 87283] socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
Next to each system call in the log you will see the pid (process identifier) of the process that made the call.
Output everything to a file
By default, strace
prints all the information to the screen. You can use the -o
switch to print the information to a file, like this:
$ strace -o strace_output.txt ./my_program
Print time when the system call happened
If you want to print wallclock time (e.g. 15:59:04) when the system call or signal happened, use -t
switch:
$ strace -t ls
15:59:04 execve("/bin/ls", ["ls"], [/* 42 vars */]) = 0
...
If just seconds is not precise enough for you, you can also print milliseconds with -tt
switch:
$ strace -tt ls
16:01:00.965998 execve("/bin/ls", ["ls"], [/* 42 vars */]) = 0
...
Measure how much time did the system call take
To measure how much time did the system call take, use the -T
switch:
$ strace -T ls
execve("/bin/ls", ["ls"], [/* 42 vars */]) = 0 <0.000187>
...
You can also print the cumulative statistics of all system calls, you can use the -c
switch like this:
$ strace -c ./my_program
...
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
76.89 0.203417 20342 10 wait4
16.24 0.042960 3 14896 265 open
3.93 0.010394 1 13217 60 stat
1.15 0.003043 2 1397 getdents
1.15 0.003034 4 703 select
Print lines in code that made the system call
You can use the -k
switch to figure out where did the system call originate from. For this to work, strace
needs to be linked with libunwind library. An example:
$ strace -k ./my_program
...
write(1, "\tInstruction per cycle: 1.01006\n", 32) = 32
> /lib/x86_64-linux-gnu/libc-2.27.so(__write+0x14) [0x110154]
> /lib/x86_64-linux-gnu/libc-2.27.so(_IO_file_write+0x2d) [0x8b1bd]
> /lib/x86_64-linux-gnu/libc-2.27.so(_IO_do_write+0xb1) [0x8cf51]
> /lib/x86_64-linux-gnu/libc-2.27.so(_IO_file_overflow+0x103) [0x8d403]
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25(std::ostream::put(char)+0x8a) [0x11368a]
> /home/ivica/Projects/johnysswlab/2020-07-branches/counting(measure_time::~measure_time()+0x467) [0x10c57]
> /home/ivica/Projects/johnysswlab/2020-07-branches/counting(main+0x11d9) [0x3859]
> /lib/x86_64-linux-gnu/libc-2.27.so(__libc_start_main+0xe7) [0x21b97]
> /home/ivica/Projects/johnysswlab/2020-07-branches/counting(_start+0x2a) [0x491a]
...
You can see in the above example the stack backtrace which caused the system call.
Final Word
To really appreciate STRACE you will first need to understand what are the important system calls and how applications are structured around them. When you do, STRACE will become your best friend when it comes to debugging system calls and signals.
Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us
Or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.
Further Read
10 Strace Commands for Troubleshooting and Debugging Linux Processes