We at Johnny’s Software Lab LLC are experts in performance. If performance is in any way concern in your software project, feel free to contact us.
I had difficulty thinking of a name for this post that would be underline the importance of record and replay debugging. Whatever name would I pick, it would seem too “weak” because I am about to write about such important tools that every developer should be aware of, and yet all the names seemed too casual. Whatever, let’s move on.
When I first started working as a developer, back in 2010, and when I was getting acquainted with gdb, one of the thoughts I had in one of my debug sessions was: “Well, here is the wrong value, now if I could just put a watchpoint on this memory address and run the program backward, I would find the source of the problem in a minute”. This though appeared but it was soon dispersed by the harsh reality. Ten years later, things have changed.
In 2014, the good people of Mozilla Foundation were debugging problems in Firefox, and apparently they had a lot of issues that were difficult to reproduce consistently. And while trying to solve them, they developed a tool that could record the execution of Firefox, including intermittent problems in it, and from that point on, reproduce it anytime and anywhere.
As an added benefit, this tool came with the ability to reverse execute your program. Since the whole execution of the program is saved, it is possible to go back in time to a specific instruction without restarting your program. You can put a breakpoint or a watchpoint, reverse-continue your program and watch the magic happening before your eyes. Sweet 🙂
Oh, yes, I forgot, the tool’s name is RR, which is short for record and replay.
How to use RR?
From the developer’s viewpoint, RR is super easy to use. First, you record the execution of your program using rr record
. The output of the recording is a folder containing the test binary, execution trace etc. Once you’ve caught the bug (if the bug is difficult to reproduce), from that point on you can use rr replay
to run your program again. Here is an example of how to record it:
$ rr record ./linked_list_test
rr: Saving execution to trace directory `/home/ivica/.local/share/rr/linked_list_test-2'.
...
To record it you use rr record
command. The recording is saved in a folder, in my example it is /home/ivica/.local/share/rr/linked_list_test-2
. To replay it, you just run rr replay
.
$ rr replay
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
...
Reading symbols from /home/ivica/.local/share/rr/linked_list_test-2/mmap_hardlink_3_linked_list_test...done.
Really redefine built-in command "restart"? (y or n) [answered Y; input not from terminal]
Remote debugging using 127.0.0.1:10207
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.27.so...done.
done.
0x00007ff9faba7090 in _start () from /lib64/ld-linux-x86-64.so.2
(rr) c
Continuing.
This time, your program is not actually executing. Instead, it is running from a recording. For example, it will not read or write anything to a file, instead, these operations are simulated. All the addresses in your program stay the same between the runs in order to make debugging easier.
In the replay stage, debugging your program with RR is same as debugging it with GDB. Most GDB commands are supported. With RR, commands for reverse debugging such as reverse-continue
(rc
), reverse-next
(rn
), reverse-step
(rs
) become much more useful.
When I say RR gives you the possibility to reverse execute a program, this is an understatement. Debugging a program in reverse is the most natural way to debug software. The cause of the bug is somewhere in the past, and the developer starts off from the bug manifestation. What the developer is actually doing is trying to get to the source of the bug by going bit by bit to the origin of the bug.
If you look at it this way, you will understand why it is the most natural to debug software:
- Set up a watchpoint to the wrong or corrupted value and then reverse-continue.
- Set up a breakpoint to the previous invocation of the function you are debugging and then reverse-continue.
- Reverse execute to the beginning of the caller function.
- If you get a stack corruption, you perform
reverse-nexti
(go back one instruction) and your corrupted stack magically uncorrupts.
The possibilities are endless. Here is an example of reverse execution:
Breakpoint 1, linked_list<test_struct<1>, 1>::remove_if<void run_test<1, 1, 1>(std::vector<int, std::allocator<int> >&)::{lambda(test_struct<1> const&)#1}>(void run_test<1, 1, 1>(std::vector<int, std::allocator<int> >&)::{lambda(test_struct<1> const&)#1}&&) (this=0x7ffe3a02ba20, condition=...) at linked_list.h:135
135 prev->next = next;
(rr) watch prev->next
Hardware watchpoint 2: prev->next
(rr) rc
Continuing.
Hardware watchpoint 2: prev->next
Old value = (linked_list<test_struct<1>, 1>::linked_list_node *) 0x55c304b88f20
New value = <unreadable>
0x000055c3036a314a in linked_list<test_struct<1>, 1>::remove_if<void run_test<1, 1, 1>(std::vector<int, std::allocator<int> >&)::{lambda(test_struct<1> const&)#1}>(void run_test<1, 1, 1>(std::vector<int, std::allocator<int> >&)::{lambda(test_struct<1> const&)#1}&&) (this=0x7ffe3a02ba20, condition=...) at linked_list.h:142
142 prev = current;
(rr)
These features are missing in a classical debugger, and I think this is precisely the reason why many developers prefer to debug using printfs instead of classical debuggers. With printfs you instrument your code, you run it and you collect the outputs. Most of the output will be uninteresting, but you can filter those out and see those that are interesting. This is what I call “going back in time” the old way. But the whole add output-compile-link-reproduce cycle can make it very inefficient.
Limitations
Unfortunately, RR has it’s own limitations. Here are the biggest limitations:
- You need a reasonably new Intel processor. I didn’t have luck with this since my home laptop has AMD processor.
- It works on Linux only. Windows has Time Travel Debugging, which is similar to RR but I didn’t have the opportunity to try it out.
- You need a Linux kernel minimal version 3.11. In my work we are using an old system which doesn’t have this kernel version. I cannot use it at work 🙁
- It doesn’t work on some virtual environment. In my case, it doesn’t work with VirtualBox (which I have) and Xen, but it works with VmWare and KVM.
Under the hood
To record the execution of your program, RR doesn’t need to record the effect of every instruction. Recording in GDB works like that and it is very slow. It is slow to the point of being usable only for a few scenarios.
To record the execution of the program, RR does the following:
- Intercepts all system calls using
ptrace
system call. It records the input and output values of the system calls. During reproduction, no actual calls are forwarded to the kernel by the recorded program, instead, RR emulates them. - Records asynchronous events using processor counters. During replay, information from the recording serves to replay the asynchronous events.
Having in mind that during replay no actual system calls are made, this means that RR recordings can be big. On the upside, no reading to or writing from files is done, there is no actual communication over a socket. All the complexities of complicated setups go away once you’ve reproduced the problem.
Slowdown incurred by RR will typically be around 2x, which is fine for most circumstances. There are a few more limitations, however, which are due to the way RR is designed:
- RR simulates only one CPU core. If your program uses parallelization to run fast, running it with RR will slow it down significantly. All the functionality within your program will be preserved, but it will just run slower. Recording on multiple cores is complicated and it was a design decision by RR people not to do it. Please note there is one type of bugs related to weak memory ordering that cannot be reproduced with RR.
- If your program spawns other processes, RR will record them as well. However, if your process communicates with another process that is not in the same process tree using shared memory, RR will not be able to record it. This is not possible since this would require for RR to trap every memory access to the shared memory. This means that there is a certain type of program you cannot debug. In my case, we have a user-space driver that communicates with the device using shared memory. RR is useless to debug this type of program.
As you might have guessed, when you are reverse-executing your program, your program is not actually running in reverse. During the recording, RR periodically saves the state of your program as a restartable checkpoint. When you are reverse running it, it goes to the previous checkpoint N and runs it until the current line. If any breakpoints, watchpoints, etc. fire, RR will stop. If it reaches the place where it started from, it goes back to checkpoint N-1 and repeats the process.
Advanced features
RR offers a few other features that are useful but not the core functionality of the program.
Portable traces
If you are running your program on one machine and you are reproducing it on another machine, portable traces utility can be very useful. For example, if you need a complicated setup to reproduce your bug, or you are reproducing a problem observed by QA or a customer, portable traces features can be indispensable.
When you have made the recording, do rr pack
on it to create a portable version of the directory. You can then zip it and share it with others, everything you need will be in it. You don’t need any external libraries or programs to reproduce the bug, you don’t even need to have the same Linux version.
$ rr pack /home/ivica/.local/share/rr/linked_list_test-5
rr: Packed trace directory `/home/ivica/.local/share/rr/linked_list_test-5'.
In our example, everything needed to reproduce the bug is now located inside /home/ivica/.local/share/rr/linked_list_test-5
.
Chaos mode
If your bug is difficult to reproduce, RR offers chaos mode. When recording in chaos mode, RR will vary the priorities of threads in your program between different runs in order to reproduce those bugs that are due to race conditions or other non-deterministic behavior. To record in chaos mode, run rr record -h ./my_program
Conclusion
In the past there were also debuggers that could do recording, replaying and reverse execution, but they never caught up. This is too bad, and it is time to change that.
RR is a game-changer. It is easy to use, it is free and very usable. If you invest time to learn it, RR will change the way you are debugging your programs, make you more productive and make your life much easier.