| Master software performance in just 16 hours! Join our Software Optimization for the Memory Subsystem Workshop taking place from May 18th to May 21st. Click here to express interest or register. |
A few days ago I wrote a small app to illustrate one of the articles I was preparing. Basically the program was loading a file from the hard disk, sorting it, and then outputting to another file only unique values (by omitting duplicates).
The function for writing unique values to a file looks like this:
void remove_duplicates_and_save(std::vector<std::string>& lines,
std::string file_name) {
std::ofstream myfile(file_name);
myfile << lines[0] << std::endl;
for (int i = 1; i < lines.size(); i++) {
if (lines[i] != lines[i - 1]) {
myfile << lines[i] << std::endl;
}
}
}
As you can see, the function is simple enough and there is nothing special about it. The whole program took 2.3 seconds to complete on a file with 1 million lines. When I ran it through speedscope’s flamegraphs, I got the following output:

As you can see, a lot of time is spent in remove_duplicates_and_save function, and if you look a little bit closer, a it involves a lot of flushing! For those who don’t know, flushing is moving data from your computer’s operating memory to the hard drive, and it is a very expensive operation if done often. So to increase performance, the C++ standard library performs flushing only when its internal data buffer is full.
I expected that remove_duplicates_and_save function would take a shorter time than sort_lines, however, this was not the case. Upon closer inspection, the culprit was found. According to C++ standard, outputting std::endl causes a buffer flush and degrades performance. Replacing std::endl with '\n' gave the following frame graph:

The overall program’s runtime went down from 2.3 seconds to 0.65 seconds. Function remove_duplicates_and_save almost disappeared from the flamegraph, which means its runtime is very short. Unlucky choice in the design of C++ standard library, but std::endl is a very inefficient way to write a new line to a file! So use '\n' instead!
Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us
You can also subscribe to our mailing list (link top right of this page) or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.
