We at Johnny’s Software Lab LLC are experts in performance. If performance is in any way concern in your software project, feel free to contact us.
A few days ago I wrote a small app to illustrate one of the articles I was preparing. Basically the program was loading a file from the hard disk, sorting it, and then outputting to another file only unique values (by omitting duplicates).
The function for writing unique values to a file looks like this:
void remove_duplicates_and_save(std::vector<std::string>& lines, std::string file_name) { std::ofstream myfile(file_name); myfile << lines[0] << std::endl; for (int i = 1; i < lines.size(); i++) { if (lines[i] != lines[i - 1]) { myfile << lines[i] << std::endl; } } }
As you can see, the function is simple enough and there is nothing special about it. The whole program took 2.3 seconds to complete on a file with 1 million lines. When I ran it through speedscope’s flamegraphs, I got the following output:
As you can see, a lot of time is spent in remove_duplicates_and_save
function, and if you look a little bit closer, a it involves a lot of flushing! For those who don’t know, flushing is moving data from your computer’s operating memory to the hard drive, and it is a very expensive operation if done often. So to increase performance, the C++ standard library performs flushing only when its internal data buffer is full.
I expected that remove_duplicates_and_save
function would take a shorter time than sort_lines
, however,
this was not the case. Upon closer inspection, the culprit was found. According to C++ standard, outputting std::endl
causes a buffer flush and degrades performance. Replacing std::endl
with '\n'
gave the following frame graph:
The overall program’s runtime went down from 2.3 seconds to 0.65 seconds. Function remove_duplicates_and_save
almost disappeared from the flamegraph, which means its runtime is very short. Unlucky choice in the design of C++ standard library, but std::endl
is a very inefficient way to write a new line to a file! So use '\n'
instead!
Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us
Or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.