The C++ standard library is a collection of classes and functions. These are written in C++ and are part of the C++ standard itself. All popular compiler toolchains come with a C++ standard library. The popular ones are libstdc++ (GNU), libc++(LLVM) also popularly known as libcxx, msvc-stl(appears to be derived from dinkumware C++ library and libc++ and was upstreamed in 2019). Needless to say, the standard library plays a very important role in the runtime performance of many systems.
Over a period of time I have collected a list of performance opportunities. Some of them I found online from the mailing list and bugzilla. Others by reading source code, and previous experience with the performance analysis of libstdc++ and libc++. Disclaimer: For some of the items below I do not have experimental numbers, and I’m mostly relying on what was reported in the references.
The algorithmic requirements on STL and their compliance by all the popular libraries makes it easy to believe that the performance couldn’t be improved further. The requirements are based on computational complexity theory big O and friends. Because the constant factor isn’t taken into account (for good reasons), the realized performance depends on the actual implementation and the workload. The iostream
library is just slow, like almost all the interfaces are slow (probably except for the ones I optimized ;) ). The source code looks very much like a typical Java program (too many indirections and virtual methods).
There are four subsections to classify performance opportunities.
iostream
librarysort
, find
std::vector
std::vector
has perf issues (See: Really bad codegen for libc++ vector). Use realloc whenever appropriate. Some improvements were proposed in (See: [Improving std::vectorstd::string
string::find.*of
, and string::rfind
are still suboptimal (See string)std::map
std::iostream
std::sort
: of clang/gcc may be slow depending on the workload (sort).std::find
of libcxx is very slow compared to libstdc++ because llvm does not unroll the loop automatically (find).This will help the compiler reorganize basic blocks in the function.
Annotating pointers with restrict:
Annotating branches with builtin_likely
:
Devirtualization will help the iostream library because it has too many virtual methods.
Constructors and destructors of STL containers like string
, vector
etc.
Vectorize memcpy
, memset
etc style loops. Also related to Loop Idiom Recognition
Detect memcpy
, memset
, memchr
, memcmp
style loops
Even better, jump threading with auto-FDO. (See: Jump Threading Bug)
Helps bring micro-optimization and code-layout tuned according to our workloads
std::find
will benefit from:
- Loop unrolling (See Loop unroll opportunities)
- Needs loop rotation to make loop-unroll more effective (See: Loop rotation bug)
CoroFrame does not pay attention to the lifetime markers (See CoroFrame)