Rendered at 09:55:52 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
halayli 4 hours ago [-]
The performance observation is real but the two approaches are not equivalent, and the article doesn't mention what you're actually trading away, which is the part that matters.
The C++11 threadsafety guarantee on static initialization is explicitly scoped to block local statics. That's not an implementation detail, that's the guarantee.
The __cxa_guard_acquire/release machinery in the assembly is the standard fulfilling that contract. Move to a private static data member and you're outside that guarantee entirely. You've quietly handed that responsibility back to yourself.
Then there's the static initialization order fiasco, which is the whole reason the meyers singleton with a local static became canonical. Block local static initializes on first use, lazily, deterministically, thread safely. A static data member initializes at startup in an order that is undefined across translation units. If anything touches Instance() during its own static initialization from a different TU, you're in UB territory. The article doesn't mention this.
Real world singleton designs also need: deferred/configuration-driven initialization, optional instantiation, state recycling, controlled teardown. A block local static keeps those doors open. A static data member initializes unconditionally at startup, you've lost lazy-init, you've lost the option to not initialize it, and configuration based instantiation becomes awkward by design.
Honestly, if you're bottlenecking on singleton access, that's design smell worth addressing, not the guard variable.
menaerus 2 hours ago [-]
> Honestly, if you're bottlenecking on singleton access, that's design smell worth addressing, not the guard variable.
There's a large group of engineers who are totally unaware of Amdahl's law and they are consequently obsessed with the performance implications of what are usually most non-important parts of the codebase.
I learned that being in the opposite group of people became (or maybe has been always) somewhat unpopular because it breaks many of the myths that we have been taught for years, and on top of which many people have built their careers. This article may or may not be an example of that. I am not reading too much into it but profiling and identifying the actual bottlenecks seems like a scarce skill nowadays.
PacificSpecific 34 minutes ago [-]
You leveled up past a point a surprising number of people get stuck on essentially.
I feel likethe mindset you are describing is kind of this intermediate senior level. Sadly a lot of programmers can get stuck there for their whole career. Even worse when they get promoted to staff/principal level and start spreading dogma.
I 100 percent agree. If you can't show me a real world performance difference you are just spinning your wheels and wasting time.
alex_dev42 4 hours ago [-]
Excellent points about the initialization order fiasco. I've been bitten by this in embedded systems where startup timing is critical.
One thing I'd add: the guard overhead can actually matter in high-frequency scenarios. I once profiled a logging singleton that was called millions of times per second in a real-time system - the atomic check was showing up as ~3% CPU. But your point stands: if you're hitting that bottleneck, you probably want to reconsider the architecture entirely.
The lazy initialization guarantee is usually worth more than the performance gain, especially since most singletons aren't accessed in tight loops. The static member approach feels like premature optimization unless you've actually measured the guard overhead in your specific use case.
halayli 3 hours ago [-]
Yes definitely not dismissing the lock overhead, but I wanted to bring attention to the implicit false equivalence made in the post. That said, I am surprised the lock check was showing up and not the logging/formatting functions.
csegaults 4 hours ago [-]
Err how does the static approach suffer from thread safety issues when the initialization happens before main even runs?
I might be responding to a llm so...
halayli 3 hours ago [-]
a real human. threads can exist before main() starts. for example, you can include another tu which happens to launch a thread and call instance(). Singletons used to be a headache before C++11 and it was common(maybe still is) to see macros in projects that expand to a singleton class definition to avoid common pitfalls.
MaulingMonkey 1 hours ago [-]
In fact, Windows 10+ now uses a thread pool during process init well before main is reached.
It's a bit contrived, but a global with a nontrivial constructor can spawn a thread that uses another global, and without synchronization the thread can see an uninitialized or partially initialized value.
jibal 2 hours ago [-]
@dang There should be an HN guideline against such accusations.
aappleby 3 hours ago [-]
This is... not an example of good optimization.
Focusing on micro-"optimizations" like this one do absolutely nothing for performance (how many times are you actually calling Instance() per frame?) and skips over the absolutely-mandatory PROFILE BEFORE YOU OPTIMIZE rule.
If a coworker asked me to review this CL, my comment would be "Why are you wasting both my time and yours?"
thomasmg 3 hours ago [-]
Optimizing requires a (performane) problem, and often needs a benchmark.
In my view, the article is not about optimizing, but about understanding how things work under the hood. Which is interesting for some.
delusional 2 hours ago [-]
> If a coworker asked me to review this CL, my comment would be "Why are you wasting both my time and yours?"
If a coworker submitted a patch to existing code, I'd be right there with you. If they submitted new code, and it just so happened to be using this more optimal strategy, I wouldn't blink twice before accepting it.
platinumrad 4 hours ago [-]
I haven't written C++ in a long time, but isn't the issue here that the initialization order of globals in different translation units is unspecified? Lazy initialization avoids that problem at very modest cost.
m-schuetz 5 hours ago [-]
I liked using singletons back in the day, but now I simply make a struct with static members which serves the same purpose with less verbose code. Initialization order doesn't matter if you add one explicit (and also static) init function, or a lazy initialization check.
procaryote 2 hours ago [-]
Yeah, I feel singletons are mostly a result of people learning globals are bad and wanting to pretend their global isn't a global.
A bit like how java people insisted on making naive getFoo() and setFoo() to pretend that was different from making foo public
Honestly the guard overhead is a non-issue in practice — it's one atomic check after first init. The real problem with the static data member approach is
initialization order across translation units. If singleton A touches singleton B during startup you get fun segfaults that only show up in release builds with a
different link order.
I ended up using std::call_once for those cases. More boilerplate but at least you're not debugging init order at 2am.
csegaults 4 hours ago [-]
Came here to say the same thing. Static is OK as long as the object has no dependencies but as soon as it does you're asking for trouble. Second the call_once approach. Another approach is an explicit initialization order system that ensures dependencies are set up in the right order, but that's more complex and only works for binaries you control.
stingraycharles 1 hours ago [-]
AI. Probably clawdbot.
uecker 2 hours ago [-]
It is strange to use lightdm and gdm as examples, which are both written in C (if nothing has changed recently).
signa11 5 hours ago [-]
i am not sure why this entire article is warranted :o) just use `std::call_once` and you are all set.
swaminarayan 4 hours ago [-]
Nice breakdown. I’m curious how often the guard check for a function-local static actually shows up in real profiles. In most codebases Instance() isn’t called in tight loops, so the safety of lazy initialization might matter more than a few extra instructions. Has anyone run into this being a real bottleneck in practice?
The C++11 threadsafety guarantee on static initialization is explicitly scoped to block local statics. That's not an implementation detail, that's the guarantee.
The __cxa_guard_acquire/release machinery in the assembly is the standard fulfilling that contract. Move to a private static data member and you're outside that guarantee entirely. You've quietly handed that responsibility back to yourself.
Then there's the static initialization order fiasco, which is the whole reason the meyers singleton with a local static became canonical. Block local static initializes on first use, lazily, deterministically, thread safely. A static data member initializes at startup in an order that is undefined across translation units. If anything touches Instance() during its own static initialization from a different TU, you're in UB territory. The article doesn't mention this.
Real world singleton designs also need: deferred/configuration-driven initialization, optional instantiation, state recycling, controlled teardown. A block local static keeps those doors open. A static data member initializes unconditionally at startup, you've lost lazy-init, you've lost the option to not initialize it, and configuration based instantiation becomes awkward by design.
Honestly, if you're bottlenecking on singleton access, that's design smell worth addressing, not the guard variable.
There's a large group of engineers who are totally unaware of Amdahl's law and they are consequently obsessed with the performance implications of what are usually most non-important parts of the codebase.
I learned that being in the opposite group of people became (or maybe has been always) somewhat unpopular because it breaks many of the myths that we have been taught for years, and on top of which many people have built their careers. This article may or may not be an example of that. I am not reading too much into it but profiling and identifying the actual bottlenecks seems like a scarce skill nowadays.
I feel likethe mindset you are describing is kind of this intermediate senior level. Sadly a lot of programmers can get stuck there for their whole career. Even worse when they get promoted to staff/principal level and start spreading dogma.
I 100 percent agree. If you can't show me a real world performance difference you are just spinning your wheels and wasting time.
One thing I'd add: the guard overhead can actually matter in high-frequency scenarios. I once profiled a logging singleton that was called millions of times per second in a real-time system - the atomic check was showing up as ~3% CPU. But your point stands: if you're hitting that bottleneck, you probably want to reconsider the architecture entirely.
The lazy initialization guarantee is usually worth more than the performance gain, especially since most singletons aren't accessed in tight loops. The static member approach feels like premature optimization unless you've actually measured the guard overhead in your specific use case.
I might be responding to a llm so...
https://web.archive.org/web/20200920132133/https://blogs.bla...
Focusing on micro-"optimizations" like this one do absolutely nothing for performance (how many times are you actually calling Instance() per frame?) and skips over the absolutely-mandatory PROFILE BEFORE YOU OPTIMIZE rule.
If a coworker asked me to review this CL, my comment would be "Why are you wasting both my time and yours?"
In my view, the article is not about optimizing, but about understanding how things work under the hood. Which is interesting for some.
If a coworker submitted a patch to existing code, I'd be right there with you. If they submitted new code, and it just so happened to be using this more optimal strategy, I wouldn't blink twice before accepting it.
A bit like how java people insisted on making naive getFoo() and setFoo() to pretend that was different from making foo public
I ended up using std::call_once for those cases. More boilerplate but at least you're not debugging init order at 2am.