Rendered at 17:17:43 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
bob1029 9 hours ago [-]
I feel like there are cases where burning an entire core on busy waiting will result in better overall power consumption than if we sleep and yield as much as possible throughout.
Bringing a core back from the dead each time can be a very expensive operation. If you never yield to the operating system, your working set is ~guaranteed to be hot. Communicating information is substantially more expensive than processing information in terms of energy consumption. DRAM and infinity fabric don't work for free. One AMD CCX maxed out is nothing compared to saturating the memory subsystem.
Regarding your conclusion, I always figured that the reason WAITPKG seems kinda lame is the only reason they ported it to Core architectures was to make the heterogenous CPUs possible. It works better on Atom. On Core it does almost nothing, as you note. AMD's ISA extension was written from the ground up for their high performance server core, which might explain why it's actually useful.
jeffbee 16 hours ago [-]
I was psyched about UMWAIT and TPAUSE when Alder Lake arrived but unfortunately Intel has never shipped a model where these features actually work right. There have been errata all over the topic of the monitors failing to fire. TPAUSE also unfortunately useless since you have to read TSC, it just takes too long.
dgacmu 16 hours ago [-]
This is very cool. Uh, no pun intended. For those looking for a tl;dr: recent instruction sets add a "pause in low power for x time" busy wait instruction and a "pause for time + or if this memory is touched" instruction. Particularly on e-cores and epyc rome, these instructions can save a good amount of power vs traditional busy waiting while hitting wakeup time targets quite well.
Nifty stuff.
Bringing a core back from the dead each time can be a very expensive operation. If you never yield to the operating system, your working set is ~guaranteed to be hot. Communicating information is substantially more expensive than processing information in terms of energy consumption. DRAM and infinity fabric don't work for free. One AMD CCX maxed out is nothing compared to saturating the memory subsystem.