Back during the Linux 6.17 merge window was an optimization geared for ARM64 that could have a "16x reduction" in the number of calls. Unfortunately that commit ended up causing a rather significant regression for some systems that has now been addressed...