close
Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Cancel
16
  • 2
    Trevis Down (aka Beeonrope on OS) wrote about this in the comments in this post and continued the discussion here. He found that each ties (scalar, AVX/AVX2, AVX-512) has "cheap" (no FP, simple operations) instructions and "heavy" instruction. Cheap instructions drop the frequency to the one of the next higher tier (e.g. cheap AVX-512 inst use the AVX/AVX2 tier) even if used sparsely. Heavy inst must be used more than 1 every ... Commented Jul 2, 2019 at 13:06
  • 2
    ... two cycles and drop the frequency according to their tier (e.g. AVX-512 heavy instrs drop the frequency to the AV-512 base). Travis also shared the code he used to test here. You can find the behaviour of each instruction with a bit of patience or by his rule of thumb. Finally note that this frequency scaling is a problem iif the ratio of vector to scalar instruction is low enough so that the drop in frequency is not balanced by the bigger width at which data is processed. Check the final binary to see if you really gained anything. Commented Jul 2, 2019 at 13:10
  • 1
    @HCSF You can make three builds, one without AVX, one with AVX/AVX2 and one with AVX-512 (if applicable) and profile them. Then take the fastest one. Commented Jul 2, 2019 at 14:52
  • 2
    Peter mentioned the -mpreferred-vector-width=256 option. I don't know if it prevents gcc from ever producing AVX-512 instructions (outside of direct intrinsic use), but it is certainly possible. I am not aware of any option which distinguishes between "heavy" and "light" instructions however. Usually this isn't a problem, since if you turn off AVX-512 and don't have a bunch of FP ops, you are probably targeting L0 anyways, and AVX-512 light is still L1. Commented Jul 3, 2019 at 6:57
  • 1
    @HCSF important routines in libc are generally compiled multiple times for different ISAs and then the version appropriate for the current CPU is selected at runtime using the dynamic loader's IFUNC capability. So you'll usually get a version optimized for your CPU (unless your libc is quite old and your CPU quite new). Commented Jul 4, 2019 at 0:16