close
Skip to main content
11 events
when toggle format what by license comment
Aug 17, 2015 at 20:47 comment added Peter Cordes This generates nice asm on clang 3.5 and later, but not on gcc (even 5.2) See my answer on this question, or a shorter version on stackoverflow.com/questions/776508/…. It's always branchless, but it's a sequence of several instructions. Godbolt: godbolt.org/g/wJBthc
Jul 20, 2015 at 15:42 comment added supercat ...*best-case* time of the shift routine. It's possible the multi-word (e.g. long long) shift routines used by some compilers would need tweaking, but an implementation of x>>y compliant with the above requirement would be cheaper than an implementation of x>>(y-1)>>1 which had to comply with C standard behavior when y was precisely equal to the bit size of x.
Jul 20, 2015 at 15:38 comment added supercat @ThomasMatthews: What would make things easier to read would be to amend the C standard to specify that x>>y, when y is precisely equal to the bit size of x, must yield either x>>(y-1)>>1 or x, chosen in arbitrary fashion. The vast majority of non-hyper-modern C platforms would naturally comply with such a standard; the only ones I know of where compliance would add cost are either those which use a computed jump into a routine that does repeated one-byte-shifts (for platforms without an n-bit shift instruction); the additional cost there would be slight compared with even the...
Jul 18, 2015 at 23:08 vote accept jww
Oct 7, 2015 at 20:06
Jul 13, 2015 at 16:27 comment added jww @MikeMB - I've got other code that stashes it away in a static const.
Jul 13, 2015 at 16:25 comment added MikeMB Just for readability I'd store sizeof(T)*8 in a local variable.
Jul 13, 2015 at 16:25 comment added jww "... in conjunction with branch predictors..." - I think that assumes the branch predictors work as {expected|desired}. That may not be the case on low end processors and boards. Plus, the guys advancing these attacks are very clever, so I want to avoid giving them a toehold.
Jul 13, 2015 at 16:18 comment added dyp I'm not 100% sure if it's well-defined, but I think you can simplify that to -y % (sizeof(T)*8). It seems to be better optimized by clang++ and g++
Jul 13, 2015 at 16:05 comment added Lightness Races in Orbit Yeah but portability!
Jul 13, 2015 at 16:03 comment added Thomas Matthews This is where I believe a simple inline assembly instruction would be easier to read. :-)
Jul 13, 2015 at 16:02 history answered Mark B CC BY-SA 3.0