Learned the other day how AVX-512 instructions are implemented at the chip level. Chips have "floating point units", which have lanes going to and from them. Each lane handles 1 instruction/cycle, and there's 512 lanes! So 512 ops in parallel = AVX. Pretty cool!
Found the article I was reading. Has some really neat schematics in it! https://fuse.wikichip.org/news/1815/amd-discloses-initial-zen-2-details/ …pic.twitter.com/StKJgE5VD7