Tweets

You blocked @Liran_Alon

Are you sure you want to view these Tweets? Viewing Tweets won't unblock @Liran_Alon

  1. Pinned Tweet
    14 Nov 2018

    - My talk on KVM nested virtualization past year improvements on KVM Forum. Much to present in a very short time slot. Highly recommend viewing slides appendix for much more technical details and nVMX mechanisms! (slides: )

    Undo
  2. Jan 31

    I later saw this great talk: that explains how RISC may achieve CISC-like perf with clever micro-arch tricks. Main concepts is Macro-Fusion in which decoder generates a single uOp for multiple MacroOps & CISC having both 2 and 4 bytes instructions.

    Undo
  3. Jan 18

    RAP/XFG based on hash seems better. Interesting observation that 7% of indirect branches have >100 valid targets if only hash prototype. Thought: Add compiler annotation for non-cross-module func_ptr & targets that makes linker add unique arg for this branch targets hash? (2/2)

    Show this thread
    Undo
  4. Jan 18

    Slides on Linux kernel CFI. Clang jmp-table based CFI seems quite bad as add many opcodes before indirect branch, exec additional jmp and requires global call-site visibility (E.g. doesn't work for cross-module branch). (1/2)

    Show this thread
    Undo
  5. Jan 5

    : Cool work of using MLPDS to observe micro-arch usage of Load Port. Idea: Signal sibling thread to do some op to observe, sleep X cycles & then observe Load Port stale data with MLPDS. Slowly enlarge X to sequence multiple loads done by single observed op

    Undo
  6. 30 Dec 2019

    Awesome talk by on how he re-implemented C# Array.Sort() using CPU vector operations (AVX2) to avoid branch miss-prediction perf penalty. Resulting in significantly faster implementation.

    Undo
  7. 30 Dec 2019

    : Encountered by mistake this nice (but very old) PDF that compares x86 and ARM architectures. Mostly from userspace perspective but still I found it interesting.

    Undo
  8. 29 Dec 2019

    Also on ARM64, wmb()+writeX_relaxed() compared to writel() will change dma_wmb() to wmb() unnecessarily. As dma_wmb()==DMB(OSHST) is sufficient to flush WCBs. I'm not sure if write to doorbell (UC Device mem) does implicit wmb()==DSB(ST) anyway as in x86 Intel. ARM expert here?..

    Show this thread
    Undo
  9. 29 Dec 2019

    Having said that, I wonder if on these scenarios it's sufficiently ok to just wmb()+writeX_relaxed() on write to doorbell even though it exec unnecessary SFENCE on Intel. Because probably it cause implicit SFENCE on write to UC to be much faster? This is all very weird... (3/3)

    Show this thread
    Undo
  10. 29 Dec 2019

    This applies to some NIC drivers I recently reviewed. They have a feature that Tx desc is written to PCI BAR mapped as WC (Instead to mem) to avoid one DMA read. Thus, only on AMD they require wmb() before writing to doorbell (UC). For example, mlx4 BlueFlame feature. (2/3)

    Show this thread
    Undo
  11. 29 Dec 2019

    Encountered a strange x86 cache-coherency inconsistency: Intel guarantees to flush WCBs on read/write UC mem but does so only for read. If true, Linux should have new flush_wcb_writeX() util that differ between CPU vendors? (1/3)

    Show this thread
    Undo
  12. 17 Dec 2019

    Intel DSA: Similar to Intel QuickData but with ScalableIOV support, generate/test CRC support & generate delta of memcmp support. It's also first time I see new "posted write" interface being used (e.g. ENQCMD) aimed to be generic HW accelerator interface

    Undo
  13. 16 Dec 2019

    I also wonder how these entries are separated between "translation entries" and "paging-structure-cache entries". I didn't find this information specified anywhere. I do hope these numbers don't include the latter and that there are additional entries for that somewhere. :) (2/2)

    Show this thread
    Undo
  14. 16 Dec 2019

    Surprised by interesting numbers in Intel Optimisation Guide section 2.5.5.2 L1 DCache. Turns out there are only 4 DTLB entires for 1GB pages! Crazy! So mapping all guests memory as 1GB pages in EPT may be less efficient? Worth benchmarking! (1/2)

    Show this thread
    Undo
  15. 16 Dec 2019

    Given Intel DDIO provides device with direct access to limited set of LLC ways, I would also expect to have a non-temporal store instruction that not only write directly to LLC, but can be hinted to write to DDIO-accessible LLC ways. E.g. To accelerate NIC/NVMe submissions. (3/3)

    Show this thread
    Undo
  16. 16 Dec 2019

    i.e. Producer isn't expected to read descriptors it writes to submission queue. Thus, no need to load their cache-lines to producer's L1/L2. Which also hurts Consumer latency on reading them. Thoughts? (2/3)

    Show this thread
    Undo
  17. 16 Dec 2019

    Q: Producer/Consumer ring is a common pattern for high perf comm between 2 CPU cores or CPU core & device. Thus, I expected Intel to have non-temporal store instruction that write to LLC without polluting L1/L2. Useful also with device DDIO. But MOVNT* also bypass LLC. Why? (1/3)

    Show this thread
    Undo
  18. 10 Dec 2019

    Attack on SGX: Lower operating voltage of CPU via undocumented MSR to cause complex inst to produce wrong results. Malicious host can use this before ECALL to cause enclave's MUL & AES-NI inst to malfunction. Can lead to SGX leaking secrets to host.

    Undo
  19. 10 Dec 2019

    Nice and short blog post on when it's useful to use C "restrict" keyword to limit effects of pointer aliasing in order to aid compiler optimizations.

    Undo
  20. 5 Dec 2019

    AWS Networking news: VPC traffic mirroring (+ LB integration), VPC ingress routing (Custom routes on IGW/VGW), multicast routing (Clone packet to vNICs group), VPC inter-region peering & accelerated site-to-site VPN (VPN to Edge -> Direct-Connect to AWS)

    Undo
  21. 5 Dec 2019

    AWS Compute news: Nitro Enclaves, Graviton2, Inf1 instances, Compute Optimizer, Outpost + Rack-slot security-key holds PK, Local Zones (City-local EC2 servers), Wavelength (EC2 servers at 5G city aggregation center -> single-digit ms latency).

    Undo

Loading seems to be taking a while.

Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

    You may also like

    ·