Tweetovi

Blokirali ste korisnika/cu @Liran_Alon

Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @Liran_Alon

  1. Prikvačeni tweet
    14. stu 2018.

    - My talk on KVM nested virtualization past year improvements on KVM Forum. Much to present in a very short time slot. Highly recommend viewing slides appendix for much more technical details and nVMX mechanisms! (slides: )

    Poništi
  2. 31. sij

    I later saw this great talk: that explains how RISC may achieve CISC-like perf with clever micro-arch tricks. Main concepts is Macro-Fusion in which decoder generates a single uOp for multiple MacroOps & CISC having both 2 and 4 bytes instructions.

    Poništi
  3. 18. sij

    RAP/XFG based on hash seems better. Interesting observation that 7% of indirect branches have >100 valid targets if only hash prototype. Thought: Add compiler annotation for non-cross-module func_ptr & targets that makes linker add unique arg for this branch targets hash? (2/2)

    Prikaži ovu nit
    Poništi
  4. 18. sij

    Slides on Linux kernel CFI. Clang jmp-table based CFI seems quite bad as add many opcodes before indirect branch, exec additional jmp and requires global call-site visibility (E.g. doesn't work for cross-module branch). (1/2)

    Prikaži ovu nit
    Poništi
  5. 5. sij

    : Cool work of using MLPDS to observe micro-arch usage of Load Port. Idea: Signal sibling thread to do some op to observe, sleep X cycles & then observe Load Port stale data with MLPDS. Slowly enlarge X to sequence multiple loads done by single observed op

    Poništi
  6. 30. pro 2019.

    Awesome talk by on how he re-implemented C# Array.Sort() using CPU vector operations (AVX2) to avoid branch miss-prediction perf penalty. Resulting in significantly faster implementation.

    Poništi
  7. 30. pro 2019.

    : Encountered by mistake this nice (but very old) PDF that compares x86 and ARM architectures. Mostly from userspace perspective but still I found it interesting.

    Poništi
  8. 29. pro 2019.

    Also on ARM64, wmb()+writeX_relaxed() compared to writel() will change dma_wmb() to wmb() unnecessarily. As dma_wmb()==DMB(OSHST) is sufficient to flush WCBs. I'm not sure if write to doorbell (UC Device mem) does implicit wmb()==DSB(ST) anyway as in x86 Intel. ARM expert here?..

    Prikaži ovu nit
    Poništi
  9. 29. pro 2019.

    Having said that, I wonder if on these scenarios it's sufficiently ok to just wmb()+writeX_relaxed() on write to doorbell even though it exec unnecessary SFENCE on Intel. Because probably it cause implicit SFENCE on write to UC to be much faster? This is all very weird... (3/3)

    Prikaži ovu nit
    Poništi
  10. 29. pro 2019.

    This applies to some NIC drivers I recently reviewed. They have a feature that Tx desc is written to PCI BAR mapped as WC (Instead to mem) to avoid one DMA read. Thus, only on AMD they require wmb() before writing to doorbell (UC). For example, mlx4 BlueFlame feature. (2/3)

    Prikaži ovu nit
    Poništi
  11. 29. pro 2019.

    Encountered a strange x86 cache-coherency inconsistency: Intel guarantees to flush WCBs on read/write UC mem but does so only for read. If true, Linux should have new flush_wcb_writeX() util that differ between CPU vendors? (1/3)

    Prikaži ovu nit
    Poništi
  12. 17. pro 2019.

    Intel DSA: Similar to Intel QuickData but with ScalableIOV support, generate/test CRC support & generate delta of memcmp support. It's also first time I see new "posted write" interface being used (e.g. ENQCMD) aimed to be generic HW accelerator interface

    Poništi
  13. 16. pro 2019.

    I also wonder how these entries are separated between "translation entries" and "paging-structure-cache entries". I didn't find this information specified anywhere. I do hope these numbers don't include the latter and that there are additional entries for that somewhere. :) (2/2)

    Prikaži ovu nit
    Poništi
  14. 16. pro 2019.

    Surprised by interesting numbers in Intel Optimisation Guide section 2.5.5.2 L1 DCache. Turns out there are only 4 DTLB entires for 1GB pages! Crazy! So mapping all guests memory as 1GB pages in EPT may be less efficient? Worth benchmarking! (1/2)

    Prikaži ovu nit
    Poništi
  15. 16. pro 2019.

    Given Intel DDIO provides device with direct access to limited set of LLC ways, I would also expect to have a non-temporal store instruction that not only write directly to LLC, but can be hinted to write to DDIO-accessible LLC ways. E.g. To accelerate NIC/NVMe submissions. (3/3)

    Prikaži ovu nit
    Poništi
  16. 16. pro 2019.

    i.e. Producer isn't expected to read descriptors it writes to submission queue. Thus, no need to load their cache-lines to producer's L1/L2. Which also hurts Consumer latency on reading them. Thoughts? (2/3)

    Prikaži ovu nit
    Poništi
  17. 16. pro 2019.

    Q: Producer/Consumer ring is a common pattern for high perf comm between 2 CPU cores or CPU core & device. Thus, I expected Intel to have non-temporal store instruction that write to LLC without polluting L1/L2. Useful also with device DDIO. But MOVNT* also bypass LLC. Why? (1/3)

    Prikaži ovu nit
    Poništi
  18. 10. pro 2019.

    Attack on SGX: Lower operating voltage of CPU via undocumented MSR to cause complex inst to produce wrong results. Malicious host can use this before ECALL to cause enclave's MUL & AES-NI inst to malfunction. Can lead to SGX leaking secrets to host.

    Poništi
  19. 10. pro 2019.

    Nice and short blog post on when it's useful to use C "restrict" keyword to limit effects of pointer aliasing in order to aid compiler optimizations.

    Poništi
  20. 5. pro 2019.

    AWS Networking news: VPC traffic mirroring (+ LB integration), VPC ingress routing (Custom routes on IGW/VGW), multicast routing (Clone packet to vNICs group), VPC inter-region peering & accelerated site-to-site VPN (VPN to Edge -> Direct-Connect to AWS)

    Poništi
  21. 5. pro 2019.

    AWS Compute news: Nitro Enclaves, Graviton2, Inf1 instances, Compute Optimizer, Outpost + Rack-slot security-key holds PK, Local Zones (City-local EC2 servers), Wavelength (EC2 servers at 5G city aggregation center -> single-digit ms latency).

    Poništi

Čini se da učitavanje traje već neko vrijeme.

Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.

    Možda bi vam se svidjelo i ovo:

    ·