Tweetovi
- Tweetovi, trenutna stranica.
- Tweetovi i odgovori
- Medijski sadržaj
Blokirali ste korisnika/cu @Liran_Alon
Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @Liran_Alon
-
Prikvačeni tweet
https://youtu.be/Pc7F-n5278w - My talk on KVM nested virtualization past year improvements on KVM Forum. Much to present in a very short time slot. Highly recommend viewing slides appendix for much more technical details and nVMX mechanisms! (slides: https://events.linuxfoundation.org/wp-content/uploads/2017/12/Improving-KVM-x86-Nested-Virtualization-Liran-Alon-Oracle.pdf …)
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
I later saw this great talk: https://www.youtube.com/watch?v=Ii_pEXKKYUg … that explains how RISC may achieve CISC-like perf with clever micro-arch tricks. Main concepts is Macro-Fusion in which decoder generates a single uOp for multiple MacroOps & CISC having both 2 and 4 bytes instructions.https://twitter.com/Liran_Alon/status/1223215852146786305 …
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
RAP/XFG based on hash seems better. Interesting observation that 7% of indirect branches have >100 valid targets if only hash prototype. Thought: Add compiler annotation for non-cross-module func_ptr & targets that makes linker add unique arg for this branch targets hash? (2/2)
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
https://outflux.net/slides/2020/lca/cfi.pdf …
@kees_cook Slides on Linux kernel CFI. Clang jmp-table based CFI seems quite bad as add many opcodes before indirect branch, exec additional jmp and requires global call-site visibility (E.g. doesn't work for cross-module branch). (1/2)Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
https://gamozolabs.github.io/metrology/2019/12/30/load-port-monitor.html … : Cool work of using MLPDS to observe micro-arch usage of Load Port. Idea: Signal sibling thread to do some op to observe, sleep X cycles & then observe Load Port stale data with MLPDS. Slowly enlarge X to sequence multiple loads done by single observed op
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Awesome talk by
@damageboy on how he re-implemented C# Array.Sort() using CPU vector operations (AVX2) to avoid branch miss-prediction perf penalty. Resulting in significantly faster implementation.https://twitter.com/damageboy/status/1209153363620835330 …Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
http://infocenter.arm.com/help/topic/com.arm.doc.dai0274b/DAI0274B_migrating_from_ia32_to_arm.pdf … : Encountered by mistake this nice (but very old) PDF that compares x86 and ARM architectures. Mostly from userspace perspective but still I found it interesting.
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Also on ARM64, wmb()+writeX_relaxed() compared to writel() will change dma_wmb() to wmb() unnecessarily. As dma_wmb()==DMB(OSHST) is sufficient to flush WCBs. I'm not sure if write to doorbell (UC Device mem) does implicit wmb()==DSB(ST) anyway as in x86 Intel. ARM expert here?..
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Having said that, I wonder if on these scenarios it's sufficiently ok to just wmb()+writeX_relaxed() on write to doorbell even though it exec unnecessary SFENCE on Intel. Because probably it cause implicit SFENCE on write to UC to be much faster? This is all very weird... (3/3)
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
This applies to some NIC drivers I recently reviewed. They have a feature that Tx desc is written to PCI BAR mapped as WC (Instead to mem) to avoid one DMA read. Thus, only on AMD they require wmb() before writing to doorbell (UC). For example, mlx4 BlueFlame feature. (2/3)
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Encountered a strange x86 cache-coherency inconsistency: Intel guarantees to flush WCBs on read/write UC mem but
@AMD does so only for read. If true, Linux should have new flush_wcb_writeX() util that differ between CPU vendors? (1/3)@fagiolinux@_msw_@DanielMarcovit3@_AlexGrafpic.twitter.com/kT3Tw6VBKz
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator … Intel DSA: Similar to Intel QuickData but with ScalableIOV support, generate/test CRC support & generate delta of memcmp support. It's also first time I see new "posted write" interface being used (e.g. ENQCMD) aimed to be generic HW accelerator interface
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
I also wonder how these entries are separated between "translation entries" and "paging-structure-cache entries". I didn't find this information specified anywhere. I do hope these numbers don't include the latter and that there are additional entries for that somewhere. :) (2/2)
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Surprised by interesting numbers in Intel Optimisation Guide section 2.5.5.2 L1 DCache. Turns out there are only 4 DTLB entires for 1GB pages! Crazy! So mapping all guests memory as 1GB pages in EPT may be less efficient? Worth benchmarking!
@_msw_@Karim_Allah@rsinghal1 (1/2)pic.twitter.com/wmlyFJBp8z
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Given Intel DDIO provides device with direct access to limited set of LLC ways, I would also expect to have a non-temporal store instruction that not only write directly to LLC, but can be hinted to write to DDIO-accessible LLC ways. E.g. To accelerate NIC/NVMe submissions. (3/3)
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
i.e. Producer isn't expected to read descriptors it writes to submission queue. Thus, no need to load their cache-lines to producer's L1/L2. Which also hurts Consumer latency on reading them. Thoughts? (2/3)
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Q: Producer/Consumer ring is a common pattern for high perf comm between 2 CPU cores or CPU core & device. Thus, I expected Intel to have non-temporal store instruction that write to LLC without polluting L1/L2. Useful also with device DDIO. But MOVNT* also bypass LLC. Why? (1/3)
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
https://plundervolt.com/doc/plundervolt.pdf … Attack on SGX: Lower operating voltage of CPU via undocumented MSR to cause complex inst to produce wrong results. Malicious host can use this before ECALL to cause enclave's MUL & AES-NI inst to malfunction. Can lead to SGX leaking secrets to host.
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
https://nullprogram.com/blog/2019/12/09/ … Nice and short blog post on when it's useful to use C "restrict" keyword to limit effects of pointer aliasing in order to aid compiler optimizations.
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
https://youtu.be/xLbx-iqjZxk AWS Networking news: VPC traffic mirroring (+ LB integration), VPC ingress routing (Custom routes on IGW/VGW), multicast routing (Clone packet to vNICs group), VPC inter-region peering & accelerated site-to-site VPN (VPN to Edge -> Direct-Connect to AWS)
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
https://www.youtube.com/watch?v=Cqa1BHos1sg … AWS Compute news: Nitro Enclaves, Graviton2, Inf1 instances, Compute Optimizer, Outpost + Rack-slot security-key holds PK, Local Zones (City-local EC2 servers), Wavelength (EC2 servers at 5G city aggregation center -> single-digit ms latency).
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.