-
-
Save vishbin/963c5f5c2b0dc046b702bc1b6c0533d6 to your computer and use it in GitHub Desktop.
Revisions
-
Eliot Eshelman revised this gist
Jul 2, 2018 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -68,7 +68,7 @@ Assumes a GPU clock frequency of 1GHz (NVIDIA Tesla GPUs range from 0.8~1.4GHz). "Local" and "Remote" cache/memory values are from dual-socket Intel Xeon. Larger SMP systems have more hops. GPU NVLink connections are not always 40GB. They range from 20GB to 150GB, depending upon the server platform design. Credit -
Eliot Eshelman revised this gist
Dec 30, 2016 . 1 changed file with 3 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -30,6 +30,8 @@ Read 1MB sequentially from disk 5,000,000 ns 5,000 us 5 ms ~200MB/ Random Disk Access (seek+rotation) 10,000,000 ns 10,000 us 10 ms Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms Total CPU pipeline length? NVIDIA Tesla GPU values ----------------------- @@ -42,6 +44,7 @@ Transfer 1MB to/from PCI-E GPU 80,000 ns 80 us ~12GB/s Floating-point add/mult operation? Shift operation? Atomic operation in GPU Global Memory? Total GPU pipeline length? Launch CUDA kernel (via dynamic parallelism)? -
Eliot Eshelman revised this gist
Dec 30, 2016 . 1 changed file with 9 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -9,7 +9,7 @@ L3 cache hit (shared line in another core) 25 ns 65 cycl Mutex lock/unlock 25 ns L3 cache hit (modified in another core) 29 ns 75 cycles L3 cache hit (on a remote CPU socket) 40 ns 100 ~ 300 cycles (40 ~ 116 ns) QPI hop to a another CPU (time per hop) 40 ns 64MB main memory reference (local CPU) 46 ns TinyMemBench on "Broadwell" E5-2690v4 64MB main memory reference (remote CPU) 70 ns TinyMemBench on "Broadwell" E5-2690v4 256MB main memory reference (local CPU) 75 ns TinyMemBench on "Broadwell" E5-2690v4 @@ -35,10 +35,15 @@ NVIDIA Tesla GPU values ----------------------- GPU Shared Memory access 30 ns 30~90 cycles (bank conflicts will introduce more latency) GPU Global Memory access 200 ns 200~800 cycles, depending upon GPU generation and access patterns Launch CUDA kernel on GPU 10,000 ns 10 us Host CPU instructs GPU to start executing a kernel Transfer 1MB to/from NVLink GPU 30,000 ns 30 us ~33GB/sec on NVIDIA 40GB NVLink Transfer 1MB to/from PCI-E GPU 80,000 ns 80 us ~12GB/sec on PCI-Express x16 link Floating-point add/mult operation? Shift operation? Atomic operation in GPU Global Memory? Launch CUDA kernel (via dynamic parallelism)? Intel Xeon CPU values --------------------- @@ -58,6 +63,8 @@ Notes Assumes a CPU clock frequency of 2.6GHz (common for Xeon server CPUs). That's ~0.385ns per clock cycle. Assumes a GPU clock frequency of 1GHz (NVIDIA Tesla GPUs range from 0.8~1.4GHz). That's 1ns per clock cycle. "Local" and "Remote" cache/memory values are from dual-socket Intel Xeon. Larger SMP systems have more hops. GPU NVLink connections are not always 40GB. They range from 20GB to 80GB, depending upon the server platform design. -
Eliot Eshelman revised this gist
Dec 30, 2016 . 1 changed file with 13 additions and 5 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,13 +1,19 @@ Latency Comparison Numbers -------------------------- L1 cache reference/hit 1.5 ns 4 cycles Floating-point add/mult/FMA operation 1.5 ns 4 cycles L2 cache reference/hit 5 ns 12 ~ 17 cycles Branch mispredict 6 ns 15 ~ 20 cycles L3 cache hit (unshared cache line) 16 ns 42 cycles L3 cache hit (shared line in another core) 25 ns 65 cycles Mutex lock/unlock 25 ns L3 cache hit (modified in another core) 29 ns 75 cycles L3 cache hit (on a remote CPU socket) 40 ns 100 ~ 300 cycles (40 ~ 116 ns) QPI hop to a another CPU (time per hop) . 40 . ns 64MB main memory reference (local CPU) 46 ns TinyMemBench on "Broadwell" E5-2690v4 64MB main memory reference (remote CPU) 70 ns TinyMemBench on "Broadwell" E5-2690v4 256MB main memory reference (local CPU) 75 ns TinyMemBench on "Broadwell" E5-2690v4 256MB main memory reference (remote CPU) 120 ns TinyMemBench on "Broadwell" E5-2690v4 Send 4KB over 100 Gbps HPC fabric 1,040 ns 1 us MVAPICH2 over Intel Omni-Path / Mellanox EDR Compress 1KB with Google Snappy 3,000 ns 3 us Send 4KB over 10 Gbps ethernet 10,000 ns 10 us @@ -65,6 +71,8 @@ Additional Data Gathered/Correlated from: ----------------------------------------- Memory latency tool: https://github.com/ssvb/tinymembench CPU data from Agner Fog: http://www.agner.org/optimize/ CPU cache and QPI data: https://mechanical-sympathy.blogspot.com/2013/02/cpu-cache-flushing-fallacy.html Intel performance analysis: https://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf Intel Broadwell CPU data: http://users.atw.hu/instlatx64/GenuineIntel00306D4_Broadwell2_NewMemLat.txt Intel SkyLake CPU data: http://www.7-cpu.com/cpu/Skylake.html MVAPICH2 fabric testing: http://mug.mvapich.cse.ohio-state.edu/static/media/mug/presentations/2016/DK_Status_and_Roadmap_MUG16.pdf -
Eliot Eshelman revised this gist
Dec 30, 2016 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -21,7 +21,7 @@ Read 4KB randomly from SATA SSD 500,000 ns 500 us DC S351 Round trip within same datacenter 500,000 ns 500 us One-way ping across Ethernet is ~250us Read 1MB sequentially from SATA SSD 1,818,000 ns 1,818 us 2 ms ~550MB/sec DC S3510 SATA SSD Read 1MB sequentially from disk 5,000,000 ns 5,000 us 5 ms ~200MB/sec server hard disk (seek time would be additional latency) Random Disk Access (seek+rotation) 10,000,000 ns 10,000 us 10 ms Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms -
Eliot Eshelman revised this gist
Dec 29, 2016 . 1 changed file with 8 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -34,10 +34,13 @@ Transfer 1MB to/from NVLink GPU 30,000 ns 30 us ~33GB/s Transfer 1MB to/from PCI-E GPU 80,000 ns 80 us ~12GB/sec on PCI-Express x16 link Intel Xeon CPU values --------------------- Wake up from C1 state 500 ns varies from <0.5us to 2us Wake up from C3 state 15,000 ns 15 us varies from 10us to 50us Wake up from C6 state 30,000 ns 30 us varies from 20us to 60us Warm up Intel SkyLake AVX units 14,000 ns 14 us AVX units go to sleep after ~675 us Notes @@ -69,4 +72,5 @@ NVMe SSD: http://www.intel.com/content/dam/www/public/us/en/do SATA SSD: http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-s3510-spec.pdf GPU optimization: https://www.olcf.ornl.gov/wp-content/uploads/2013/02/GPU_Opt_Fund-CW1.pdf CPU/GPU data locality: https://people.maths.ox.ac.uk/gilesm/cuda/lecs/lecs.pdf GPU Memory Hierarchy: https://arxiv.org/pdf/1509.02308&ved...qHEz78QnmcIVCSXvg&sig2=IdzxfrzQgNv8yq7e1mkeVg Intel Xeon C-state data: http://ena-hpc.org/2014/pdf/paper_06.pdf -
Eliot Eshelman revised this gist
Dec 28, 2016 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -55,7 +55,7 @@ GPU NVLink connections are not always 40GB. They range from 20GB to 80GB, depend Credit ------ Adapted from: https://gist.github.com/jboner/2841832 Original curator: http://research.google.com/people/jeff/ Originally by Peter Norvig: http://norvig.com/21-days.html#answers Additional Data Gathered/Correlated from: -
Eliot Eshelman revised this gist
Dec 28, 2016 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -13,7 +13,7 @@ Compress 1KB with Google Snappy 3,000 ns 3 us Send 4KB over 10 Gbps ethernet 10,000 ns 10 us Write 4KB randomly to NVMe SSD 30,000 ns 30 us DC P3608 NVMe SSD (best case; QOS 99% is 500us) Transfer 1MB to/from NVLink GPU 30,000 ns 30 us ~33GB/sec on NVIDIA 40GB NVLink Transfer 1MB to/from PCI-E GPU 80,000 ns 80 us ~12GB/sec on PCI-Express x16 gen 3.0 link Read 4KB randomly from NVMe SSD 120,000 ns 120 us DC P3608 NVMe SSD (QOS 99%) Read 1MB sequentially from NVMe SSD 208,000 ns 208 us ~4.8GB/sec DC P3608 NVMe SSD Write 4KB randomly to SATA SSD 500,000 ns 500 us DC S3510 SATA SSD (QOS 99.9%) -
Eliot Eshelman revised this gist
Dec 28, 2016 . 1 changed file with 3 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -18,7 +18,7 @@ Read 4KB randomly from NVMe SSD 120,000 ns 120 us DC P360 Read 1MB sequentially from NVMe SSD 208,000 ns 208 us ~4.8GB/sec DC P3608 NVMe SSD Write 4KB randomly to SATA SSD 500,000 ns 500 us DC S3510 SATA SSD (QOS 99.9%) Read 4KB randomly from SATA SSD 500,000 ns 500 us DC S3510 SATA SSD (QOS 99.9%) Round trip within same datacenter 500,000 ns 500 us One-way ping across Ethernet is ~250us Read 1MB sequentially from SATA SSD 1,818,000 ns 1,818 us 2 ms ~550MB/sec DC S3510 SATA SSD Read 1MB sequentially from disk 5,000,000 ns 5,000 us 5 ms ~200MB/sec server hard disk (seek time would be additional latency) Disk Access (seek + rotation time) 10,000,000 ns 10,000 us 10 ms @@ -54,11 +54,12 @@ GPU NVLink connections are not always 40GB. They range from 20GB to 80GB, depend Credit ------ Adapted from: https://gist.github.com/jboner/2841832 Curated by Jeff Dean: http://research.google.com/people/jeff/ Originally by Peter Norvig: http://norvig.com/21-days.html#answers Additional Data Gathered/Correlated from: ----------------------------------------- Memory latency tool: https://github.com/ssvb/tinymembench CPU data from Agner Fog: http://www.agner.org/optimize/ Intel Broadwell CPU data: http://users.atw.hu/instlatx64/GenuineIntel00306D4_Broadwell2_NewMemLat.txt -
Eliot Eshelman revised this gist
Dec 27, 2016 . 1 changed file with 23 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -12,27 +12,45 @@ Send 4KB over 100 Gbps HPC fabric 1,040 ns 1 us MVAPICH Compress 1KB with Google Snappy 3,000 ns 3 us Send 4KB over 10 Gbps ethernet 10,000 ns 10 us Write 4KB randomly to NVMe SSD 30,000 ns 30 us DC P3608 NVMe SSD (best case; QOS 99% is 500us) Transfer 1MB to/from NVLink GPU 30,000 ns 30 us ~33GB/sec on NVIDIA 40GB NVLink Transfer 1MB to/from PCI-E GPU 80,000 ns 80 us ~12GB/sec on PCI-Express x16 link Read 4KB randomly from NVMe SSD 120,000 ns 120 us DC P3608 NVMe SSD (QOS 99%) Read 1MB sequentially from NVMe SSD 208,000 ns 208 us ~4.8GB/sec DC P3608 NVMe SSD Write 4KB randomly to SATA SSD 500,000 ns 500 us DC S3510 SATA SSD (QOS 99.9%) Read 4KB randomly from SATA SSD 500,000 ns 500 us DC S3510 SATA SSD (QOS 99.9%) Round trip within same datacenter 500,000 ns 500 us Read 1MB sequentially from SATA SSD 1,818,000 ns 1,818 us 2 ms ~550MB/sec DC S3510 SATA SSD Read 1MB sequentially from disk 5,000,000 ns 5,000 us 5 ms ~200MB/sec server hard disk (seek time would be additional latency) Disk Access (seek + rotation time) 10,000,000 ns 10,000 us 10 ms Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms NVIDIA Tesla GPU values ----------------------- GPU Shared Memory access 30 ns 30~90 cycles (bank conflicts will introduce more latency) GPU Global Memory access 200 ns 200~800 cycles, depending upon GPU generation and access patterns Launch CUDA kernel on GPU 10,000 ns 10 us Transfer 1MB to/from NVLink GPU 30,000 ns 30 us ~33GB/sec on NVIDIA 40GB NVLink Transfer 1MB to/from PCI-E GPU 80,000 ns 80 us ~12GB/sec on PCI-Express x16 link Other useful values ------------------ Warm up Intel SkyLake AVX units 14,000 ns 14 us AVX units go to sleep after ~675 us Timings of C-state changes? Notes ----- 1 ns = 10^-9 seconds 1 us = 10^-6 seconds = 1,000 ns 1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns Assumes a CPU clock frequency of 2.6GHz (common for Xeon server CPUs). That's ~0.385ns per clock cycle. Assumes a GPU clock frequency of 1GHz (NVIDIA Tesla GPUs range from 0.8~1.4GHz). That's 1ns per clock cycle. GPU NVLink connections are not always 40GB. They range from 20GB to 80GB, depending upon the server platform design. Credit ------ @@ -47,4 +65,7 @@ Intel Broadwell CPU data: http://users.atw.hu/instlatx64/GenuineIntel00306D4_B Intel SkyLake CPU data: http://www.7-cpu.com/cpu/Skylake.html MVAPICH2 fabric testing: http://mug.mvapich.cse.ohio-state.edu/static/media/mug/presentations/2016/DK_Status_and_Roadmap_MUG16.pdf NVMe SSD: http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-p3608-spec.pdf SATA SSD: http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-s3510-spec.pdf GPU optimization: https://www.olcf.ornl.gov/wp-content/uploads/2013/02/GPU_Opt_Fund-CW1.pdf CPU/GPU data locality: https://people.maths.ox.ac.uk/gilesm/cuda/lecs/lecs.pdf GPU Memory Hierarchy: https://arxiv.org/pdf/1509.02308&ved...qHEz78QnmcIVCSXvg&sig2=IdzxfrzQgNv8yq7e1mkeVg -
Eliot Eshelman revised this gist
Dec 27, 2016 . 1 changed file with 12 additions and 15 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -8,21 +8,21 @@ L3 cache reference 16 ns 42 cycl Mutex lock/unlock 25 ns 64MB main memory reference 46 ns TinyMemBench on "Broadwell" E5-2690v4 256MB main memory reference 75 ns TinyMemBench on "Broadwell" E5-2690v4 Send 4KB over 100 Gbps HPC fabric 1,040 ns 1 us MVAPICH2 over Intel Omni-Path / Mellanox EDR Compress 1KB with Google Snappy 3,000 ns 3 us Send 4KB over 10 Gbps ethernet 10,000 ns 10 us Write 4KB randomly to NVMe SSD 30,000 ns 30 us DC P3608 NVMe SSD (best case; QOS 99% is 500us) Read 4KB randomly from NVMe SSD 120,000 ns 120 us DC P3608 NVMe SSD (QOS 99%) Read 1MB sequentially from NVMe SSD 208,000 ns 208 us ~4.8GB/sec DC P3608 NVMe SSD Write 4KB randomly to SATA SSD 500,000 ns 500 us DC S3510 SATA SSD (QOS 99.9%) Read 4KB randomly from SATA SSD 500,000 ns 500 us DC S3510 SATA SSD (QOS 99.9%) Round trip within same datacenter 500,000 ns 500 us Read 1MB sequentially from SATA SSD 1,818,000 ns 1,818 us 2 ms ~550MB/sec DC S3510 SATA SSD Read 1MB sequentially from disk 5,000,000 ns 5,000 us 5 ms ~200MB/sec server hard disk (seek time would be additional latency) Disk Access (seek + rotation time) 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms Other useful values ------------------ Warm up Intel SkyLake AVX units 14,000 ns 14 us AVX units go to sleep after ~675 us Timings of C-state changes? @@ -32,9 +32,6 @@ Notes 1 ns = 10^-9 seconds 1 us = 10^-6 seconds = 1,000 ns 1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns Assumes a CPU clock frequency of 2.6GHz (common for Xeon server CPUs). That's ~0.385ns per clock cycle. Credit -
Eliot Eshelman revised this gist
Dec 27, 2016 . 1 changed file with 23 additions and 9 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,18 +1,23 @@ Latency Comparison Numbers -------------------------- L1 cache reference 1.5 ns 4 cycles Floating-point add/mult/FMA operation 1.5 ns 4 cycles L2 cache reference 5 ns 12 ~ 17 cycles Branch mispredict 6 ns 15 ~ 20 cycles L3 cache reference 16 ns 42 cycles Mutex lock/unlock 25 ns 64MB main memory reference 46 ns TinyMemBench on "Broadwell" E5-2690v4 256MB main memory reference 75 ns TinyMemBench on "Broadwell" E5-2690v4 Send 4K bytes over 100 Gbps HPC fabric 1,040 ns 1 us MVAPICH2 over Intel Omni-Path / Mellanox EDR Compress 1K bytes with Google Snappy 3,000 ns 3 us Send 1K bytes over 1 Gbps network 10,000 ns 10 us Write 4K randomly to NVMe SSD 30,000 ns 30 us DC P3608 NVMe SSD (best case; QOS 99% is 500us) Read 4K randomly from NVMe SSD 120,000 ns 120 us DC P3608 NVMe SSD (QOS 99%) Read 1 MB sequentially from NVMe SSD 208,000 ns 208 us ~4.8GB/sec DC P3608 NVMe SSD Write 4K randomly to SATA SSD 500,000 ns 500 us DC S3510 SATA SSD (QOS 99.9%) Read 4K randomly from SATA SSD 500,000 ns 500 us DC S3510 SATA SSD (QOS 99.9%) Round trip within same datacenter 500,000 ns 500 us Read 1 MB sequentially from SATA SSD 1,818,000 ns 1,818 us 2 ms ~550MB/sec DC S3510 SATA SSD Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms @@ -36,4 +41,13 @@ Credit ------ Curated by Jeff Dean: http://research.google.com/people/jeff/ Originally by Peter Norvig: http://norvig.com/21-days.html#answers Additional Data Gathered/Correlated from: --------------------------------------- Memory latency tool: https://github.com/ssvb/tinymembench CPU data from Agner Fog: http://www.agner.org/optimize/ Intel Broadwell CPU data: http://users.atw.hu/instlatx64/GenuineIntel00306D4_Broadwell2_NewMemLat.txt Intel SkyLake CPU data: http://www.7-cpu.com/cpu/Skylake.html MVAPICH2 fabric testing: http://mug.mvapich.cse.ohio-state.edu/static/media/mug/presentations/2016/DK_Status_and_Roadmap_MUG16.pdf NVMe SSD: http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-p3608-spec.pdf SATA SSD: http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-s3510-spec.pdf -
Eliot Eshelman revised this gist
Dec 23, 2016 . 1 changed file with 16 additions and 11 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,11 +1,13 @@ Latency Comparison Numbers -------------------------- L1 cache reference 0.4 ns 1 cycle Floating-point add/mult/FMA operation 1.5 ns 4 cycles Branch mispredict 5 ns 15 ~ 20 cycles L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Send 1K bytes over 100 Gbps HPC fabric 1,100 ns 1 us MVAPICH2 over Intel Omni-Path Compress 1K bytes with Google Snappy 3,000 ns 3 us Send 1K bytes over 1 Gbps network 10,000 ns 10 us Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD Read 1 MB sequentially from memory 250,000 ns 250 us @@ -15,20 +17,23 @@ Disk seek 10,000,000 ns 10,000 us 10 ms 20x dat Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms More useful values ------------------ Warm up Intel SkyLake AVX units 14,000 ns 14 us AVX units go to sleep after ~675 us Timings of C-state changes? Notes ----- 1 ns = 10^-9 seconds 1 us = 10^-6 seconds = 1,000 ns 1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns Details ------- Assumes a CPU clock frequency of 2.6GHz (common for Xeon server CPUs). That's ~0.385ns per clock cycle. Credit ------ Curated by Jeff Dean: http://research.google.com/people/jeff/ Originally by Peter Norvig: http://norvig.com/21-days.html#answers Much data from Agner Fog http://www.agner.org/optimize/ -
jboner revised this gist
Jan 15, 2016 . 1 changed file with 20 additions and 20 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,25 +1,25 @@ Latency Comparison Numbers -------------------------- L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns 3 us Send 1K bytes over 1 Gbps network 10,000 ns 10 us Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD Read 1 MB sequentially from memory 250,000 ns 250 us Round trip within same datacenter 500,000 ns 500 us Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms Notes ----- 1 ns = 10^-9 seconds 1 us = 10^-6 seconds = 1,000 ns 1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns Credit ------ @@ -28,7 +28,7 @@ Originally by Peter Norvig: http://norvig.com/21-days.html#answers Contributions ------------- Some updates from: https://gist.github.com/2843375 'Humanized' comparison: https://gist.github.com/2843375 Visual comparison chart: http://i.imgur.com/k0t1e.png Animated presentation: http://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development/latency.txt -
jboner revised this gist
Dec 13, 2015 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -17,8 +17,8 @@ Send packet CA->Netherlands->CA 150,000,000 ns 150 ms Notes ----- 1 ns = 10^-9 seconds 1 ms = 10^-3 seconds * Assuming ~1GB/sec SSD Credit -
jboner revised this gist
Jun 7, 2012 . 1 changed file with 18 additions and 9 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,25 +1,34 @@ Latency Comparison Numbers -------------------------- L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms Read 4K randomly from SSD* 150,000 ns 0.15 ms Read 1 MB sequentially from memory 250,000 ns 0.25 ms Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150 ms Notes ----- 1 ns = 10-9 seconds 1 ms = 10-3 seconds * Assuming ~1GB/sec SSD Credit ------ By Jeff Dean: http://research.google.com/people/jeff/ Originally by Peter Norvig: http://norvig.com/21-days.html#answers Contributions ------------- Some updates from: https://gist.github.com/2843375 Great 'humanized' comparison version: https://gist.github.com/2843375 Visual comparison chart: http://i.imgur.com/k0t1e.png Nice animated presentation of the data: http://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development/ -
jboner revised this gist
Jun 7, 2012 . 1 changed file with 2 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -21,4 +21,5 @@ By Jeff Dean (http://research.google.com/people/jeff/) Originally by Peter Norvig (http://norvig.com/21-days.html#answers) Some updates from: https://gist.github.com/2843375 Great 'humanized' comparison version: https://gist.github.com/2843375 Visual comparison chart: http://i.imgur.com/k0t1e.png Nice animated presentation of the data: http://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development/ -
jboner revised this gist
Jun 2, 2012 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,7 +5,7 @@ Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms Read 4K randomly from SSD 150,000 ns 0.15 ms Read 1 MB sequentially from memory 250,000 ns 0.25 ms Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD 1,000,000 ns 1 ms 4X memory -
jboner revised this gist
Jun 2, 2012 . 1 changed file with 3 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,7 +5,7 @@ Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms SSD 4K random read 150,000 ns 0.15 ms Read 1 MB sequentially from memory 250,000 ns 0.25 ms Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD 1,000,000 ns 1 ms 4X memory @@ -20,4 +20,5 @@ Assuming ~1GB/sec SSD By Jeff Dean (http://research.google.com/people/jeff/) Originally by Peter Norvig (http://norvig.com/21-days.html#answers) Some updates from: https://gist.github.com/2843375 Great 'humanized' comparison version: https://gist.github.com/2843375 Visual comparison chart: http://i.imgur.com/k0t1e.png -
jboner revised this gist
Jun 1, 2012 . 1 changed file with 2 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,6 +5,7 @@ Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms SSD random read 150,000 ns Read 1 MB sequentially from memory 250,000 ns 0.25 ms Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD 1,000,000 ns 1 ms 4X memory @@ -19,4 +20,4 @@ Assuming ~1GB/sec SSD By Jeff Dean (http://research.google.com/people/jeff/) Originally by Peter Norvig (http://norvig.com/21-days.html#answers) Some updates from: https://gist.github.com/2843375 Great 'humanized' comparison version: https://gist.github.com/2843375 -
jboner revised this gist
Jun 1, 2012 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -10,7 +10,7 @@ Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD 1,000,000 ns 1 ms 4X memory Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150 ms 1 ns = 10-9 seconds 1 ms = 10-3 seconds -
jboner revised this gist
Jun 1, 2012 . 1 changed file with 6 additions and 6 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -12,11 +12,11 @@ Disk seek 10,000,000 ns 10 ms 20x datacenter Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150 ms 1 ns = 10-9 seconds 1 ms = 10-3 seconds Assuming ~1GB/sec SSD By Jeff Dean (http://research.google.com/people/jeff/) Originally by Peter Norvig (http://norvig.com/21-days.html#answers) Some updates from: https://gist.github.com/2843375 Great 'humanized' comparison version: https://gist.github.com/2843375 -
jboner revised this gist
Jun 1, 2012 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -13,8 +13,8 @@ Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X Send packet CA->Netherlands->CA 150,000,000 ns 150 ms By Jeff Dean (http://research.google.com/people/jeff/) Originally by Peter Norvig (http://norvig.com/21-days.html#answers) With some updates from Brendan (http://brenocon.com/dean_perf.html) Assuming ~1GB/sec SSD -
jboner revised this gist
Jun 1, 2012 . 1 changed file with 21 additions and 13 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,14 +1,22 @@ L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms Read 1 MB sequentially from memory 250,000 ns 0.25 ms Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD 1,000,000 ns 1 ms 4X memory Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150 ms By Jeff Dean (http://research.google.com/people/jeff/) With some updates from Brendan: http://brenocon.com/dean_perf.html Comparisons from https://gist.github.com/2844130 Assuming ~1GB/sec SSD 1 ns = 10-9 seconds 1 ms = 10-3 seconds -
jboner revised this gist
May 31, 2012 . 1 changed file with 11 additions and 11 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,14 +1,14 @@ L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Zippy 3,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns By Jeff Dean (http://research.google.com/people/jeff/): -
jboner revised this gist
May 31, 2012 . 1 changed file with 3 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,3 @@ L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns @@ -11,4 +9,6 @@ Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns By Jeff Dean (http://research.google.com/people/jeff/): -
jboner revised this gist
May 31, 2012 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,4 +1,4 @@ By Jeff Dean (http://research.google.com/people/jeff/): L1 cache reference 0.5 ns Branch mispredict 5 ns -
jboner revised this gist
May 31, 2012 . 1 changed file with 2 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,3 +1,5 @@ By Jeff Dean: L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns -
jboner revised this gist
May 31, 2012 . No changes.There are no files selected for viewing
-
jboner created this gist
May 31, 2012 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,12 @@ L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Zippy 3,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns