Skip to main content

hard drive - Why are 16 threads more efficient than 8 on an i7 with hyperthreaded 4 cores? (Robocopy)


In Windows 8.1, I am using Robocopy to save 2 servers' data onto a dedicated PC's storage space. The data volume is 147,314 files in 4,110 folders (66,841,845,760 bytes).


All 3 involved PCs feature an i7 CPU with 4 cores and are in a 1 Gb network. The target's Storage Space (mirrored and striped on D:) is realized using a 4 x 4 TB JBOD case.


Due to the CPUs' 4 cores and hyperthreading I was expecting, that the Robocopy switch /MT:8 would work best, and that more than 8 threads would be overkill due to not beneficiary thread management.


I tested this. I list the fourth test series' data here (duration in mm:ss):


 1 thread:  59:19
2 threads: 39:12
4 threads: 29:13
8 threads: 24:36
16 threads: 24:19
32 threads: 24:27

Granted, the few seconds using 16 threads are negligible, but they are consistent in all test series, i.e. not due to more loadwork on the less than 16 threads test (unless this was the case in all 4 test series). Also note, that 32 threads are almost always a bit faster than 8 threads.


Question: what technical reason is responsible for using 16 threads being more efficient than 8 threads on an i7 with 4 hyperthreaded cores?



Answer



TL;dr version: if you were doing something highly CPU intensive, such as transcoding video using Handbrake, then you wouldn't want to use more cores than CPUs as there would be nowhere for the work to be done. In this case where most threads will spend 90% of their time asleep waiting of reads or writes having more threads works for you rather than against.




Copying files is not a particularly CPU-bound task. While having more cores may help prevent other tasks from blocking out your copying tool it is unlikely that each thread is running anywhere near 100% on each core.


Each copying thread will send a read request to the hard disk and then will go to sleep while waiting for the read request to be fulfilled. Your spinning rust disk generally has a seek time of 9milliseconds, practically an eternity in CPU terms, and the copying task would not simply spin around saying "is it ready yet?" and wasting CPU cycles. Doing so would lock that thread at 100% CPU and waste resources. No, what happens is that the thread issues a read and the thread is put to sleep until the read completes and the data is ready for the next step.


In the meantime another thread does the same, gets blocked on a read and is put to sleep. This happens for all 16 of your threads. (In reality your reads and writes will be happening at random times as they get out of sync, but you get the idea)


Once one of the threads has data ready for it then Windows reschedules it and it starts processing it for being written. As far as the thread is concerned the process is the same. It says "write this data to file x at location y" and Windows takes the data and deschedules the thread. Windows does the background work to figure out where the file is, moves the data (potentially across the network adding more milliseconds to the delay) and then returns control to the thread once the write succeeded.


No one thread will be burning all the time on a CPU core and so more threads than you have CPUs is not a problem. No thread will be awake long enough for it to be a problem.


If you only had a single CPU with lots of other threads running then you could be bottlenecking on the CPU, but in a multicore system with this kind of workload I would be surprised if the CPU is the problem.


You are more likely to be bottlenecked on hard drive performance and are hitting the queue depth for the read or write buffers on the drives. By using more threads you are pushing something to its limits, be it disk or network, and the only way to find out what is the best number of threads is to do what you have done and experiment with it.


On a system with SSD to SSD copying I would suspect that a lower number of threads might be better as there would be less latency than copying files from spinning rust HDDs, pushing across the network and writing to spinning rust, but I have no evidence to support that supposition.


Comments

Popular Posts

Use Google instead of Bing with Windows 10 search

I want to use Google Chrome and Google search instead of Bing when I search in Windows 10. Google Chrome is launched when I click on web, but it's Bing search. (My default search engine on Google and Edge is http://www.google.com ) I haven't found how to configure that. Someone can help me ? Answer There is no way to change the default in Cortana itself but you can redirect it in Chrome. You said that it opens the results in the Chrome browser but it used Bing search right? There's a Chrome extension now that will redirect Bing to Google, DuckDuckGo, or Yahoo , whichever you prefer. More information on that in the second link.

linux - Using an index to make grep faster?

I find myself grepping the same codebase over and over. While it works great, each command takes about 10 seconds, so I am thinking about ways to make it faster. So can grep use some sort of index? I understand an index probably won't help for complicated regexps, but I use mostly very simple patters. Does an indexer exist for this case? EDIT: I know about ctags and the like, but I would like to do full-text search. Answer what about cscope , does this match your shoes? Allows searching code for: all references to a symbol global definitions functions called by a function functions calling a function text string regular expression pattern a file files including a file

How do I transmit a single hexadecimal value serial data in PuTTY using an Alt code?

I am trying to sent a specific hexadecimal value across a serial COM port using PuTTY. Specifically, I want to send the hex codes 9C, B6, FC, and 8B. I have looked up the Alt codes for these and they are 156, 182, 252, and 139 respectively. However, whenever I input the Alt codes, a preceding hex value of C2 is sent before 9C, B6, and 8B so the values that are sent are C2 9C, C2 B6, and C2 8B. The value for FC is changed to C3 FC. Why are these values being placed before the hex value and why is FC being changed altogether? To me, it seems like there is a problem internally converting the Alt code to hex. Is there a way to directly input hex values without using Alt codes in PuTTY? Answer What you're seeing is just ordinary text character set conversion. As far as PuTTY is concerned, you are typing (and reading) text , not raw binary data, therefore it has to convert the text to bytes in whatever configured character set before sending it over the wire. In other words, when y...

linux - CentOs 7.1 - Install Tomcat 8

I am using this tutorial as a setup reference to getting a Tomcat 8 running on CentOs 7.1 , but after typing: [root@localhost tomcat]# sudo systemctl start tomcat I get the error: Job for tomcat.service failed. See 'systemctl status tomcat.service' and 'journalctl -xn' for details. systemctl status tomcat.service prints the following: [root@localhost tomcat]# systemctl status tomcat.service tomcat.service - Apache Tomcat Web Application Container Loaded: loaded (/etc/systemd/system/tomcat.service; disabled) Active: failed (Result: exit-code) since Wed 2015-11-25 16:54:33 CET; 1min 19s ago Process: 45873 ExecStart=/opt/tomcat/bin/startup.sh (code=exited, status=203/EXEC) Nov 25 16:54:33 localhost.localdomain systemd[1]: Starting Apache Tomcat Web Application Container... Nov 25 16:54:33 localhost.localdomain systemd[1]: tomcat.service: control process exited, code=exited status=203 Nov 25 16:54:33 localhost.localdomain systemd[1]: Failed to start Apache Tomcat Web App...