Skip to main content

hard drive - Why are 16 threads more efficient than 8 on an i7 with hyperthreaded 4 cores? (Robocopy)


In Windows 8.1, I am using Robocopy to save 2 servers' data onto a dedicated PC's storage space. The data volume is 147,314 files in 4,110 folders (66,841,845,760 bytes).


All 3 involved PCs feature an i7 CPU with 4 cores and are in a 1 Gb network. The target's Storage Space (mirrored and striped on D:) is realized using a 4 x 4 TB JBOD case.


Due to the CPUs' 4 cores and hyperthreading I was expecting, that the Robocopy switch /MT:8 would work best, and that more than 8 threads would be overkill due to not beneficiary thread management.


I tested this. I list the fourth test series' data here (duration in mm:ss):


 1 thread:  59:19
2 threads: 39:12
4 threads: 29:13
8 threads: 24:36
16 threads: 24:19
32 threads: 24:27

Granted, the few seconds using 16 threads are negligible, but they are consistent in all test series, i.e. not due to more loadwork on the less than 16 threads test (unless this was the case in all 4 test series). Also note, that 32 threads are almost always a bit faster than 8 threads.


Question: what technical reason is responsible for using 16 threads being more efficient than 8 threads on an i7 with 4 hyperthreaded cores?



Answer



TL;dr version: if you were doing something highly CPU intensive, such as transcoding video using Handbrake, then you wouldn't want to use more cores than CPUs as there would be nowhere for the work to be done. In this case where most threads will spend 90% of their time asleep waiting of reads or writes having more threads works for you rather than against.




Copying files is not a particularly CPU-bound task. While having more cores may help prevent other tasks from blocking out your copying tool it is unlikely that each thread is running anywhere near 100% on each core.


Each copying thread will send a read request to the hard disk and then will go to sleep while waiting for the read request to be fulfilled. Your spinning rust disk generally has a seek time of 9milliseconds, practically an eternity in CPU terms, and the copying task would not simply spin around saying "is it ready yet?" and wasting CPU cycles. Doing so would lock that thread at 100% CPU and waste resources. No, what happens is that the thread issues a read and the thread is put to sleep until the read completes and the data is ready for the next step.


In the meantime another thread does the same, gets blocked on a read and is put to sleep. This happens for all 16 of your threads. (In reality your reads and writes will be happening at random times as they get out of sync, but you get the idea)


Once one of the threads has data ready for it then Windows reschedules it and it starts processing it for being written. As far as the thread is concerned the process is the same. It says "write this data to file x at location y" and Windows takes the data and deschedules the thread. Windows does the background work to figure out where the file is, moves the data (potentially across the network adding more milliseconds to the delay) and then returns control to the thread once the write succeeded.


No one thread will be burning all the time on a CPU core and so more threads than you have CPUs is not a problem. No thread will be awake long enough for it to be a problem.


If you only had a single CPU with lots of other threads running then you could be bottlenecking on the CPU, but in a multicore system with this kind of workload I would be surprised if the CPU is the problem.


You are more likely to be bottlenecked on hard drive performance and are hitting the queue depth for the read or write buffers on the drives. By using more threads you are pushing something to its limits, be it disk or network, and the only way to find out what is the best number of threads is to do what you have done and experiment with it.


On a system with SSD to SSD copying I would suspect that a lower number of threads might be better as there would be less latency than copying files from spinning rust HDDs, pushing across the network and writing to spinning rust, but I have no evidence to support that supposition.


Comments

Popular Posts

Use Google instead of Bing with Windows 10 search

I want to use Google Chrome and Google search instead of Bing when I search in Windows 10. Google Chrome is launched when I click on web, but it's Bing search. (My default search engine on Google and Edge is http://www.google.com ) I haven't found how to configure that. Someone can help me ? Answer There is no way to change the default in Cortana itself but you can redirect it in Chrome. You said that it opens the results in the Chrome browser but it used Bing search right? There's a Chrome extension now that will redirect Bing to Google, DuckDuckGo, or Yahoo , whichever you prefer. More information on that in the second link.

linux - Using an index to make grep faster?

I find myself grepping the same codebase over and over. While it works great, each command takes about 10 seconds, so I am thinking about ways to make it faster. So can grep use some sort of index? I understand an index probably won't help for complicated regexps, but I use mostly very simple patters. Does an indexer exist for this case? EDIT: I know about ctags and the like, but I would like to do full-text search. Answer what about cscope , does this match your shoes? Allows searching code for: all references to a symbol global definitions functions called by a function functions calling a function text string regular expression pattern a file files including a file

How do I transmit a single hexadecimal value serial data in PuTTY using an Alt code?

I am trying to sent a specific hexadecimal value across a serial COM port using PuTTY. Specifically, I want to send the hex codes 9C, B6, FC, and 8B. I have looked up the Alt codes for these and they are 156, 182, 252, and 139 respectively. However, whenever I input the Alt codes, a preceding hex value of C2 is sent before 9C, B6, and 8B so the values that are sent are C2 9C, C2 B6, and C2 8B. The value for FC is changed to C3 FC. Why are these values being placed before the hex value and why is FC being changed altogether? To me, it seems like there is a problem internally converting the Alt code to hex. Is there a way to directly input hex values without using Alt codes in PuTTY? Answer What you're seeing is just ordinary text character set conversion. As far as PuTTY is concerned, you are typing (and reading) text , not raw binary data, therefore it has to convert the text to bytes in whatever configured character set before sending it over the wire. In other words, when y

networking - Windows 10, can ping other PC but cannot access shared folders! What gives?

I have a computer running Windows 7 that shares a Git repo on drive D. Let's call this PC " win7 ". This repo is the origin of a project that we push to and pull from. The network is a wireless network. One PC on this network is running on Windows 10. Let's call this PC " win10 ". Win10 can ping every other PC on the network including win7 . Win7 can ping win10 . Win7 can access all shared files on win10 . Neither of the PCs have passwords. Problem : Win10 cannot access any shared files on win7 , not from Explorer, nor from Git Bash or any other Git management system (E-Git on Eclipse or Visual Studio). So, win10 cannot pull/push. Every other PC on the network can access win7 shared files and push/pull to/from the shared Git origin. What's wrong with Windows 10? I have tried these: Control Panel\All Control Panel Items\Network and Sharing Center\Advanced sharing settings\ File sharing is on, Discovery is on, Password protected sharing is off Adapte