Skip to main content

hard drive - Why are 16 threads more efficient than 8 on an i7 with hyperthreaded 4 cores? (Robocopy)


In Windows 8.1, I am using Robocopy to save 2 servers' data onto a dedicated PC's storage space. The data volume is 147,314 files in 4,110 folders (66,841,845,760 bytes).


All 3 involved PCs feature an i7 CPU with 4 cores and are in a 1 Gb network. The target's Storage Space (mirrored and striped on D:) is realized using a 4 x 4 TB JBOD case.


Due to the CPUs' 4 cores and hyperthreading I was expecting, that the Robocopy switch /MT:8 would work best, and that more than 8 threads would be overkill due to not beneficiary thread management.


I tested this. I list the fourth test series' data here (duration in mm:ss):


 1 thread:  59:19
2 threads: 39:12
4 threads: 29:13
8 threads: 24:36
16 threads: 24:19
32 threads: 24:27

Granted, the few seconds using 16 threads are negligible, but they are consistent in all test series, i.e. not due to more loadwork on the less than 16 threads test (unless this was the case in all 4 test series). Also note, that 32 threads are almost always a bit faster than 8 threads.


Question: what technical reason is responsible for using 16 threads being more efficient than 8 threads on an i7 with 4 hyperthreaded cores?



Answer



TL;dr version: if you were doing something highly CPU intensive, such as transcoding video using Handbrake, then you wouldn't want to use more cores than CPUs as there would be nowhere for the work to be done. In this case where most threads will spend 90% of their time asleep waiting of reads or writes having more threads works for you rather than against.




Copying files is not a particularly CPU-bound task. While having more cores may help prevent other tasks from blocking out your copying tool it is unlikely that each thread is running anywhere near 100% on each core.


Each copying thread will send a read request to the hard disk and then will go to sleep while waiting for the read request to be fulfilled. Your spinning rust disk generally has a seek time of 9milliseconds, practically an eternity in CPU terms, and the copying task would not simply spin around saying "is it ready yet?" and wasting CPU cycles. Doing so would lock that thread at 100% CPU and waste resources. No, what happens is that the thread issues a read and the thread is put to sleep until the read completes and the data is ready for the next step.


In the meantime another thread does the same, gets blocked on a read and is put to sleep. This happens for all 16 of your threads. (In reality your reads and writes will be happening at random times as they get out of sync, but you get the idea)


Once one of the threads has data ready for it then Windows reschedules it and it starts processing it for being written. As far as the thread is concerned the process is the same. It says "write this data to file x at location y" and Windows takes the data and deschedules the thread. Windows does the background work to figure out where the file is, moves the data (potentially across the network adding more milliseconds to the delay) and then returns control to the thread once the write succeeded.


No one thread will be burning all the time on a CPU core and so more threads than you have CPUs is not a problem. No thread will be awake long enough for it to be a problem.


If you only had a single CPU with lots of other threads running then you could be bottlenecking on the CPU, but in a multicore system with this kind of workload I would be surprised if the CPU is the problem.


You are more likely to be bottlenecked on hard drive performance and are hitting the queue depth for the read or write buffers on the drives. By using more threads you are pushing something to its limits, be it disk or network, and the only way to find out what is the best number of threads is to do what you have done and experiment with it.


On a system with SSD to SSD copying I would suspect that a lower number of threads might be better as there would be less latency than copying files from spinning rust HDDs, pushing across the network and writing to spinning rust, but I have no evidence to support that supposition.


Comments

Popular Posts

How do I transmit a single hexadecimal value serial data in PuTTY using an Alt code?

I am trying to sent a specific hexadecimal value across a serial COM port using PuTTY. Specifically, I want to send the hex codes 9C, B6, FC, and 8B. I have looked up the Alt codes for these and they are 156, 182, 252, and 139 respectively. However, whenever I input the Alt codes, a preceding hex value of C2 is sent before 9C, B6, and 8B so the values that are sent are C2 9C, C2 B6, and C2 8B. The value for FC is changed to C3 FC. Why are these values being placed before the hex value and why is FC being changed altogether? To me, it seems like there is a problem internally converting the Alt code to hex. Is there a way to directly input hex values without using Alt codes in PuTTY? Answer What you're seeing is just ordinary text character set conversion. As far as PuTTY is concerned, you are typing (and reading) text , not raw binary data, therefore it has to convert the text to bytes in whatever configured character set before sending it over the wire. In other words, when y...

linux - Extract/save a mail attachment using bash

Using normal bash tools (ie, built-ins or commonly-available command-line tools), is it possible, and how to extract/save attachments on emails? For example, say I have a nightly report which arrives via email but is a zip archive of several log files. I want to save all those zips into a backup directory. How would I accomplish that? Answer If you're aiming for portability, beware that there are several different versions of mail(1) and mailx(1) . There's a POSIX mailx command, but with very few requirements. And none of the implementations I have seem to parse attachments anyway. You might have the mpack package . Its munpack command saves all parts of a MIME message into separate files, then all you have to do is save the interesting parts and clean up the rest. There's also metamail . An equivalent of munpack is metamail -wy .

performance - Single Threaded Qaud Core v.s Hyper-Threading Dual Core

Let's say we have two CPUs, One is Quad Core 3.2 Ghz with 4 Cores, and We have a Dual Core 3.2 Ghz with 2 Cores with 2 threads in each Core (Hyper-Threading). My assumption as a programmer will be, the 4 cores 4 threads should perform faster than 2 cores 4 threads since the second CPU needs to switch between threads in order to emulate 4 cores while the first one doesn't need to perform such switching as each core can perform independently and individually. I want to confirm that my assumption is true, if not please explain why one is better than the other. Answer I do believe thats true - since hyper threading does share some elements - specifically the main execution resources, you'll be able to run 4 full threads at once, rather than waiting for those resources to be freed up. The point of HT is to get better performance with a smaller use of die area - your quad core would generally be a bigger chip - say almost twice as large, than a non HT dual core chip, while a HT...

freeze - How do I stop windows 8.1 from freezing when the screen locks

This happens to me on a regular basis if I leave the computer for upwards of 10 minutes. It didnt do so at first but started after a couple of days. This is possibly related to further windows updates although nothing seems to tie in obviously when looking at my update history. I have to hold the power button in to power off. If the screens have switched off aswell they wont come back on, if they haven't I see the login picture and can move the mouse pointer but nothing happens and no combination of keyboard mashes or mouse clicks lets me see the login prompt. In the event log (type event viewer into the start menu) under system before every Critical problem (me powering down the machine without restarting) I get distributedCOM errors talking about this guid: "The server {BF6C1E47-86EC-4194-9CE5-13C15DCB2001} did not register with DCOM within the required timeout." I also get the same error for this 1B1F472E-3221-4826-97DB-2C2324D389AE. This seems to be a common theme and...