I transferred a big file to a USB device (flash memory card). The KDE GUI showed the bar going from 0% to 100% and then disappeared. After that, I picked “safely remove” in KDE Plasma’s menu in the bottom-right, but now it says:
Do not unplug yet! Files are still being transferred...
Clearly, the data was not really transferred while it showed the progress meter. But what was it doing then? How can data appear to transfer yet not really?
There is no progress indicator this time, and the “circle animation” appears stuck. I’m not sure that it is actually doing anything to the device, but the physical unit’s own LED is blinking furiously, so I assume that it must be writing to it after all…
What to make of this?
(Exactly after finishing typing this post, it finished. So it wasn’t stuck forever.)
When you transfered the file to the device. the chipset on the device accepted and cached the file much quicker than it could write the file. You just have to wait till the cached file is written.
it’s not about KDE’s file transfer progress being flawed but more about how file transfering works at the level of the OS. Almost all modern OSs caches the data sent to drives to RAM instead of actually writing them right away for performance reasons.
But this means that when you have to ensure that data is actually written to the drive ( eg - when removing it from the system ), you have to make sure that the cache is properly ‘flushed’ to the drive. Only problem is, you can’t actually determine the actually time it would take to sync all that cached data.
Even Windows uses the same system. Right up to Windows 8 ( at least ), they would show a message saying something along the lines of ‘removing drive, wait till this operation is over before unpluging’ in a notification when it’s ejected. But users are typically stubborn and disconnects devices before they are properly synced, so they disabled caching for removable drives ( at the cost of some performance ).
The problem KDE is having is actually created by two separate reasons:
Their text tells the user exactly what’s happening (“Files are still being transferred”) which ironically confuses the user who doesn’t actually understand it’s inner workings. Windows’ generic “Removing Drive” message makes the user think its taking time to finalize some unknown removal procedure, so they doesn’t question that much.
They still use caching for removable devices. This is still the recommended configuration from kernel developers and KDE Plasma doesn’t currently artificially disable caching for them.
BTW, the reason for the progress meter is indeterminate progress bar ( one that only shows some progress is made. not how much ) is that it’s impossible to tell how much time or progress is being made when it comes to writing the cache to the drive.
Annoyingly, MS-Windows users are used to their OS not explaining things and are forced to trust the process, while Linux users expect the OS to actually explain what is happening, and then get confused as annoyed when the technical explanation raises more questions.
I guess what I don’t understand is why it would be “worse performance” to actually write the data immediately as opposed to first saving it to RAM and then writing it at some later point?
If both drives are on the same bus, then reading before writing makes sense. If the drive you are reading is faster than the drive you are writing, then it makes sense.
Sorry, I don’t follow this logic. Why save data to any other place instead of directly to the destination drive? How can it ever be faster? Especially as we’re talking USB, which surely isn’t faster than the internal SATA cables?
Even old tech like Hard Disk Drives have cache memory to provide this ability. This has been around for a long time now. Even if you don’t understand it, you still don’t want to disconnect a drive until the system tells you that it is safe to do so. This is why it is so critical to have a battery backup system on any computer system. It is not there to keep you working. It is there to allow the system to gracefully shut down without losing data that is in the process of being written.
That data write could complete in milliseconds or it might take minutes.
It’s not just about speed. There are various considerations, for example:
You always have to buffer some data in RAM: operations are always either “read from drive to RAM” or “write from RAM to drive”. You can’t actually move data directly from drive to drive (unless both drives can do DMA, and even then I’m not sure).
Different drives have different optimal read or write sizes, and often the optimal size for read is different from write, for the same drive, so you ideally want to buffer data in RAM at a size that is the multiple of both drives’ optimal sizes, so that you can issue “full” read and write commands.
As mentioned before, if both drives are on the same bus (for example when copying from one USB drive to another USB drive), then it becomes even more important to cache a large amount of data in RAM, as the switching between targets may take some time.
All that is regardless of whether the copying application uses “synchronous write” or not. If you can use synchronous writes, then you can verify that the OS has finished writing to the drive when the copying application has finished copying - but that may not yield the best performance, as letting the kernel schedule the timing of writes and the size of writes will always give better performance because the kernel driver has the best knowledge and the best timing to manage the drive.
I’m not familiar with the new MS-Windows behavior (that was described above) of writing synchronous writes to USB due users’ behavior of yanking the drive once the “copy dialog” has finished its progress bar, but it must be slower than letting the OS manage that. I would have told you that you can time that and compare, but MS-Windows is generally lousy at I/O so any file copying comparison with a Linux system would have the MS-Windows system lose, regardless of what method they are using…
I’m consistently impressed by the level of technical knowledge possessed by various KDE users and contributors. It’s good that someone knows how this stuff works under the hood to ensure that the user-facing text hiding all the magic remains accurate.
Thank you for the clarification. I was somehow aware of the caching-related issues but this is a very useful summary that increases my level of understanding.
Yet, I believe that from a UI/UX perspective it is problematic to have a progress report (i.e. transfer complete) where the meaning of that “progress” is ambiguous to the user. The transfer is complete for the purpose of reading back the files, but not for the purpose of removing the media.
I guess the DE could monitor the cache sync status and maintain the transfer notification active as long as write back is not complete? Or it could provide a “force sync” button with tooltip / ref to documentation that better explain what is going on.
I understand that exposing too much of the underlying complexity is always an issue, but I am not sure that having default behaviours that leave the system in an ambiguous state (files having been transferred and not transferred at the same time) is actually better.
Because some of the write caching is done on the controller built into the devices themselves, this has long been an issue for every OS. It doesn’t matter if it is Windows, UNIX, Linux, etc. The OS doesn’t know how long it might take the device to complete the write, because this info is not given to the OS. The controllers built into the devices such as SSDs might not be the same on two different drives of the same model number. The OS can only use the data given to it by the device.
One cannot do much if the controller does not expose the hardware cache state, but one could achieve something equivalent to running sync on demand and waiting for it to finish, I guess?
I get the point about hiding the magic but giving a little more awareness to the user could be beneficial. I see how there is no easy one-fits-all solution though.
Each of these drive manufacturers buy the cheapest controllers they can get, just to keep their products competitively priced and hopefully functional. We are all at their mercy.
At the beginning of the SSD wars, the controllers were a big deal. Each new controller added features and functionality that another brand did not have. We tended to purchase drives according to those specs. Nowadays, all the controllers have pretty much the same standards in place. Why none of them added this ability to show the cache state to the OS is beyond my reasoning. Perhaps, it would chew into the sacred bit rate of the transfers.
Yes, probably the DE could call fsync() when copying a large file into a USB stick, regardless of the kernel’s cache settings. IMHO that does make more sense than a copy that finishes in 1 sec but then unmount takes half a minute.
The problem is:
How to detect “a USB stick” (that would be unmounted very soon). Or shall we do that on all USB disks? Or perhaps a switch on the Disks widget?
What to do when copying a lot of small files? fsync()ing every file would definitely hurt copy speed. sync() to flush all write caches on all disks doesn’t sound very good either.
I wrote a simple plasmoid for myself to have an indicator when there are pending transfers to removable drives, here is the link in case someone finds it useful.
Maybe something like that (but with actual good implementation) would be a good addition to the Disk & Devices applet?