The problem:
This is a problem i ran into when uploading large backup files (~1 TB) via FTP. I am running FileZilla FTP Server, and was also using the FileZilla FTP Client. Due to the connection speed these uploads were taking multiple days, and at certain points the transfers had to be resumed. After a certain point, on the largest files i started getting an error of
550 can't access file. This didn't make much sense to me, as there was no chance of a permissions issue or that something else was using the file. I restarted the FTP service and eventually even rebooted the server but the issue persisted.
Tracking down the cause:
I ended up running
Procmon on the FTP Server service to see what it was doing that was causing the error. This actually gave me an error code on WriteFile which i was able to to follow.
What is actually wrong:
So now i have an error 0xc0000427 which translates to STATUS_FILE_SYSTEM_LIMITATION, an NTFS error. Unfortunately this is not as simple as filesystem damage or path length too long. I found
this blog post from a Microsoft Engineer that, while it doesn't mention this issue, pointed me in the right direction. Basically when you upload a large file via FileZilla FTP Server, it doesn't allocate the space ahead of time, it just grows the file as needed. What this ends up doing is creating an extremely fragmented file. Growing it in this fashion will fill up the NTFS Attribute List for the file, eventually making it so no new child entries can be added. When this happens, any write attempts outside the allocated space will produce the 0xc0000427 error.
Fixing it:
So now I know what's wrong, how do i fix it? Well, Microsoft released a hotfix for Server 2008 R2 and earlier that patches ntfs.sys to support 4Kb Base Records instead of the current 1KB limit. The Base Record contains the Attribute List, so this allows it to grow larger to support... well... heavily fragmented files. The typical reason that people run into this limit are SQL database and VHD(x) files, since they by design have to grow at a steady pace. So that's great, but what do i do to fix it? Well, this hotfix supposedly is included as of Server 2012 R2, but even with that to leverage the 4KB Base Record size you have to format the entire volume with the /L switch. That's useless to me, as i'm not going to offload 20TB of data in order to do this. The article also mentions that running a defrag will not clean this up, as it will just move the fragments to be contiguous but not collapse the Attribute Lists.
TLDR:
The actual fix is so stupidly simple, yet i probably would not have thought to try it if I didn't understand the problem.
Just create a copy of the file on the server, delete the old one, rename the copied version. There, that's it. Since it already knows the size of the file, it will allocate it at the full size, bypassing most fragmentation.