Tuesday, June 30, 2015

Exchange 2010 OWA Search Broken, 0x80041606

Customer with two Exchange 2010 servers, one Hub/CAS and one Mailbox server. Users started receiving an error when searching in OWA: "The action couldn’t be completed. Please try again". I was told that this happened after a power failure (and bad backup batteries), so it sounded like something was corrupted.

The event viewer was also logging an error:
Content Indexing function 'CISearch::EcGetRowsetAndAccessor' received an unusual and unexpected error code from MSSearch.
Mailbox Database: Operations Database 1 
Error Code: 0x80041606

Rebuild the Content Indexes

This was my first step, i took the easy route of stopping the Microsoft Exchange Search Indexer service and deleting the catalogs. These folders are located in every mailbox database folder and start with CatalogData. It is perfectly safe to delete these and then start the service again. After starting the service you will see event viewer entries indicating that it is rebuilding the indexes. You can also run the PowerShell command Get-MailboxDatabaseCopyStatus to see the status. The ContentIndexState column will indicate whether it's Crawling or Healthy.

Repair Symlinks

I saw plenty of articles indicating that the symlinks needed to be repaired, this did nothing.

Reinstall Filter Pack

Ultimately the solution was to reinstall the Microsoft Office Filter Pack on the Mailbox server. This did require a reboot (and yes, I tested it prior to rebooting). Afterwards the search was working as expected.

Thursday, June 18, 2015

550 can't access file, 0xc0000427, and STATUS_FILE_SYSTEM_LIMITATION

The problem:

This is a problem i ran into when uploading large backup files (~1 TB) via FTP. I am running FileZilla FTP Server, and was also using the FileZilla FTP Client. Due to the connection speed these uploads were taking multiple days, and at certain points the transfers had to be resumed. After a certain point, on the largest files i started getting an error of 550 can't access file. This didn't make much sense to me, as there was no chance of a permissions issue or that something else was using the file. I restarted the FTP service and eventually even rebooted the server but the issue persisted.


Tracking down the cause:

I ended up running Procmon on the FTP Server service to see what it was doing that was causing the error. This actually gave me an error code on WriteFile which i was able to to follow.



What is actually wrong:

So now i have an error 0xc0000427 which translates to STATUS_FILE_SYSTEM_LIMITATION, an NTFS error. Unfortunately this is not as simple as filesystem damage or path length too long. I found this blog post from a Microsoft Engineer that, while it doesn't mention this issue, pointed me in the right direction. Basically when you upload a large file via FileZilla FTP Server, it doesn't allocate the space ahead of time, it just grows the file as needed. What this ends up doing is creating an extremely fragmented file. Growing it in this fashion will fill up the NTFS Attribute List for the file, eventually making it so no new child entries can be added. When this happens, any write attempts outside the allocated space will produce the 0xc0000427 error.


Fixing it:

So now I know what's wrong, how do i fix it? Well, Microsoft released a hotfix for Server 2008 R2 and earlier that patches ntfs.sys to support 4Kb Base Records instead of the current 1KB limit. The Base Record contains the Attribute List, so this allows it to grow larger to support... well... heavily fragmented files. The typical reason that people run into this limit are SQL database and VHD(x) files, since they by design have to grow at a steady pace. So that's great, but what do i do to fix it? Well, this hotfix supposedly is included as of Server 2012 R2, but even with that to leverage the 4KB Base Record size you have to format the entire volume with the /L switch. That's useless to me, as i'm not going to offload 20TB of data in order to do this. The article also mentions that running a defrag will not clean this up, as it will just move the fragments to be contiguous but not collapse the Attribute Lists.


TLDR:

The actual fix is so stupidly simple, yet i probably would not have thought to try it if I didn't understand the problem. Just create a copy of the file on the server, delete the old one, rename the copied version. There, that's it. Since it already knows the size of the file, it will allocate it at the full size, bypassing most fragmentation.