Friday, July 24, 2015

DFS-R SYSVOL Replication - Performing an Authoritative Restore

Many articles have the correct procedure for this, however i have had a few of these cases come up recently so i figured it would be best to aggregate the fixes i've had to put in place.

Failure Reasons

  • Unclean Shutdown - Microsoft changed it so that if DFS-R detects a dirty shutdown, it DOES NOT resume replication. This is obviously very bad if you don't regularly check your event logs.
  • Last Contact Too Old - There is also a limit on how old the database can get without talking to another DFS-R peer. If you don't catch this in time then this can also prevent replication.


Helpful Commands

Please make sure you understand what you are doing before using these commands. These are tools to repair/workaround DFS-R issues, but can also introduce issues if used improperly.
  • If you are getting eventlog entries indicating that you need to resume replication, it should already have the command in the log. It will look something like this (with a different volumeGuid):
    wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid="B4A015E2-A116-11DE-89FB-806E6F6E6963" call ResumeReplication
  • If your eventlog entries are giving ID 4012 saying that it's been disconnected too long, then you may want to (temporarily) raise the limit:
    wmic.exe /namespace:\\root\microsoftdfs path DfsrMachineConfig set MaxOfflineTimeInDays=100
  • If you suffered from this due to a dirty shutdown, you can enable auto recovery to prevent future issues:
    wmic /namespace:\\root\microsoftdfs path dfsrmachineconfig set StopReplicationOnAutoRecovery=FALSE


Authoritative Restore Procedure

  1. Manually backup your SYSVOL, typically it's in C:\WINDOWS\SYSVOL
  2. Throughout this process, BE PATIENT. I have seen some of these steps take 10 minutes to take full effect. If you don't see the events right away do not try the process over again, just wait.
  3. Always use an Administrator command prompt.
  4. Stop the DFSR service on all domain controllers (net stop DFSR)
  5. Run adsiedit.msc, connect to the Default Naming Context, and drill down to your Domain Controllers OU
  6. Pick a domain controller that will be your authoritative restore source. This should be the one with the up-to-date copy of SYSVOL. Drill down to CN=Domain System Volume and then double click CN=SYSVOL Subscription.
  7. Set msDSFR-Enabled to FALSE and msDFSR-Options to 1 and press OK.
  8. On all other domain controllers, perform the same procedure except leave msDFSR-Options not set
  9. On the primary domain controller:
    1. Force an AD replication (repadmin /syncall /AdP)
    2. Start DFSR (net start DFSR)
    3. Wait for event 4114 signaling that it has stopped replication
    4. Go back into adsiedit and only on the primary domain controller, set msDFSR-Enabled to TRUE
    5. Force an AD replication (repadmin /syncall /AdP)
    6. Run dfsrdiag pollad
    7. Wait for event 4602 signaling that it started replication, confirm with net share that SYSVOL and NETLOGON are being shared.
  10. Now perform this task on every other domain controller:
    1. Start DFSR (net start DFSR)
    2. Wait for event 4114 signaling that it has stopped replication
    3. Go back into adsiedit and set msDFSR-Enabled to TRUE for this domain controller only
    4. Force an AD replication (repadmin /syncall /AdP)
    5. Run dfsrdiag pollad
    6. Wait for event 4602 signaling that it started replication, confirm with net share that SYSVOL and NETLOGON are being shared. Note that on this particular step i have had it take a bit of time, and have had to re-run the pollad command after 5-10 minutes for it to actually work.
  11. You should now be fully replicated. If your issues were caused by unclean shutdowns then you might want to consider making it not stop replication on recovery

Sunday, July 12, 2015

Port Forwarding on ADTRAN Netvanta Products with Redundant Uplinks

If you have to set up redundant connections on ADTRAN Netvanta products, there are a few different ways that this can be accomplished. In the event that you are using route maps to poilcy route traffic out different connections, you may run into issues with port forwarding. The issue comes up if the port forward comes in the interface that is not currently the default route. You will receive a message on the console similar to:
2015.07.12 13:30:27 FIREWALL id=firewall time="2015-07-12 13:30:27" fw=FW1 pri=1 proto=3389/tcp src=1.2.3.4 dst=4.5.6.7 msg="Spoofing detected, dropping packet Src 53668 Dst 3389 from ISP2 policy-class on interface vlan 10" agent=AdFirewall

This is because the unit does a URPF (Unicast Reverse Path Forwarding) check and sees that the packet came in on what it thinks is the wrong interface. On equipment from most other vendors, you have to disable this at a global level. For ADTRAN, you actually disable it at the policy-class level, which is counter-intuitive but gives you more granular control. To disable you simply issue this command on the policy class that's receiving the traffic:

FW1(config)#no ip policy-class ISP2 rpf-check
Where ISP2 is your policy class name. You will need to disable this on any policy class that may receive incoming traffic where that interface is not the default route.

Tuesday, June 30, 2015

Exchange 2010 OWA Search Broken, 0x80041606

Customer with two Exchange 2010 servers, one Hub/CAS and one Mailbox server. Users started receiving an error when searching in OWA: "The action couldn’t be completed. Please try again". I was told that this happened after a power failure (and bad backup batteries), so it sounded like something was corrupted.

The event viewer was also logging an error:
Content Indexing function 'CISearch::EcGetRowsetAndAccessor' received an unusual and unexpected error code from MSSearch.
Mailbox Database: Operations Database 1 
Error Code: 0x80041606

Rebuild the Content Indexes

This was my first step, i took the easy route of stopping the Microsoft Exchange Search Indexer service and deleting the catalogs. These folders are located in every mailbox database folder and start with CatalogData. It is perfectly safe to delete these and then start the service again. After starting the service you will see event viewer entries indicating that it is rebuilding the indexes. You can also run the PowerShell command Get-MailboxDatabaseCopyStatus to see the status. The ContentIndexState column will indicate whether it's Crawling or Healthy.

Repair Symlinks

I saw plenty of articles indicating that the symlinks needed to be repaired, this did nothing.

Reinstall Filter Pack

Ultimately the solution was to reinstall the Microsoft Office Filter Pack on the Mailbox server. This did require a reboot (and yes, I tested it prior to rebooting). Afterwards the search was working as expected.

Thursday, June 18, 2015

550 can't access file, 0xc0000427, and STATUS_FILE_SYSTEM_LIMITATION

The problem:

This is a problem i ran into when uploading large backup files (~1 TB) via FTP. I am running FileZilla FTP Server, and was also using the FileZilla FTP Client. Due to the connection speed these uploads were taking multiple days, and at certain points the transfers had to be resumed. After a certain point, on the largest files i started getting an error of 550 can't access file. This didn't make much sense to me, as there was no chance of a permissions issue or that something else was using the file. I restarted the FTP service and eventually even rebooted the server but the issue persisted.


Tracking down the cause:

I ended up running Procmon on the FTP Server service to see what it was doing that was causing the error. This actually gave me an error code on WriteFile which i was able to to follow.



What is actually wrong:

So now i have an error 0xc0000427 which translates to STATUS_FILE_SYSTEM_LIMITATION, an NTFS error. Unfortunately this is not as simple as filesystem damage or path length too long. I found this blog post from a Microsoft Engineer that, while it doesn't mention this issue, pointed me in the right direction. Basically when you upload a large file via FileZilla FTP Server, it doesn't allocate the space ahead of time, it just grows the file as needed. What this ends up doing is creating an extremely fragmented file. Growing it in this fashion will fill up the NTFS Attribute List for the file, eventually making it so no new child entries can be added. When this happens, any write attempts outside the allocated space will produce the 0xc0000427 error.


Fixing it:

So now I know what's wrong, how do i fix it? Well, Microsoft released a hotfix for Server 2008 R2 and earlier that patches ntfs.sys to support 4Kb Base Records instead of the current 1KB limit. The Base Record contains the Attribute List, so this allows it to grow larger to support... well... heavily fragmented files. The typical reason that people run into this limit are SQL database and VHD(x) files, since they by design have to grow at a steady pace. So that's great, but what do i do to fix it? Well, this hotfix supposedly is included as of Server 2012 R2, but even with that to leverage the 4KB Base Record size you have to format the entire volume with the /L switch. That's useless to me, as i'm not going to offload 20TB of data in order to do this. The article also mentions that running a defrag will not clean this up, as it will just move the fragments to be contiguous but not collapse the Attribute Lists.


TLDR:

The actual fix is so stupidly simple, yet i probably would not have thought to try it if I didn't understand the problem. Just create a copy of the file on the server, delete the old one, rename the copied version. There, that's it. Since it already knows the size of the file, it will allocate it at the full size, bypassing most fragmentation.

Friday, April 3, 2015

Fortigate Management - Filtering by IP

For most people the solution to restricting admin access to a Fortigate firewall is to use Trusted Hosts for the admin logins. This works fine, however any user (or bot) on the internet can still see the login prompt. This can also potentially cause alarms in security scans (such as for PCI compliance). The easy solution is to not allow access over the WAN interfaces however if you need to, here is a better way to do it using local-in policies.

For mine, i wanted to allow pings from anyone, admin access from my remote subnets, and then deny the rest. Here is the config, modify the interface names as needed, I used an Address Group for the allow rule.
config firewall local-in-policy
    edit 1
        set intf "wan1"
        set srcaddr "all"
        set dstaddr "all"
        set action accept
        set service "PING"
        set schedule "always"
    next
    edit 2
        set intf "wan1"
        set srcaddr "Admin Subnets"
        set dstaddr "all"
        set action accept
        set service "ALL"
        set schedule "always"
    next
    edit 3
        set intf "wan1"
        set srcaddr "all"
        set dstaddr "all"
        set service "ALL"
        set schedule "always"
    next
end



After implementing this you can remove your Trusted Hosts configuration as it has become pointless. I feel this solution is much more flexible, the only downside is that you can't see or administer it through the web interface (yes, even if you enable the Local-In Policy feature, it's still not there).

Friday, February 13, 2015

cPanel and Outlook Autodiscover

If you have a cPanel customer who also has an SSL certificate for their website, you may find that Outlook Autodiscover does not work properly (especially with older versions) as cPanel matches /Autodiscover/Autodiscover.xml and gives the client IMAP/SMTP settings instead of giving a 404 and letting it move on. You may even find that your requests are redirected to cpanelmaildiscovery.cpanel.net/autodiscover/autodiscover.xml which is obviously not desired. In this case, there is a quick fix, though it's really more of a hack. This assumes you're using Apache.
  1. Edit /etc/httpd/conf/httpd.conf
  2. Search for autodiscover, you should find a ScriptAlias line referencing it
  3. Comment this line out or remove it completely
  4. Restart apache (service httpd graceful)
  5. Edit /usr/local/cpanel/APACHE_CONFIG and find the same line and remove it
  6. At the top of your httpd.conf you should also have details on how to make cPanel retain the settings, i ran /usr/local/cpanel/bin/apache_conf_distiller --update to save the changes to the template.
Obviously a cPanel update could potentially overwrite these changes, but as far as i could tell this is the only way to get it to actually behave correctly.

Monday, February 9, 2015

Exchange Autodiscover after an On-Premise to Office 365 Migration

If for whatever reason you decided to do a cutover or similar migration from On-Premise Exchange to Office 365, you may find that Autodiscover can be a little sticky. The issue that I ran into is that for this environment the computers are domain joined, but we cannot immediately uninstall Exchange to remove the organization info from Active Directory. Due to this, when Outlook starts up and searches for Autodiscover, it will search for a SCP (Service Connection Point) in Active Directory, which will still be there and pointing to the wrong spot. Best case scenario this can make it take longer to start up, worst case it can end up either pointing to the wrong spot or certificate errors. The solution, though it's a bit of a hack, is to just change the connection point. I did this with ADSI Edit, note that I consider this a temporary workaround and not a permanent fix. Once you uninstall Exchange then all of this is removed from AD.
  1. Open adsiedit.msc (typically from a Domain Controller)
  2. Connect to the Configuration context
  3. Navigate to Configuration > Services > Microsoft Exchange > (OrgName) > Administrative Groups > (Your Administrative Group Name) > Servers > (Server Name) > Protocols > Autodiscover
  4. Edit the entry for your server and scroll down to ServiceBindingInformation
  5. Adjust the value to reference the proper URL for your Office 365 environment. If you don't know, you can use the Autodiscover test in testconnectivity.microsoft.com to verify what your URL should be.
  6. Outlook should detect it properly now, you can also verify by using the connectivity tester built into Outlook by Ctrl+Clicking the Outlook system tray icon and selecting Test E-Mail AutoConfiguration

Windows Server Unable to Make Outbound TCP Connections

Ran into an issue with a customer server where various things were just not working. A reboot always fixed it, but it had to be done every few days which was not acceptable. The types of errors being received were things like "no logon servers are available" or being unable to open any Active Directory tools. DNS and network shares were working fine, i could RDP in without issues, restarting various services did not help, ICMP/UDP were fine. I attempted to telnet out but any outgoing connection was immediately aborted, wireshark did not see the connection attempt, all firewalls were off.

I ended up putting PuTTY on the server and tried to telnet from there, and received a more specific "no buffer space available" error. Searching for that, i determined it was a Winsock error related to not enough free handles being available. With that knowledge, i opened the task manager and added the Handles column and sorted. There it was, an HP Plotter utility service with 6000+ open handles. I killed the process and everything immediately went back to normal.