Friday 18 November 2016

Integration Is Not That Great Between Batch And Individual Mailbox Migration Tools

I was migrating a small batch of some 30 mailboxes from on-premise to Exchange Online in a hybrid configuration. I chose the option to suspend the migration, then complete it manually later.

While most mailboxes reached the AutoSuspended status, some have not even started to migrate (PercentComplete = 0). These few mailboxes appeared to be infinitely looping through statuses like TransientFailureSource, LoadingMessages, InitializingMove and the like:


I said to myself, I couldn't let these "transient" failures and apparent delays hold up the rest of the pack. Experience shows that re-migrating them from scratch usually gets them through.

I was thinking that at the end of the day a batch migration is just a bunch of  mailbox moves treated as one unit, and I should be able to manage individual move requests within the batch via *-MoveRequest commands. I went ahead and finalised the AutoSuspended moves with Resume-MoveRequest. Little did I know that I was about to upset the batch migration logic to the point where it could not recover, resulting in all mailbox moves being reported as "Failed".

To recap:
1. Most mailboxes in the batch finished initial sync and entered the AutoSuspended state.
2. Some mailboxes failed to sync, quoting transient failures and looping indefinitely through various states, never reaching the AutoSuspended state.
3. Using Resume-MoveRequest in remote PowerShell, I completed the AutoSuspended moves.
4. Using PowerShell, confirmed that all resumed migrations have reached the Completed status:


At this point I stopped the batch, removed the failing mailboxes, and restarted the batch. Then I chose "Complete the migration" in the action pane.

I was expecting that the batch will realise that the mailbox moved have been completed and will finish successfully. Well, good luck with that... All mailboxes that have completes successfully following Resume-MoveRequest, were reported as Failed in the GUI. Yes, Failed.

I checked individual reports. They confirmed that the mailboxes were indeed successfully migrated - and please note, these reports were returned by the GUI's very own Download the report for this user link:


The Migration Batch GUI report however showed them all as Failed:


Get-MigrationBatch seemed to be supporting its GUI sibling, returning a status of CompletedWithErrors:


An educated guess indicates that the batch migration wizard is retrieving the report from individual Get-MoveRequestStatistics -IncludeReport commands.

So should I be worried? In my experience, definitely not. Confirmed with users that the above discrepancy didn't affect their ability to connect to and use their migrated mailbox.

Takeaway: Integration between *-MigrationBatch and *-MoveRequest commands (and their siblings) isn't that great. Expect inconsistent reports when mixing them. If you started moving mailboxes using one tool then use the same tool until the migration is complete - don't mix them, no matter how tempting it may be. The inconsistent reports can be misleading but data integrity is maintained. The reports to rely on are those created by *-MoveRequest and *-MoveRequestStatistics. If these commands say that the move has been completed successfully then they have been completed successfully, regardless what other reports say.

Hope that one day Microsoft will improve reporting and integrate batch and individual mailbox migration tools better.

Happy reporting :-)

Saturday 12 November 2016

Removing Exchange 2010? Disable Performance Logs & Alerts!

In a recent upgrade (a.k.a. "transition") from Exchange 2010 to Exchange 2016 project I got to the point where I had to uninstall Exchange 2010. To my surprise it failed to uninstall, despite the fact that all services that I was aware of, such as antivirus and backup, were disabled or otherwise not affecting Exchange components.

The uninstall wizard displayed a long list of rundll32 processes that had open files and this prevented the removal of my Exchange 2010 server:


In order to see which process holds files open, I had to know what was loaded with rundll32. At this point, however, the only clue I had was the process ID, a.k.a. PID. Not very useful.

It is not all gloom and doom though. Task Manager is a very useful troubleshooting beast, with lots of information hidden from the default view.

In my case I had to see what was loaded with rundll32. Let's look specifically at PID 7172, the first in the pre-flight check error report.

In Task Manager, I enabled the Command Line column...


... which then showed me the path to the DLL that was loaded in the process:


Doing a quick Google search on pla.dll, it looked like the files were open by the Performance Logs & Alerts service. Indeed, this service was started:


I disabled _and_ stopped the service, ...


... retried the uninstall wizard, and voilĂ , the pre-flight check succeeded:


IMPORTANT: Stating the obvious, it is not sufficient to just _disable_ the service. You must also _stop_ it to make sure it releases all file locks and resources. Disabling it ensures that the service doesn't start and lock files while in the middle of uninstalling Exchange, while stopping it actually releases the file locks.

In a similar way, we can identify any process that may hold files open when uninstalling not only Exchange 2010 but other software too.

Takeaway: Add the Performance Logs & Alerts service to the list of services to disable when uninstalling Exchange 2010, or perhaps any other version of Exchange server as a matter of fact.



Saturday 5 November 2016

StalledDueToTarget_MdbCapacityExceeded

Intro

The Mailbox Migration Performance Analysis blog posted by the Exchange team back in 2014 provides a great deal of details about monitoring and measuring migration performance using the various states and the time a job spent in each state.

What it doesn't touch on, however, is what appears to be a rare condition: sometimes the Exchange Online mailbox database designated to store a particular on-boarded mailbox apparently runs out of space. At least that's what an educated guess made me to conclude.

Some Context

In one of my recent Office 365 migration projects one particular mailbox was stuck for hours in the StalledDueToTarget_MdbCapacityExceeded state.

Get-MoveRequest <Identity> showed that the request has been queued. Nice detail but in this case it wasn't very informative.

Displaying some more details, including the status report, returned no useful information either. In fact the report may be a bit misleading in that it may suggest that the mailbox is too big and it doesn't fit within the quota. Well, the figures indicated that both the mailbox and the archive were well below the 50Gb mailbox and unlimited archive quota.

Get-MoveRequest <Identity> | Get-MoveRequestStatistics -IncludeReport | fl DisplayName, StatusDetail, TotalMailboxSize, TotalArchiveSize,Report


I have never come across this particular condition and I turned to my omniscient technical advisor, GG (Google the Great). No luck there either. Not one hit. Am I the only one in the universe who's seen this so far? I started to speculate.

The error suggests that the mailbox in question has been designated to be stored on a specific mailbox database, however the database had no capacity to accommodate the mailbox.

Assuming that I am correct, I have no way of knowing how this happened. I can think of two possible scenarios:

1. I can imagine that under some rare conditions the database selection algorithm may over-provision mailbox databases. The system detects it down the line, but it fails to re-allocate the mailbox to another eligible database. and the mailbox move gets stuck and never completes. I suspect this is the most likely cause.

2. The maximum size of the target mailbox database has been administratively reduced *after* the move request has been queued. Unlikely but not impossible.

I may never find out, but it is irrelevant anyway.

After the troubled mailbox has been sitting in this state for over 5 hours, I said to myself that I have to do something about it. Maybe removing the mailbox from the batch and re-migrating it separately will put the mailbox into a roomier database. And it did!

The Fix

This worked for me:

1. Stop the migration batch. Mailboxes cannot be removed from a batch that is still running.

Stop-MigrationBatch "Batch Name" -Confirm:$false


2. Give it a couple of minutes and confirm that the batch has stopped. If it is still "stopping" then give it some more time.

Get-MigrationBatch "Batch-Name"


3. Find the user's primary SMTP address by running one of the following commands on the on-premise system:

((Get-Mailbox -Identity UserName).PrimarySmtpAddress).ToString()

or

Get-Mailbox -Identity UserName | fl PrimarySmtpAddress

4. Remove the mailbox from the batch. Substitute Primary_SMTP_Address with the one obtained above.

Remove-MigrationUser <Primary_SMTP_Address>


5. Restart the migration batch.

Start-MigrationBatch "Batch Name"


6. Now that the user has been removed from the migration batch, create a new batch and add the mailbox. This time Exchange Online will place the mailbox into a database with sufficient space and the mailbox will be moved successfully - or at least that's what happened in my case.

That's it. Enjoy your move!