Deploying Exchange 2007 Multi-site CCR Clusters – Do’s and Don’ts

**patris1** · 2010-01-21, 01:52 PM

کد:

http://www.msexchange.org/articles_tutorials/exchange-server-2007/migration-deployment/deploying-exchange-2007-multi-site-ccr-clusters-part1.html

PART-1

Introduction

Back in 2006, from when Exchange 2007 RTM’d up until Exchange 2007 SP1 RTW’d, the Exchange Product group not only considered Cluster Continuous Replication (CCR) to be a local datacenter high availability solution but also a viable site resilience solution. Actually, CCR was the only native functionality that could be used to provide site resilience. Of course there were/are alternatives. You could/can use Single Copy Clusters (SCCs) combined with a software-based third party replication solution or have the storage vendor replicate data on the storage layer (typically block level). Several vendors also make it possible to combine CCR with storage layer replication, but why not take advantage of native functionality rather than investing in an expensive 3rd party solution?
Approximately a year later came Exchange 2007 SP1 and with it a brand new feature known as Standby Continuous Replication (SCR). SCR was developed primarily for site resilience scenarios. But Exchange 2007 SP1 also improved the existing CCR feature by making it possible to have the Windows failover cluster nodes located on separate subnets. So, although SCR does a pretty good job, some Exchange architects for some reason or the other still want to deploy multi-site CCR clusters. This is especially true now that you do not have to stretch the subnets. Quite frankly, a multi-site CCR cluster can be the right solution if you deploy it properly and have an infrastructure/topology that goes hand in hand with this type of resilience solution. Please understand, a multi-site CCR cluster can be an absolute nightmare if deployed in an infrastructure that does not live up to the requirements and/or the solution is not deployed properly. To any confusion, this article will explain the ramifications of multi-site CCR clusters as well as provide best practice recommendations.
Important:
Note this four part article isn’t a step by step guide on how you deploy multi-site CCR cluster. For general CCR cluster step by step guides see my Deploying CCR on Windows Server 2003/2008 article series:

What Operating System should I use?

So on which operating system should I install my multi-site CCR cluster nodes? Had you asked one year ago, I would have been tempted to recommend Windows Server 2003, but Windows Server 2008 includes some interesting improvements in regards to CCR clusters. Also, now that we’re at Exchange 2007 Rollup Update 7 (RU7), most of the bugs/issues that existed when installing Exchange 2007 SP1 on Windows Server 2008 have now been fixed.
Among the CCR related improvements are support for multi-subnet failover clusters and faster log file shipping. As you may know the Exchange Replication Service uses SMB for log file shipping. Windows Server 2008 introduces SMB v2, which does the job 30-40% faster than SMB v1 did.
The Windows 2008 Failover Cluster deployment experience is also much smoother than what we’re used to in Windows Server 2003. Move over, you should note that Windows Server 2003 will retire within the next couple of years (for exact details, check out the Microsoft Support Lifecycle website for Windows Server 2003 here). Finally, have in mind that you won’t be able to in-place upgrade Windows Server 2003 based Exchange 2007 servers to Windows Server 2008 at a later point in time should you decide to deploy Exchange 2007 on Windows Server 2003. Instead, you must replace the existing servers with new Windows Server 2008 based Exchange 2007 SP1 servers. Alternatively, you can uninstall Exchange 2007 from the existing machines and then in-place upgrade them to Windows Server 2008 followed by re-installing Exchange 2007 SP1. When Exchange 2007 SP1 has been installed, you can restore mailboxes/public folders from backup, re-connect disconnected LUNs, copy the EDB files back to the Mailbox servers or whatever strategy you prefer to use.

For more information on how to migrate Windows Server 2003 based Exchange 2007 server to Exchange 2007 SP1 on Windows Server 2008, see this article on Microsoft TechNet.
Based on the above, my recommendation is to install multi-site CCR cluster nodes (and any other Exchange 2007 server roles for that matter) on Windows Server 2008 machines.
Stretched or Non-Stretched Subnets?

If you’re installing the multi-site CCR cluster nodes on Windows Server 2003 based machines; the cluster nodes must be located on the same subnet. This is because of the fact that Windows Server 2003 Clusters doesn’t support locating cluster nodes on different subnets. In this case, you must stretch the subnets between the two datacenters for private (heartbeat), the public network as well as any other networks used by the cluster nodes. Stretching the subnets means that broadcast traffic will be sent over the WAN between the datacenters. Taking the folks in your organization’s network department into consideration, there’s a good chance this requirement is a showstopper. But personally I’ve worked on a few projects where this wasn’t an issue as the organization had 10GB links between the datacenters. We simply created a VLAN and stretched it between the datacenters.
With Windows Server 2008 stretched subnets are no longer a requirement as Windows Server 2008 has full support for routed networks. As you can see in the Create Cluster Wizard in Figure 1, we now have the option of specifying two IP addresses for the Windows Server 2008 Failover Cluster – one belonging to the subnet in the primary datacenter and one in the backup datacenter.

Figure 1: Specifying an IP address for the Windows Server 2008 Cluster name on two subnets
Note
If the interface for the private network isn’t residing in a stretched subnet, you must use routing between the two different networks on each physical site. When this is the case, there are special requirements when it comes to configuring the interface used for the private network. Since Tim McMichael (Enterprise Messaging Support Professional with MSFT) has written a great blog on this specific topic, I won’t go into the details in this articles series.
In Figure 2, you can see a dependency tree with a dependency structure based on “OR” for the two IP addresses (located on separate subnets) assigned to the cluster network name resource. Traditional clusters supported “AND”, but Windows Server 2008 Failover cluster also support “OR” which is used when multi-site clusters have IP addresses on separate subnets.

Figure 2: Windows Server 2008 Failover Cluster Dependency tree
To create a dependency report, right-click on the cluster network name resource and select Dependency Report in the context menu as shown in Figure 3.

Figure 3: Creating a Dependency Report
When the Windows Server 2008 Failover Cluster has been created and you install the Exchange 2007 Active Mailbox role on the cluster node in the primary datacenter, you have the option of specifying the IP address that should be assigned to the Clustered Mailbox Server (CMS) on each subnet (Figure 4). Although I don’t recommend it, you can even use DHCP assigned IP addresses and/or IPv6 address if you like.

Figure 4: Specifying an IP address for the CMS on both Subnets

Even though you can install the Active and Passive Mailbox server roles using the Exchange 2007 Setup wizard, you can run into a few issues when doing so. So instead, I recommend you use the command line setup command when you install a new CMS. To install the Active Clustered Mailbox role with Setup.com on the cluster node in the primary datacenter, use the following command:
Setup.com /Mode:install /Roles:mailbox /NewCMS /CMSName:<CMS name> -CMSIPv4Addresses:<IP1>,<IP2>

Figure 5: Installing the Active Clustered Mailbox Role using Setup.com
To install the Passive Mailbox role on the cluster node in the backup datacenter, use the following command:
Setup.com /Roles:mailbox

Figure 6: Installing the Passive Clustered Mailbox role
Network Cards

A best practice recommendation is to have 4 network cards (or more) installed in each CCR cluster node. Usually, I install 5 network cards in each node. Two network interfaces in a public team (configured in fault tolerance mode only. Don’t load balance them as experience has shown this can cause miscellaneous issues). Two network interfaces for log shipping (in order to accomplish log shipping redundancy). Finally one network interface for backup (disabled for cluster use) since you don’t want to perform backups of databases over the public network. This provides us with 3 independent paths for replication and 3 independent paths for heartbeat communications between the cluster nodes.

When you have this many network cards installed in each cluster node, it’s highly recommended you give them meaningful names in the Failover Cluster Management console as shown in Figure 7. Simply right-click on them and select the rename in the context menu.

Figure 7: Naming the networks with more meaningful names in the WFC console
Although not specifically minded CCR clusters, I also recommend that you disable IPv6 on all Exchange servers and domain controllers (unless you’re one of the few who actually use IPv6!). To disable IPv6 see the IPv6 FAQ here.
You should also be aware of the impact the Scalable Networking pack feature can have on any Windows 2003 based Exchange 2007 servers in your environment. For detailed information about the Scalable Networking Pack and how it affects Exchange, please see: You Had Me At EHLO... : Windows 2003 Scalable Networking pack and its possible effects on Exchange. It’s usually a good idea to disable scalable networking pack features on all Exchange 2007 servers in your organization.
We have now reached the end of part 1, but you can look forward to part 2 in a very near future.

موضوعات مشابه:

**patris1** · 2010-01-21, 01:54 PM

کد:

http://www.msexchange.org/articles_tutorials/exchange-server-2007/migration-deployment/deploying-exchange-2007-multi-site-ccr-clusters-part2.html

PART-2

Introduction

In part 1 of this four part article series, we took a look at what operating system to use for your multi-site CCR clusters. We looked at stretched and non-stretched subnets as well as the network card configurations.
In this part 2, we will continue where we left off in part 1. Particularly, we will look at stretched Active Directory site strategiesas well as recommended Network Latency and Heartbeat timeout values.
Stretched Active Directory site

No matter if you use Windows Server 2003 or Windows Server 2008 as the underlying operating system for your Exchange 2007 servers, the CCR cluster nodes must always be located in the same Active Directory (AD) site. This means that although the nodes can be located on separate subnets, you must still stretch the AD site they belong to between the two datacenters. Some of you may have heard that Windows Server 2008 Failover Clusters supports nodes located in different AD sites, and although this is true, Exchange can not locate CCR cluster nodes in separate AD sites.
Because the CCR cluster nodes must belong to the same AD site, the AD site needs to be stretched between the datacenters. With regards to the Hub Transport server role, this will mean that messages sent or received by users that have a mailbox stored on the CMS located in the primary datacenter can theoretically be received from or sent to Hub Transport (HT) servers in the backup datacenter. The same is true for Exchange client requests such as Autodiscover, OWA, EAS, POP3, and IMAP requests/connections to Client Access Servers (CAS). In addition, LDAP/auth requests to Global Catalog servers (GCs) can also go to servers in the backup datacenter. As you can imagine, this can result in a lot of traffic between the datacenters. This is especially true because HT and CAS servers and Outlook clients uses MAPI over RPC to communicate with Exchange 2007 Mailbox servers.
However, you can block connections/requests going to the servers in the backup datacenter. In regards to message submissions from the CMS to any HT servers in the AD site, you can use the Set-MailboxServer cmdlet with the -SubmissionServerOverrideListparameter to specify which Hub Transport servers should be used. This way you can exclude the HT servers located in the backup datacenter even though they belong to the same AD site.
If/when a disaster strikes in the primary datacenter resulting in a failover to the backup datacenter, just update the submission override list, so that it only includes the HT servers in the backup datacenter.
In order to block CAS server requests/connections from hitting servers in the backup data center, you can take advantage of load balancing mechanisms. If you have a large environment, chances are you have implemented either a hardware based or WNLB based solution (see this previous article of mine on how you load balance CAS servers). If this is the case, here is what to do;

Create an NLB array which includes all CAS servers located in the primary datacenter. Configure the internal URL for Autodiscover, OWA, OAB, EWS and UM to point to the FQDN (namespace.company.com) of the load balancing solution. All requests that hit the specified FQDN will then go to CAS servers in the primary datacenter.
Now, create another NLB array for CAS servers in the backup datacenter, use the same FQDN but use another IP address than the one you used for the first NLB array. In addition, make sure you only create a record in DNS for the first NLB array otherwise requests will be load balanced between the two NLB arrays.
If/when a disaster strikes in the primary datacenter resulting in a failover to the backup datacenter, just update the NLB record in DNS to point to the IP address of the NLB array created for the CAS servers in the backup datacenter. Then wait for the change to propagate to all DNS servers in your organization.
In regards to GCs, you can configure Outlook clients to use GCs located in the primary datacenter. The steps necessary to implement this behavior are explained in this KB article.
Although the above will block requests/connections from hitting the servers in the backup datacenter, it adds complexity to your environment. But fear not, there is also another viable method which I will describe in the next section.

Using two Active Directory Sites in the Backup Datacenter

In order to eliminate the chance of having servers and clients in the primary datacenter communicate with Exchange 2007 servers and GCs in the backup datacenter, you can create an additional AD site in the backup datacenter and then move all servers except the Windows Failover Cluster on which the passive Mailbox role is installed to this AD site as depicted in Figure 1.

Figure 1: Two AD sites in the Backup Datacenter
When disaster strikes, the primary datacenter results in a failover of the backup datacenter. What should be done next is to move the servers from the second AD site (AD site 2) to the stretched AD site (AD site 1) by giving each server an IP address on the subnet of the stretched AD site or by changing the AD site definitions in the Active Directory Sites and Services MMC snap-in. Some of you may think it would be easier to simply move the CMS to the second AD site, but this would make it impossible for an HT server to re-submit messages to the CMS which results in data loss during a failover and is not a supported method.
Network Latency and Heartbeat timeout values

When deploying a multi-site CCR cluster, you should at all times try to keep the network latency between the datacenters below 500 milliseconds (ms). If you are deploying the solution in a LORG (large organization) with full utilization of storage groups/databases and a lot of mailbox activity, it is recommended to keep the network latency under 50 milliseconds (ms). Otherwise, there is a chance of experiencing issues with large copy queues etc. With that said though, you can adjust the aggressiveness of heartbeat timeouts which helps avoid unnecessary failovers during temporary network problems. By default, the tolerance settings for missed cluster heartbeats are configured to 5 missed heartbeats both for nodes located on the same subnet and nodes on different subnets (Figure 2). When dealing with multi-site clusters, it is recommended that you change this setting to 10 missed heartbeats (approximately 12 seconds).

Figure 2: Default value for Subnet Thresholds
To change the CrossSubnetThreshold threshold to ten missed heartbeats instead of the default of five, use the following command:
cluster ClusterName /prop CrossSubnetThreshold=10
You can verify the new heartbeat threshold values by entering the following command:
Cluster.exe /cluster:<ClusterName> /prop

Figure 3: Subnet Thresholds changes to 10 missed heartbeats

Note:
Later on in the article series, we will pause the passive cluster node, which means that a failover will not happen automatically even though you have two votes (one cluster node and the File Share Witness (FSW)) available. If you do not use the pause cluster method, the above heartbeat settings are strongly recommended, if you do not want a failover to occur out of the blue.
DNS Time to Live values

When the CMS is moved (via planned hand over or a failover) from a cluster node on one subnet to a cluster node on another, and the IP address and network resource name comes online again, the failover cluster kicks off a timer. After 10 minutes with the network resource name and IP address on the cluster node, a DNS record update is issued.
By default the DNS Time to Live (TTL) value for the network name resource is 20 minutes. This means that when the 10 minute timer has occurred, you would need to wait another 20 minutes for the DNS record (Figure 4). Add to this that the update must propagate to domain controllers throughout the organization. Moreover, the client side resolver cache on the clients running Outlook would also need some time for the update to be reflected.

Figure 4: Default DNS TTL value forCMS in DNS Manager
Plus, 30 minutes is considered a long time in most environments. The best practice recommendation is to change the DNS TTL value to 5 minutes. To do so, first we need to find the CMS cluster network name resource. This can be done by opening a command prompt on one of the cluster nodes and enter the following command:
cluster /cluster:<Name of CMS> res

Figure 5: Finding the CMS cluster network name resource name

Now that we have the cluster network’s name, let us change the TTL to 5 minutes. We do so using the following command:
Cluster.exe res <CMSNetworkNameResource> /priv HostRecordTTL=300

Figure 6: Changing the TTL to 300 seconds (5 minutes)
Stop and start the Clustered Mailbox Server (CMS) using Stop-ClusteredMailboxServer and Start-ClusteredMailboxServer cmdlet’s or the Manage Clustered Mailbox Server wizard.
Note:
Although some may feel tempted to do so, do not try to change the DNS TTL via the property page of the DNS record in the DNS Managerm, as the setting will be overwritten with the value configured for the HostRecordTTL on the cluster nodes every time the DNS record is refreshed. The record will be refreshed when the CMS is started, moved or brought online after a failure or failover.
Now let us verify that the TTL for the DNS record has been changed from 20 to 5 minutes. We do so by opening the property page of the CMS cluster network name resource DNS record in the DNS Manager on a DNS server as shown in Figure 7.

Figure 7: TTL value changed

We reached the end of part 2 in the article series covering the do’s and don’ts of multi-site CCR clusters. Part 3 will be published in a near future. Until then enjoy!

**patris1** · 2010-01-21, 01:55 PM

کد:

http://www.msexchange.org/articles_tutorials/exchange-server-2007/migration-deployment/deploying-exchange-2007-multi-site-ccr-clusters-part3.html

PART-3

Introduction

In part 2 of this four part article series, we took a look at the stretched Active Directory site strategiesas well as recommended Network Latency and Heartbeat timeout values.
In this part 3, we will continue where we left off in part 2. We will look at transport dumpster strategies as well as placement and configuration of the file-share witness.
Transport Dumpster Strategy

If disaster strikes, the primary datacenter will cause all servers to be unavailable and the messages to be held in the transport dumpster of Hub Transport servers in the primary datacenter. Obviously, they cannot be re-submitted to the CMS after (lossy) failover to the passive cluster node in the backup datacenter. Depending on how many log files were lost for each storage group during the failover, data loss will occur. The result is potential end-user complaints. Note though, that it is possible (if at all) to move an HT server’s queue to an HT server in the backup datacenter and have the content of the queue re-submitted. Timing is all important here. In order to have any messages held in the transport dumpster re-submitted, the queues must be moved before the cluster submits the transport dumpster flush. Otherwise it will not be possible to get the messages in the transport dumpster re-submitted as they will be lost in this case.
For more information on how to move message queues to another HT server, take a look at the following link.
Placement of the File Share Witness

A CCR based cluster uses the Node majority with File Share Witness (FSW) quorum model, which basically means that although a CCR cluster only has two cluster nodes, there is a third one referred to as the file share witness. This is typically an HT server in the same AD site as the CCR cluster nodes. In this type of quorum model, two nodes are not enough to sustain a failure of any cluster node. To sustain failure of any node in a Node majority with File Share Witness (FSW) quorum based cluster, there must be a least three devices that can be considered as available. The FSW acts as the third available device in a two-node Node majority with File Share Witness (FSW) quorum cluster, which means that this type of cluster can sustain the failure of a single cluster node. In addition, the FSW protects against the cluster “split brain” syndrome and a problem known as a “partition in time”. Basically what this means is that the FSW must be available when a failover occurs to the passive node in the backup datacenter, otherwise you cannot bring the CMS online before you have created a new FSW on an HT server in the backup datacenter as well as re-configured the Windows Failover cluster to point to this HT server. This complicates the failover process significantly. If possible, another solution would be to place the FSW in a third datacenter. This does not mean that it is a must to deploy an additional Exchange 2007 HT server in that third datacenter. Using a Hub Transport server as the FSW is just a best practice recommendation when you deploy both CCR nodes in the same datacenter. For instance, you could use a file server in that third datacenter if you like.
You may have heard that using a CNAME record to point to the FSW is a good idea since this will make it a much simpler process to point a CCR cluster to another FSW. All that needs to be done is to have the FSW folder share pre-created with appropriate permissions and then update the CNAME record after the disaster has hit the primary datacenter. But even though Microsoft used to support this method, the guidance was revised (read more in this blog post). Today, the recommended method for re-provisioning a FSW share on another server is to use the Cluster service’s built-in “force quorum” capabilities.
So when the FSW is located in the primary datacenter and a disaster takes down all servers in the primary datacenter, the failover to the passive node in the backup datacenter will not happen automatically. This is because two votes must be available when a failover occurs in a MNS based cluster. In order to bring the cluster resources online on the passive node, first create a new FSW share on a HT server (if you have not pre-created it that is) in the backup datacenter and follow these instructions, then open a command prompt on the cluster node in the backup datacenter and type:
NET START CLUSSVC /forcequorum
This will force the cluster service to start on this cluster node. Now, open the Failover Cluster console and select the cluster name in the left pane. Then click More Actions in the Action pane and choose Configure Cluster Quorum Settings in the context menu as shown in Figure 1.
Note:
In Windows 2003 the /forcequorum switch was a maintenance mode switch. Basically we told Windows to get the cluster service started. With Windows Server 2008 this is no longer so. When using the /forcequorum switch with Windows Server 2008 Failover Clusters, you are telling the cluster that the configuration on this node is now the master. This, in turn, means that when the other cluster node comes back up and converges with the cluster, it will replicate the configuration information back from this node. This is an important change!

Figure 1: Selecting Configure Cluster Quorum Settings
Click Next.

Figure 2: Configure Cluster Quorum Wizard’s welcome page
Select Node and File Share Majority (for clusters with special configurations) as shown in Figure 3 and click Next.

Figure 3: Choosing the right quorum model
Click the Browse button and enter the name of the HT server on which you created the FSW share. Click Show Shared Folders (Figure 4) and select the FSW share. Click OK.

Figure 4: Selecting the new FSW share
Click Next twice then Finish.

Figure 5: Cluster Resources online on cluster node in the backup datacenter
When the primary datacenter is up again, create a new FSW share on the old HT server that had the FSW share. Now follow the above steps again in order to point the cluster to this FSW share.
When all involved servers are online in the primary datacenter, you can now move the CMS back to the cluster node in the primary datacenter.
Note:
Since Exchange 2007 RTM’d, I have received many questions in regards to the FSW. One of them was whether it is possible to use FSW in combination with DFS, so that you could have multiple servers acting as FSWs. Although it’s an interesting idea, this is unsupported territory.

**patris1** · 2010-01-21, 01:55 PM

کد:

http://www.msexchange.org/articles_tutorials/exchange-server-2007/migration-deployment/deploying-exchange-2007-multi-site-ccr-clusters-part4.html

PART-4

Introduction

In part 3 of this four part article series, we took a look at transport dumpster strategies as well as placement and configuration of the file share witness.
In this, part 4, we will continue where we left off in part 3. We will take a look at fail-over between the cluster nodes and how a fail over between nodes located in separate datacenters affects the end-users. Finally we will talk a bit about backup strategies when deploying a multi-site CCR cluster.

This is the last article in this multi part article.

Failover behavior between Cluster Nodes

If two votes are available, a failover of the CMS to the passive node in the backup datacenter will occur automatically. This means that, unlike SCR, no manual intervention is required. At first this sounds like an excellent idea as that is one step less to perform during a site failover. But, are you really sure you want to have the CMS fail over to the backup datacenter just like that? This question really depends on things such as number of users, bandwidth between the datacenters, and number of storage groups/mailbox databases. Imagine that a minor unplanned network outage happens in the primary datacenter and triggers a failover of the CMS to the backup datacenter. In a worst case scenario, a failover of the CMS can take up to 30 minutes in some environments. On top of that you suddenly have a situation where all Exchange HT and CAS servers, Domain controllers/Global Catalog servers as well as Outlook clients must communicate with the CMS now in the backup datacenter. If you do not want an automatic failover of the CMS to occur during a situation where the IP address or the network name of the cluster is unavailable, consider pausing the cluster service on the passive node.
Note:
The reason why you should pause and not stop the cluster service on the passive node is because when you pause a node, existing groups and resources will stay online while additional groups and resources cannot be brought online on the node. Because of this, the cluster service will not stop log file shipping from occurring between the cluster nodes. However, stopping the cluster service on the passive node would make the cluster stop functioning and break log file shipping until the Cluster service is restarted.
To pause the cluster service on a cluster node in a Windows Server 2008 failover cluster, open the failover cluster management console > expand the cluster and then Nodes. Now right-click on the passive node and choose Pause in the context menu as shown in Figure 1 below.

Figure 1: Pausing the Cluster service on the passive CCR cluster node
You can also pause the cluster using cluster.exe if you like. To do so use the following command:
CLUSTER.EXE <name of WFC> NODE <name of node> /PAUSE

Figure 2: Passive Cluster Node Paused
When a disaster hits the primary datacenter bringing the active node down, you would then need to failover to the cluster manually by first resuming the passive node as shown in Figure 3. This is even though you have a third vote in the form of a file share witness available in the primary datacenter or a third datacenter.

Figure 3: CMS is currently offline
In addition, if the cluster core services were not owned by the passive cluster node in the backup datacenter, when we paused the cluster on that node, you must force the cluster core resources online before unpausing the cluster node after the loss of the active cluster node in the primary datacenter. To see which cluster owns the cluster core resource, click on the cluster right under Failover Cluster Management and then expand Cluster Core Resources in the middle pane. Here you can see the current owner of the cluster core resource as shown in Figure 4. To force the cluster core resource online on the paused mode, right click on IP address assigned to the paused node and select Bring this resource online in the context menu.

Figure 4: Current owner of the cluster core resources
Although the passive node has been paused, when the active node went down the passive node became the current owner of the cluster/Exchange resources.
To bring the resources online, right-click on the passive node and select Resume in the context menu (Figure 5).

Figure 5: Resuming the cluster node
Now open the Exchange Management Shell and type the following command in order to bring the CMS online on the passive node:
Start-ClusteredMailboxServer <CMS Name>
The cluster and Exchange resources will now be brought online and Outlook clients will be able to re-connect to their mailbox.

Figure 6: CMS now online on the cluster node on the backup datacenter
How a Failover affects an Outlook 2007 User

Okay, so now that we have configured our multi-site CCR cluster according to best practice recommendations, let us try to simulate a failover of the CMS to the backup datacenter. Let us see how this affects Outlook users in the organization. In Figure 7 below, we have a screenshot of an Outlook 2007 client connected (in cached mode) to a mailbox stored on our CMS currently online on the cluster node in the primary datacenter.

Figure 7: Outlook client connected to a mailbox stored on our CMS currently online in the primary datacenter

Let us now move the CMS to the passive cluster node in the backup datacenter. We can do this using the following command:
Move-ClusteredMailboxServer <CMS> -TargetMachine < passive cluster node> -MoveComment Test

Figure 8: Moving the CMS to the passive cluster node in the backup datacenter
As can be seen in Figure 9 below, the CMS is now unavailable as it is currently being move to the passive cluster node in the backup datacenter.

Figure 9: Connection to Microsoft Exchange has been lost
When the CMS has been brought online, the DNS record must be updated with the IP address that was assigned to the CMS on the subnet in the backup datacenter. By default, this record has a TTL of 20 minutes. But as you saw earlier in the article, we changed the TTL value of the DNS record to 5 minutes. However this does not mean that Outlook will pick up the change after 5 minutes. Outlook client convergence is (10 minutes to issue update as a Windows 2008 multi-site cluster delays the record update to the DNS server by 10 minutes) + (DNS server replication latency) + (time remaining or full time in resolver cache for the TTL to expire), so expect between 10 to 30 minutes (the latter being large and/or slow environments with many DNS servers deployed).
The good thing is that when Outlook 2003/2007 is running in cached mode, the user can continue working and may not even notice that Outlook is offline. Also the end users don’t have to restart the Outlook client, it will re-connect automatically.

Figure 10: Connection to Microsoft Exchange has been restored

As most of you know, Outlook 2007 makes use of the availability/autodiscover service. After the failover, an Outlook 2007 client will simply pick a CAS server in the backup datacenter for availability/autodiscover service purposes.
In Figure 11 we can see that Outlook cannot contact to two CAS servers in the primary datacenters and fails over to the CAS server in the backup datacenter, when we test Autdiscover.

Figure 11: Testing Autodiscover to see if the CAS server in the backup datacenter is picked up
If you use Outlook 2003, then OAB and free/busy lookups relies on system folders in the Public Folder database. In case you have Outlook 2003 clients deployed, make sure you have a replica of the PF hierarchy in the backup datacenter.

Performing a backup in the backup datacenter

Another thing you should make sure is included in your disaster recovery plan is backup of the CMS in the backup datacenter. Are you ready for this once the CMS has failed over? I mean in worst case you just lost your primary datacenter and are currently captured in a site-level single point of failure scenario. This makes those database backups extra important as rebuilding the lost datacenter can take weeks even months depending on the damages.

Conclusion

As you have read throughout this article series, there are a lot of things to keep in mind when planning for deploying a multi-site CCR cluster. First thing you must decide is really whether this is the right choice for your organization. In most situations you should consider using SCR instead multi-site CCR clusters, but then again environments are different. Who knows, maybe you have a very specific reason to go for multi-site CCR clusters? And as mentioned in the introduction, multi-site CCR clusters can be a good solution if you have the right infrastructure and you deploy it properly.

موضوع: Deploying Exchange 2007 Multi-site CCR Clusters – Do’s and Don’ts

پیوند

ابزارهای موضوع

نمایش

Deploying Exchange 2007 Multi-site CCR Clusters – Do’s and Don’ts

کلمات کلیدی در جستجوها:

1

3

failover cluster forcequorum

windows 2003 exchange 2007 ccr determine identify location file share witness was set up

how to pause passive nod ccr exchange

8

برچسب برای این موضوع

مجوز های ارسال و ویرایش