Currently, a challenge for any company is being protected against a contingency in their information systems. With Site Recovery, tool that the Azure cloud offers us, we can have a disaster recovery plan in a simple and reliable way.
With the following article, we will show step by step how to activate this protection for our systems. Specifically for virtual machines running in Azure and which do not have an HA system by load balancing.
We will perform this lab for an Apache server, running on a Linux VM, independent in Azure.
It is important to remind that this same Azure service can be applied to OnPremise servers, physical or virtualized. In this case we must deploy elements on the local infrastructure, to be able to process the data before sending them to Azure.
Recovery Service Vault (RSV)
We login to the Azure portal and create a “Recovery Service Vault”:
We put a name to the resource and select a group of resources, or we create a new one if it is necessary. We click on the “create” button:
Note: It would be advisable to locate the RSV in an Azure region different from the source VM, in order to protect ourselves against the regional contingency of any service from Azure.
We enable replication for the VM. In our case, for the virtual machine that has the apache server:
Vamos a establecer la réplica en la región de “North Europe”, seleccionamos el grupo de recursos y la red de destino y pulsamos sobre “Enable Replication”:
Note: If we want to assign a specific configuration to the resource group and the network, we must create them previously to this step.
It creates all the resources of the destination (policy, resource group, network, disks …) and makes a mapping of the 2 networks, in order to enable the failover:
Once the tasks have been completed correctly, we will receive the portal’s ok:
Now, we have to wait for the first replication to complete. Logically, this will be the longest, since you must transfer all the data:
Once this first job is finished we will have the VM protected and we will be able to perform the switching:
Configure target Resources
Once the first replication is finished, we can edit certain parameters of the destination VM, such as the size of the destination VM and the IP address:
The service is being provided from the original VM in Western Europe:
We simulate a contingency by shutting down the source VM:(12)
We proceed to activate the switching so that the replica server comes into production. To do this, we access the RSV and select the corresponding replicated element:
As we can see, the element is in Warning because it detects that the source VM is not working. We select the option of Failover:
Note: If we have not run any fault test, a warning will appear like we have never executed it. We must mark the check that we have understood the warning and continue:
We choose the recovery point, within the available ones, with which we want the target machine to start. Here we have several options:
- Latest (Low RPO): the job starts by first processing all the data that has been sent to the Site Recovery service. The data processing creates a recovery point for the virtual machine. This option offers the minimum recovery point (RPO) goal, because the virtual machine created after failover has all the data that was replicated to the Site Recovery service when the failover was triggered.
- Latest processed (Low RTO): This option mistakenly switches the virtual machine to the most recent recovery point that the Site Recovery service has processed. Since no time is spent processing unprocessed data, this is a failover option with a lower recovery time objective.
- Latest app-consistent (more recent consistent with the application): This option, by mistake, switches the recovery plan virtual machine to the last recovery point consistent with the application that the Site Recovery service processed.
In our case we choose the option of Last RTO, since we know that the VM has not undergone changes and thus we shortened the start-up time:
Once the commutation has finished:
We access the VM in the destination, to assign a public IP and to recover the service:
We assign a DNS name:
And we change the public DNS to assign the new CNAME of Apache server, lowering the TTL as much as possible so that the change is quickly replicated.
We are already giving service again:
At this point of the switching we have several options:
- Change the recovery point
- Confirm the recovery point
Change Recovery Point
We can modify the recovery point if, for instance, we need to restore an accidental deletion in the origin or a ramsonware attack, and we need to go further back in time:
We will momentarily lose the service again, since the changes have to be applied to go to the chosen point on this occasion.
Once we have verified that the recovered VM is the correct one, we will confirm the commutation with the commit button.
Now we must go to the next point Re-protect to return to protect our environment.
This option can be selected before or after committing. In this way we will invert the sense of replication, to recreate a disaster protection environment.
By default, the data will be sent to the group of resources and network from which the data originally originated:
NOTE: A restart of the VM may be necessary after reactivating the protection, as some changes must be made to the mobility agent.