Geo replication in Azure Service Bus

Published by Alexandre MARCEL
Category : Azure / Service Bus
07/05/2025

On June 10 and 11, 2024, I attended the “Integrate 2024” event organized by Kovai. During the event, Microsoft announced a new feature: Geo-replication for Azure Service Bus.

The public preview was initially scheduled for June 17, 2024. However, it was finally launched on June 25, 2024, as announced by Eldert Grootenboer, Senior Product Manager at Microsoft, in an article published on the Microsoft Tech Blog.

In this article, you will learn about the benefits of this new feature.

 

Geo-replication Principle

 

Prerequisites

 

First, note that this feature is only available on the Premium tier of Service Bus. Additionally, geo-replication currently supports only one secondary region—we will return to this concept later.

Microsoft also specifies that this is a public preview, meaning it should not be used in production environments. Moreover, it is only available in selected regions for now. The availability will gradually expand—you can track the progress in the official Microsoft documentation.

 

Before: Geo-disaster Recovery

 

Before geo-replication, Azure offered Geo-disaster recovery for Service Bus. This feature ensures the integrity of metadata (entities, configurations, properties) by coupling a secondary namespace with the primary namespace currently in use.

In case of a failure or disaster, metadata is switched from the primary namespace to the secondary namespace. Replication is continuous, and failover is almost instantaneous.

To understand this method in more detail, refer to Microsoft’s documentation.

 

Now: Geo-replication

 

Concept

As mentioned in the previous section, Geo-disaster recovery protects only metadata. To extend its offering, Microsoft introduced geo-replication, which also protects data.

With this feature, the following are safeguarded:

  • Queues, topics, subscriptions, and filters
  • Entity data
  • State and property changes made to messages in a namespace
  • Namespace configuration

The core principles remain the same:

  • One primary region (active) and one secondary region (backup). Both regions share the same configuration for rapid promotion.
  • Continuous replication of both data and metadata between the two regions.
  • Near-instant failover to minimize service disruption.

Here is a diagram illustrating how geo-replication works:

 

Illustration of the geo replication in Azure Service Bus

 

When everything is functioning normally, producers and consumers of Service Bus messages connect to the primary region via a single namespace. This design allows users to configure their workflow using this namespace without needing to change it during failover.

If a failure or disaster occurs, the secondary region is promoted. The namespace is switched to point to the secondary region, which then becomes the new primary region. The former primary region is downgraded to secondary. Once the new secondary region is reset, it can be promoted again as needed.

The client manages the promotion of the primary region to the secondary via a property. This gives full visibility over issue resolution. Additionally, automatic promotion is possible using associated metrics.

NB: It is not possible to read from or write to the secondary region.

 

Replication Modes

There are two replication modes: synchronous and asynchronous.

 

Synchronous Mode:

  • All requests are replicated to the secondary region.
  • The secondary region must validate and confirm the operation before it is finalized in the primary region.
  • As a result, publishing speed depends on the time needed to publish, replicate, acknowledge, and validate.
  • Both regions must be available for your application.
  • If the secondary region is delayed or unavailable:
    • Messages are not accepted or validated.
    • The primary region limits incoming requests.
  • Pros: Secure data replication since validation occurs in both regions.
  • Cons: Increased latency due to replication time.

 

Asynchronous Mode

  • Requests are immediately validated in the primary region.
  • The client receives an acknowledgment before replication to the secondary region.
  • Replication occurs asynchronously, and users can configure the maximum allowed lag between the primary and secondary region.
  • If this lag exceeds the configured value, the primary region starts limiting incoming requests.
  • Pros: Lower latency and no immediate impact if the secondary region is delayed or unavailable.
  • Cons: Potential data loss, as some messages may not be replicated before validation.

 

Use Cases

 

Planned Replication

 

Without any external event affecting you, you may choose to migrate your Service Bus Namespace to another region. Below is a non-exhaustive list of reasons that might lead to this use of geo-replication:

  • Azure opens a new region that is geographically closer to your company’s or your users’ location. For performance optimization and better control, you may prefer migrating to this new nearby region.
  • Your other Azure resources are moved for a third-party reason, such as a new feature you want to implement that is not yet available in the current region of your Service Bus. For the sake of consistency and performance, you can now move your Service Bus to this new location as well.

To do so, you must configure geo-replication on the existing namespace with the desired new region as the secondary region. Once synchronization is complete, the planned promotion of the region begins. Already published messages are then replicated. After the promotion is complete, you can delete the old region—it becomes the new secondary region. Your flows will now run in the desired new primary region.

 

Unplanned Replication

 

Geo-replication is most relevant when an unexpected event occurs and impacts the usage of your Service Bus. This can include:

  • A regional outage
  • A disaster, such as a cyberattack or an issue at the physical regional level
  • A degradation in Service Bus performance within the region

It is then important that your secondary region can take over from the primary region you are using. Additionally, geo-replication ensures continuous synchronization so that your Service Bus remains available and avoids any service disruption. This is achieved through the promotion of the secondary region.

Depending on the severity of the affected services, two types of promotion are possible:

  • Planned promotion: Already published messages are replicated before the promotion is finalized
  • Forced promotion: The region failover is almost instantaneous, you simply need to wait for the promotion to complete

 

Setting Up Geo-replication in Azure

 

Now that we’ve covered the theory, let’s see how it works in practice. Since geo-replication is still in public preview, check the Microsoft documentation for the latest updates.

 

Promotion Process

The promotion process follows these steps:

  • Initiating promotion:
    • The client manually triggers the promotion.
    • The namespace enters read-only mode from the moment promotion is requested until it is completed.
    • Promotion types:
      • Planned: The system waits for replication lag to catch up before starting promotion.
      • Forced: The system immediately starts the promotion, risking data loss for unreplicated messages.
    • A forced promotion can be triggered at any time to speed up the failover process.
  • Host update:
    • The namespace’s hostname updates to point to the new primary region (this may take a few minutes).
    • You can verify the new primary region by pinging the fully qualified domain name (FQDN) of your namespace.
  • Clients automatically reconnect to the new primary region.
    • Automating promotion is possible using monitoring tools but requires additional planning.

 

geo replication configuration in Azure Portal

 

Monitoring Data Replication

You can track replication progress using metrics.

 

service bus metrics in Azure Portal

 

  1. Enable metric logs in your Service Bus namespace.
  2. Use Activity Logs to monitor data.
  3. To check replication lag (in seconds), run the following query:

 

AzureMetrics 
| where TimeGenerated > ago(1h) 
| where MetricName == "ReplicationLagDuration"

 

Conclusion

 

Microsoft’s new geo-replication feature significantly improves on Geo-disaster recovery by also replicating data, not just metadata.

To use geo-replication effectively, choose the right replication mode:

  • Synchronous for maximum data security.
  • Asynchronous for lower latency and higher availability.

Since geo-replication is still in public preview, keep an eye on Microsoft updates regarding supported regions.

Microsoft charges the Premium tier for Service Bus per messaging unit. With geo-replication:

  • Secondary regions use the same number of messaging units as the primary region.
  • Additional bandwidth costs apply per secondary region.
  • During the early public preview, Microsoft waives these fees.

 

🚀 Ready, set, geo-replicate!