HANA Replication
SAP HANA HA and DR Series #5: System Replication
https://blogs.sap.com/2017/02/21/sap-hana-ha-and-dr-series-5-system-replication/
https://blogs.sap.com/2017/02/28/sap-hana-ha-and-dr-series-6-system-replication-operation-modes-
parameters/
1. The original operation mode for SAP HANA System Replication was delta_datashipping which is still
available for all SAP HANA releases. As of SAP HANA SPS11 operation mode logreplay is also available as
a solid feature.
The underlying technology for both operation modes is (continuous) log shipping.
2. With delta_datashipping, the secondary node also receives delta data from time to time (every 10
mins by default) in addition to continuous redo log shipping. In case of a failover, the redo logs just need
to be replayed up to the last arrived delta data shipment.
And whenever the primary and the secondary nodes are disconnected (due to any reason e.g. network,
hardware or service interruption), the replication is basically out of sync and once the services are
restored, system replication immediately initiates a delta shipping of the missing data (instead of a full
data shipping) to get in sync again which reduces the sync time between primary and secondary hosts.
Log Replay
With operation mode logreplay, no delta shipping is required anymore. After the system replication is
initially setup with full data sync, the continuous replication between primary and secondary nodes is
purely based on redo logs. The redo logs are replayed immediately after arrival which makes secondary
node is a like a hot standby with reduced RTO
Because there is no delta shipping required anymore, the amount of data that needs to be transferred
to secondary site is reduced significantly. That means the network requirements for this configuration is
based on your primary system’s transactional workload resulting reduced traffic and network bandwidth
between two sites. Also, because the redo logs are applied as soon as they arrive, the RTO is even
shorter than delta_datashipping.
There is also a third operation mode introduced with SAP HANA 2.0 SPS00: logreplay_readaccess which
is an active/active (read enabled) configuration basically allows read-only access on column tables via
SQL with a delayed view on the data compared to the primary. According to SAP Note 2426477, starting
with SAP S/4HANA 1610 FPS 1, ABAP applications consuming analytical ABAP CDS views through the
analytical engine (ABAP’s INA interface) can be redirected automatically to the secondary if SAP HANA
system replication is operating in mode logreplay_readaccess. Note that the operation mode
“logreplay_readaccess” is not supported in systems running SAP HANA XS advanced yet.
5 Must Know System Replication Parameters
There are quite a few configuration aspects in System Replication and I believe below parameters are
very important and especially useful if you need to maintain the stability of the connectivity time to
time:
datashipping_min_time_interval:This parameter defines the minimum time interval between two data
shipping requests from secondary system. The default value is 600 (seconds) and you may consider
tuning this parameter depending on your primary system’s transactional workload. If
datashipping_logsize_threshold (see below parameter) is reached first, the data shipping request will be
sent before the time interval is elapsed.
datashipping_logsize_threshold : Minimum amount of log shipped between two data shipping requests
from secondary system. Default value is 5GB.
logshipping_timeout: This is the amount of time the primary node waits for the acknowledge after
sending a log buffer to the secondary node, default value is 30 seconds. If the primary node does not
receive the acknowledge within the time defined by this parameter, it will close the connection to the
secondary site so it can continue data processing. This parameter prevents primary node from blocking
transactional processing in case of network failure between two sites or service interruption on the
second node. If you are using System Replication as an HA solution for production systems, you may
consider reducing this parameter so in case of primary and secondary nodes are disconnected, the
primary node (and your productive system) would continue transactional processing. When the
connection is restored between two sites, primary node will start sending redo logs based on the latest
received acknowledgment from secondary node.
reconnect_time_interval: If primary and secondary nodes are disconnected, this parameter defines the
interval between connection attempts from the secondary node to restore the services. The default
value is 30 seconds.
enable_log_compression and enable_data_compression:These parameters enable the compression of
log buffers / data pages before sending them to secondary node. If network bandwidth is the bottleneck
in the system replication setup log/data compression can improve log shipping performance because
less data is being sent over the network. However, the secondary site requires additional time and
processing power for compression and decompression. If you have a fast network, this could literally
result a worse performance, so consider this parameter only if your network speed is reduced. The
parameter can be set on the secondary node dynamically.
https://blogs.sap.com/2017/03/13/sap-hana-ha-and-dr-series-7-log-replication-modes/
SAP HANA HA and DR Series #7: Log Replication Modes
Log replication modes offered by SAP HANA
Synchronous in-memory (syncmem): This is the default log replication mode. In this mode, primary
node waits for the acknowledgement confirming the log has been received by the secondary node
before committing any transactions. Basically, primary node waits until secondary node has received
data and as long as the replication status is ACTIVE for all services, there will not be any data loss.
However, if at least one service has a status different from ACTIVE, a failover to secondary node might
result in a data loss. Because there might be some committed changes in the primary node which could
not be sent to secondary due to a connectivity or service failure. So, if replication status is active for all
services, you don’t lose any data in case of a failover.
This option offers two main advantages: shorter transaction delay and better performance. Because the
primary node waits for secondary node only to “receive” the logs, the transaction delay in the primary
node is shorter (which is literally the log transmission time) compared to other replication modes. Also,
because the primary node does not wait for any I/O or disk writing activity on the secondary node, it
basically performs better.
The waiting period of the primary system is determined by parameter named logshipping_timeout in
global.ini. Its default value is 30 secs and if there is no acknowledgement received from the secondary
node after 30 secs, primary node continues without replicating the data.
This replication mode can be ideal for system replication setup as a high availability and disaster
recovery solution, especially if both nodes are in the same data center or very close to each other.
Synchronous (sync): In this mode, primary node waits for acknowledgement confirming the log has
been received AND persisted by the secondary node before committing any transactions.
The key benefit of this option compared to syncmem is the consistency between primary and secondary
nodes. You know that the primary node will not commit any transactions until secondary node received
and persisted the logs.
Like syncmem, the waiting period of the primary system is also 30 secs by default and determined by
logshipping_timeout parameter in global.ini. If there is no acknowledgement received from the
secondary node after 30 secs, primary node continues without replicating the data.
This replication mode can be ideal for system replication setup as a high availability solution, especially if
both nodes are in the same data center.
Synchronous (full sync): Full sync replication was introduced with SPS08 as an additional option for the
sync mode. This mode provides absolute zero data loss no matter what because primary node waits
until secondary node received the logs and persisted them on the disk; the transaction processing on
the primary node is blocked until secondary system becomes available. This ensures no transaction can
be committed on the primary node without shipping the logs to the secondary site.
This mode can be activated via parameter enable_full_sync in the system replication section of global.ini
file. When the parameter is set to disabled, full sync is not configured. If you change the parameter to
enabled in a running system, that means full sync is configured but not activated immediately to prevent
the transaction blocking.
Full sync will be completely activated only when the parameter is enabled AND the secondary node
connected. Then you will see the REPLICATION_STATUS becomes ACTIVE. That means if there is a
network connectivity issue between two nodes transactions on the primary node will be blocked until
secondary node is back.
This replication mode can be ideal for multitier system replication configurations especially between tier
2 and tier 3 nodes for data protection, or can also be used for system replication HA configuration when
both nodes are in the same local area network and data protection is number one priority.
Asynchronous (async): In this option, primary node does not wait any acknowledgement or
confirmation from the secondary node, it commits the transactions when it has been written to the log
file of the primary system and sent redo logs to the secondary node asynchronously. Obviously, this
option provides the best performance among all four options as the primary node does not have to wait
for any data transfer between nodes, or I/O activity on the secondary node. However, async mode is
more vulnerable to data loss compared to other options. You might expect some data loss during an
(especially unexpected) failovers.
This replication mode can be ideal for system replication as a DR solution in a distant location or for
companies where system performance is the number one priority and because it provides a database
consistency with a near zero data loss, it is still a favourable option. In a multitier system replication,
async can also be used between tier 2 and tier 3 as a DR solution.