
Configuration Procedures and Considerations
# P0602477 Ver: 3.1.11 Page 273
The Failover/Failback Process
1. Upon startup in a standby configuration, each secondary TRIP attempts to
connect to all TMSs for which it is listed as secondary in the
$MPSHOME/common.standby/etc/tms/tms.cfg file. All
primary components controlling these TMSs reside on different nodes.
2. Upon receiving a signal from the TMS that it lost its primary controller
(i.e., the primary component failed), TRIP informs the local RCD that
failover is required. RCD sets its state to “failover initiated” and sets the
secondary node configuration to match the configuration of the primary
node (using backup local copies of primary MPS and COMMON
components). If the primary node has fewer components than the
secondary node, unused components are pointed to empty configuration
($MPSHOME/mps0.empty). Once failover is initiated by RCD, all
subsequent failover requests from TRIPs are ignored (i.e. the first primary
node whose failure is detected by the secondary will be taken over). After
changing configuration to the primary node’s configuration, RCD restarts
local COMMON and MPS components to force them to load the new
configuration.
3. When components are restarted, RCD waits 5 minutes for local TRIPs to
acquire TMSs. If the TRIPs cannot acquire TMSs, the TMSs may need to
be restarted. This is reported as the “grace period” in the “Local MPS
States” panel in the RCD status report. The grace period ends when
either:
•all local MPS components (excluding those loaded empty configuration)
acquired their TMSs.
•5 minutes have elapsed
4. Redundancy is implemented on a node-by-node basis. Once entering the
active state, the secondary node protects all MPS components on the
failed primary node for current and future failures but does not protect
other primary nodes.
For example a primary node runs two MPS components. When one of
them fails, secondary node will enter the active state and take over the
failed component and also will monitor the second primary component
for failures. When the second component failure is detected, the
secondary node will take it over. The RCD status report shows whether a
secondary MPS component acquired the TMS.
5. Once the failed primary node is operational again, it can request failback
(see TRIP Failback on page 268) from the secondary node. Once
failback is requested, the secondary TRIPs release their TMSs, allowing
primary TRIPs to take control. Once all TMSs are released, the secondary
node goes back to standby state, again protecting all primary nodes in the
cluster.
Alarms are displayed as systems change state. See the MPS Alarm Message Reference
Manual for information regarding the alarms and their meanings.
Komentarze do niniejszej Instrukcji