Fault Tolerance and Emergency Access

In general, Hitachi ID Bravura Privilege should always be deployed with at least three database nodes, situated in at least two different locations. This minimizes the probability that all Bravura Privilege servers are concurrently offline, due to simultaneous hardware failures or simultaneous site disasters.

This arrangement still leaves open the possibility that Bravura Privilege is unreachable by some users due to failure of a single Bravura Privilege node or of connectivity to a single Bravura Privilege site. In fact, multiple modes of failure should be considered, as illustrated in Figure [link].

Emergency Access to Privileged Accounts

Emergency Access to Privileged Accounts

In the figure, there are several possible situations:

  1. The user and the system he wishes to manage are on the same network segment.
    Work is possible.

  2. The user and the system he wishes to manage are on different network segments.

    1. The user is able to connect to the system he wishes to manage.
      Work is possible.

    2. The user is not able to connect to the system he wishes to manage.
      Work is not possible -- Bravura Privilege is not involved.

Once it has been established that work is possible, the next question is whether a privileged account is accessible:

  1. There is a Bravura Privilege database node on the same network segment as the user.

    1. A local Bravura Privilege application node is available.
      No problem.

    2. All local Bravura Privilege nodes are off-line.

      1. A remote Bravura Privilege server is available.
        No problem.
      2. No Bravura Privilege application node is accessible.
        The user can access a remote Bravura Privilege server over a VPN, on the Extranet, using Hitachi ID Mobile Access on his phone or by calling another user at another site where Bravura Privilege is accessible. Note that if one user has to get a password from another, password disclosure rather than an automatically launched SSH or RDP session is required.

  2. There is no Bravura Privilege server on the same network segment as the user.

    1. A remote Bravura Privilege server is available.
      No problem.
    2. No Bravura Privilege server is accessible.
      The user can access a remote Bravura Privilege server over a VPN, on the Extranet, using Mobile Access on his phone or by calling another user at another site where Bravura Privilege is accessible. Note that if one user has to get a password from another, password disclosure rather than an automatically launched SSH or RDP session is required.

In short, so long as the user is able to connect to the system he wishes to manage and so long as at least one Bravura Privilege application node is functional somewhere on the network (reachable or not), the user should be able to retrieve passwords -- directly or by asking for assistance from another user with better connectivity.

"Break Glass" Option

Access controls and approval workflows in Bravura Privilege can be configured for "break glass" scenarios. Hitachi ID Systems customer must define who can request such access, who must approve it, for what systems, etc. These are Hitachi ID customer-specific policy decisions, rather than generic product features.

For example, in an all-out-emergency scenario, Hitachi ID customer may set a flag on the system indicating that no approvals are required for any check-out. When this is done, all requests will be auto-approved (but strictly audited). This is an extreme example of what's possible -- not a recommendation by Hitachi ID.

Overview of Emergency "Break Glass" Scenarios

Hitachi ID customers often ask whether Bravura Privilege supports access to privileged accounts in a "break glass" scenario -- i.e., during an emergency. While the short answer is simply "yes," a more clear understanding of how this is done must begin with a classification of the different types of emergencies (disasters) that might arise and how best to respond to each one.

The term "break glass" is ambiguous, as it does not specify what part of the infrastructure was damaged, so it shall not be used in the following description.

The basic disaster scenarios depend on what part of an organization's infrastructure was damaged, as follows:

  1. 1-Vault:
    A single Bravura Privilege node becomes inaccessible. All data centers remain operational.
  2. 1-DC, 0-vaults:
    An entire data center becomes inaccessible, but it does not contain an Bravura Privilege node.
  3. 1-DC, 1 vault:
    An entire data center becomes inaccessible, and it does contain an Bravura Privilege node.
  4. Multi-DCs, some vaults:
    Multiple data centers become inaccessible, including some but not all Bravura Privilege nodes.
  5. Multi-DCs, all vaults:
    Multiple data centers become inaccessible, including all Bravura Privilege nodes.

Hitachi ID strongly urges customers to deploy Bravura Privilege in an active-active, geographically distributed configuration. This means that there should always be at least two active Bravura Privilege nodes, installed in at least two data centers, with a maximum physical distance separating them.

How organizations respond to each scenario using Bravura Privilege is described below:

  1. 1-Vault:
    Simply continue using a vault at another location. There is ample time to rebuild the damaged Bravura Privilege node and there should be no service interruption.
  2. 1-DC, 0-vaults:
    Work to reconstruct the damaged data center can leverage access to privileged accounts and passwords using vaults at other locations.
  3. 1-DC, 1 vault:
    As above, even though one Bravura Privilege node was damaged, others still function and can be used to support data center reconstruction.
  4. Multi-DCs, some vaults:
    While the likelihood of such a wide-spread disaster is very low, Bravura Privilege remains operational with at least one functioning node. No special care is required to support continued access to privileged accounts.
  5. Multi-DCs, all vaults:
    This is essentially the end-of-the-world scenario. It requires that all major data centers and all Bravura Privilege nodes are damaged. When the dust settles and the time comes to rebuild, the contents of the Bravura Privilege credential database may be restored from any backup media that may have survived the cataclysm.

As can be seen above, it would take something like a global-scale disaster to cause service interruption to Bravura Privilege, due to its geographically distributed architecture, real-time and fault-tolerant data replication and multi-master, active-active operation. Not impossible, but also not likely.

A more practical consideration is whether access controls should be relaxed in the context of a disaster which incapacitates one or more data centers. For example, workflow approvals might be required under normal circumstances, to checkout access to some privileged accounts. Moreover, access to a directory might be required when users sign into Bravura Privilege.

To expedite access to privileged accounts during an emergency, a handful of local login IDs can be defined on the Bravura Privilege logical instance, with unrestricted access to some or all privileged accounts. In an emergency, these accounts may be enabled and their passwords distributed to the recovery team. This reduces "red tape" (and lowers security) and eliminates the dependency on an external directory.