3.7.2 Recoverable Errors

The MDFU Protocol defines mechanisms for detecting and recovering from errors. When a recoverable error occurs in an update, a proper series of subsequent operations can be used to correct this error. Once the error has been corrected, the update can resume and be successfully completed.

The following errors are considered recoverable errors:

  1. Commands that have been corrupted during transmission from the host to the client.
  2. Responses that have been corrupted during transmission from the client to the host.
  3. Commands that are completely lost due to corruption on the physical communication bus.
  4. Responses that are completely lost due to corruption on the physical communication bus.

These errors may be caused by sources such as noise bit flips which occur as the commands and responses are transported across a physical bus.

3.7.2.1 Detection

One of the responsibilities of each Transport Layer definition is to provide a command-response integrity check mechanism for determining if commands and responses have been corrupted when they are transmitted across the communications bus.

Commands and responses that are completely lost are detected using the command time-outs which are learned during the Discovery phase and incorporated as part of the flow control algorithm.

3.7.2.2 Host Recovery Algorithm

In order to recover from corruption, hosts are responsible for:

  1. Resending commands when requested by the client.
  2. Resending commands when responses are not received in the command time-out time period.
  3. Detecting response corruption and resending the command to recover from this corruption.

The host recovery algorithm is very simple and it is shown in the diagram below.

The host must provide a mechanism to log all the errors detected and notify a use of the errors. The host must also have a configurable maximum retry count (MaxRetries) before terminating the update to help make link problems obvious to the user.

Figure 3-9. Host Error Recovery Algorithm

3.7.2.3 Client Recovery Algorithm

In order to recover from corruption, clients are responsible for:

  1. Detecting command corruption and requesting the host resend corrupted commands.
  2. Implementing the receive sequence number filtering to ensure that commands are only executed once and are executed in the proper order.
  3. Resending a response to the host without triggering a second execution of a command when the host is recovering from a corrupted response.
    Important: After executing a command, a client must retain the response to that command until the client can conclude that the host will not request that specific response be resent. See the Client Sequence Number Processing section of the specification for more information on when a client can discard a response to an executed command.

The figure below shows how the client fulfills the three error recovery responsibilities described above.

Figure 3-10. Client Recovery Algorithm
Figure Notes
Note 1:This figure only highlights the client algorithm details that allow clients to recover from errors. See the Client Command Processing and Response Generation section for more details on the complete client command processing and response generation algorithm.
Note 2:See Client Sequence Number Processing section of the specification for more information.
Note 3:The only responses resent by the client are responses to executed commands4. Ephemeral Resend Request Responses5 are never resent by the client.
Note 4:Responses to executed commands must be retained by the client until the client can conclude that the host will not request for the response to be resent.
Note 5:Ephemeral Resend Request Responses are never resent by the client, and the client can discard them as soon as they have been sent to the host.

3.7.2.4 Detailed Recovery Diagrams

The following sections illustrate detailed sequence diagrams which show how the host and client recovery algorithms can recover from the following types of recoverable errors:

  • Corrupted Command
  • Corrupted Response
  • Corrupted Command followed by Corrupted Response
  • Corrupted Response followed by Corrupted Command
  • Lost Command
  • Lost Response

3.7.2.4.1 Recovering From a Corrupted Command

Upon detecting a corrupted command, the client sends a response with status COMMAND_NOT_EXECUTED and asserts the response sequence number RESEND bit. When the host sees the resend bit asserted, it resends the command to the client.

3.7.2.4.2 Recovering From a Corrupted Response

When the host detects a corrupted response, the host resends the last command sent.

The client uses the repeated sequence number of the repeated command to avoid repeated execution and simply resends the corresponding response.

3.7.2.4.3 Recovering From a Corrupted Command Followed by a Corrupted Response

In a situation where a command is corrupted and the response to that command is also corrupted, the host resends the command. Upon receiving the command, the client executes the command and sends the response to the command allowing successful recovery from a corrupted command and corrupted response.

3.7.2.4.4 Recovering From a Corrupted Response Followed by a Corrupted Command

The figure below illustrates how a host recovers from a situation where a response is corrupted and the command that is resent by the host also gets corrupted. The client detects the command corruption and sends a resend request for the NextSeqNum command. The host still has not successfully received a response for the last command it sent so it resends the command a second time. The client sees it has already executed this command and simply resends the response to that command without executing the command again. This process enables recovery from a situation where a response and subsequent command are both corrupted.

3.7.2.4.5 Recovering From a Lost Command

There are certain forms of command corruption which can occur on certain transport interfaces that can result in the client failing to completely receive a command. In these scenarios, the command is lost and the client never generates a response to the lost command.

The host detects command loss when it times out waiting for a response from the client. At this point, the host can recover by resending the command to the client.

3.7.2.4.6 Recovering From a Lost Response

The host detects response loss when it times out waiting for a response from the client. At this point, the host can recover by resending the command to the client.