Error Handling in VBScript - Part III

Strategy for Handling Errors

I lay out a set of guidelines in this section is that I strongly recommend that you follow if you are developing VBscripts.

Rule 0. You need do a strategy for handling errors.

OK, if you are writing a script solely for your own convenience then you can decide to adopt a lazy policy and not bother with error handling, but defer this work until the errors actually occur. However in the worlds of enterprise system administration and ASP, the person who is running the script is not the same as its author. So it isn’t you who has to deal with errors when they occur, it’s the poor user. You therefore have a responsibility to the user to ask yourself “what should I do in my script do if an error occurs?”

Rule 1. Only attempt to process and recover from those errors where a clearly stated rule exists for doing so.

When you designed your script there may be specific ‘errors’ that you anticipate and where you can define a rule for handling the error and continuing processing. The example of appending to an error log is a good one in point. There is a small chance that the log open might fail because some other script is running which has also opened the log. Here the logic for the rule might be to retry the open after having slept for a short period (up to a maximum number of times). Another might be reading a registry key which might not exist, but that a default is defined in this case.

Rule 2. Adopt a zero toleration policy for all other errors.

Please remember that we are writing scripts to carry out automation of system administration in a live production estate so the last thing that we should allow is the situation where the script engine detects an error, but the script logic continues processing after in the hope that we can ignore it and therefore also implicitly the root cause of the error. In my view such an attitude is an absolute no-no.

For all errors other than covered by rule 1 there is only one safe policy: cleanly terminate the script as soon as practical, because once such an error has occurred, what else can you safely do?

Rule 3. Attempt to catch all errors before changing system state.

A script’s actions can either be viewed as read-only (for example reading a file, a registry value or executing a read query through WMI) or changing system state (writing to a file, the registry, etc.)(1) . The situation that you must try to avoid is one where you: <change some state> <do some validation or processing which may result in throwing an error> <change some more state> because if the middle step results in script termination, then you have left the system in a part changed state. The goal should be all or nothing in terms of state change. This means that you must explicitly validate all inputs and parameters. Don’t assume correctness; check first. If necessary execute the script in two or more phases: validation, then processing.

A corollary to this is always to use Option Explicit to trap as many putative errors as possible during script parsing — better that the errors occur here where they can be easily pricked up in testing.

Rule 4. Adopt a coherent and standard strategy for logging and auditing that ensures that all errors and context are recorded.

The primary purpose of logging is forensics. Errors do occur; operators do invoke scripts with the wrong parameters; external service failures such as network brown-out can cause scripts to fail in bizarre ways. Rule 3 is not perfect and there are circumstances when scripts can crater part-way through a set of changes. When this happens you should always log the exception and ensure that your support processes sweep up and analyse such exceptions, investigating the root cause of the error, fixing any incomplete state changes, and identifying any flaws in the script logic. In order to stand any reasonable chance of doing this properly you need a detailed record of what the inputs to the script were, what the context was (who, what, where, when), and what errors were detected.

Rule 5. Write the log file to a standard folder and use a standard unique naming convention for the log file name.

I know of at least one instance when a script has failed half way through, and the operator has then retried it. Unfortunately the script wrote to a file called `error.log` and the second run over-wrote the content of the first run. Moral: adopt a convention such as <script ID>-YYYYMMSSHHMMS-NN.log.

Rule 6. Write a begin and end record to the Application Error Log giving a summary context (who, what, where) status. Make sure a mechanism is implemented to replicate the Application Error-log entries to a central repository.

A record of every script execution must be maintained, both on the local machine where the script was executed and centrally. This central copy should be asynchronous to script execution to avoid denial of service by the script aborting if the network path browns-out. Management Systems such as MOM can be configured to do this automatically. A simple alternative is to spawn a standard script which uses WMI to query the log and post them to a central repository. Why both to do this — simply because System Admins often administer Servers by executing scripts on their local PC. In a large enterprise, there may be many such PCs, and if something happens to a server, the only practical way you can determine who has been working on the that server is to interrogate the central log.

Rule 7. Update a standard global variable (e.g. g_strContext) with a description of the current processing as you move through your code. Use this when logging any fatal errors.

One of the features of VBScript error processing is that if a error is thrown it can be caught by a calling routine but it will have little information as to the context in which the error occurred. By updating this global variable then the error handling routing can use this to provide a context in which the error occurred.

Rule 8. Use the subroutine hierarchy to simplify and add structure to your error handling.

Within your subroutine hierarchy strictly separate out `processing routines` where you carry processing and can therefore create errors from those `error handling routines` which provide the error processing logic.

Always start a `processing routine` with On Error Goto 0(2) . With the exception of errors as detailed in rule 1, do not attempt to process errors but allow the scripting engine to throw such errors up the call stack. Regularly update strContext as you move through the code.

Always start an `error handling routine` with On Error Resume Next. Move all processing into processing routines, and limit processing to invoking such routines and the subsequent error logic. Always structure your main routine as such an error handling routine. The basic structure of such routines should look like:

  On Error Resume Next
  ...  
  subProcess1 parameters
  < Note:1>
  If Err.Number <> 0 Then
     <Error Processing>
     ...
     Wscript.Quit
  End If
  ...
  <Note:1 You can safely fold error handling by adding here>
  If Err.Number = 0 Then subProcess2 parameters
  If Err.Number = 0 Then subProcess3 parameters
  ...

Rule 9. Where a valid rule exists for handling a potential error, the statement could raise the error should be bracketed using one of the following templates.

These ensure that any Err status is cleared and Resume Next is turned off.

  ` Template One
  On Error Resume Next
  <var> = <default expression>
  <var> = <expression that could error>
  On Error Goto 0
  ` Template Two
  On Error Resume Next
  <Statement that could error>
  If Err.Number <> 0 Then
     <if necessary latch Err properties>
     On Error Goto 0
     <Handle Error State>
  Endif
  On Error Goto 0
  ` Template Three
  On Error Resume Next
  <Statement that could error>
  If Err.Number = <SOME_PREDICTED_ERR_CONST> Then
  On Error Goto 0
     <Handle Error State>
     Else If Err.Number <> 0 Then
        subUtilSaveError [or nasty code to latch Err properties]
        On Error Goto 0
        subUtilRaiseError [or a fully qualified Err.Raise statement]
     Endif
     On Error Goto 0

Rule 10. Better to have redundant error checking than too little.

Heard about the trader who sold 61,000 shares for $16 instead of 16,000 shares for $61? Well, this sorts of mistake happen in real life when running administration scripts. Finger trouble can cause disasters, so where appropriate put in sensible double checks and cross checks. Better safe than sorry. For example, if you are running a script over a range of servers through some wildcard or an OU relationship, well how many servers are you expecting? Maybe you want to apply an upper bound check, or have an extra parameter which is the number of servers ±10%, say. You can apply tolerances here because you are trying to catch the simple finger trouble that can cause serious damage. And always go for collateral redundancy rather than simple duplication or “Are you sure?” — how many times have you clicked on “Yes” to such a dialogue only to say “B***er, I meant No!” two seconds later. Such redundancy means that you need two bits of finger trouble to damage the estate.

Footnotes

(1) However, the convention is also adopted that writing to certain write only mechanisms (such as the scripts local log file) is outside normal system state and is therefore not treated as changing the system state.

(2) There is a bit of a dilemma here since this is on the one hand this causes the side effect of clearing the Error block, but on the other the Vbscript Language definition does not document that this is the default state.