Internal Errors

Almost every piece of software has it’s own way of handling internal errors. Usually such error is reported, and the software aborts. Take, as an example, the infamous blue screen of death. Is it possible to define the important concepts for internal errors, so as to not confuse an end-user? Let’s give it a try:

  • First of all, if an internal error is detected, this should mean the software has found a potential bug. Something in the flow of execution went awfully wrong, with devastating effects such that the program cannot continue running. Which is actually what internal errors are for; detecting a wrong execution path before devastating events will take place.
  • Therefore, the program should abort after detecting and reporting the internal error.
  • The user should be notified how to report the error as a bug. Ideally, the program might provide a way to automatically report the previously occurred error. Via the internet, for example, see Joel Spolsky’s excellent article on automatic crash reporting in the context of his FogBugz product. Unfortunately, this might be out of the question in software for businesses depending highy on intellectual property rights.
  • Every internal error should be uniquely identified, for example, using an error code, to ease communication between clients, the helpdesk and development. They can all talk about, say, error 146. The developers should even be able to immediately identify the line in the code where error 146 is trapped.
  • A hint might be given about what is wrong, although that could confuse the user in believing to be able to solve the error. However, since an internal error detects a potential bug, most probably the source code is wrong, which a user usually cannot fix. Moreover, you don’t want the user to try working around the error, it is a software problem that you should want to know of. This is a tough call: give the user some information to be able to try to work around the problem, or leave out those details in the message reported by the software, to raise the chance of being notified of the problem (in particular when automatic reporting is not an option). In the second case, your helpdesk could be instructed with hints to work around specific internal errors.

Some examples: a segmentation fault can actually be classified as an internal error, however not detected by the program itself, but the OS. Running Linux, an application will print “Segmentation fault” and is terminated by the kernel. Seems reasonable — there is no information how to contact the helpdesk, since there is no centralized Linux helpdesk to contact. I also see no reason to use error codes. The Windows 9x blue screen gives you a choice: terminate the program an continue or restart the OS. This is confusing for the user; in almost all cases, a restart is required to really recover from the problem. On the other hand, the Xbox 360 error screen shows a distinct example of a non-confusing internal error.