Top 5 Reasons Why Desktops Get Corrupted


 * Why is my Desktop so Fragile and What can I do About it?

How many times have you heard or read about a user who has had to rebuild their Desktop, run MAKEINI to recover from corrupted INI files, use Alt-F1 during the boot process to return to the install Desktop or completely reinstall and recustomize OS/2 because of some sort of Desktop problem? I suspect that everyone has heard of multiple instances of one or more of the above and many have been the user with the problem. Why is this happening and what can a user do to insure that they are not included in this unfortunate group?


 * Why is the Desktop so Fragile?

In my opinion, there are five primary reasons why so many Desktops have corruption problems:

1. The OS/2, PM, WPS environment is much more complex than virtually any user can appreciate. PM and WPS are both multithreaded applications and any environment where various tasks are done asynchronously, which is the definition of multithreading, is much more complex than it appears on the surface. The reason for the complexity is that the number of different logic paths that can be taken through the code becomes large very quickly, since the logic path is very dependent on timing. When you add this basic complexity to the fact that PM and WPS have a very large number of threads, you end up with an extremely, to say the least, complex environment. There are ways to address this problem, such as semaphores and other mechanisms, but it is also very easy to leave holes where unexpected events can occur.

2. Because of the complexity described above, it is not possible to adequately test the effect of changes that are made to the operating environment. IBM keeps an extensive suite of tests that they run on every new version of OS/2 and that they use to verify that a fix to one problem has not created a problem somewhere else. These tests can verify that the most likely logic paths will work correctly, but they cannot verify that every logic path is okay because it is not possible to identify every logic path and, even if it was possible, the resources necessary to test them all would far exceed what is available. Thus there will always leave bugs in the code and some these bugs may eventually cause a problem with one or more user's Desktop.

3. IBM has chosen to downplay the probability of having Desktop problems. Although the IBM manuals do describe how to make a Backup of the Desktop, it is placed way at the end of one of the manuals and is not stressed as being as important as I believe it is.

4. The response of the IBM support mechanisms to user Desktop problems often goes directly to the most extreme and difficult solutions. Very frequently, when a user contacts IBM for technical support because of a Desktop problem, three things occur. First, very little, if any, effort is made to figure out what was the actual cause of the problem and a generic reason for the problem, such as INI File Corruption, is given. Second, the proposed solution is often to start over, which is easy for the technical support person and a lot of work for the user. Finally, the user is given no direction for preventing the problem in the future. The net effect of this is that users are left with the vague feeling that they are the reason the Desktop is corrupted, not OS/2, PM or WPS, the user is faced with a huge amount of work to rebuild their Desktop from scratch and nothing has been done to prevent the user from being in the exact same position again.

5. The Desktop can be corrupted by any application that runs in the OS/2 environment or any number of user actions. The parts of OS/2 that make up the Desktop are not protected from being changed by any application running under OS/2 or user modification. Once again, there are reasons for this. Providing the protection would significantly increase the size and complexity of OS/2 and would have a negative effect on performance. In addition, the added complexity would make maintenance of OS/2 that much more work. One could probably argue both sides of the issue, but it makes little difference as far as today's user is concerned. The fact is that everything that makes up the Desktop can be changed by any application in the system and, in my opinion, it is a testimony to the quality of OS/2 software that third party applications and user mistakes do not cause more problems than they do.

There are probably many other things that contribute to the problem and some of them are being addressed by IBM. For example, WPS aware application must currently run as part of the single WPS process, requiring close integration between all of these applications and WPS. This problem will be eased somewhat when WPS is modified to use DSOM rather than SOM.


 * What can I do about this Desktop Problem?

One of the most interesting aspects of this entire problem is that it is extremely easy to address. In the process of developing IniMaint, SysMaint, MultiMaint and AccessWPS I have clobbered the Desktop, the INI files, the Extended Attributes and other parts of my operating environment in more ways than almost anyone one else who has worked with OS/2. I have never had to reinstall OS/2, never had to run MAKEINI to rebuild corrupted INI files, never had to use Alt-F1 to return to the Install Desktop or had to rebuild my Desktop manually. The reason is that I have always had a Backup of my Desktop and a plan for how I would use that Backup to Restore my operating environment in case of a problem. It is as simple as that.

There are a number of applications that will make a complete Backup of the Desktop in less than a minute. These same applications can completely Restore the Desktop in about a minute, as long as PM is not running. The longest part of doing a Restore is creating an environment where OS/2 is running, but PM is not running. There are two ways, that I know of, to do this. First is to boot from a floppy. Second is to interrupt the boot process before PM is started with something like ShiftRun. Once this environment is reached, the Desktop Restore is very quick. (NOTE FROM R.KUT: Warp users have the ability to hit the Alt-F1 keys when the OS/2 box appears in the upper left corner at boot time. This will bring up a menu of various choices. To get OS/2 running without PM running, as mentioned above, select option C from the menu. This will bring you to a command prompt, from which you can do whatever is required. When you are finished, type EXIT, and the system will flush all it's caches and reboot. This process will replace the above two suggestions.)

If you do not have both a Backup of your Desktop and a clear plan for how you will use the Backup to Restore your environment, then you should place this at the very top of your priority list and get it done immediately. Both the Backup and the plan are critical. The time to develop a plan for how you are going to recover is not while you are staring at a black screen or a white screen with the clock sitting in the middle. If either of these happen to you, you will be very upset, mad and probably will have difficulty thinking things through. Therefore, you need to make a Backup and then imagine yourself sitting in front of that blank screen and think through the exact steps you would take to recover. Here is my sequence:

1. I use ShiftRun, so if the boot got that far, I simply reboot. If the boot did not get that far, I boot from the Install Diskettes. If you are running OS/2 2.1 with the SP, be sure you have diskettes that you can boot from because the original diskettes will not work because of version problems and the SP diskettes will not work because they will run the Service Program and the Shell. (NOTE FROM R. KUT: Warp users should do a right-click on the Desktop and select SYSTEM SETUP. In there you will find the icon CREATE UTILITY DISKETTES. You will need three HD floppies for this process. Keep the floppies in a safe place, and update them after applying any Corrective Service Diskettes (CSD). This step might seem unnecessary right now, but once you have crashed you will be glad that you stopped to do it.)

2. I then run my the Restore CMD file created by my Backup and Restore application, SysMaint, specifying the latest Backup generation.

3. I then reboot OS/2 and I am back operating normally.

Total elapsed time, less than 5 minutes. Rate of failure, zero.

Even though I have not ever had a failure to Restore my Desktop, I am prepared for a failure, since my Backup application will automatically maintain as many generations of Backup as I ask it to and I have asked for 20 generations. This means I could go back one generation at a time to older and older Desktops.

So what should you do if you do not have a Backup of your Desktop?

1. Get a Desktop Backup and Restore application immediately.

2. Make a Backup of your current Desktop and put it somewhere where it will not get clobbered.

3. Sit down and think through exactly what you will do to Restore your Desktop if you have a problem and be sure it will work. In other words, if you are going to boot from a floppy, then get out the floppies and boot from them to insure you can do it. If you are running ShiftRun, then interrupt the boot and make sure you know exactly what you would do from there.


 * When should I make a New Backup of my Desktop?

My answer to this question is that I make a new Backup anytime I make any changes to the position, content or settings of any Object on the Desktop. The Backup takes less than a minute and I always have at least 10 generations of Backup so there is no reason to expose myself to having to manually recreate things.

The key here is having multiple generations of Backup. If your Backup application does not have a facility for maintaining multiple generations then you must insure, at the very minimum, that a new Backup will not clobber the current one, so you need to create your own mechanism for protecting the current Desktop Backup. If the process of maintaining multiple generations is a complex or time consuming one, then you might decide to not make a Backup as often as I do.