Databook for OS/2 – Chapter 2 – Inside OS/2 Warp

Overview
IBM’s newest version of Warp, code named Merlin, and officially called Warp 4, has many important new features. Among the more important and interesting of these are integrated voice recognition, a help feature called WarpGuide, and a usability feature called the Warp Center.Warp 4 retains its predecessor’s Crash Protection(TM) which prevents a crashed program from causing the entire system to crash. It also keeps its powerful multitasking capabilities and the ability to run all of your current DOS and Windows 3.1 applications.

The workings of some of the major features of OS/2 Warp are discussed in a fair amount of detail in Chapter 2. All versions of Warp covered here. Warp 3, Warp Connect, Warp 4, and Warp Server, all work in exactly the same way at this level because they share the same kernel.

The specific functions covered in this chapter are multitasking, the OS/2 ring protection mechanism, memory protection, and Crash Protection. What multitasking is and how it can benefit you is covered. How the priority-based, preemptive multitasking of OS/2 works and why it is important. Memory protection and the ring protection mechanism of the 80×86 and Pentium Processors and how OS/2 uses them to provide crash protection.

The OS/2 Kernel
The kernel of any operating system provides the core functions for that operating system. It is the part of the operating system that performs basic functions such as allocating hardware resources like memory and CPU time. The OS/2 kernel functions reside in the OS2KRNL executable file which is loaded during boot. Note that the file name has no extension.

The OS/2 kernel performs the following basic functions: During the boot process, the kernel is also responsible for loading and processing the CONFIG.SYS file.
 * Memory management. The kernel allocates and deallocates memory and assigns physical memory locations based upon requests, either implicit or explicit, from application programs. In cooperation with the CPU, the kernel also manages access to memory to ensure that programs only access those regions of memory which have been assigned to them. Part of memory management includes managing the SWAPPER.DAT file and the movement of memory pages between RAM and the swapper file on the hard drive.
 * Task management. The OS/2 kernel manages the execution of all tasks running on the system. The scheduler portion of the kernel allocates CPU time to each running process based on its priority and whether it is capable of running. A task which is blocked – perhaps it is waiting for data to be delivered from the disk, or for input from the keyboard – does not receive CPU time. The OS/2 kernel will also preempt a lower priority task when a task with a higher priority becomes unblocked and capable of running.
 * Interprocess communication.Interprocess communication (IPC) is vital to any multitasking operating system. Many tasks must be synchronized or communicate with each other to ensure that their work is properly coordinated. The kernel manages a number of IPC methods. Shared memory is used when two tasks need to pass data between them. The OS/2 clipboard is a good example of shared memory. Data which is cut or copied to the clipboard is stored in shared memory. When the stored data is pasted into another application, that application looks for the data in the clipboard’s shared memory area.
 * Named pipes can be used to communicate data between two programs. Data can be pushed into the pipe by one program and the other program can pull the data out of the other end of the pipe. A program may collect data very quickly and push it into the pipe. Another program may take the data out of the other end of the pipe and either display it on the screen or store it to the disk, but it can handle the data at its own rate.
 * Semaphores can be used to coordinate the activity of two programs or two separate threads within a single program. When one task sets the semaphore, for example, the other task cannot proceed until the first has reset the semaphore.
 * Device management.The kernel manages access to the physical hardware through the use of device drivers. Access to physical devices must be managed carefully or more than one application might attempt to control the same device at the same time. The OS/2 kernel manages this so that only one program actually has control of or access to a device at any given moment.One example of this is a COM port. Only one program can communicate through a COM port at any given time. If you are using the COM port to get your e-mail from the Internet, for example, and try to start another program which attempts to use the same COM port such as HyperAccess Lite, the OS/2 kernel detects that the COM port is already in use. The kernel then uses the hardware error handler (HARDERR.EXE) to display a message on the screen that the COM port is in use.
 * I/O Management.The kernel is also responsible for managing I/O devices. This includes parallel and serial port I/O, and file system I/O.The kernel does not actually handle physical access to the disk, but rather manages the requests for disk I/O submitted by the various running programs. It passes these requests on to the file system, whether it be FAT, HPFS, CDFS (CD-ROM file system), or NFS (Network file system), and manages the transfer of data between the file system and the requesting programs.

Much of the code for actual implementation of these kernel level functions resides in dynamic link libraries such as DOSCALL1.DLL. The command processor, CMD.EXE is also part of the kernel. Some basic command line commands are also included in the kernel as part of the file CMD.EXE. The commands are called Internal commands because they are a part of the kernel. The COPY and DEL commands are examples of internal commands.

Kernel Variations
Over the years, as new versions of OS/2 have appeared, the kernel of the operating system has changed. Most of these changes have been relatively minor to incorporate support for new features or functions.

The primary changes in the OS/2 kernel have been for Symmetric Multi Processing (SMP). All versions of Warp including Warp 3, Warp Connect, and Warp 4, have a uniprocessor kernel. Warp Server has a uniprocessor kernel. Warp Server SMP has an SMP kernel. Warp Server for e-business is shipped with both the uniprocessor and the SMP kernel. When installing Warp Server for e-business, you can select whether to install the uniprocessor or the SMP kernel.

Recommendation: When installing Warp Server for e-business, it is a valid choice to install the SMP kernel on a uniprocessor computer. Because the uniprocessor kernel is somewhat faster on a uniprocessor system than the SMP kernel is, I recommend installing the uniprocessor kernel on a uniprocessor system.

Introduction to Multitasking
At any given moment in time, a personal computer microprocessor can execute only one instruction in one task. However, by dividing the microprocessor time between multiple tasks, it is possible to give the impression to relatively slow human senses that they are running simultaneously. This technique is called multitasking.

DOS is intrinsically a single tasking operating system. This means that it can only perform one task, or job, at a time. If you wish to run an application program other than one which is currently running, you must terminate the current application and then start the second application. This limitation, as much as the limitation on memory usage, is due to the historical roots of DOS. The original Intel 8088 microprocessor did not have the hardware support required to effectively implement a multitasking, virtual memory, or memory isolated operating system.

OS/2 utilizes the built in multitasking capabilities of the Intel 80×86 family of microprocessors. These capabilities include the ability to address large amounts of real memory – up to 4 GB – and the ability to protect the system resources being used by one program from intrusion by another program. For this reason, the operating mode of the 80×86 microprocessors which provides for multitasking is known as protect mode.

The multitasking ability of OS/2 allows the system to perform more than one task at a time. You may have several different programs running at once, as well as multiple communications sessions. It is also possible to have one program performing several tasks simultaneously. The basic unit of program execution is the thread. Each program running under OS/2 must have at least one thread, but may have more than one if it has been so designed.

Processes and Threads
The minimal benefit of being able to switch from one program to another without terminating the first is something which everyone can appreciate. The real benefits of true multitasking become apparent when downloading several large files from a BBS, using FTP to transfer a file to the internet, spell checking a large document, and printing checks for accounts payable. The downloads, spell check, and check printing can be started and then you can work interactively with the word processor while the other tasks proceed in the background.

The benefits derived from using multiple threads in a single program are not so apparent, however. From the standpoint of the user, the primary benefit is that of faster perceived program response time. The programmer may provide one thread to allow for data input, another thread for updating the screen, and yet a third for performing calculations. In a spreadsheet environment, for example, as soon as a user has entered a number into a cell, the display can be updated with newly calculated values and while the user is entering additional data, the background thread can recalculate the rest of the spreadsheet. This results in a much faster perceived speed of operation because the user does not have to wait for the entire spreadsheet to recalculate before additional data is entered.

When each program is launched Warp creates a new process. Every process (program) has at least one thread. If the programmer of an OS/2 program decided that multiple threads were required, he or she could use special function calls within OS/2 to do so. In general, programs have only one thread unless the programmer has overtly designed the program for more.

Task Management
OS/2 multitasking is managed by the task dispatcher. The dispatcher handles the dispatching of tasks with a set of algorithms which are designed to allow multiple tasks to run simultaneously with as little effect upon each other as possible, while still allowing high priority tasks swift access to the processor when needed. OS/2 is a dynamic, priority based, preemptive multitasking system with round-robin scheduling.

In OS/2 Warp, the dispatcher dynamically modifies a thread’s time slice based on trigger actions that may have occurred. For instance, if a thread encounters a page fault during its time slice, it is given an extra time slice to process the data contained in the faulted page. Applications doing disk I/O are also given extra time slices if the data they are reading is in the disk cache. When the TIMESLICE= parameter is used as in versions 1.x and 2.x of OS/2, none of these dynamic modifications of a threads allocated time will occur. Instead, each thread will be given the minimum time slice of X, and its time slice will not be allowed to go beyond value Y. See also the TIMESLICE document for more details about this CONFIG.SYS file parameter.

Priority Based Multitasking
A priority based multitasking operating system is one in which each task is assigned a priority. The priority of each task is determined by the programmer. If the programmer makes no overt choice, the operating system, OS/2 in this case, assigns a default priority, regular class, priority level 0.

In any priority based task scheduling operating system, the task with the highest priority which is capable of running, that is, not blocked, will get the CPU. All other tasks must wait until that task becomes blocked before they may have CPU time. A task becomes blocked when it must wait for input from a keyboard or mouse, for example, or for data to be read from a disk, or for a communications buffer to fill. There are many other reasons why a task might become blocked. If a task of a higher priority than the current running task becomes unblocked, then OS/2 will preempt the running task to give the CPU to the task with the higher priority.

OS/2 has four priority classes:
 * 1) Time critical (800) A time critical task such as communications or networking will run in this priority class. Such programs need to have access to the CPU very quickly.

In communications, for example, when the data receive buffer fills up it is necessary to remove the data from it and store it in a less critical location until it can be processed by a lower priority task. When the receive buffer fills, the communications port hardware generates a hardware interrupt which is handled by the OS/2 device driver for that port. For an asynchronous communications port this is the COM.SYS device driver. The device driver sends a software interrupt to the application program which moves the data from the receive buffer and stores it in another location in memory.

The portion of the program which handles the communications buffer must be a very high priority so that it can get the data from the buffer before more data arrives and causes a receive buffer overrun. The rest of the communications program can run at a lower priority class. When overruns occur, the data must be resent which reduces the total data transfer rate.

Time critical tasks take a very small portion of the CPU’s time. Even after all time critical tasks receive all the CPU time they require, there is still plenty of time left for all the other tasks running on the computer.

The Pulse program itself runs at idle priority. It simply counts the CPU cycles which it receives as a consequence of the fact that no other process needs them, and displays the result as a graph indicating the amount of CPU time being used by all the other programs on the system. To perform this task accurately, it cannot run at regular class or higher because it would always receive some CPU cycles in the normal round-robin sequence of task dispatching. This would skew the results.
 * 1) Server (400) Server class programs are used by Warp Server, for example, to process requests for data by client workstations and to get the data requested ready to transfer across the network. Processing these types of tasks at the Server class ensures that administrative tasks such as creating new user IDs do not interfere with them.
 * 2) Regular/normal (200) Regular, or normal class is the priority class at which most application programs run. This is also the default class at which OS/2 will run programs which do not overtly specify a priority class.
 * 3) Idle time (100) A program which runs at idle class will only receive CPU time when all other programs at higher priorities have become blocked. In reality, for most average systems, this can be quite a large amount of time. To see this, start the OS/2 Pulse program applet which comes with OS/2. It is located in the Productivity folder. Most of the time the processor is doing very little, if anything, so there is lots of time for idle class programs to run.

A good example of a productive application which should be run at idle class would be a math intensive calculation. There is an OS/2 program which calculates the common math constants Pi and e (Euler’s constant) to any number of decimal places in increments of 500 places. Suppose I want to calculate Pi to 1,000,000 decimal places. At the same time, I want to do all my normal daily tasks like word processing, Lotus Notes, network communications, and asynchronous communications.

Since this calculation takes a couple days on even a fast system, I cannot afford to have it tie up my system. This program was written so that I can change the priority class, so I set it at idle class and start the calculation. Actually, this program was written so intelligently that I can change the priority class while it is calculating. With the Pi calculation going in the background in idle class, I can go do my regular daily work. Any CPU cycles not needed for my normal work are given by Warp to the Pi calculation.

Figure 1: The 4 priority classes of Warp each have 32 levels.

32 Priority levels per class
Figure 2.1.1, above, shows the four priority classes of OS/2 Warp. The boxes to the right of the priority class diagram represent threads of several processes which are running. Threads on the same line are at the same priority class and level. Threads shown with red rectangles are blocked; that is, they cannot run because they are waiting for some system activity, such as disk access to complete, or the printer to request more data. The threads indicated by yellow rectangles are capable of running, but are not currently running because another thread is. The one thread represented by a green rectangle is running. Only one thread can run at a time.

Each priority class under OS/2 has 32 priority levels numbered from 0 through 31. There are a total of 128 discrete priority levels at which tasks can run. This means that multiple tasks may run in regular class, for example, but one or more may still be at a higher priority level within the regular class. The numeric values for each priority class are given in Table 2.1.1.

Using PSTAT to determine the priority of a task
Warp has a utility called PSTAT which allows you to display or print a list of all the tasks and threads running on your system. PSTAT shows the process and thread IDs for each task as well as the priority level for each task. Figure 2.1.2 shows the output of PSTAT when run on the author’s system. The command below issued from the command prompt sends the output of the command to a file named C:\PSTAT.DAT. The /C parameter is used to display the current process and thread data. Other parameters allow display of memory, DLL, and semaphore information.  PSTAT /C > C:\PSTAT.DAT

Process and Thread Information

Parent Process  Process   Session   Process   Thread ID       ID        ID       Name       ID    Priority   Block ID   State

002F     0000       13      C:\IBMLAN\SERVICES\ALERTER.EXE    01      0200     FFFE004C   Block 002E     0000       13      C:\IBMLAN\SERVICES\TIMESRC.EXE    01      0200     FFFE004A   Block 002D     0000       13      C:\IBMLAN\NETPROG\RNS1.EXE    01      0200     FFFE0048   Block 02     0200     06880096   Block 03     0200     068800C2   Block 04     0200     068800EE   Block 002B     0000       13      C:\IBMLAN\SERVICES\NETLOGON.EXE    01      0200     FFFE0046   Block 02     0200     FFFD0072   Block 03     031F     FFFE0045   Block 04     0200     040054FE   Block 002A     0000       13      C:\IBMLAN\SERVICES\NETSERVR.EXE    01      0300     FFFE003E   Block 02     0300     FFFE003C   Block 03     0300     FDF2BB44   Block 04     0200     FDF254B8   Block 05     0300     FFFE003F   Block 0029     0000       13      C:\IBMLAN\SERVICES\NETSERVR.EXE    01      0301     FFF78083   Block 02     0300     FFFE003D   Block 03     0300     040054E6   Block 04     0300     FFFE0040   Block 05     0300     04004853   Block 06     0301     FFFE0041   Block 0026     0000       13      C:\MUGLIB\MUGLRQST.EXE    01      0200     040054C8   Block 02     0200     FE1ACE34   Block 0025     0000       13      C:\IBMLAN\SERVICES\MSRV.EXE    01      0200     040054BC   Block 02     0200     FFFE0037   Block 0023     0000       13      C:\IBMLAN\SERVICES\WKSTAHLP.EXE    01      031E     2260BE6C   Block 02     0200     225005FE   Block 03     031E     2250210E   Block 04     031E     2250061C   Block 0022     0000       13      C:\IBMLAN\SERVICES\WKSTA.EXE    01      0300     FFFE003A   Block 02     021F     FFFD005B   Block 03     0200     FFFE0033   Block 04     021F     FFFD0060   Block 0018     0000       00      C:\OS2\EPWMUX.EXE    01      0200     FFFE0024   Block 0013     0000       00      C:\OS2\EPWMUX.EXE    01      0200     FFFE001F   Block 0012     0000       00      C:\OS2\EPWPSI.EXE    01      0200     06880012   Block 0011     0000       00      C:\OS2\EPWMP.EXE    01      0200     FE03E834   Block 02     0200     FFFE0025   Block 03     0200     FFF78083   Block 000E     0000       00      C:\WAL\MEMMANIT.EXE    01      031F     FFFE0016   Block 000C     0000       00      C:\IBMLAN\NETPROG\ATKINIT.EXE    01      0302     FFFE0013   Block 02     0200     FFFE0014   Block 000B     0000       00      C:\IBMCOM\VLANINIT.EXE    01      0200     FFFE0017   Block 000A     0000       00      C:\OS2\EPWROUT.EXE    01      0200     FFFE0020   Block 02     0200     FFFE001A   Block 0009     0000       00      C:\OS2\SYSTEM\LOGDAEM.EXE    01      021F     04A00252   Block 0008     0000       00      C:\IBMLAN\NETPROG\LSDAEMON.EXE    01      0200     FFF78083   Block 0007     0000       00      C:\IBMCOM\PROTOCOL\LANDLL.EXE    01      030B     544F4B52   Block 0005     0000       00      C:\MPTN\BIN\CNTRL.EXE    01      0304     F2750002   Block 02     0304     F2750001   Block 03     0304     1450525A   Block 04     0304     14505272   Block 05     0304     F2750001   Block 06     0304     F2750002   Block 07     0304     FFF78083   Block 08     0304     F2750008   Block 0004     0000       00      C:\IBMCOM\LANMSGEX.EXE    01      0200     FFFEF636   Block 000D     0001       01      C:\OS2\PMSHELL.EXE    01      0200     FDF20F24   Block 02     0300     FFCA000D   Block 03     0300     FFFD001C   Block 04     0300     FFFD001D   Block 05     0300     04000E28   Block 06     0200     FE179B6C   Block 07     0200     FE17BBF0   Block 08     041F     FDF20F54   Block 09     0200     FFFE001B   Block 0A     0200     FFFD0006   Block 0B     0300     FFFE001D   Block 0C     0300     FFFE001E   Block 0D     0300     FFFE001C   Block 0E     0304     FFFD002A   Block 0F     0304     04000E1A   Block 10     0200     FFFD002C   Block 11     0301     FDF25078   Block 12     0200     FDEDAD40   Block 13     0300     04000E36   Block 14     0300     FE03DB74   Block 15     0300     FDF254F8   Block 16     0401     FE1E037C   Block 17     0401     FE1E0DA8   Block 00A0     000D       15      F:\IMPOS20\IMPOS.EXE    01      0200     FE1F16EC   Block 02     0200     0688011A   Block 0093     000D       1E      E:\DESCRIBE\OS2\DESCRIBE.EXE    01      021F     FE1E6E38   Block 02     0200     FE1E6BC8   Block 03     0200     FE1E6BF4   Block 04     021F     FE1E1980   Block 05     0200     FE1E200C   Block 06     0200     FE1E2A30   Block 008E     000D       1D      (kernel)    01      0200     FE1E08D4   Block 0055     000D       1C      E:\UTILITY\FILESTAR\FILESTAR.EXE    01      0200     FDEF5C30   Block 03     0200     FDEF5990   Block 04     0200     FDF0FACC   Block 004B     000D       18      F:\NOTES\ILNOTES.EXE    01      0201     FE01ADF0   Block 004A     000D       1B      F:\NOTES\QNC.EXE    01      0200     FDFE7F68   Block 0048     000D       1A      E:\IBMAV\IBMAVTIM.EXE    01      0200     FFF78083   Block 0047     000D       11      E:\WPC200\WPC200.EXE    01      0200     FE1C2BEC   Block 02     0200     FE1C29D0   Block 03     0200     FE1C2948   Block 04     0200     FE1C292C   Block 0034     000D       13      C:\OS2\CMD.EXE    01      0200     FFCA0034   Block 00AA     0034       13      C:\OS2\PSTAT.EXE    01      031F     00000000  Running 0032     000D       04      F:\NOTES\QNC.EXE    01      0200     FE18C7B0   Block 0033     0032       04      F:\NOTES\ISERVER.EXE    01      0201     FFF78083   Block 02     0206     FFF78083   Block 03     0207     040054DA   Block 04     0207     040054D4   Block 05     0201     FFF78083   Block 06     0205     FE1B71F0   Block 0045     0033       04      F:\NOTES\QNC.EXE    01      0206     FE1C4D88   Block 0046     0045       04      F:\NOTES\IADMINP.EXE    01      0201     FFF78083   Block 0041     0033       04      F:\NOTES\QNC.EXE    01      0206     FDEFDC70   Block 0042     0041       04      F:\NOTES\IAMGR.EXE    01      0201     FFF78083   Block 0043     0042       04      F:\NOTES\QNC.EXE    01      0201     FDEFDB64   Block 0044     0043       04      F:\NOTES\IAMGR.EXE    01      0201     FE136AAC   Block 003C     0033       04      F:\NOTES\QNC.EXE    01      0206     FDFF3F70   Block 003D     003C       04      F:\NOTES\IUPDATE.EXE    01      0201     FFF78083   Block 003A     0033       04      F:\NOTES\QNC.EXE    01      0206     FE1B0D58   Block 003B     003A       04      F:\NOTES\IROUTER.EXE    01      0201     FFF78083   Block 0038     0033       04      F:\NOTES\QNC.EXE    01      0206     FE1C2864   Block 0039     0038       04      F:\NOTES\IREPLICA.EXE    01      0201     FFF78083   Block 001F     000D       19      E:\IBMAV\IBMAVSD.EXE    01      0200     FE040FCC   Block 001C     000D       17      E:\UTILITY\GTU30\SENTRY.EXE    01      0200     FDF1B27C   Block 02     0200     FFF78083   Block 03     0200     FFF78083   Block 04     0200     FFF78083   Block 05     0200     FFF78083   Block 001B     000D       16      E:\FAXWORKS\FAXWORKS.EXE    01      0200     FE10AECC   Block 02     0300     FDEDA968   Block 03     0200     0688006A   Block 04     0200     FFFE002F   Block 05     031F     00BC09E4   Block 0019     000D       14      C:\IBMLAN\NETPROG\NETMSG.EXE    01      0200     FDFF1CC0   Block 02     0200     FFFE0039   Block 03     0200     FE126BCC   Block 04     0200     FDFF1EE8   Block 0016     000D       12      C:\OS2\PMSHELL.EXE    01      0200     FE18CA20   Block 02     0300     FDEDA614   Block 03     030A     FDEDA5FC   Block 04     0300     FE18C7B4   Block 05     0300     FE17CE68   Block 06     0200     FDEDAD58   Block 07     0400     FE141FD8   Block 08     0200     FE122D34   Block 09     0200     FE122C30   Block 0B     0400     FE18C050   Block 0C     021F     43D82800   Ready 0D     0200     FFFE0026   Block 0E     0400     FE11FF54   Block 0014     000D       10      C:\OS2\PMSPOOL.EXE    01      0200     FFF78083   Block 02     0200     FE149BC4   Block 03     0200     4388000E   Block 04     0200     FE18DFF0   Block 05     0200     FE150970   Block 06     0200     FE1509A8   Block 000F     000D       00      C:\OS2\SYSTEM\HARDERR.EXE    01      0300     04000E44   Block 02     0300     04001158   Block 03     0300     0400117C   Block  Figure 2: The results of PSTAT on the Author’s system.

Altering Priority Levels
OS/2 does not provide for user access to priority levels. A programmer, however, may provide access to priority levels within a program and provide the user with controls in the program to manipulate them. There are also some third party utilities available which allow the user to view and alter the priority levels of other running programs. One such utility is CPU Monitor Plus, by BonAmi Software.

Extreme care should be taken when altering the priority level of a process or thread, as the action taken may have the opposite of the desired effect. Consider, for example, the print spooler process. This process manages the transfer of print data from application programs, to the hard drive, and from there to the printer.

When I ask most people what priority they think that the spooler process should run at, they tell me it should be a very low priority process. In fact, it should be a high priority process. The reason for this is that printing, being an essentially mechanical process, is very slow compared to the speed of the processors in today’s computers. As a result, when the spooler sends data to the printer, it must then wait for a computer’s eternity for the printer to print the data and request more from the spooler. When this request for more data is received by the spooler, it is important to respond and get the data to the printer as quickly as possible to prevent delays to an already slow process.

The spooler is blocked most of the time, either waiting for a print job or waiting for the printer to request more data for a running print job. While blocked, it gets no CPU time, and all five spooler threads have priority level 200, the base level of the Regular priority class. When a print job is spooling, however, one of the threads is boosted to priority level 400, the base level of Server (a.k.a. Fixed High) class. This provides a high enough priority to ensure that printing gets done quickly but, because the spooler is blocked most of the time waiting for the printer to request more data, it does not affect the performance of the rest of the system.

The Spooler object in the System Setup folder allows users to adjust the priority of the print spooler. You should be careful with this, however. Decreasing the priority will slow the print process significantly. Increasing the priority will have no noticeable effects while small print jobs are printing because they complete so quickly, but can cause the rest of the system to slow considerably while very large print jobs are being printed. Another problem with increasing the priority too much is that the spooler priority becomes higher than that of other important tasks such as network functions. This can cause problems not only with the response time on the local workstation, but it can also create problems for networking tasks.

The default print spooler priority setting is perfect for most environments.

Preemptive Multitasking
In any multitasking operating system, there are some tasks which are extremely important and which must obtain access to the processor as quickly as possible at specific times. Such a task would be a communications or networking program. These tasks, or really a small portion of these tasks, the portion responsible for the time critical part of the process, should be run at a high priority. For communications programs this is usually at some high level in the time critical class.

When a program with a higher priority than the currently running program or task becomes unblocked, OS/2 preempts the lower priority task and gives the CPU to the higher priority task.

Round-Robin Scheduling
Round-robin scheduling simply means that all tasks which are running at the same priority will receive time slices in a round-robin fashion, so that each can have some time. The Warp task dispatcher manages this and preempts any task which has taken the maximum time slice.

Dynamic Priority
The priority level at which a task runs under OS/2 is dynamic; OS/2 can change the priority of a task depending upon its needs. There are three ways in which OS/2 uses it’s ability to dynamically alter the priority of applications, even while they are running.

CPU Starvation
It is possible that some tasks in a multitasking system might become starved for CPU time. Since most running tasks will be at regular priority, many will be at the same priority level. As these tasks run in round-robin fashion, the amount of time taken by the tasks which are not blocked may prevent one or more of the remaining tasks from obtaining CPU time within the amount of time specified in the MAXWAIT statement in CONFIG.SYS.

When a task becomes starved for CPU time in this manner, the OS/2 task scheduler boosts the priority of the starved task within its class by one priority level. This places this task at a higher priority level than the other tasks with which it was running. It will, therefore, receive a timeslice before those other programs. When it does receive a timeslice, it is allocated one minimum timeslice by the scheduler and then it is reduced to its base priority.

This dynamic priority scheduling allows each task to be sure it will receive at least some CPU time.

Foreground Boost
When running a program in a multitasking environment, especially one with which you will continually interact such as a word processing program, it is reasonable to expect that program to respond to user input quickly. One of the ways in which OS/2 ensures that this will happen is to provide the foreground task, that is the one with which the user is interacting and which has the keyboard focus, a boost in priority. In this case the boost is by a level of one within the task’s priority class.

Foreground I/O Boost
Setting PRIORITY_DISK_IO=YES in CONFIG.SYS provides a priority boost to I/O operations for the application running in the foreground. This ensures that a DOS program, or an OS/2 program which was not written with a separate thread for I/O operations will perform its I/O tasks and return control to the user as quickly as possible.

OS/2 Crash Protection
This section discusses the term Crash Protection and what is meant by Crash Protection within the context of the OS/2 operating system. It also discusses why applications crash, the causes of OS/2 crashes, and methods which can be used to ensure that crashes of the operating system can be minimized.

Crash Protection
IBM began talking about OS/2 and its crash protection in early 1992, while OS/2 2.0 was in beta testing. IBM states on the OS/2 Warp package that “[OS/2's] Crash Protection helps prevent a single, wayward program from affecting the rest of your system.” Notice that the term Crash Protection is a trademark of the IBM Corporation. IBM thinks that this is so important that they want no one else to use the term. If Crash Protection so important, and if OS/2 has it, why does OS/2 still crash?

Application crashes
Application programs can and do crash. In many cases in which an application actually crashes, the problem is a programming error of some type. The short C++ program in Figure 1 is an example of a program which may crash under some circumstances. This program is designed to request a name as input. The variable “Name” is defined as a three character string. Most users will type their own name, and names vary in length. The program will cause an access violation and SYS3175 error if the user enters a name with more than three characters. Thus it will fail for long names and not fail for short ones. The SYS3175 error message is displayed in a PM dialog box. It is also possible to get SYS3170, SYS3171 , or SYS1808 errors depending upon the length of the name entered. ''Figure 1: This C++ program can crash under the right circumstances.'

In the case of a three letter name like “Don”, the program displays a text line which says, “The name you entered is: Don.” No error occurs. With a four letter name, like “Dave”, the program crashes and OS/2 displays a dialog box which says that a SYS3171 error has occurred. If the user chooses to view the register display from the dialog box choices, he or she will find that the error detail is “exception c0000005… insufficient stack space”.

Now assume that the user enters a fifteen letter name. The program crashes again and the user is presented with a dialog box which says that a SYS3175 error has occurred. This time viewing the register display shows that the error is an “…access violation…”. This program has also generated a SYS3186 error, which is a “privileged instruction” error when used with the name “Jennifer”, for example.

In all cases, when the error occurs, the program crashes. In no case does OS/2 crash and all of the other applications running on the system continue to run as if nothing has happened. As far as the other programs running on your system are concerned, nothing has happened because they were protected from the crash of the defective program. This is what crash protection means and OS/2 fulfills this promise quite nicely. This program would also crash in a DOS or Windows environment, but the entire system would crash and recovery would usually necessitate rebooting the system.

There are some very interesting conclusions which can be drawn from this little experiment. The first is that a single bug in a program can cause a number of different symptoms. In this case, at least five different errors have been presented.

The second conclusion we can make is that a bug may not necessarily present itself and cause a symptom; at least not until the proper circumstances are present. This situation can occur in a real program in which a programmer did not fully consider the size of a data field. The name field in a payroll program may work fine for years – until a new employee with a very long name is hired. An error occurs while entering the new employee into the payroll program, and the payroll clerk calls the company support person with this “new” bug. In many cases, the assumption is that the program is still fine, but the hardware is at fault, or that OS/2, which was installed last week, is now the problem.

Application lockups
OS/2′s multitasking uses a priority based, round-robin, preemptive algorithm. This means that OS/2 will give the CPU to the task with the highest priority. If another task is running at a lower priority, that running task will be preempted by OS/2 in order to do so. OS/2 will not allow a lower priority task to run until the higher priority task has completed.

A poorly written OS/2 application can take up all the CPU time and cause the entire system to lock up. Figure 2 contains a C program which operates entirely within the rules of OS/2 and its programming APIs, but which will lock up the system as soon as it begins to run. This program boosts itself to the highest priority in the system and loops in a math operation. This prevents OS/2 from allowing any other application CPU time. Figure 2: This program will lock up OS/2 by hogging all the CPU cycles

On rare occasions, a program may enter a high priority section of code and get stuck there. This can cause a complete system lock up. In other cases, the program will stay in the high priority section for several seconds or even minutes. This is not good programming, but it happens. Waiting a few moments will bring the system back to normal when the program causing the problem reduces its priority back to normal or completes execution of the high priority thread.

It is important to note that code can be written which can cause any operating system to lock up or crash. In most real world applications, the priority of a task is boosted temporarily and other applications and tasks will not even be affected. Other applications use a separate thread for high priority activities. These threads only need to execute for very short period of time and the effect on the rest of the programs running in the system is negligible.

The Single Input Queue Dilemma
Another cause of OS/2 crashes is the single input queue (SIQ) design of the Workplace Shell (WPS). The single input queue simply means that all of the mouse input and keystrokes sent to the WPS are sent to a single queue to wait until they can be processed by the application for which they were intended. This means that an application which was not written to properly respond to the queue and release it immediately can cause the entire desktop to lock up. Memory overcommitment of 100% (when total system memory requirements are 200% of installed RAM memory) or more also seem to contribute to the single input queue lockup problem.

In reality, OS/2 is not locked up; only the desktop is frozen. The processing of other applications continues so that if you have a print job spooling or a download from a BBS in process, those tasks continue to run. You can see this if you start the OS/2 System Clock and configure it to show the seconds hand. If your system locks because of the single input queue, the second hand on the clock will continue to run. You may wait until critical apps or downloads have finished before you reboot the system, if it actually comes to that.

The single input queue was designed into OS/2 by Microsoft back in days of OS/2 1.1, the first version to have the Presentation Manager desktop environment. This was done because – so the story goes – the programmer responsible for the Presentation Manager did not know how to deal with multiple input queues. With multiple input queues, each application has a separate queue for mouse and keyboard activity, so that a single misbehaved application does not lock up the entire WPS. Microsoft has used a multiple input queue strategy in Windows NT and Windows 95 to overcome this problem.

IBM has not modified OS/2 to provide multiple input queues because they contend that would break existing OS/2 applications which depend upon the single input queue model. IBM has said, however, that they intend to fix this problem, and appears to have included the circumvention in Fixpak 17 which became available in late January, 1996. The fix is not multiple queues, but rather a watchdog timer on the single input queue which will release the queue automatically when one application refuses to respond.

There is a great deal of controversy over whether this approach to the solution results in a true fix or merely a workaround. If it works, however, the specific approach used is irrelevant. Fixpak 16 is also supposed to have had the SIQ fix, but has been withdrawn because it had some problems. If you have a copy of it, you should not install it. Fixpack 17 does contain the SIQ fix. The fix is described in the text file READ17_1.TXT which is included with the fixpack. The following line must be added to the CONFIG.SYS file to activate the fix. SET PM_ASYNC_FOCUS_CHANGE=ON x The x parameter specifies in milliseconds the amount of time which OS/2 waits before determining that the application is not responding. The default with no parameter is 2000 milliseconds (2 seconds). The suggested range is from 2 to 5 seconds. Once it has determined that a program is not responding to the queue, OS/2 flags the queue as bad and switches to the next application you are trying to use.

OS/2 continues to monitor the queue to see whether the application begins to respond by reading messages. If this occurs, OS/2 marks the queue as good and continues to operate normally. If the application does not respond to the queue, you can try to terminate the program, or just ignore it.

Recovering from hangs and lockups
Once you realize a lockup has occurred, you should spend a moment before taking any action to observe what is actually happening. Some apparent lockups are caused by normal activities like disk swapping. When launching a program with memory already significantly overcommitted, a large amount of swapper activity will occur. Since this activity must take priority over all else except communications, the system may seem to come to a halt. Heavy disk activity might indicate that the system is engaged in serious swapping rather than being locked up.

There are a couple ways of recovering from OS/2 system hangs and lockups caused by the software types of problems we have discussed so far. First, press the Ctrl-Esc key combination repeatedly. This will normally result in a dialog box which prompts you to terminate the application. If you try this it may result in termination of the hung program and negate the need for a reboot. Be sure to wait for several minutes before proceeding with any more drastic actions because OS/2 can take a very long time to respond to Ctrl-Esc. Ten or fifteen minutes would not be unreasonable to wait on a slow (less than 486 33 MHz) system; five or six minutes would be appropriate on fast computers. Warp’s response to Ctrl-Esc seems much faster with Fixpak 16.

WatchCat
WatchCat is a shareware program written by a pair of programmers in Germany which can help in the recovery from some OS/2 lockup situations. When a computer with WatchCat installed locks up, the user generates a wake-up signal using one of several methods. A device driver is installed which allows WatchCat to respond to a user selected keyboard combination, or it can be configured during or after installation to respond to mouse or joystick input, or to switches connected to the serial or parallel ports, or other custom devices. Once WatchCat has control of the system in a full screen or windowed text mode session, the locked application can be terminated. By their nature, some types of applications cannot be terminated. The registered version can terminate more applications more “brutally”, and has more activation methods and other features.

WatchCat is intended to kill programs which are not responding to input, and therefore, to free the input queue. It seems to do a good job of this. WatchCat cannot, however, help with programs which hog the CPU by running at high priority. The program in Figure 2, for example, cannot be terminated because WatchCat cannot get any CPU cycles. I also found it necessary to use the key combinations several time to get WatchCat’s attention. The documentation suggests an alternative activation method when this occurs

If terminating the locked program does not work, you can reboot using the Ctrl-Alt-Del key combination. This soft reboot will be intercepted by the OS/2 kernel and flush the cache and buffers to disk before terminating OS/2 and performing the reboot. This is not a wonderful “recovery” but you should not lose any data and all files will be closed properly.

If a soft reboot does not work, the chances are very high that you do not have a software problem, but rather a hardware problem such as a lost interrupt. You can also tell when the system has entered a “hard lock” because the second hand of the OS/2 system clock will stop. Hardware causes of OS/2 crashes are discussed in following sections. When other attempts at recovery don’t work, you will have to reboot with the Big Red Switch.

Other causes of crashes
There are a number of other causes for system crashes. These are not unique to OS/2, however, and can affect any computer running under any operating system. The bus design of your system is important and the environment in which your system operates is also consequential.

System design
One area which can cause problems is the design of your system, particularly the system data bus. The original IBM PC data bus is called the ISA bus. ISA stands for Industry Standard Architecture – even though it is not an industry standard. The ISA bus was developed, along with the original IBM PC and DOS, based on the assumption that the PC would be used in a stand-alone, single tasking environment. As a result, the nature of the ISA bus is not conducive to a multitasking operating system.

In a true multitasking environment many different tasks can be under way at any given time. This can result in a large number of interrupts being generated. It is possible with the ISA bus for an interrupt to be lost if it happens to occur at the same time as another interrupt. A lost interrupt causes the system to hang or lock up. Many times, OS/2 hangs on an ISA bus system are symptomatic of old hardware technology.

IBM developed the Micro Channel bus for its PS/2 line of computers with a multitasking environment in mind. It is designed so that an interrupt cannot be lost, even if it coincides with another interrupt. Far fewer hangs occur on systems with Micro Channel than on ISA bus systems. The biggest problem with the Micro Channel bus is that IBM did not and does not know how to market personal computer products. The PCI bus which is becoming widespread in systems today is also designed to provide a better multitasking hardware platform than the ISA bus. Not only is it significantly faster than the ISA bus, it also helps to prevent lost interrupts.

If you have a choice, you should purchase a system with a Micro Channel bus or a PCI bus to eliminate lost interrupts. These two busses are also designed to reduce problems caused by Electromagnetic Interference (EMI).

Electromagnetic Interference
Environmental problems for computers are similar to environmental pollution for human beings and other living things. Electronic pollution is called Electromagnetic Interference, or EMI. Electromagnetic interference is caused by two types of electromagnetic fields; radio frequency fields and magnetic fields. This class of phenomena affects the system hardware, but many of the symptoms can appear to be the result of problems in the operating system. Any operating system can be affected, OS/2, DOS, Windows NT, AIX, Unix, and Windows 95.

There are a number of different electromagnetic phenomena which can cause problems for computers and other electronic equipment. Electrostatic discharge (ESD) can occur in the Autumn and Winter. Radio frequency interference (RFI) can occur near radio and TV stations, radar installations, airports, and other locations as well. Poor grounding can cause problems of its own and it can aggravate other problems like ESD and RFI.

Electrostatic Discharge
Electrostatic discharge (ESD) begins to show up in the Autumn as the moisture content of the air – relative humidity (RH) – decreases. During the summertime the high relative humidity keeps ESD at bay by draining the electrostatic charges almost as quickly as they accumulate.

Electrostatic charges are created when two dissimilar materials are separated. The most commonly recognized method for people to accumulate a static charge is to walk across a carpet on a dry autumn or winter day. The static charge accumulated on such a day is not noticeable until it is discharged to another object – usually a door knob or the computer – with the crackle of a spark which causes an unpleasant jolt.

ESD can cause a computer to crash in many ways. You may find that your computer just hangs. You may experience parity errors in DOS or Trap errors in OS/2. Windows may present a general protection fault (GPF) as the result of ESD. The symptoms will vary and the true source of the problem will be very difficult to determine.

The case in which static is discharged from your body is probably the least common cause of problems for your computer. The charges accumulated are just not that high. The real culprit is your chair. A charge of up to 10,000 volts is generated when you get up out of the chair – remember the separation of dissimilar materials. The charge is retained by the chair because the casters on most chairs are rubber or plastic – both of which are nearly perfect insulators. When the chair touches or comes in close proximity to the desk or cart on which your computer sits, the resultant electrostatic discharge can and frequently does disrupt its operation.

Simple ways to prevent ESD
There are only a couple things you can do to prevent ESD. You can also take steps to ensure that when ESD does occur, the results are as harmless as possible.

The primary and least expensive thing that anyone can do to reduce the occurrence of ESD is to prevent the buildup of static charges. The best way to do this is to keep the relative humidity in the computer room between 45% and 70%. 50% to 60% RH is ideal. The 45% to 70% relative humidity range drains away the static charge through the moisture in the air quickly enough that it does not build to a level high enough to cause a discharge.

Another way to prevent static buildup, particularly on your chair, is to use special static draining carpet and chairs with casters which are designed to drain static to the floor and from there to the ground. This is obviously a more expensive solution than keeping the relative humidity in the correct range. It may be necessary, however, to use this approach in buildings or offices in which there is no control over the relative humidity.

There is another very simple way to keep the air in your computer room moist as well as filtered. Green, growing plants add moisture to the air and filter it to remove harmful pollutants and toxins. Plants also remove undesirable ions from the air. These attributes of plants are good for people as well as computers.

Magnetic fields
The magnetic field created by your computer monitor can cause problems when the monitor is placed on top of or in close proximity to your system. All computer monitors use a CRT (cathode ray tube) to generate an image using a beam of electrons. This electron beam is swept across the face of the CRT by a powerful magnetic field which can interfere with the ability of the system to read data from the diskette drive or from any device which has a cable passing too close to the CRT. This can cause CRC – cyclic redundancy check – or other disk media errors.

Almost any type of electrical device can generate a magnetic field. Many of these devices are not computer related. The best way to prevent problems due to magnetic fields is to remove the source of the magnetic field. Move the system monitor away from other devices like the system unit. Keep the entire system away from large electric motors or CRT devices like television sets.

Radio Frequency Interference
Radio Frequency Interference – RFI – is any unwanted electronic signal that is transmitted or received by an electronic device. A computer can generate RFI that interferes with the operation of other electronic devices, just as other devices generate RFI that disrupts the computer. RFI can propagate through the air, as with radio waves, or through the power lines to the power plug of your computer. These unwanted and undesirable signals, whatever their source and however they are propagated, can crash your computer unexpectedly or initiate any number of unusual symptoms.

Symptoms of RFI can be lockups and hangs, trap and SYSxxxx errors of various types, repeated booting, CRC and disk media errors, and internal processing errors. In other words, many of the errors that can be caused by real hardware or software problems can also be caused by RFI problems that are just as real but harder to find, prove, and fix. Many times problems that cannot be traced or explained in any possible way should make you suspicious of RFI.

Sources of RFI
Nearby radio and TV stations can generate powerful electromagnetic signals. These signals propagate through the air and can be picked up by a computer. The cables attached to a computer and to peripheral devices can act as excellent antennae. The keyboard cable, the printer cable, the cables to modems and other external devices all pick up the radiated signals from radio and TV stations. If your system is located close enough to one, you may experience problems.

Powerful radar sources can also affect your system. An airport or air station can be the source of ground-based radar as well as aircraft radars and other radio signals. These radar signals can be powerful and can cause problems similar to those caused by radio and TV signals. The system entry points are cables, which act as antennae for radar signals. Microwave relay towers and cellular phones and their relay towers can cause RFI problems with computers, too.

Minimizing RFI problems
RFI problems cannot be prevented entirely, but they can be minimized by taking certain simple precautions. If you are located next to a radar installation, for example, even proper grounding and all other measures may not be enough to prevent high power radar pulses and radio emissions from interfering with your computer. The following suggestions should help to minimize problems with RFI.

One very common set of entry points for radiated RFI is loose system covers and missing card slot covers. Replace any missing card slot covers. These covers may have been left off after removing a card from the system. Be sure to always replace them when a card is removed from a system. Be sure to install the covers on your system unit or attached peripheral devices if you currently have them off. Fasten them down securely with the latches or screws provided. A missing or improperly installed system cover allows significant RFI entry into the system.

Another common entry point for RFI are the cables that connect the various external peripherals to the system. These cables act as antennae and can pick up radio frequency signals and transmit them inside the system where they can cause problems. These cables can be keyboard cable, mouse cable, serial communications and printer cables, parallel printer cables, audio cables if you have a sound card, data cables to external devices such as external SCSI hard drives or CD-ROM drives, and even the system’s own power cord.

To reduce RFI pickup on cables, ensure that each cable connector is seated properly in its receptacle at both ends of the cable and that the fasteners are properly installed and in use. If screws are used to hold the connector in place, ensure that they are tightened snugly. Where wire retaining clips are used, ensure that they are properly seated and latched. For printer cables that have separate ground wires, you should connect the ground wire to the screw or fastener provided on one end only. Connecting the ground wires on both ends can cause ground loop currents to flow that defeat the purpose of the ground wire. In the case of cables like this, the ground wire is used for shielding, and it makes its connection to the ground reference through the frame of the printer.

Check the ground
Most new homes and office buildings have adequate grounding for the proper operation of a personal computer system, but older homes and offices may not. Be aware, however, that even though a building is relatively new, there still may be problems such as loose connections and missing connections that increase the susceptibility of the system to RFI problems. If you even suspect an EMI problem, check the ground! Computers are much more susceptible to the effects of ESD and RFI when they are improperly grounded. A quick check of your computer’s ground can be accomplished with a simple electrical outlet ground checker available at most hardware or electrical supply stores. Even if the ground checks good, however, you could still have a grounding problem which can aggravate the effects of ESD,

The ideal ground wire installation is an insulated green wire at least the same size as the wires which supply the power to the outlet. The wire should connect only to a one inch diameter copper stake driven at least 10 feet into moist earth or to an equivalent copper water pipe. It should not connect to any other wire or bus at any point along its length. The connection to the ground stake should be made at a point no greater than twelve inches from the entry into the earth, and should be as close as possible to the earth.

Proper grounding is not difficult to achieve, but it can be expensive. You should definitely call a trained electrician to deal with this type of problem. Do NOT under any circumstances attempt to work on the electrical system of your home, office, or building yourself. It can kill you.

If you are having problems which no one can seem to fix, the important things to check are the relative humidity and the grounding of your computer. It would also be wise to consult with someone who specializes in resolving electromagnetic environmental problems.

OS/2 is crash protected
OS/2 does live up to the crash protected claims which IBM makes for it, but it is not crash-proof. Many of the causes of OS/2 crashes are really not problems with OS/2, but rather are the result of external factors which affect all operating systems equally. A true crash – as opposed to a single session lockup – which can be truly attributed to a problem in OS/2 is very rare. OS/2 is a solid platform which gets better with each release.

The Swap File
The swap file is an important part of OS/2 memory management. It allows OS/2 to overcommit memory so that more memory can be used by the operating system and application programs than there is RAM physically present in the computer. Pages of data are swapped out of RAM and onto the disk for storage until they are needed again, when they are swapped back into RAM.

The memory pages stored on disk are located in a file called SWAPPER.DAT, which is usually located in the \OS2\SYSTEM subdirectory on the OS/2 boot drive. The location of this file is specified in the CONFIG.SYS file by the following line, and can be changed if your boot drive is too full to provide sufficient space for the SWAPPER.DAT file to grow. SWAPPATH=C:\OS2\SYSTEM 4096 5120 The SWAPPATH statement in the CONFIG.SYS file determines the initial size of the SWAPPER.DAT file (5 MB in this example) and the amount of disk space to reserve so that the swap file cannot file up the entire disk. It also determines the location of the swap file.

Swapping Strategy
The strategy used to implement swapping in OS/2 is completely misunderstood by most people. As a result they have certain expectations about how the swap file is supposed to behave, and believe that there is a problem when it does not behave as expected.

OS/2 starts by creating the SWAPPER.DAT file during the kernel initialization. If the swap file previously existed, as is almost always the case, it is erased and recreated. The initial size of the swap file is determined by the second parameter in the SWAPPATH statment, above. Its location is determined by the path specified in the SWAPPATH statement.

Preallocating the swap file helps to ensure that it is contigous. This is especially important for performance reasons on FAT drives. Preallocation also reduces the amount of time required to move data to the swap file because the space on the hard drive has already been allocated. Again, this is especially important on a FAT drive because it can take a great deal of time to allocate disk space. This is a result of the relatively primitive structure of the FAT file system.

Swap File Allocation
Part of the OS/2 swap strategy is to attempt to allocate or deallocate space for SWAPPER.DAT on the hard drive when no other activity is taking place. This helps to ensure that, when additional space needs to be allocated, it is not done at the very time when the data needs to be swapped to the file, thus slowing the actual swapping process.

When RAM is full and existing swap file space is nearly full, the memory management portion of the OS/2 kernel allocates more space on the hard drive for the swap file even though it is not yet needed. The trigger point is when 500 KB or less of free space remains in the swap file. When that point is reached, the memory manager watches the hard drive to determine a time when it is not busy responding to read or write requests. When the drive is not busy, the memory manager allocates more space for the swap file in 1 MB increments.

This strategy for swap file allocation not only prevents having to allocate disk space at the moment the swap file needs to be used, it also prevents interrupting other applications which are accessing the disk.

Swap File Deallocation
Deallocation of swap file space is less critical in terms of its impact on the swap process. Deallocation can occur any time. It does not have to be done as a prerequisite to enable some other task to be completed. It is also desirable to maintain much of the data in the swap file for a period of time after the file was closed or the program terminated. This can prevent having to read those same files from the disk if they are needed relatively soon.

Say, for example, that I have just finished a document in a word processor. After I print the document, and while the print spooler is sending the document to the printer and the printer is printing the document, I close the program. I am prone to make mistakes, as are most of us carbon based, humanoid life forms. Since I find the error after the file is closed and the program has been terminated, I have to restart the word processing program and reload the document. It does not take very long because the application and the document have both been retained in the swap file.

Dynamic Link Libraries (DLLs) are also retained in the swap file for a considerable period of time. Many DLLs are used by a lot of different components of OS/2 and by many application programs.

For these reasons swap file deallocation does not take place quickly. Many people I have talked to in my years of supporting OS/2 assumed that as soon as a file was closed or a program terminated that the swapper file should shrink. When it did not, they assumed that there was a problem. This is an erroneous assumption.

Let us assume, then, that several files or programs have been closed and that the requisite amount of time has passed for the memory manager to begin the process of deallocating space from the SWAPPER.DAT file. When the disk is not busy, space belonging to the files and programs which have been least recently closed is marked as unused. It is not possible, however, just to whack a chunk off the end of the file. Most of the empty space in the file is now scattered throughout the SWAPPER.DAT file rather than all being nice and tidy at the end.

The task of the memory manager now becomes one of moving all of the unused space to the end of the swap file so that it can be deallocated. Over a period of time while the disk is not busy with productive tasks, the memory manager moves empty pages to the end of the swap file by moving in-use pages to the empty ones nearer the beginning of the file.

When three conditions are met, the memory manager will deallocate the space.
 * The last 1 MB of space in the file must be unused.
 * There must be 1.5 MB of total unused space.
 * There must be no new pages swapped into the file for a specified period of time. This helps to ensure that there will be no immediate need to reallocate space in the swapper file. Why deallocate space which will be needed again soon anyway?

When these conditions are met, the last 1MB of the SWAPPER.DAT file is deallocated leaving at least 528 KB of empty space available in case additional swapping needs to take place. Because of the need to meet all of these requirements and the time required to move the data around within the swap file, this process can take a long time. It does work, though. My swap file can be as large as 55 MB during a typical day. although I do observe some shrinkage of SWAPPER.DAT during the day, it usually remains over 40 MB until I quit for the evening. By the next morning, it is back down to 25 MB or so, which is just a little larger then the 20 MB default preallocation size I have specified in CONFIG.SYS.

This is very complex, but a large part of this strategy is to perform operations on the swap file in such a way that they affect the productive tasks as little as possible.

Tuning
See “Tuning the SWAPPER.DAT file” for details of tuning the swap file.

Intel memory structure
The structure of memory in an Intel processor environment is a legacy of the design inherited from the original IBM PC back in 1981. The Intel 8088 processor used in the PC had a memory address range of 1 MB. At the time, 1 MB was many times the size of the address space provided in any other personal computer. This 1 MB address space was divided by the need to access memory using a segmented memory model into sixteen 64 KB segments. The lower ten of those segments, 640 KB, were provided as user memory to load an operating system and application programs. The remaining upper 384 KB was reserved for BIOS, video RAM, and future use.

In order to ensure backward compatibility for operating systems and programs with the early computers, all Intel based personal computers since have retained this memory structure. Even the 80×86 and Pentium class processors have an identical memory structure in the lower 1 MB. Above the 1 MB address, however, the address space is unbroken up to 4 GB and operating systems and applications can access all of it unencumbered – if the hardware will support that much memory.

It is important to understand that this memory structure is not a result of the design of the 80×86 class of processors; it is an artefact of the design of the original IBM PC.

Figure 1: Memory map of OS/2 in an Intel environment.

OS/2 memory usage
OS/2 Warp can access the entire 4 GB memory address space of the Intel 80×86 processor family. This includes the 80386, 80486, Pentium, Pentium Pro, and all the DX, SX and SLC variants of these processors. When Warp is loaded into RAM at boot time, the OS/2 kernel begins loading at the lower boundary of memory. It fills the lower 640 KB of RAM first, skips the 384 KB of reserved memory address space, and continues loading into the area beyond 1 MB. When OS/2 is running, it essentially ignores the address space between 640 KB and 1 MB

Once the kernel is loaded, the rest of memory is available for use by user application programs. These can be real OS/2 programs, DOS programs, and Windows 3.1 programs. Paged access to memory

Application programs do not access memory directly under OS/2. To ensure that application programs do not use memory which does not belong to them, memory can be accessed only through the operating system. This is accomplished through the use of page tables. Warp manages the page tables, and memory, with the support of the processor.

Under Warp, all memory is divided into fixed, non-movable blocks called Pages. Warp’s memory manager keeps track of memory usage with the Page Tables. Whenever a program is loaded by Warp, the memory manager creates a page table for that program. The page table is a list of memory pages which belong to the program. The Page table entries also contain information about how the memory is being used and is the basis for the memory protection mechanism of Warp.

''Figure 2: OS/2 Warp accesses physical memory through the Page Table. The Page table entries also contain information about how the memory is being used and are the basis for the memory protection mechanism of Warp.''

The most important thing to understand about memory management in OS/2 is not how physical address are generated; rather it is how each program’s virtual memory space is isolated from the rest of the programs. Each entry in the page table for a process contains an address pointer to the base location of the physical page of memory represented. It also contains attribute bits which describe how the physical memory is used and what access to the memory is permitted. The attribute bits which are important for this discussion are:
 * Commit When this bit is on, the page of memory has been committed to physical storage, either RAM or fixed disk. The Commit and Decommit bits are mutually exclusive; only one can be on at a time, while the other must be off.
 * Decommit When this bit is on, the memory page has been assigned to the program but, because the program has not yet stored any data in the page, or used it in any way, the page has not been committed to physical memory. In this way OS/2 only uses physical memory resource which is actually needed by a program. This is called sparse allocation. The Commit and Decommit bits are mutually exclusive; only one can be on at a time, while the other must be off.
 * Execute This bit, when on, means that the data stored in the page is executable code.
 * Read When on, this bit means that the data in the page is readable. If off, the data cannot be read.
 * Write When this bit is on, it means that the page can be written to.

Access to any page in a manner which is inconsistent with its attributes causes the Intel processor to generate an access fault. OS/2 has a kernel level process, HARDERR.EXE, which handles these hardware errors from the processor. The hardware error handler determines the cause of the access violation and displays an error message for the user to see. In some cases, where the error occurs in the kernel of OS/2, an Internal Processing Error (IPE), is generated and the system is stopped.

Virtual Memory
OS/2 utilizes what is known as virtual memory. Virtual memory is provides a protected address space to each program running on the computer. From the viewpoint of the program, as shown in Figure 3, it resides alone in a computer address space which is contiguous and sequential. Because this memory space is protected, other programs cannot obtain access to it. Virtual memory can also use available hard drive space as auxiliary storage when the computer’s RAM is all in use. What this means is that when you have 24 MB of RAM in your system, and you are running programs which together require 37.5 MB of memory for executable code and data, you need an additional 13.5 MB of RAM which you don’t have. In OS/2 Warp, the memory manager uses available hard disk space to substitute for that unavailable RAM.

Figure 3: Each Application in a virtual memory system appears to have sequential, contiguous, and isolated memory space.

In fact, the physical memory resource allocated to each application may be neither sequential nor contiguous, and may not even be in RAM. Figure 4 shows that some pages of memory for a given program may be located in RAM, while other memory pages may be located on the fixed disk drive. It is OS/2, or rather OS/2′s memory manager which is responsible for mapping the virtual memory to physical memory resource. The OS/2 memory manager is also responsible for moving data from disk to RAM when they are needed for the computer to work on.

When physical RAM is filled up, and yet more RAM is needed by a program, the OS/2 memory manager transfers the contents of one or more pages of RAM into the SWAPPER.DAT file in the hard drive. Those pages of RAM are then allocated to the program which needs them. Entries for these memory pages are created in the page table for the program which needed them, and the entries show the location of the page in RAM.

When a program attempts to access a page of memory which has been swapped out to the hard drive, the processor generates a page fault and the OS/2 memory manager then retrieves the page from the hard drive and places it into RAM. If necessary to make room for this page transferred to RAM from the hard drive, the OS/2 memory manager will dispose of pages which have been marked as discardable, or will transfer pages from RAM to the hard disk.

Figure 4: OS/2 maps virtual memory to a physical resource, either RAM or fixed disk.

When any application, as a result of a programmer error or an overt attempt by the programmer, attempts to access memory which does not belong to that program, the Intel processor generates an access violation. The processor knows that the memory does not belong to the program trying to access it because there is no entry in the page table for that program for the memory in question. Warp’s error handler interprets this access violation as a Trap 000D error. When a program accesses memory which does not belong to it, the only option which OS/2 allows the user is to terminate the errant program. See Figure 1 in the document “OS/2 Crash Protection” for an example of a program which can crash with a Trap 000D error because it attempts to access memory which does not belong to it.

For any kind of problem with a program which creates an access violation, the error handler first displays a SYS3175 error. To determine the specific problem, click on the button to see the registers. At that point the specific trap error is displayed.

Ring Protection mechanism
OS/2 uses a four level protection mechanism to protect itself, the operating system from interference by other programs. This is not the same as the memory protection, actually memory isolation, which isolates programs’ memory spaces from each other. This ring protection mechanism, diagrammed in Figure 1, is used to ensure that applications cannot alter registers in the CPU or manipulate memory directly by accessing the page tables. The protection mechanism prevents this by refusing requests from programs other than a kernel level process to use the CPU instructions required to access the CPU registers and the page tables. The ring protection mechanism gets its name from the concentric rings used to diagram it.

The ring levels
OS/2 runs at ring level 0 which is the highest level of protection. Actually only the kernel of OS/2 and certain other kernel level functions run at this level. This is the most protected level of the CPU and is used only by an operating system. Running at ring 0 gives the operating system access to the privileged CPU instructions which are used to manipulate the CPU registers and the page tables. Access to these privileged instructions allows OS/2 to manage memory and to manage the operation of the CPU and the computer as a whole. It is important to note that the operating system, because it runs at ring 0, can use all of the CPU instructions available to the outer rings as well as those limited to programs running at ring 3.

Ring level 1 not used is not used in OS/2. This was a simple design decision on the part of the OS/2 architects who felt that providing access to this level would not offer any significant advantages to the operating system structure while adding unnecessary complexity. The kernel running at ring 0 can use any of the few instructions provided by this level.

Applications normally run at ring level 3. This is the least protected level, and programs which run at this level are the least trusted. That is, they are to be prevented from manipulating the hardware directly under any circumstances. In most cases, however, it is not necessary for applications to use hardware directly. The device driver structure of OS/2 provides program independent methods for accessing all hardware and so application programs normally have no need to do so.

IOPL
IOPL (I/O Privilege Level) programs run at ring 2. Ring 2 gives some few programs which need it a little more direct access to the hardware of the computer than they can have at ring 3. The sensitive instructions of ring 2 can be used by application programs to use instructions which exercise a little more control over parts of the computer like serial or parallel ports. Even with this additional level of access, however, OS/2 rigidly supervises the program using IOPL.

Any program which requires access to IOPL must first tell OS/2 that it will access ring 2. The programmer’s API which is used for IOPL at ring 2 is tightly controlled by OS/2 to prevent applications from causing problems. Unfortunately, a program written to access I/O Privilege Level can take complete control of a device; this cannot happen at ring 3. If the program does not relinquish control of the device, other programs which need access to the device, including the operating system, may appear to lock up while they wait for access to be granted. Until the program running at ring 2 relinquishes the device, all other programs must wait. It is imperative, therefore, that programmers working with IOPL at ring 2 follow IBM guidelines about programming at that level and release the device as soon as possible.

The statement IOPL=YES is required in the CONFIG.SYS file to allow programs to access ring 2 IOPL level instructions. If this statement has been altered such that a particular program which requires IOPL cannot get access to it, an error is displayed which says, “The system is not configured to support this program.”

IOPL=YES Allows I/O Privilege Level access to be granted to all programs which require it Protect ring level 2.

IOPL=  Allows I/O Privilege Level access to be granted to only those programs in the list. All other programs will be refused access to IOPL. The list consists of the names of executable files separated by commas.

Dynamic Link Libraries – DLL
OS/2 uses libraries of executable code call Dynamic Link Libraries, or DLLs. DLLs are used to improve the performance of OS/2 and application programs and to reduce the amount of disk space and RAM required to store and run programs.

Dynamic linking means that the DLLs are linked to the EXE code at load time or at run time rather than when the program is linked by the programmer. Static linking places all of the runtime code into a single large executable module which must be loaded into RAM in its entirety. Because many DLLs are designed to be loaded into RAM only when they are called by a program, this can make load time faster so that programs begin to run faster.

DLLs can be used by many programs rather than just one. Since many functions residing in a DLL are needed by many executable programs, a single DLL can take the place of static linking the same code multiple times into multiple executable files. This saves disk space and RAM because a single instance of a DLL in RAM can be used by multiple programs.

Dynamic linking can be accomplished in two ways under OS/2.


 * 1) Load-time dynamic linking loads DLLs called from within a code segment at the time the segment is loaded into memory. This causes load time to be longer but can result in improved run-time performance, especially if the DLL is used frequently.
 * 2) Run-time dynamic linking postpones loading the DLL until the code actually calls a function in the DLL. This reduces the time required to initially load the program and get it running, and also results in a smaller RAM footprint. This is appropriate for DLLs which are infrequently used because it can impact performance slightly as DLLs are loaded during execution of programs.

Warp Server SMP
Symmetric MultiProcessing is used to increase the processing capability of a single computer. By using multiple processors the amount of work which can be performed by a single computer can be increased considerably. Warp Server SMP supports SMP computers with from one to sixty four processors and provides a significant improvement in performance over computers with a single processor.

Warp Server SMP and Warp Server for e-business are both scaleable up to 64 processors in a single computer. Warp Server is optimized for 4 processors and Warp Server for e-business is optimized for 8 processors.

Under Warp Server SMP and Warp Server for e-business, Symetric MultiProcessing spreads all of the computing tasks among all of the processors in the computer as evenly as possible, so that each processor has some of the load.

Windows NT Comparison
Windows NT, in contrast to the way Warp Server SMP works, loads up each processor completely before placing any load on the additional processors which may be available. This causes some processors to do most of the work, while other processors may not do anything at all.

Windows NT is, theoretically, capable of scaling up to 16 processors, but in reality, it can only use a maximum of 4 processors effectively. Even then, based on independent third party tests, Warp Server on a single processor computer outperforms Windows NT on a 4 processor SMP environment.