OS/2 Dynamic Link Libraries

You are in a maze of twisty little passages, all alike. >USE DYNAMIC LINK LIBRARY I see no DLL here. >MAKE DLL I don't know how to make a DLL. >INVENTORY

You have: >USE TOOLS IN TOOLKIT TO MAKE DLL. Huh?
 * an 80286 machine
 * sufficient memory
 * OS/2
 * An OS/2 toolkit. In the toolkit is:
 * a text editor
 * a C compiler
 * an assembler
 * a linker

Like an ADVENTURE game, not knowing the keywords when you're trying to create a Dynamic Link Library can be very frustrating. Once you know the keywords, though, you can explore new areas of the game, and bring home new prizes and treasures.

The objective of this article is to help teach you some of the new keywords and key techniques necessary to understand and build a Dynamic Link Library of your own.

What is a Dynamic Link Library and Why Should I Care?
Well, I think the idea of Dynamic Link libraries is one of the more important concepts OS/2 introduces. Although the principle of DLLs have been around for a while (usually called shareable libraries in other operating systems), OS/2 is one which makes such a shareable library an intrinsic part of the operating system, not just an added-on, "neat" idea. In fact, the system library functions themselves are DLLs in OS/2, with a clear separation by device type, allowing an easy upgrade route.

In MS-DOS, after you compile a program, you next link it with other portions of the program and with portions from a library of commonly used routines. The end result, is a stand-alone piece of code which is loaded into memory, outstanding address references resolved, and then executed. The physical file resulting from the link contains portions of the library it used: two programs which use the printf function will each contain a copy of the library functions which comprise that ubiquitous function.

In a single tasking operating system with a sufficiently large hard disk, this really isn't a problem. In a multi-tasking operating system which allows for shared memory usage, loading multiple copies of the same code seems wasteful. OS/2 obviates this need by allowing only one copy of a given function to be loaded in memory and to have this copy shared by any task which seeks its functionality. Since the function itself is not physically part of the program file, it is possible for the executable to be rather small, and to only update the library as required. The concept of separate overlay files (and complicated linkers) is no longer needed: just include the specific DLLs required and let the operating system do the rest.

But the advantages of using DLLs go far beyond the convenience: there is a functionality to DLLs which can be exploited in many different ways. One way includes the ability for two (or more) totally separate and distinct programs to be able to share memory by simply accessing the same run-time routine. This message passing, already an intrinsic part of OS/2, can be fine-tuned with DLLs to fit the exact needs you might have in a complicated environment.

This is not without a price though: there are some "tricks" to writing an operable DLL, and some cautions and caveats. I discovered some of them the hard way while preparing an example of DLL usage for this article.

Of course, if you wish to avoid the supposed complexity of using these useful techniques, there is nothing in OS/2 to prevent you from using the standard, and more familiar, linking techniques of the past. Except, perhaps, the knowledge that There Is A Better Way.

A Dynamic Link Library is not simply a new way of linking an old library. There are some intrinsic differences between the two techniques. A look at how statically linked libraries are linked into your code will help in understanding how the new DynaLink approach differs, and why it is a better way.

Static Linking
When you compile a standard C program, the resulting output of the compiler is an object file. The object file contains a number of different types of records. There are different record types for procedures and routines, externally accessible variables, local stack variables and so on. Each unique type of item has a unique record type associated with it.

There is a also a unique record type which indicates where the code for a routine starts. Another unique record type has the name of the routine and a pointer to that routines code. Some record types indicate that the routine requested in not found, and hence must be external to the object module. Other record types indicate that the object is an externally located data item. And so on.

The important thing is that each call to a routine which can not be resolved within the particular source module is changed into a parameter which simply involves an "external item" record. Helping out the compiler, you can specify the call type as being a near or far routine, or a near or far external item of some other type.

After you've finished compiling all of your various source modules into their object modules, you next link them together, along with some appropriate libraries, and end up with an executable piece of code. What does the linker actually do, though?

The linker examines each object module it sees (usually in the order in which they're presented) and keeps a list of all record types which either indicate a request for an external item or which define a global item. Then as it sees record types which indicate actual code routines, it determines that routine's placement and resolves all calls for it into an actual address within the eventual output file. Far calls, of course, indicate not just an offset within a 64K segment, but allow additional segments to be addressed, which in turn allow much larger programs to be created.

There is nothing intrinsically foreign to the compiler in the concept of mixed memory-model code as long as it knows how a routine will be called. The compiler will generate a far return for routines defined as far calls, and near returns for near calls. Addressing of "far" data items are resolved in a similar way: the compiler puts out a record type which the linker can understand and resolve into an actual segment and offset (there's an extra step for the actual loading and executing of the code, covered below).

Whatever items are not resolved within the linking of the various object modules are next searched for in the libraries. These libraries are, basically, object modules with nothing except local references resolved. A module in the library starts off as a simple object file usually, and then is stored and indexed into a library as an entire unit: it is not stored on a routine by routine basis, but rather on an object file by object file basis.

The appropriate routine or external item is found in the library, the module is then pulled from the library and inserted into the executable form. All references to it are resolved and the process continues. An important consideration is that the object module originally loaded into the library as one unit is pulled from it as one unit as well, even if only one of the functions specified in the routine is referenced.

The end result is a totally self contained image out on disk. This image is loaded at some address (called the base address) when you run it, the base segment address is added to all of the other segment addresses throughout the code in the mysterious load routines, and then finally, with a simple call or jump, your program is executed.

That's basically how static linking works, with the more technical details glossed over.

What are the differences with Dynamic Linking, then, with the idea of statically linking an already existing library? Well, the differences in the process are not all that substantial. The end result is, though. And because of that, the conceptual design of the Dynamic Link Library is different, as I describe below.

Dynamic Linking
With a "normal" library, you compile all of the object modules you'll need, then use a librarian program to create a library. The library itself is in some strange format, suitable only to linkers and librarian programs.

Things are a little different with DLLs, though. First, there are two separate link steps. You must link the constituent object file members which form the DLL together, and then link your own code with the resulting DLL. Creating the DLL itself, however, requires a bit of work:

After you've created the object files from your source, you use the normal linker to create the DLL (plus a special file described below), and its format is really no different than a normal EXE file (really: it even has the 'MZ' as its first two bytes!). It is therefore admirably suited for the standard system loader to load as if it were actually a program. Later on the system will, basically, do just that for the initialization routine. Typically, the new library will have an extension of DLL.

The special file, described fully below, is called a module definition file. It describes the external interface for each of the accessible routines: their public names and their attributes. Anything not specifically mentioned in the DEF file can not be accessed routinely by an outside program. This definition file is called the "export module definition file".

By running the export DEF file through a program called IMPLIB (Import Librarian), a special library file can be created. This library file is conceptually similar to the "standard" idea of a library, and hence has the LIB extension. (See Figure x)

An option to running the special file through the IMPLIB program is to create what is in essence the *inverse* of the export DEF file. Such a file is called an "import module definition" file. It, too, has the extension of DEF. (See Figure x)

When you link the DLL with your own code, the linker sees the special record format of the import library (the LIB file created by IMPLIB), or reads the import DEF file, and creates special records which are understood by OS/2's program load facilities.

The end result of a link which uses DLLs is a hybrid file. It can be considered as if a partial EXE and a partial OBJ at the same time. A compiled object module will resolve local variables and routines into a segment and an offset, leaving external references virtually undefined. The Dynalink program will have result in an EXE coming from the linker with its external references to DLL routines effectively unresolved.

At this point, I'm going to stop referring to 'segments' as such, and start calling them 'selectors': DLLs are applicable only in protected mode OS/2, after all.

Part of OS/2's program loader recognizes that the EXE it's about to load contains DLL calls. Finding these records causes a lookup on an internal table to determine if the DLLs has already been loaded. Now, each module in the DLL can be defined as a "load at runtime" or a "load on demand" module. Regardless of this definition, a selector is allocated for each module and all references to those modules are now resolved into a selector and offset pair. If the module has been defined as a "load at runtime" module, then the actual code for the module is read from the file, loaded into memory and any outstanding linkages resolved.

[Sidebar on alternative "manual" approach to locating and calling DLL routines.]

A brief mention of why protected mode is a handy thing: consider what happens if a "load-on-demand" function is called before that selector points to valid code: a page fault occurs, and the memory management module can easily resolve what the problem is, load the appropriate code, and allow the program to continue operating as if nothing had happened! Subsequent calls to routines within the same selector would operate without a page fault. Once the page fault mechanism (an intrinsic part of OS/2 and protected mode applications) has been enabled, it is virtually transparent whether or not a requested page exists in "real" memory or in virtual memory.

The 80286 and 80386 chips have a table within them, called the Local Descriptor Table (See Figure x), which holds selectors, and the characteristics of these selectors. There is a an LDT for each of the processes currently running. If a process attempts to access memory using a selector not within their LDT then hardware will cause a fault to occur: effective hardware protection of memory space.

The GDT, or Global Descriptor Table, is similar to the LDT, except that all tasks may access the selectors (and their associated memory) contained therein. Although this seems a simple way in which to make a selector and its data space accessible to multiple processes, OS/2 does not use the GDT for shared memory access. Instead it makes an entry into the LDT of each process. [Why is this?]

When a request is made to OS/2 for memory allocation, the type of memory (shared or non-shared) is included in the request, and an entry made in the LDT for all processes allowed to share this memory.

the following paragraph is not necessary

Only the kernal (Ring 0) code may write to these tables, however. (device drivers also run at Ring 0, so they'd have write access to the descriptor tables as well, but we'll save that for another article).

So...What's the Big Deal?
Well, so far, functionally, a relatively efficient mechanism exists for linking in routines as required at run time instead of just once at link. All automatically and transparently, of course, but what are the advantages of such an ability? There are quite a few. First swapping the DLL routines in and and out of memory becomes pretty easy: the LDT has a 'present' bit which indicates whether the requested segment is in memory or not.

If not in memory, a page fault occurs as described above, and the swapped out DLL routine can be brought into 'real' memory. Since the selector itself is but an index into a table which contains real address information, the individual DLL modules can end up anywhere in memory. Transparent to your own code, of course.

Program code without some data space associated with it is a rarity: pure code can't manipulate items, although often useful in purely mathematical routines. The 8088/8086 family of processors used the data segment register to address its data space. The 80286/80386 family of chips requires data to be addressed through a selector as well. And, the information for the data selector is also stored in the LDT (See Figure x).

By setting of the appropriate bits in the LDT entry for a given selector, its associated memory can be made private or publicly accessible, or it can be set so it may be written to or is a read-only piece of memory. Data selectors can even require a certain level of privilege in the code attempting to access it. Any "illegal" operation will cause a fault to occur, and the OS/2 is able to deal with the faulting process as required.

This means that, with the LDT set properly, memory can be shareable between tasks, memory can be protected from illegal or erroneous access, and other interesting memory usage and control techniques can be enabled.

As such, the DLL can be controlled and fine tuned in a variety of different ways. This fine tuning is done through the DEF files mentioned above.

Defining the DEF File
There are two different types of DEF files. One, the EXPORT definition file, is used to let the world know what the various entry points and characteristics of these entry points are. The IMPORT definition file indicates what functions from the DLL will be used and therefore should be linked at run time. There is also the IMPORT library created by processing the EXPORT definition file through IMPLIB. Let's look at each piece separately, with a list of the available features and options handy (See Figure ?). All these options, by the way, must be entered in the appropriate DEF file in UPPERCASE. [Why is that?]

The DEF Files: Showing Your Face to the Outside World

LIBRARY Statement

The EXPORT DEF file really only requires a few fields. The most important required field is the LIBRARY field. This defines that this is a DLL Export definition file, instead of a "normal" application DEF file.

LIBRARY [name][init_type]

The LIBRARY statement must be the first one in the DEF file, allowing the linker (and IMPLIB) to have a bit of a head start on what is about to come. The first argument [name] to the LIBRARY statement is the eventual output name for the created DLL. The extension of DLL is used unless you specify a different one.

When the DLL is first loaded, there may well be some things you'd want to be initialized (setting certain data items, assuring certain system resources are available, etc.). Each DLL has the ability of having an initialization routine which will be called when the DLL is first loaded, or upon each invocation of the DLL. [init_type]allows you to specify if you want the initialization routine called each time the DLL is invoked (INITINSTANCE) or only once, when the DLL is first loaded (INITGLOBAL - the default).

NAME [appname][apptype]

In a like manner, if the linker sees the NAME statement, it understands that you are creating an application and not a DLL. The NAME statement allows you to specify whether the application is WINDOWS compatible and, if so, whether it is capable of running in real mode or protected mode. If you specify WINDOWAPI as the second argument, then WINDOWS is required by this application in order to execute. Specifying WINDOWCOMPAT means that it is not only WINDOWS compatible, but can also run in its own screen group under OS/2. Finally specifying NOTWINDOWCOMPAT indicates that the application requires its own screen group when running, and is the default if you specify nothing.

NAME allows you to specify (with [apname]) the name the application shall *** shall ??? *** have after linking. The default extension, naturally enough, is .EXE.

CODE [load][executeonly][iopl][conforming]

All code segments within the DLL will share a similar set of attributes unless otherwise specified. The default set of these attributes is set with the CODE statement.

There are several other optional parameters allowed (but ignored) in the CODE statement for compatibility with WINDOWS.

The [load] parameter indicates if you wish the segment to be automatically loaded upon DLL invocation (PRELOAD) or to wait until the segment is actually accessed with a call (LOADONCALL - the default). In an application which may have large areas of the code which might never be called, there is no real need any longer to load those library calls into memory all at once. If they're called, then they'll be loaded automatically if the LOADONCALL option is specified. Once the routine is loaded, it will stay loaded in memory (except for swapping out to disk, of course).

If you use the [executeonly] option to specify that other processes can not read this segment (by using EXECUTEONLY), then, even though the LDT marks the selector as accessible on a global basis, it may not be read (or treated like a data segment selector) by any process without the appropriate privilege level required. The default (EXECUTEREAD) allows the memory allocated to this selector to be read for purposes other than for execution.

Only code segments with a high enough privilege level may access the hardware directly. You may specify that a segment has this ability with the [iopl] parameter. The default (NOIOPL) makes sense: unless otherwise specified, an attempt to access the hardware (such as the comm port directly) will cause an immediate fault to be taken. In the case where you allow a segment to access hardware directly, include the IOPL parameter in that your CODE line (it is probably a better idea to specify IOPL only when required, see the SEGMENTS statement below)

OS/2 still requires that you make a system call in order to request the privilege of hardware access. [Why is this? How many processes can have hardware access at the same time?]

A brief description of how the Privilege Level in OS/2 functions is important to understanding the implications of using the IOPL parameter.

The 80286/80386 chip prohibit direct transitions between code segments of a differing level of privilege. The default privilege level for application code in OS/2 is Ring 3. Ring 2 code segments may access hardware directly. The only way to transfer from one privilege level to another is through what is called a call gate. A call gate has a specific selector type in the LDT and an actual "destination selector" which is the selector belonging to the actual code segment of the privileged call. Additionally, the gateway has its own attendant privilege level and may only be called by code segments of the same privilege level.

Conceptually, when a call is made to a privileged routine, it passes through the call gate before passing to the privileged routine. Since the only way through the call gate would be with either a CALL instruction (going into the routine) and a RET (coming back from the routine), the call gateway concept provides for an extra level of code security, but at the cost of some additional hardware overhead. Each transition via a gateway causes parameters on the stack to be copied to a new stack, another interesting security feature of the 80286, since a lower privileged program could manipulate the return address on the stack otherwise. See the EXPORTS statement below for more information on that requirement.

An IOPL'ed routine also uses up an additional slot in the LDT table. Although the LDT table has 8K possible entries in it (each LDT entry takes up eight byte, so an entire segment has been allocated to the LDT in OS/2), 5K of those are reserved for OS/2 itself. That leaves you with only about 3K LDT entries. Probably enough for the foreseeable future.

The final parameter in the CODE statement allows you to specify whether the segment is a CONFORMING or NONCONFORMING segment. This also deals with the IOPL privilege level and can be pretty confusing at first. Consider it to be the inverse of the gateway approach.

Normally, a segment will execute with the privilege level of the calling segment. However, there are time when this might not be appropriate: consider a Ring 3 communications protocol checking routine called from a Ring 2 device driver. In this situation, you might not want to allow the the protocol checker to operate with the higher privilege of its calling segment. The default case, the NONCONFORMING parameter, would cause the Ring 3 routine to execute at Ring 3. Set to CONFORMING, it would execute at the privilege level of the routine calling it: the device driver running at Ring 2.

Data Space Definitions
Just as code segments have a method of setting default parameters, the DATA segments also allow certain parameters to be set. This is done with the DATA statement.

The DATA statement shares some of its parameter list with the CODE statement. This makes a great deal of sense since these parameters describe how to make the default settings for each data selector in the LDT. The format of the DATA statement is therefore very similar to the CODE statement: DATA [load][readonly][instance][iopl][shared] [load], as above, indicates whether the data segment should be loaded upon first invocation or load should wait until the first access to the selector's address. The default condition is LOADONCALL, however you can specify invocation load with PRELOAD.

[readonly] allows you to determine if the data segment is allowed to be written into (with the default parameter of READWRITE) or whether it should be protected against write access (READONLY). Attempts to write to a READONLY segment will cause a hardware fault.

[instance] allows you to specify whether or not this data segment (the DGROUP data segment in most cases) should be automatically allocated upon invocation, and if so whether there should be one copy allocated for the entire DLL (SINGLE, which is the default setting for DLL's), or whether each instance of DLL usage should have its own automatic data segment allocated (Multiple, and the default setting for applications). If no automatic allocation is required, then the parameter should be set to NONE.

Each data segment can also have its own IOPL level. This allows you to set the minimum privilege level required in order to access this data segment: setting the [iopl] parameter to IOPL means that only Ring 2 and more privileged levels are allowed access to the data segment. The default, NOIOPL, allows Ring 3 code segment routines to have access to the data affiliated with the data segment. This allows an interesting interface to be created between IOPL'ed segments and non-IOPL'ed segments through common shared memory: like passing a message through a keyhole.

Finally, [shared] allows you to determine whether a data segment marked as a READWRITE segment may be shared among different tasks. If it is marked as shareable, then only one segment is allocated at load time, and any process with privilege level sufficient to write to it may do so. The default, NONSHARED, does not allow write access to a common data segment and causes a separate copy to be loaded for each instance. If a data segment is marked as READONLY, then it is shareable by definition.

Segment by Segment Parameters
Unless otherwise specified, code and data segments have the attributes you set in the CODE and DATA statements as described above (or their pre-defined default values if you don't describe them).

However, using the SEGMENTS statement, you may specify the individual characteristics for a given named segment. Using the fields as specified above, the format for the SEGMENTS statement is: SEGMENTS [Tony:the following should be on one line, indented slightly] [CLASS 'classname'][load][readonlyexecuteonly] [iopl][conforming][shared] The [CLASS 'classname'] is an option which allows you to specify that the parameter (which is required) be assigned to the class specified. If you don't specify a classname, then the 'CODE' classname will be assigned. [What happens to the DATA segments which aren't named?]

Other arguments to the SEGMENT members are as outlined above.

EXPORTS Statement
The EXPORTS statement is the only method of letting the outside world know about the routines of the DLL (the EXPORT statement is only applicable to DLL's. See the IMPORTS statement below for application requirements).

Unless specified by inclusion in the EXPORTS section of the DEF file, a DLL routine is invisible to applications. The full format of each line within the EXPORT section is: [=internalname][@ordinal][RESIDENTNAME][pwords] A name used internally within the DLL need not be the name the outside application world knows the routine by: you can specify the outside name as different from the internal name easily. This allows you to have a class of functions each serving a similar purpose and then to categorize them if you wish with a meaningful prefix.

If you wish, you can allow access to the function by its ordinal (or the routines library "slot" number) instead of by its name, by specifying the desired ordinal (obviously unique for the DLL) preceded by an '@' sign. If you do, lookups will be faster at load time, and less space will be required for the in-memory search list.

If you do use the [@ordinal] option, then you may have to consider using the [RESIDENTNAME] option as well: normally, if an ordinal is used, then OS/2 will not keep the specified external name available. If you're not using the ordinal parameter, then OS/2 will keep the name resident in its search tables.

If you've included usage of any privileged functions in your routine, you'll have to let the linker know how many words to reserve for parameter copying by using the [pwords] variable. Since a calling task will have its own parameters copied as it passes through the gateway, you have to reserve that space now.

IMPORTS Statement
The imports section allows you to specify which external DLL routines you require in your application (although a DLL can import functions from another DLL, ad infinitum). The format of a line in the IMPORTS section is: IMPORTS [name=]. Again, like the EXPORTS lines, you can specify a name your routine uses when it is trying to resolve external routines. You could, therefore, create a debugging DLL and a "normal" DLL and be able to link between them only by changing the or the associated with the named routine. is the name of the application or DLL which contains the desired which was specified in the EXPORTS statement for the DLL. The can also be an ordinal number.

If the optional [name=] parameter is not specified, then the default name the routine will be "known" as will be the same as . You must specify an internal name, however, if you've specified an ordinal number instead of an.

Other Statements
There are a variety of other statements which can be included in the DEF file(s). They are described in Figure xDEF.

Using the DEF files
There are two specific ways in which the DEF files can be used: first, just including them on the command line to the linker, and second, passing them onto IMPLIB.

IMPLIB is the Import Library Manager utility, a standard part of the developers toolkit in OS/2. If you're creating the DLL and the application to use the DLL, you don't have an absolute need for IMPLIB, since you can create the EXPORT and IMPORT library definition files as you desire. However, if you're creating a DLL for other applications to use (perhaps a commercial functions library, or perhaps a replacement for an already available product you produce), then IMPLIB should be part of your development cycle.

IMPLIB takes a definition file for input and produces what appears to be a simple LIB file for output. This then allows you to include the LIB file in the link step. And allows you to include multiple DEF files into one LIB file, too. Assuming you had two DLL's, called COM_INP.DLL and COM_OUT.DLL each with their associated DEF files. You could specify: IMPLIB COM_STUF.LIB COM_INP.DEF COM_OUT_DEF and then simply distribute the COM_STUF.LIB and the two DLLs, keeping the internal details of the DLLs to yourself.

A DLL Example
In attempting to create a DLL for this article, I ran into a number of difficulties. Some can not, by the very nature of the DLL and multi-tasking software, be resolved: deadlock can occur in DLLs just as they can in other types of software.

Think of the required aspects of a simple multi-session process such as a "chat" facility: multiple copies of the same process running, each of which occasionally generates a message, which is added to some internal queue. Each message generated must be collected by all other processes before it can be erased from the queue of outstanding messages, and each such message must be displayed, eventually, by each process. Finally, each of the processes must be able to "login" or "logout" from the chat session, and each must have some type of unique identifier.

I've designed such a facility as a method of demonstrating some of the unique abilities and problem spots of using a DLL as the "glue" which holds a multi-process concept like this together. Is it useful? Well, perhaps not on a single-screen machine, but if the output were to a number of communications ports, it might be.

Er... one aspect of this code should be brought to your attention before you start reading. All of the problems inherent with this code design can be readily and easily solved using an approach which includes the OS/2 system resource of queues. Why wasn't that approach used for this article, then?

Primarily because it wouldn't have required the concept of using DLLs!

StepWise Design
One of the underlaying advantage of DLLs which makes them useful in this application is the ability to not only have private and shared memory, but the ability of separately compiled and executed tasks to utilize the same code at the same time. In essence, there is nothing to prevent one of "users" of this chat code from following the coding conventions I've created and creating their own user-friendly interface (the bane of spiffy- concept-designers everywhere). In fact, there is no reason why differently designed front ends couldn't be used for each session.

Starting with a concept like that, I designed this code using a majority of the capabilities in the DLL.

One of the abilities of the DLL is to provide for initialization code which will be executed either upon just the first invocation of the DLL, or upon each invocation. This initialization routine is called before the process itself starts to run. This DLL only calls its initialization routine the first time, so the EXPORT file for it contains the INITGLOBAL parameter. Since this is the default condition, if could be excluded, if you wished. The routine I use in this DLL is a simple one, merely setting certain default conditions and allocating some required queue space.

First, there is a login procedure. The login procedure must advise the library code that another consumer and provider of messages has suddenly appeared. To make things easier, the login procedure returns some user identifier to the process: it becomes useful to include an ID when generating new messages, when consuming old ones and, of course, when logging out.

When the DLL sees the login, it also allocates and assigns whatever global and local objects and structures are required for the new process. A choice had to be made in the design as to where the actual allocations of memory would be made, since the memory could be allocated either in the DLL (becoming, in essence, a hidden object from the "client" code) or in the per-process code itself. There are advantages to having a DLL routine allocate memory which is globally accessible to all processes but which only the DLL routines know about.

Additionally, a login causes each message already in the queue to appear unread to the newly logged in task. Later, when requests are made for an outstanding and unread message, these messages will be returned.

The general design of the DLL causes a sharing of the "cleanup" task on each call to the "get a message" routine. When a message is passed to the DLL, it is added to a queue - a structure which includes a flag word with one bit for each session. A mask word with a set bit for each empty task slot is used for the initial value of this flag word. The current task ID is then or'ed in, allowing the sender of the message to indicate it has already received the message.

When a process fetches a new message, it sets the bit in the flag word to indicate that it has fetched this message. Then, when that flag indicates all processes have gotten a copy of the message, the message can be removed from the global queue. Each process therefore has to have the ability to manipulate that queue directly or must call a routine which has that ability.

I've opted for a more modular design: using a routine to specifically remove the message from the queue (or to add a message to the queue) allows me to isolate the queue itself. Although the queue resides in global memory at this point, perhaps in the future it might reside on some node on a network, or some memory device which might require a higher privilege level? Therefore, isolating the routine which physically modifies the queues is a good idea.

Since there isn't a human attached to each of the sessions, I have each session send a message only after a random amount of time has passed. And, just to keep things interesting, there is a suitable sleep period whilst the imaginary typist is "entering" his or her message. This allows messages to build up in the queue. Whenever the sender is not "typing" or "sending" a message, it is executing loop which constantly seeks the outstanding message count. Blocking on a null message count would prohibit the sender from sending a message. Of course, OS/2 provides the ability of having two different threads, one of which could block on a null message count within the DLL, but that is not within the scope of this article.

Displaying of messages received takes place on a per-process basis. This can cause problems when the session does not currently have screen access. Eventually, when the internal queue for the process fills up, not having access to the screen will cause it to block. When a process blocks, it stops fetching messages from the DLL queue. Eventually that queue will fill up. When it does, another session will block when it attempts to add a message to the queue. This condition can cascade until all sessions are blocked.

Therefore, before any session sends a message, it checks to determine if room exists in the queue. However, OS/2 is a multi-tasking operating system. Therefore, a routine must not be interrupted between the time it determines there is room on the queue and the process of actually adding the message to the queue. Two specific alternatives exist to get around this problem: the first is to call DOSCritSec, which prohibits the given task from being interrupted by any other system process - rather drastic, and inherently ugly.

The other, and the one I used in DLL_CHAT, was to setup a globally accessible RAM semaphore and to assign the semaphore immediately upon entry to the "add a message" routine. Other processes attempting to add a message would programmatically block on this flag and would wait in the loop for it to free up, or for a certain amount of time to pass. If the flag didn't change within the specified time-out period, then an error condition would be returned to the calling task.

I used a little trick here which the optimization of the MSC 5.0 compiler makes easy. I set the initial value of the RAM semaphore to 1111111111111110. With a simple right shift of one bit position, I can simultaneously read the current status of the semaphore as well as reserve it for my own usage if it is not in use. When I grab the semaphore, I immediately set it to all 1's (since the right shift causes the topmost bit to be set to a zero in 80286 architecture), forcing subsequent right shifts to not only see the semaphore is in use, but to do so without having to "turn off interrupts" or indicate it is a critical section.

This will only work when the word is right shifted in place: if your C compiler does not generate this as an in-place shift, then this will not be a safe way for you to manipulate the semaphore. It's an easy operation to do with an assembler routine, though, in any case. [Reed - Which optimization switches cause the SHR directly to memory?]

Finally, the logout routine. When the session gets a quit command from the keyboard, it immediately passes control to the DLL logout routine. This sets the above mentioned RAM semaphore, then proceeds to loop through the outstanding message list. For each outstanding message, it sets the flag as if the process had already received the message. After each flag word has been so set, it is examined to determine if it has been read by all processes. If so, it is removed from the queue.

Each message on the queue is a member of a linked list, and it's memory is allocated from the global memory pool. When removing a message from the queue, the pointers of the other messages it points to are modified to point to each other, then the memory is deallocated.

Well, that is the basic design of the DLL_CHAT program. Now for the bad news.

Caveats and Warnings
It's not really as bad as all that, but there are a few things you have to be aware about when you're designing your DLLs.

Above I mentioned some extraordinary lengths I went to in the original design to assure that certain areas of the code are protected against two "competing" tasks attempting to access it at once.

This is a problem inherent in any multi-processing system. Typically, it's called a "re-entrancy" problem, that is, a piece of code being entered by a calling process before another process has finished with its call. Using semaphores, as I did, is effective in most circumstances. But, the method I chose was not the optimal method.

Consider what happens if the session currently executing the semaphores routine happens to be interrupted by some high priority event (perhaps a keystroke, or (if attached to a comm port) the modem losing carrier). There is no guarantee it will return to where it left off. Yet, if it doesn't return and finish the routine, then the semaphore will forever be marked as in use.

OS/2 does, however, provide an alternative if you use one of the system semaphores. The semaphore is created with a DosCreateSem call, which returns a semaphore handle (similar to a file handle). By using other semaphore calls, a process can effectively keep the re-entrancy problem from occurring. In the event that a process who "owns" the semaphore at that time (and therefore is blocking others waiting on it) gets killed for some reason, even unintentionally, the system will effectively call DosCloseSem, which will clear the semaphore if set and restore it as a system resource if there are no other references to it.

In this application, I chose not to use system semaphores, since there would be frequent system calls with heavy overhead, and the likelihood that a process would be killed unintentionally was pretty small. However, this also meant that I had to insure that a client program "dying" would die only after relinquishing control of the semaphore.

Therefore, I use the OS/2 system call to add my own specific routine to my exit list, that is, the list of routines which OS/2 will execute on my behalf between the time the client program dies, and the time it is buried. This routine simply calls the logout procedure, which in turn will reset the system-wide flagword, the bits in each message, and finally cleanup the message base and any outstanding semaphores.

When designing a DLL, you should always keep in mind worst case scenarios: what would happen if this line of code were running while ten other processes were running *those* ten different lines of code. Since you can not really control what the other processes might be doing as they start to execute common areas of code, it is better to design the code as modularly as possible, and be sure to semaphore around areas sensitive to multi-tasking happening at just the wrong time. Chances are that it will!

Remember that, not only must you program defensively against other processes using the DLL routines and their attendant data, but if you opt to use OS/2 threads, you'll have to protect against their re-entrant usage of the DLL routines (in fact, most of the considerations I'm advising you of regarding DLL's can also be of importance when designing a threaded program).

When speaking about unanticipated or asynchronous interruptions, you should be thinking about signal catching. And about not doing it in a DLL!

If you're going to use the system to set a routine to catch a particular asynchronous event (such as program termination, or control-C trapping done with the DosSetSigHandler system call), doing it in the DLL can be dangerous. The concept of "resource" is the one which plays a critical role here. The question is, who owns the "resource" of a signal catcher in a DLL? Remember that the code is re-entrant, and that trying to determine the death of the last client member for the DLL can be tricky: especially if the signal catcher for client process termination is within the DLL itself. [Why? What portion of OS/2 doesn't allow it?]

On a similar basis, it is probably a good idea to stay away from the DosError (which allows a process to suspend hardware error processing), DosSetVect (which, lets your exception handler be called when certain conditions, such as attempts to execute an illegal opcode, occur).

If you must include such calls in your code, be sure to thoroughly isolate those portions of the code from other client members of the DLL's, and to preserve all aspects of your process state. Be sure to terminate "normally", too, not in some unique way, since DLL's have some special characteristics which are taken care of properly in automatic exit list processing upon client death.

Ramifications of what happens if you "signal out" of a DLL instead of "normal" termination include the possibility that the "active" count of the selectors which the DLL has used will not be updated properly. The DLL may still be considered by OS/2 to have some client members accessing it, since Process Termination was handled by a signal handler of your own design which doesn't know how to update the DLL active client count. [What Signals Apply on a global basis versus local? What happens to a Signal Catcher when it's owner dies?]

Design Considerations
When designing your program to use DLLs, there are a few things you'll have to be careful of in your initial program design. First, access to all DLL routines is through a far call. So, although you can use the small memory models if you wish in the client section of your code, and in the DLL itself, the external definition of the DLL routines must indicate it is a far routine. As such, the routine itself must also indicate in its prototype that it is a far routine: otherwise the CALL and RET statement types won't match.

What about the data allocated in the DLL? That, too, must be addressed as far data from the client routines. Locally, within the DLL, it may be addressed as near or far as required.

Before your client code ever executes, the initialization routine for the DLL will have already executed. Expecting any initialization by the main routine in your client code would be premature. Therefore, your DLL initialization code should only access data within the DLL itself, since the startup code may not have even allocated memory as of yet! [Exactly where does the DLL init routine get called from?]

What of the differences between global and instance data items? Well, obviously, they can be confusing concepts, since each DLL module has no easily method of determining whether the data space it is using is private or common to all tasks. This can be tricky, since many programmers routinely use temporary pointers to objects which they place in "global" data space instead of allocating it on the stack for local usage.

It is important to recognize the differences here between global data (such as items defined and allocated outside the scope of any routine in 'C') and globally accessible data. In the first case it really is "local" data, that is, data local to the client process itself and not accessible to other clients of the DLL. In the second case it is accessible to all clients of the DLL. And that can fool you if you're not careful: you must be sure that globally accessible data items don't change value when you're not looking! Keeping items to a local stack frame is probably the safest bet. Items which are kept around without changing value are best kept in private client data space.

You can easily indicate through the DATA statement in the EXPORT file which data segments you wish to be allocated on a private per-client basis and which ones you wish globally accessible. If a segment is marked as READONLY, then it is globally accessible. The MSC data group named CONST should always be marked as READONLY: that allows for only one copy of literal strings to be loaded for the entire DLL.

This brings up another interesting topic: using a C compiler to create DLLs. There were some rumors floating about for a while that this was impossible since the stack segment (SS) did not equal the data segment (DS) upon entry into a DLL. Since the library has many routines which expects them to be equal, it at first appeared that creating DLLs in C was blatantly forbidden. Not so!

Using Microsoft C to Create and Use DLLs
There are several specific enhancements available in the MSC 5.1 C compiler which make writing C DLLs very easy.

A new pragma, #pragma data_seg, allows you to specify for any function that later loads its own data segment, exactly which data segment to use. By specifying the data segment as: you not only make things easier for using DLLs, but you have more control over which data segment all initialized static and global data will reside in. The default data segment name if you don't specify one is the one used by DGROUP, which depends upon the memory model you use.
 * 1) pragma data_seg (segment_name)

This is half of the solution of which data segment to use in the DLL. The other half is to specify the called function as one which uses the previously saved data segment with the _loadds keyword. Upon entry into a _loadds function, the current DS register is saved, the last one specified in the #pragma data_seg is written into it, the function executed, and the saved DS restored upon exit. This is not such a new concept, since you've had the ability to use /Au as a compiler option for quite some time now, but this allows you to specify some capability on a function by function basis.

In order for the compiler to know, in advance, that the routine is going to be part of a dynamic link library, the new keyword _export has been added. In particular, if the function is one with an IO privilege level associated with it, then the number of words to reserve for the privilege level transition stack copy operation can be easily calculated at compile time if the _export keyword is used. In fact, if you use the _export keyword as part of your function definition, the number of words to reserve as indicated in the DEF file is ignored. [Is this TRUE? I haven't figured out how to verify it yet...]

When setting up the various data segments into their constituent types (SHARED, READONLY, etc), you should also take a look at the map file produced from the link: some additional segments might be created which you hadn't thought about. In particular, some NULL segments are created for each group as _CONST, and _BSS. In order not to confuse the linker, each member of the group should be specified within the SEGMENTS section of the EXPORT DEF file, and you need only mention the "special" segments: those with attributes different from the default setting of the DATA statement. [Is this true?]

Creating the Initialization Routine for Your C DLL
Remember that the DLL, once passed through the linker, looks much like an EXE file. In fact, the same load routine used for your own client module is used to load the DLL itself. And, if you've defined an initialization routine within the DLL, it will be executed almost as if a stand-alone routine: called immediately after the DLL is loaded, it is only called (if you specify so) upon subsequent loads of the DLL.

You can easily tell the loader where the initialization routine is located by including a small assembly language routine as part of your DLL, and linking it into the DLL when you do its link. In fact, it probably is not a bad idea to have a module similar to the one in Figure xINIT, and to always name your DLL initialization routine the same. The secret of the initialization routine? Simply the fact that the only "program" the loader will find is the one which is addressed by the 'END START' directive!

The MS C compiler throws a small monkey wrench in your path, as well. Meaning to be helpful, the compiler throws a usage of the _acrtused variable into each object module. This forces the linker to be sure to include some of the startup routines from the C run-time library into the eventual output of the linker (which the compiler thought was going to be a normal EXE file). To prevent this code from being loaded into your DLL, you should define the variable yourself, as external data in a 'solo' segment: int _acrtused = 0x1234; or some number of particular meaning to you.

Additionally, in order to have global data items show up in the named segment for the particular object module you're linking, it should either be initialized, or declared as static. Or both. [What about using const?]

When writing your own DLL, you'll also want to use the -Gs switch on the MSC compiler to disable stack checking. Aside from the slight added efficiency you'll gain (slightly smaller code and one less function call per function), this is a requirement for the DLL since the stack segment is different for each client process and the size of the stack may vary on a per client basis as well. In the few places where you really need to add stack checking, MSC provides you with an abundant set of #pragma's and routines.

MSC 5.1 also includes some very welcome additions to the run time libraries package. Three new libraries exist for working with programs requiring support for multi-thread, and for DLLs with single thread and multi thread. A couple of changes which will affect you are the subtle differences such as errno now being a macro which translates into a function call: a table must now be used somehow in the functions of the run-time to enable a single run-time package to handle errors from multiple sources. [Assuming there is a table, how is the offset into the table created? PID?]

Additionally, the new DLL run-time libraries allows you to use any of the functions you've grown accustomed to. Although I have not tried each and every function, I trust that MS would have specifically mentioned any there might be a problem with. [Er... hopefully this is so?]

By the look of things and how they operate, it is probably safe to assume semaphoring was used throughout the library - this to keep a call using an globally accessible variable from being clobbered from two client processes trying to use it simultaneously. This forces a heavy overhead in system calls to frequently called routines, but one which there is little choice about. Remember that the libraries had to be written under a worst case scenario, and you pay a penalty in speed and efficiency for the safety inherent in putting semaphores around the "dangerous" routines.

[SideBar/Figure of printf being interrupted with and without semaphores...]

Conclusion
With the introduction of DLLs in OS/2, another programming environment was created. Much like WINDOWS programming, it has it's own strict rules. These rules, however, make a great deal of sense once the underlying design concept and limitations of both the chipset and of the appropriate portions of OS/2 are better understood.

You can avoid a lot of these sticky problems by piece-at-a-time programming: get as much of your program to work using routines in a more "normal" library (using the library utilities), then move the routines out into a DLL. Then by adding the additional functionality and safeguards required for shared memory access between sessions and re-entrancy problems, you'll be able to easily create a program which uses up less disk space, less memory space, and allows for inter-process communication in whatever manner *you* wish to design. Not a bad feature at all for a new operating system to be written around.

And, once you've been through the maze of twisty little passages once, the next time it isn't so hard to get through it rapidly and collect that treasure. The secret is just knowing a couple of key phrases. And thinking ahead before you enter the maze.

Figure xDEF

 * STUB 'filename':which allows you to specify the name of a DOS 3.x file to be run if this file is run under DOS instead of under OS/2.
 * PROTMODE:Indicates that this file can only be run in Protected Mode. An aid to the linker.
 * OLD:This statement allows you to preserve the names associated with ordinal numbers in a multi DLL environment. I haven't really figured out a use for it yet, either.
 * REALMODE:The opposite of PROTMODE, this indicates the program can only be run in real mode. An aid to the linker.
 * EXETYPE:Insures that the specified operating system is the current one for the program. You can specify OS2, WINDOWS. Or DOS4. DOS4?!?!? Yep. More on this in a later article.
 * HEAPSIZE:Determines how much local heap must be allocated within the automatic data segment.
 * STACKSIZE:Allows you to specify how much space should be reserved in the stack segment when the program is run.

Figure xINIT
EXTRN  INITROUTINE:FAR ASSUME CS: _TEXT _TEXT  SEGMENT  BYTE PUBLIC 'CODE' START  PROC FAR call   INITROUTINE     ; the real initialization routine ret START  ENDP _TEXT  ENDS END    START   ; defines auto-init entry point