OS/2 Miniaturization Contest

Secrets of the Programming Masters Revealed!

by Michal Necasek, September 2001

In August 2001 I came up with a crazy idea: I went on the comp.os.os2.programmer.misc newsgroup and announced an "OS/2 Miniaturization Contest", that is a contest for OS/2 programmers with the goal of creating the smallest possible program fulfilling a set of conditions. The important conditions were:
 * The program must be an OS/2 executable in the LX format (32-bit)
 * The program must print the message "I'm really small!" plus newline on the console
 * The program must run on OS/2 Warp 4 GA or newer without requiring any extra files

To make matters more interesting and attract programmers with various backgrounds, I announced originally two and later three categories:
 * Stock - any high level language, using only widely available tools. No assembly permitted.
 * High Octane Stock - same as above but assembly permitted.
 * Custom - anything is allowed as long as the program works and tools used in its creation will be published.

To minimize the amount of organization required and ensure fair contest, I had one idea which proved very fortunate: all contestants were to post only the sizes they achieved in given categories, requiring only the final winner to publish his or her source code and/or submit it to me. The sizes were posted continuously which ensured that everyone knew what the current minimum is and could (and did) try harder. I believe this resulted in executables which are as small as it gets.

Before I announced the contest I already had some ideas about how to create very small executables. But even I was surprised by the winning programs. The winners of the three categories were:
 * Stock - Knut St. Osmundsen, 276 bytes, using Watcom C, WLINK and LxLite.
 * High Octane Stock - Knut St. Osmundsen, 273 bytes, using ALP, ILINK and LxLite.
 * Custom - Martin Lafaix, 196 bytes, using mainly a hex editor.

Perhaps the most interesting result is that the C and assembly executables were not significantly different in size, contrary to initial expectations. This is mostly due to the fact that Watcom C provides very good control of generated object files, almost as good as assembly.

The most amazing is of course the Custom category winner. Especially if you realize that the size of the LX header is 196 bytes and the OS/2 loader will refuse to even try loading anything smaller than 196 bytes. This header-sized executable however prints a 17-byte message and returns to the system without crashing. I will not discuss this category in detail and instead refer readers to Martin Lafaix's dissection of his winning program (aptly named Fandango on Core).

Now that I'm done with the contest background, on to some juicy code. The code that I will present here is my own but it includes many ideas from the winners. I built all the code using Watcom 11.0c tools but it should be possible to adapt them for use with other assemblers/compilers, perhaps excluding the most advanced C code. I will explain all the compiler and linker switches that are not immediately obvious.

I assume that the reader has working knowledge of assembly, C, the x86 architecture and OS/2 executable format and execution environment. This is decidedly not beginner stuff although beginners might learn a trick or two, if they won't be too overwhelmed. If they are, well, come back in a year or two! I was once a beginner too.

High Octane Stock category
I will present the assembly category first as it is a lot more straightforward and doesn't use as many weird tricks as the C version. That is to say, in the assembly version everything is clearly visible whereas in the C version the tricks are hidden behind switches and pragmas.

Okay, so how do you print a message on the console using the OS/2 API? The first immediately obvious choice is DosWrite. But DosWrite has some serious drawbacks. It requires four parameters (ie. four DWORDs on stack) and it lives in DOSCALLS.DLL (the OS/2 kernel really) which has rather long name that will have to be embedded in the executable.

So let's look around some more. There happens to be another rarely used OS/2 API which we can use. It is DosPutMessage. It only has three parameters and even better, is located in MSG.DLL (short module name).

The entire source of a program (called asm1.asm) using DosPutMessage looks like this:  .386p EXTRN  DosPutMessage:BYTE

_DATA      SEGMENT BYTE PUBLIC USE32 'STACK' _msg: DB "I'm really small!",0aH _DATA      ENDS

_TEXT      SEGMENT BYTE PUBLIC USE32 'CODE' ASSUME CS:_TEXT, DS:_TEXT, SS:_TEXT

startup: push   offset flat:_msg push   12H push   1 call   near ptr flat:DosPutMessage add    esp,0CH ret _TEXT      ENDS

END startup  As you can see, this program is very short and doesn't use any tricks. The only "trick" is that it doesn't use DosExit and just RETs instead. That will in effect execute DosExit with default parameters. After assembling with WASM and linking with WLINK you'll end up with a 545 byte program. The exact commands I used were wasm asm1.asm wlink file asm1 lib os2386 option st=32k The only perhaps not entirely obvious switch is op st=32k which sets the stack to 32 kilobytes. Without this, the program would only have 18 bytes of stack (the message it should print) and would crash right when starting up.

While 545 bytes is not that bad in the age of bloatware, it's easy to improve the size a lot using LxLite: lxlite /T /ZS:512 asm1.exe The tricky part here is stripping the MS-DOS stub program (the /T and /ZS options) which is 128 bytes large and significantly contributes to program's size. OS/2 is quite capable of running programs without the stub. MS-DOS won't be able to run them of course but that was not a requirement of the contest. LxLite does some further optimizations too and the result is a 325 byte program. This is much better than 545 but still nowhere near the winners. The masters apparently have some aces up their sleeves. So let's take a look at them.

To achieve minimum sizes, it is necessary to attack on several fronts. The obvious one is to reduce program code. The non-obvious one is to reduce the LX structure overhead.

So let's start with something entirely different. There is one more way to reduce program size. While I believe there is no better way to print a message using the base OS/2 API, Warp 4 (the baseline for the contest) does provide one little known way which is even more suitable. Warp 4 comes with a C runtime DLL which is somewhat altered version of VisualAge C++ version 3 C runtime DLL. There are actually several versions of the DLL - LIBCS.DLL (single-threaded), LIBCM.DLL (multi-threaded) and LIBCN.DLL (subsystem). These provide vprintf and puts which both only require a single argument. I chose puts because it allows for further optimization in leaving out the newline from the message (puts prints newline automatically). However the ordinal of vprintf requires one less byte of storage in the import table of the resulting executable so the end result is the same. Anyway here's asm2.asm:  .386p EXTRN  puts:BYTE

_DATA      SEGMENT BYTE PUBLIC USE32 'STACK' _msg: DB "I'm really small!" _DATA      ENDS

_TEXT      SEGMENT BYTE PUBLIC USE32 'CODE' ASSUME CS:_TEXT, DS:_TEXT, SS:_TEXT

startup: mov    eax,offset flat:_msg jmp    near puts _TEXT      ENDS

END startup  Note that puts (and vprintf too) require a NULL terminated string, unlike DosPutMessage. But because we know (don't we?) that the data segment is guaranteed to be initialized with zeros, we can leave that out. Another important thing to remember is that puts uses the _Optlink calling convention with parameters passed via registers and not stack.

This program uses one other interesting optimization. Instead of calling puts, it jumps to it. When puts returns, it will return directly to the caller of our program which saves us a RET instruction. Even better, the fixup for puts is initialized with zeros in the executable image which allows the linker and/or LxLite to save further 4 bytes. The executable is now 318 bytes big. Enjoy the beauty of a program that is two instructions long and actually does something, without even crashing. Now the ride starts getting real interesting.

To get to the winning range, we need to get rid of some 45 bytes of excessive fat. This might seem impossible because the program already is quite small. So let's get nasty and do things little kids aren't allowed to. It's really simple but very non-obvious, as asm3.asm shows:  .386p EXTRN  puts:BYTE

_TEXT      SEGMENT BYTE PUBLIC USE32 'STACK' ASSUME CS:_TEXT, DS:_TEXT, SS:_TEXT _msg: DB "I'm really small!",0

startup: mov    eax,offset flat:_msg jmp    near puts _TEXT      ENDS

END startup  Yes, this program has only one segment! This segment doesn't have the executable flag but it runs just fine and that's important... what's more important, it's only 283 bytes. We're in the winning range! So where do we save more bytes? There are two ways. One of them is extremely simple: rename the executable! The module name is stored within the executable and to save precious bytes, you can use an empty name (.exe). Another way to save space is the first instruction of the program. The MOV takes five bytes in the executable whereas the JMP only takes one byte (because the zeros of the address fixup are left out).

We happen to know what the MOV instruction will put into EAX. It will be 10000H (64K) because that's the linear address where the executable will always be loaded. Knut St. Osmundsen came up with the most elegant method of loading 10000H into EAX in 3 bytes of code. This is the final version of the program source (asm4.asm):  .386p EXTRN  puts:BYTE

_TEXT      SEGMENT BYTE PUBLIC USE32 'STACK' ASSUME CS:_TEXT, DS:_TEXT, SS:_TEXT _msg: DB "I'm really small!",0

startup: dec    ax    inc     eax jmp    near puts _TEXT      ENDS

END startup  Here are the commands I used to build the executable: wasm asm4.asm wlink f asm4 n .exe imp puts LIBCS.362 op st=32k ren .exe asm4.exe lxlite /T /ZS:512 asm4.exe For those unfamiliar with WLINK directives, the imp puts LIBCS.362 directive tells the linker that symbol puts is imported from library LIBCS as ordinal 362. We could have used LIBCM just as well but the mini-program is single threaded so LIBCS works just fine. As for the ordinal, it is easy to find it with EXEHDR. It would have been possible to use an import library but using the import directive doesn't require one to actually have the import library. This way even the few poor souls who still don't have the Warp 4 Toolkit can build the program.

The trick is the n .exe directive (Name). That tells WLINK to use an empty module name. The final size is now 274 bytes! This is as far as the Watcom tools will go (at least I think so). Knut's winning program is one byte smaller but that's due to difference between executables produced with WLINK and ILINK, there is no way to control it. And hey, 274 bytes isn't bad either!

It is interesting to note that the entire program source code is very small and even the commands used to build it aren't very complex. But this is deceptive because the program implicitly relies on a number of features of the x86 architecture as well as the OS/2 32-bit execution environment.

Stock category
In the preceding section I revealed all the important trick used to achieve microscopic executable size. Now we will try to achieve similar results without using assembly. This requires rather good knowledge of compiler specifics, such as various obscure switches and pragmas. I will jump directly to the final code and then dissect it. Meet mini.c:  void puts(char *s);


 * 1) pragma data_seg("MYDATA", "STACK")
 * 2) pragma code_seg("MYDATA", "STACK")

char msg[] = "I'm really small!";

void _System startup(void) { puts(msg); }  This is built with the following commands: wcc386 -s -g=DGROUP mini.c wlink sys os2v2 name .exe f mini imp puts_ LIBCM.362 op start=startup,st=32k,nod ren .exe minic.exe lxLite.exe /T /ZS:512 minic.exe The resulting executable, minic.exe, is only 276 bytes large! How is that possible? Closer inspection reveals that the only difference between the C and assembly versions is the program code. The C program uses code equivalent to this: mov    eax,offset flat:_msg jmp    near puts Yes, the compiler is pretty damn clever and optimized the code as much as it could! Because the compiler has no idea where the code will finally end up, it cannot replace the offset to the message with a constant. Apart from this, the C version uses all the tricks explained in the assembly section.

The tricky part here is convincing the compiler and linker to build this program. To achieve that, two rarely used pragmas are needed. The #pragma data_seg and code_seg are fairly standard (in that most compilers support some equivalent functionality) and determine the segment where the data and generated code will be placed. In this case the code and data segments are the same of course. Why do they have class STACK? That's because WLINK requires a segment with that name to be present when linking the final executable.

Now for the compiler switches. The -s (turn off stack checking) is pretty obvious. Having stack checking on would increase code size and what's much worse, it would require the C runtime to be present. That's out of the question. The -g switch controls the group where the code segment will be placed. This has to be DGROUP, otherwise the linker couldn't merge the data and code segments. Fortunately Watcom is pretty flexible here.

The linker switches are similar to the ones used for the assembly program although two new ones were added. The NOD option is a shorthand for NODefaultlibs and turns off default library searches (there are special records in the object files generated by the compiler). Similar effect could be achieved using the -zl compiler switch.

But the key to success is the start option. In the assembly code we used the END directive to determine the program's entry point. There is no equivalent in C, although IBM C supports #pragma entry. Watcom C has no such pragma but allows us to specify the program entry point via the start=symbol option of WLINK.

The final version of the C code looks perhaps more complex than the assembly version but still not too complex. Rest assured that it is the result of many hours of intense fight with the compiler, trying to find the right mix of ingredients that would force the compiler and linker to do something it rather wouldn't do. Funnily enough, the result of the battle isn't even very big.

Conclusion
I hope this short treatise on program miniaturization has amused you or at least shocked you. In the best case you have learned some tricks that you can apply to your own programs, just like I did. The morale of the story is, if you try hard enough, you can achieve more than you could ever imagine (yeah, I know it sounds silly)! Oh and by the way, if you know some more tricks that can be used to achieve even smaller executable sizes, be sure to let me know at MichalN@prodigy.net.