The Problems and Challenges Of the Graphical User Interface

Dr. Jim Thatcher
 * Manager, Interaction Technology
 * Mathematical Sciences Department
 * IBM Research


 * National Federation of the Blind
 * November 4, 1993

Introduction
Blind computer users have been worried about access to Graphical User Interfaces (GUIs) for several years. And, who can blame them?

They have been shut out of access to graphics programs like Flight Simulator for DOS, and from certain parts of even their favorite text-based programs. The preview facility of WordPerfect, and the graph facility of Lotus 1-2-3 both enter graphics mode, and become inaccessible. But blind users may be willing to get along without access to that very small fragment of the computing environment.

The graphical user interfaces in Microsoft's Windows under DOS, in IBM's Presentation Manager under OS/2, X Windows under UNIX and the Apple Macintosh are quite a different story. In these environments all programs (including Lotus 1-2-3 and WordPerfect) run all the time in graphics mode. The environment is radically different, especially for blind users and the people who develop access technology for them.

In this article, I first explain the difference between text-mode and graphics computing. Then, I will discuss the advantages of the graphical user interface - advantages shared by blind and sighted users alike. Together with these advantages, I want to present my view of how a screen reader should respond to this new environment.

Turning to issues even more specific to screen readers, I will then discuss two problem areas that I see as critical to the continued evolution of screen readers for graphical user interfaces. The first relates to automatic announcement of error or status messages; the second is what I refer to as the "active point issue."

Text-Mode compared to Graphics-Mode computing
Let's take a look at the two different computing environments.

Text-mode computing is simple and accurate. It uses a model of the display that is a twenty five by eighty array(1) of pairs of numbers. The first number in the pair is the ASCII value of the character that appears in that position of the screen and the second number is the attribute - it gives color information for foreground and background colors and tells whether or not the character is blinking.

For example, if the background is blue and there is a white 'I' in the upper corner of your text-mode display, then the first pair of numbers in the array will be <73, 31>. 73 is the ASCII value of capital I, and 31 is the number that represents white on blue.

In text-mode computing everything is stored in display memory and it is stored there in a useful form. This is what I mean by text-mode computing being accurate. It is accurate because the display hardware uses only display memory to form the image on the screen. Display memory actually contains this array of pairs of numbers. This is what gets displayed, nothing more, nothing less.

The ASCII numbers that represent characters are exactly what a screen reader sends to the synthesizer for it to translate ASCII text into speech. When 73 is sent to the text-to-speech device, it says 'I'.

For graphics-based computing, there still is display memory, but now the numbers in that display memory only represent pixels(2), which are just dots of color.

For example, a white on blue 'I' is made up of about 128 pixels - some are the color blue, others - comprising the I - are white. The display memory holds only pixels. It contains no ASCII values.

Several methods of reading graphics-mode display screens are being discussed.
The first method would be to use character recognition, or better, document recognition to figure out what is on that display. I believe that today character recognition is feasible for a static screen, but not for changing screens as we find in a computing environment.

The second solution is to create what Berkeley Systems first called an Off-Screen-Model (OSM) when they introduced OutSpoken for the MacIntosh in November 1989, the first screen reader for a graphical user interface.

To the best of my knowledge, all screen readers for graphical user interfaces use some form of OSM. The idea of the off-screen-model is to intercept everything that is going to the display before it becomes pictures, and record all relevant information in a separate data structure or data base, called the off-screen-model. The information recorded there will include the text, its position, color, font, and window handle(3).

That is the minimum the OSM will contain. Different screen readers will contain more and different information depending, in part, on the level at which the drawing calls are intercepted.

Once you have the off-screen-model, a screen reader can be built that accesses the OSM instead of the display buffer, to determine the text and/or icons that are on the display. The screen reader uses off-screen-model to report text on the display, rather than using the display memory as it did for text-mode computing.

Advantages of GUIs
The idea of Common User Access (CUA, IBM calls it) is good news for both sighted and blind users. Basically, one uses the same ways of navigating in many different applications. Text-mode programs were heading that way as they added menu bars, pull-down menus, dialogs, and the like. But still navigation in text-mode WordPerfect 5.1, Lotus 1-2-3, and Quicken were all different. The GUI versions (OS/2 and Windows) of these applications do in fact have a common interface. The ways to get to menus, to move around menus, to pull-down menus, to interact with dialogs are all the same.

That common access is reflected in how the screen reader works. In addition to having a model of the display, the GUI screen reader is hooked into messages and actions of the GUI and so knows when menus are active, or dialogs have appeared. Most screen readers for GUI's will speak all of these events automatically, without configuration of any kind.

What had become markedly complex and difficult for text-based computing (action bars, color bars, pop-ups) is now almost automatic (In 1988, before OS/2 1.1 was released, I was showing a demonstration program that spoke menus, dialogs, entry fields, and window titles as the user moved around the Presentation Manager GUI. That is the easy part that can be done by a competent GUI programmer.) But not only are these standardized controls relatively easy for the screen reader - they are the heart of common user access.

This common access includes help and documentation as well. In all applications I have seen for OS/2 and Windows, F1 will give context-sensitive help. That unification is truly welcome. Online documentation is the rule rather than the exception, and most applications use the GUI's information presentation facilities, so getting around that documentation will be familiar across different applications.

In summary, the use of standardized controls simplifies access for blind users and sighted users alike. In addition, that access is what had become so difficult with text-mode screen readers. These benefits, I believe, far outweigh the real and perceived difficulties in designing screen readers for the GUI environment, and the blind users concern about mastering this new environment. It is my contention that the environment is, because of common user access, easier to master than was the text-based DOS environment.

Status Announcements - What was Easy is now Hard
It seems to be the nature of things; just when the really difficult parts of screen access get easy (menus, popup dialogs, and the like), the easy things, like status messages have become difficult.

For me to talk about this it is easiest to refer to the facilities of IBM Screen Reader. The precursor to IBM Screen Reader, called PC SAID, had, in 1984, a concept of Autospeak with which the user could, in a profile, specify any part of the screen (or any expression) to watch, and when there was a change in that part of the screen or the value of that expression, PC SAID would take specified action, usually an announcement.

By the time PC SAID came out as the IBM Screen Reader product, other screen readers had basically the same function.

The uses of autospeaks in all screen readers is more or less the same. The blind user needs to be notified of spontaneous status messages or error messages that appear somewhere on the display; maybe the top, maybe the bottom, but certainly not where the user is currently focused. Some examples are these kinds of messages: "String not found," "Unknown Command," "The system will go down in five minutes," "Drive A not ready," and "WAIT." For most text-mode applications, the position where that message appeared was fixed for the application, and with relatively simple configuration activities, the user could both hear the message when it appeared and review it as well. Remember; the text-based model is simple and accurate; so the message was to be found in row 23, column 16, or row 1 column 76, or maybe two rows above the cursor (as examples).

The situation for the GUI is totally different. Those status or error messages are still there. But their location is quite another question. To take just one example, the concept of row 23, column 16 is no longer relevant. The number of rows or columns in a graphical screen depends on how much text you have put there. Maybe a status message appears at the bottom of the window, but when there is no status message, the 'last line' may be the text you are currently typing.

A status message may consistently appear at some pixel position on the display. But windows can be moved, and they come up in different positions depending on the order of invocation. And relative positions are not good either because windows can be resized.

I am aware that all of this may seem somewhat mysterious, and even alarming to blind users. But I want to discuss it here in order to explain the IBM Screen Reader/2 solution to this difficult problem because I believe it is the only one currently available for GUI screen readers.

Those Autospeaks that were the key innovation of the first IBM Screen Reader for DOS have been generalized to be able to watch the results of procedures. Those procedures in our Profile Access Language (PAL) can be defined to search the window tree of the application for the status message, and return the text of that message (for example) if it exists. In this way the status message can be announced.

As I said in the beginning of this section; it is not easy, but it can be done. I look forward to other innovative interactive methods for accomplishing this task as screen readers for GUI's evolve.

The Active Point Issue
This is a subject which is not discussed by application or operating system developers or planners. This is, I believe, an issue peculiar to screen readers and screen enlargement software.

Any screen reader must be able to follow and describe the active point. This is because the blind user must know what keyboard actions will do, what the enter key will do, and where characters will be placed when entered from the keyboard.

It seems to me to be easiest to describe the active point issue with reference to text-based DOS computing. The cursor in that environment is usually the active point. The information about the cursor (position, shape, color) for text-based DOS computing is held in registers in the display hardware. A screen reader in that environment can know the cursor position by reading those registers.

For contrast, in the mid 80's, IBM had a 3270 emulator that ignored the cursor hardware and instead highlighted a single character. The active point in that emulator was a single highlighted character. The software was inaccessible by screen readers of that time because the effort required to find that highlighted character was just too great. Note that many screen readers of that time, especially IBM Screen Reader for DOS, could find that cursor character, but not in a useful or practical way.

In the late 80's DOS text-based programs more and more had an CUA type interface, with action bars, pulldown menus, and dialogs. These introduced the active point issue into text based computing. Screen readers adopted methods for watching for highlight bar or color bar changes so as to track the active point. Some applications (like Lotus 1-2-3) always had the hardware cursor follow the highlight or color bar in an invisible mode. In these cases tracking the active point was a lot easier. Initially these applications were difficult for screen readers, but because the required information was available to the screen reading program, those screen reading programs adapted.

For the graphical user interface, the active point issue is of special concern. There may be absolutely no way for a screen reader to detect the active point. The reason is that the active point, the cursor, insertion bar, highlight, selected item, is indicated with some graphical object, a line, a box or a color change. The ways to draw such an object on the display are practically unlimited. The graphical screen reader depends on knowing the drawing method. This means that it is quite possible, as we know, for a screen reader to come out as a product, and the next application (one not yet tested) could be completely inaccessible because a cursor or highlight is drawn in a way not yet imagined by the screen reader developer.

I can conceive of the following approaches to address the active point issue.


 * 1) Test as many applications as possible prior to product release; try to generalize cursor and selector drawing methods found in tested applications.
 * 2) Make absolutely certain that the screen reader handles all cursors and selectors that are standard for the GUI. Standard means created by a CreateCursor type of call or found in standard text editing widgets.
 * 3) Get on application developers' beta test programs so that problems with active point tracking specific to those applications can be caught before they become products.
 * 4) Encourage application developers to use standard cursors and selectors in standard ways.
 * 5) Add a hookable call to the GUI,

ActivePoint(x, y, width, height, Type_flag);

This would be called by the standard cursor routines in the GUI, for example WinShowCursor in PM. But also it would be called by applications using unusual cursors, non-standard cursors, if those applications wanted to be accessible. When a screen reader hooked this call, it would be passed the active point information it received.

The previous suggestion could, in effect, be implemented independent of the operating system following the AccessAware ideas proposed by by Berkeley Systems, and similar ideas proposed by Microsoft. Here a separate library would be provided and the non-standard applications would call ActivePoint( ... ) to inform the screen reader of its active point.

6. Do research. For example, can artificial intelligence techniques be applied in such a way that a screen reader would be able to learn that certain graphical objects are to be interpreted as an insertion bar, cursor, or the active point?

Conclusion
I believe that the off-screen-model concept solves the problem of the graphics screen for the graphical user interface. And I think that that technology is well understood by screen reader developers.

Common user access is a big plus for the computing environment, and with access to the GUI, this advantage is shared by blind users.

We need to work more on being able to easily configure screen readers to automatically announce all we would like; but that will, I think come with the advancing screen reader technology.

I mentioned the active point issue, because it is important and because I do not know what the solution will be. It must be remembered that most applications work just fine regarding this active point issue. The ill-behaved applications are the exception, not the rule.
 * 1) The size of this array can vary. For example, it can be 25 by 132 or 53 by 80.
 * 2) Pixels are also referred to as picture elements or PELs.
 * 3) A window handle is a tag that identifies the window.