| Charles B. Kreitzberg, Ph.D.
President
Cognetics Corporation
This paper was presented to a symposium at the National Institute of Standards and
Technology.
My topic today is the re-engineering of Legacy systems. A legacy is something handed
down from the past and now that the computer has been used commercially for more than 40
years, we're finding that a lot of the old systems which have been handed down from
previous development efforts need to be rebuilt and re-engineered to conform to current
standards.
Legacy systems re-engineering is not easy from either the software engineering or the
usability engineering perspective. One problem is that a lot of Legacy systems have been
around for years and years and often the documentation has been lost, the original
programmers are gone, and the organization doesn't really know what it has. Yet these
systems may be critical to the operation of the organization.
Also, the investment in legacy systems may be huge. I have seen situations in which
corporations estimate that converting their legacy systems to more modern technology will
require tens of million of dollars and years to complete.
Finally, from the usability engineers perspective, the original design of the
human-computer interaction on Legacy systems was constrained by the technology of its
time. These systems were built to operate on character-based terminals in which there was
little computing power at the users side. This is far different from the
client-server models today which assume a powerful workstation. Changing the interaction
design to one which works on a more contemporary GUI may not be easy.
Why Organizations are Re-engineering
Why do corporations want to re-engineer legacy systems? Among the reasons are:
- The code is old and difficult to maintain. Documentation may be lost and the programmers
who built the system and understood it are no longer around.
- The cost of maintaining old terminals and expanding access of the system to new
locations and user groups requires a front-end which runs on a PC rather than old
character-based terminals.
- The business model implicit in the software may not fit current functional requirements.
- The communications architecture which links computer to terminal is obsolete and cannot
serve as a foundation for future development.
Legacy Architecture
Most legacy systems assumed a "dumb" terminal and a "smart" host.
With the availability of personal computers this has changed. Now there is computing power
on both sides of the transmission line a model known as client-server architecture.
Client-server architecture itself has undergone significant change in the past few years.
The original model assumed that the software would reside on the client computer and the
data would reside on the server. But with the popularity of web architecture a new model,
sometimes called net-centric computing, has emerged.
In net-centric computing, the software is not run exclusively on the desktop but is
shared among the users PC (the client) and a number of servers connected to the
users PC by a network.
Because the workload is shared among several computers, the users PC does not
need a complex operating system such as Windows 95 . In principle, the workstation does
not even need a disk since all programs can be loaded from the server on demand (although
I must comment that it is probably short-sighed to buy computers without disks).
Net-centric computing is a fairly recent concept and lacks technical maturity. This
places re-engineering projects in the position of needing to evaluate the ultimate
importance of net-centric (distributed client-server) architecture and decide how to
accommodate it in the current effort.
Economic Incentives for Net-centric Computing
The requirement for net-centric computing has emerged, in part, from a realization of
the costs associated with maintaining the more traditional workstation. Since 1987, the
Gartner Groups has published analyses of a metric they call the total cost of
ownership (TCO) of computers.
The Total Cost of Ownership (TCO) model looks at:
- Capital costs (hardware/software acquisition and upgrade) which account for between
15%-21% of the TCO.
- Technical support which accounts for 17% - 27% of TCO.
- Administration costs (such as security, management of IT assets, contract negotiation
and audits to verify that licensing agreements are being observed) which account for 9% -
13% of the TCO.\
- End-user operations, the most expensive cost component, which account for 41% -56% of
the total cost of ownership. These activities include such tasks as: defining report and
spreadsheet formats, reformat or clean-up data, reading manuals, using online help,
recreating lost data, backing up files, and formatting documents.
Gartner suggests that the end user operations can be reduced by a factor of 40% if well
thought out policies and procedures are in place. (The figures in this section were
updated after the presentation from an article in Network Magazine, May 16, 1997 by Steve
Steinke).
The high cost of maintaining a users computer has led many IT departments to
explore net-centric computing models, to reduce administrative costs. This model assumes
that the operating environment on the workstation will be simplified. By reducing the
complexity of the software on the workstation, the workstation becomes a "thin
client" which, in principle is easier to maintain.
Approaches to Legacy Re-engineering
The shift to net-centric models has created a new class of legacy systems since the
so-called "fat clients" must be re-engineered as "thin clients". The
task of re-engineering legacy systems into thin clients is made especially difficult by
chaos and the lack of definition of the net-centric computing environment. Some of this
ambiguity is a result of immaturity of the software while some is a result of
manufacturers trying to position themselves as provides of unique proprietary standards.
Until these factors are overcome, there will continue to be a chaotic element in the
decision to re-engineer.
What this means is that re-engineering a legacy system can now encompass three
different activities:
- Porting a mainframe/terminal transaction system to a "fat client" model.
This has traditionally been the most common form of legacy re-engineering and has the most
well-defined and mature software environment.
- Porting a mainframe/terminal/transaction system to a "thin client" model
Increasingly, this model is being favored. A number of sof6tware product to convert
transaction systems (such as IBMs CICS) to web-enabled front-ends are being offered
to developers.
- Porting a "fat client" system to a "thin client" architecture
This type of legacy re-engineering is becoming more common as organizations attempt to
convert their software to net-centric models. Some of the legacy systems which are being
considered for conversion are only a few years old. Suns Java and Microsofts
ActiveX are two of the platforms on which this type of conversion is based.
Legacy Architecture
Lets look at the architecture of the typical Legacy system. On the left side you
have the "heavy metal" the mainframe which serves as a host. The
mainframe may not be far more powerful than a well-equipped workstation although it
probably has a better ability to manage large amounts of input and output; this is one
area in which small PC's have a disadvantage because their buss structure does not
effectively support massive data flow, although that is certainly changing as well.
On the right side you have the terminal. The model for these terminals was the IBM
3270, although there are many other types manufactured. Because these terminals were
designed before microprocessors, they have little processing capacity. This created a need
for an architecture in which most of the processing took place at the host. This
architecture is typically called transaction-based. A transaction is the unit of
communication between the terminal and the host computer.
From the user design side, the screen is the basic unit around which interactions are
designed. Screens are character-based and the formatting needs to take place in the
mainframe. Formatted screens are transmitted to the terminal where they are stored in some
form of hardware buffer and displayed on the screen.
Problems of Integrating Multiple Applications
In many organizations, integration of previously separate software systems is a
critical strategic requirement. In a number of industries, corporate strategy has shifted
from sa "silo" mentality with independent and self-contained operating
departments to a more integrated vision. One of the problems with the architecture of
transaction-based legacy systems is that they can only connect a limited number of hosts
to a terminal at one time. To shift to another application, the user may need to log out
of the session and re-connect to the new application. This can be a real problem as the
organization attempts to merge multiple systems into a single integrated environment. When
in January 1996 the IRS announced that it had had problems and was scrapping a four
billion dollar re-engineering effort, one of the reasons they cited for needing
re-engineering was that in the current environment there could be up to nine terminals on
a workers desk.
One industry segment which has had to tackle the problems of integrating multiple
legacy systems is banking. Over the past decade, banks have been moving from an account
driven model to a relationship model. In the account model, the customer was seen as
owning a set of accounts, each of which was treated as a distinct entity. In the
relationship model, the customer is viewed as having one, multi-faceted account whose
components may be deposits, investments or loans.
Since the legacy systems were designed with the account model, each account might
reside on a different host, perhaps in different locations. Pulling together all the
account information to create a single summary display can be difficult or impossible. Not
integrating multiple legacy systems can also be expensive. On one banking system on which
I worked we decided to do some analysis of the costs associated with maintaining separate
legacy systems. What we found was that when a customer requested a transfer of funds from
Account A to Account B and the two accounts were of different mainframes, the user had to
log onto the first system, bring up the account, write the critical information on a paper
form and send it to a back office where somebody would then log onto the second system,
and re-enter the data. Our analysis found that this activity was costing the bank about
$12 million dollars a year in labor costs.
Human Factors
Lets now turn to some of the human factors in interaction design. Legacy systems
were rarely designed to be user friendly. The users of legacy systems were generally
dedicated workers for whom ease of use was seen as a minor priority. The illustration
shows a screen that I pulled off a Legacy customer service system for a magazine that
sells lots of ancillary products. It is a typical "green screen" design. The
good part is that the characters are easy to read and the monospaced font can help the
user locate critical items more easily than with a proportional font. But the organization
of information on the screen is quite confusing.

At the bottom of the screen is a menu of function keys, Function keys were efficient
ways for users to interact with the mainframe host. There is also a command entry area
available in which the user can enter a coded command of one or two characters.
Just for fun, I took this particular screen and redesigned it using the same
input/output constraints. My goal was to see if a legacy screen could be made more usable
or if the transaction architecture was innately confusing. As you can see from the revised
screen, the programmers could have built a much better and more comprehensible screens
than was actually produced.. The point is that it's not simply the hardware architecture
that creates usability problems, but with the care with which the screens are designed.

To carry this exercise to the next logical stage, I reformulated the screen as a GUI
screen in Windows. It now has GUI elements like scroll bars, a spinner box and radio
buttons. Presumably I can click on these column headings and automatically sort the data
for easier viewing. This function, of course would not have been available in the legacy
version because the hardware and architecture can't support it.

Approaches to Legacy Re-engineering
Now I want to turn to various approaches which can be taken to re-engineering legacy
systems. How to approach the problem is one of the most important decisions which will
determine the outcome of the project. The costs of legacy re-engineering are potentially
huge. In information-intensive companies the costs may be in the tens of millions of
dollars and span multiple years.
Because the costs are so great, there is a temptation to seek mechanistic solutions.
Too often, these solutions give the appearance of re-engineering without the benefits.
Other approaches are more expensive and labor-intensive. Done right, however, they will
yield a solid foundation for future computing.
Terminal Emulation
In terminal emulation, you take a capable PC and "dumb it down" so that it
looks and operates like a character terminal. This allows you to retire old and expensive
terminals and replace them with newer, less expensive PCs. It is not a particularly
attractive solution unless the corporation is completely happy with the original system
and just needs to install new hardware. One problem is that the keyboard of the PC may not
have the same configuration as the original terminal. This may mean that some of the
screen prompts seen by the user are wrong and need to be translated to other key
combinations.
Screen Scraping
A step more sophisticated than terminal emulation is screen-scraping. In screen
scraping, you link a terminal emulator with another piece of software which extracts (or
scrapes) the data from the screen and passes it to a client program which can display it
as a GUI. The program which controls the user-interface in a screen scraping design is
often written in a language like Visual Basic.
Screen scraping is seen as attractive because it offers customers a "new
look" to their software without the need to actually create a great deal of new code.
Only a new front-end needs to be created.
Screen scraping is now being used to create internet (or intranet) access to legacy
systems. The software obtains the screens from the legacy host, reformats them into HTML
and sends them to a Browser for display. As client side computing tools such as Java and
ActiveX become more popular, the power of this approach will increase.
However, there is no free lunch and screen scraping has significant limitations. At its
most minimal, the screens created will exactly mirror the original character-based screens
in terms of data presentation and flow. This is an approach which does not use any of the
advantages of the client server architecture. And since GUI screens can be less efficient
for data entry than character-based screens, the net result can be a loss of productivity.
Scraping Multiple Screens
To break free from the constraint of the original presentation and navigation, the
programmer writing the new front end can retrieve and scrape several legacy screens before
constructing the GUI screen. For example, if a banking system presented checking, savings
and credit card accounts on separate screens, the programmer could create a more
integrated screen as follows:
- Retrieve checking screen and scrape it, holding the results in memory.
- Retrieve savings screen and scrape it, holding the results in memory.
- Retrieve credit screen and scrape it, holding the results in memory.
- Assemble a composite screen with data from the previous three screens and display it
with a GUI.
While in principle this type of approach may seem useful, it is limited in its
flexibility. The biggest problem is that the composite screen cannot be displayed until
all the underlying screens have been retrieved. This seriously damages response time. If
you assume that a legacy screen can be retrieved in 5 seconds, then a three screen
composite will require at least 15 seconds to appear, since the host cannot offer
multithreaded retrieval to a single user session.
Composite Super Screens
One approach to overcoming the limitations of screen scraping is to create "super
screens" which are composites of multiple legacy screens. This addresses the problem
of response time by creating an environment in which only a single screen needs to be
retrieved by the client. Since the original screen will never be seen by the user, it can
be packed with data, When one of these super screens is retrieved and scraped, the
programmer creating the new front end has a lot more data to work with and may be able to
re-organize the presentation and flow to make it more productive.
Using the example of the composite banking screen, the legacy re-engineers would create
a new screen at the host which has checking, savings and credit information on it. They
would create a new transaction command to retrieve this screen. Then the programmer
working on the client would simply request this screen , scrape it and display a GUI
equivalent.
One problem with creating super screens is that it requires re-programming the legacy
application. Typically, legacy systems are controlled by very rigid standards of operation
and quality control and documentation, so reprogramming it is not a casual affair.
Making changes to a legacy system requires releasing a new version after making changes
and doing quality control. This requires advance planning and scheduling of resources.
Middleware
A more sophisticated approach to re-engineering than screen scraping is to place a
middleware server between the client and the host. The middleware server can talk to both
the client and the host. Because it has powerful computing capabilities, it can retrieve
data asynchronously from the host so it is ready when needed by the client. It can
reformat data and screens and it can store ancillary data to supplement the information
available on the host.
Middleware servers can also link multiple hosts which reduces the problems of
integrating multiple systems. To decrease response time, the middleware programs can
anticipate data that will be needed and initiate retrieval in case they are needed. This
has the side effect of increasing network traffic but will decrease end-user response
time.
Data Warehousing
The final approach I want to discuss is the data warehouse. A data warehouse discards
much of the legacy programming and integrates all the data from multiple systems into a
single database. The mainframe which hosts the database is the data warehouse server.
Instead of requesting screens, applications can send a request to the data warehouse
server for data items needed.
Creating a data warehouse is a huge and expensive task. Done right, it can be a very
good strategic move, creating a foundation for re-engineering which is both powerful and
flexible. Since the data warehouse will be a critical component in terms of overall
network performance, a great deal of care must be taken to avoid technical environments in
which the data warehousing becomes bogged down.
Summary of Approaches
In summary, I look at legacy re-engineering options as falling into five approaches:
- terminal emulation
- simple screen-scraping
- screen scraping with composite super screens
- middleware
- data warehousing
None is fully satisfactory but the middleware approach is the most flexible in terms of
data navigation and presentation.
Let me briefly mention issues of security. Web security is central to re-engineering.
Because legacy systems were built on relatively proprietary and closed communications
architecture, most legacy systems were able to secure themselves against outside snooping
and attacks. However, the web architecture is more open and the threat is much greater. In
reality, the danger of someone intercepting packet data is low. Hostile applets and
viruses are a more real threat as is cracking the site by hackers.
It is my belief that there are adequate security measures available for all but the
most sensitive systems. But where these are not enough, it is probably necessary to
isolate the system on a separate network with carefully controlled access.
Creating a Re-engineering Project
When you are beginning a re-engineering project, there are a number of key issues you
should consider. Perhaps the most important is how much code you are prepared to re-write
on the Legacy side. The decision should be based on how flexible a platform you will need
going forward. A data warehouse will position you for client-server development. On the
other hand, if the system will be entirely replaced in a few years, it may not be
cost-effect to undertake a massive re-engineering project. There will inevitably be strong
pressure to maintain the Legacy systems intact and simply change the front end.
You need to research the number and configuration of the hosts which will feed the new
system. Also what applications can be run simultaneously by a single user. Understanding
connectivity options is key to the design of the re-engineered system.
Study the navigation of the legacy system and design a navigational scheme for the new
system. Decisions made in this area can rule out screen-scraping. Thank about what data
should be grouped together and what functions should be consolidated.
Most of all, do not fall into the trap of assuming that substituting GUI widgets for
characters will create a usable system. The new system needs to be designed from a
different perspective and to be bound by the obsolete architecture as little as possible. |