Michael Coughlan
Page 3
Legacy system stakeholders are gradually waking up to the problem. Since 2008, there has been a gradual
increase in awareness of the need to do something about it. COBOL vendors have encouraged academic training of a new crop of COBOL developers. Micro Focus does this through its Micro Focus Academic Program and Academic Alliance programs, and an IBM initiative in this area has resulted in COBOL being taught in 400 colleges and universities around the world.19 In addition, the training companies and in-house training groups that traditionally were the main source of COBOL developers are once more starting to take up the strain. For example, the US Postal Service will start its own COBOL training program as its COBOL programmers retire,20 and the Social Security Administration (SSA)20 in the United States is going the same route. Manta Technologies is reported to be developing a COBOL training series consisting of nine or ten courses.21 The company hopes to complete the series by the end of 2013. Some COBOL vendors like Veryant22 are also providing training courses.
Motivational speakers are often heard to say that the Chinese word for crisis is composed of two characters that represent danger and opportunity. Although there seems to be some doubt about the veracity of this claim, there is no doubt that in the coming years the crisis caused by the tsunami of retiring programmers represents a golden opportunity for those who can grasp it. The number of students earning computing degrees fell sharply after the year 2000, and this led to a programmer shortfall that has made it a seller’s market for computer skills. But student numbers are recovering; and as the job market gets more competitive, having COBOL on your résumé may be a very useful differentiating skill—especially if it is combined with knowledge of Java.
COBOL: The Hidden Asset
The numbers supporting the dominance of COBOL in the business application domain sound incredible. Certainly, a lot of skepticism has been voiced about them on the Internet and elsewhere. But much of the skepticism comes from those who have little or no knowledge of the mainframe arena, an area in which COBOL is strong, if not supreme.
You can gain an appreciation for the opposing points of view by reading Jeff Atwood’s post “COBOL: Everywhere and nowhere” and the associated comments. His comment that “I have never, in my entire so-called ‘professional’
programming career, met anyone who was actively writing COBOL code23” is indicative of the problem programmers often have when presented statistics regarding the importance of COBOL. Many of the comments that followed Atwood’s post reflected that disbelief; but as one commentator remarked, “You want to see COBOL? Go look at a company that processes payroll, or handles trucking, food delivery, or shipping. Look at companies that handle book purchase orders or government disbursements or checking account reconciliation. There’s a huge ecosystem of code out there that’s truly invisible to those of us who work in and around the Internet.24”
Many programmers with a conspiracy-theory bent attempt to prove the impossibility of the COBOL statistics
by pointing to the number of lines of code that could be produced by programmers in the given time frame, or by pointing to the impossibility of maintaining the claimed number of lines with the estimated number of COBOL
programmers. There are a number of answers to these points.
One answer is that the COBOL code inventory has been hugely bulked out by fourth-generation languages
(4GLs) and other COBOL-generating software.25 4GLs were all the rage between the 1970s and 1990s, and many produced COBOL code instead of machine code. This was done to give buyers confidence that if the 4GL vendor failed, they would not be left high and dry. In many cases, the vendors did fail, and only the COBOL code was left. In other cases, the programmers took to maintaining the COBOL code directly, and it is now so divorced from the 4GL
that there is no point in trying to return to the 4GL code.
Another answer is that programmer productivity seems high because many programs are simply near-copies of
existing work. In a legacy system, the enterprise data is often trapped in a variety of storage technologies, from various kinds of database to direct access files and flat files. Nearly every user request to get at that data requires a COBOL
program to be written. But these programs are not written from scratch. A programmer creates the program by using 8
Chapter 1 ■ IntroduCtIon to CoBoL
the copy, paste, and amend method. The programmer simply copies a similar program, make a few changes, and voilà: a new COBOL program and a big boost to apparent programmer productivity.
If the number of bugs found in legacy systems approached that found in newly minted systems, 2 million
programmers might find it very difficult to maintain upwards of 200 billion lines of code. The fact is, though, that unless an environmental change or a user request forces a modification of a legacy system, not much maintenance is required.
When a system has been in production for many tens of years, only the blue-moon bugs remain. There is an old joke that goes, “What’s the difference between computer hardware and computer software?” The answer is, “If you use hardware long enough, it breaks. But if you use software long enough, it works.” A real-world manifestation of David Brin’s26 practice effect, perhaps?
■ Note Blue-moon bugs are bugs that manifest themselves only as a result of the coincidence of an unusual set of circumstances.
A considerable amount of evidence points to the relatively bug-free status of legacy systems. For instance, when an inventory of software systems was taken in preparation for the Y2K conversion, it was discovered that it had been so long since some of the programs in the inventory had been modified that the source code had been lost. In the opinion of Chris Verhoef, “about 5% of the object code lacks its source code.27”
In his paper “Migrating from COBOL to Java,15” Harry Sneed mentions that 5 COBOL programmers were
responsible for 15,486 function points of legacy COBOL whereas 25 Java developers were responsible for 13,207
function points of Java code. Although it might suit COBOL advocates to believe that COBOL developers are five times more efficient than Java developers, a more realistic explanation is that the legacy system had settled into a largely bug-free equilibrium while the newly minted Java code was still awash with them.
COBOL definitely has a visibility problem. The hype that surrounds some computer languages would have you
believe that most of the production business applications in the world are written in Java, C, C++, or Visual Basic and that only a small percentage are written in COBOL. In reality, COBOL is arguably the major programming language for business applications.
One reason for COBOL’s low profile lies in the difference between the vertical and horizontal software markets.
To use a clothing analogy, an application created for the vertical software market is like a tailored, bespoke suit, whereas an application created for the horizontal software market is like a commodity, off-the-rack suit.
Advantages of Bespoke Software
Why should a company spend millions of dollars to create a bespoke application when it could buy a COTS package?
One reason is that because a bespoke application is specifically designed for an organization’s particular requirements, it can be tailored to fit in exactly with the way the business or organization operates. Another reason is that it can be customized to interface with other software the company operates, providing a fully integrated IT infrastructure across the whole organization. Yet another reason is that because the company “owns” the software, the company has control over it. But the primary reason for creating a bespoke application is that it can offer an enterprise a competitive advantage over its rivals. Because a bespoke application can incorporate the business processes and business rules that are specific to the company and that do not exist in any packaged solution, it can offer a considerable advantage over competing companies. Owens and Minor28-29 refer to the specific business rules
and processes embedded in their bespoke applications as their “secret sauce.”
An example of the effectiveness of bespoke software is the software that first allowed an airline to offer a frequent-flyer program (air miles). That software conferred such an advantage on the airline that competitors were forced to catch up, and frequent-flyer programs are now almost ubiquitous.
9
Chapter 1 ■ IntroduCtIon to CoBoL
Characteristics of COBOL Applications
Software produced for the vertical software market has characteristics that distinguish it from the commodity software you are probably more familiar with. This section examines some characteristics of COBOL applications that you may find surprising.
COBOL Applications Can Be Very Large
Many COBOL applications consist of more than 1 million lines of code, and applications consisting of 6 million lines or more are not considered unusually large in many programming shops:
• In “Revitalizing modifiability of legacy assets,30” Niels Veerman mentions a banking company
that had “one large system of 2.6 million LOC in almost 1000 programs.”
• The Irish Life Group, Ireland’s leading life and pensions company, is reported31 to have
completed a legacy system migration project to rehost 3 million lines of COBOL code.
• A Microsoft case study reported that Simon & Schuster had a code inventory of some 5 million
lines of COBOL code.32
• The Owens and Minor case study mentioned earlier reported that “the company ran its
business on 10 million lines of custom COBOL/CICS code.29”
• In his paper “A Pilot Project for Migrating COBOL Code to Web Services,” Harry Sneed
reported a “legacy life insurance system with more than 20 million lines of COBOL code
running under IMS on the IBM mainframe.33”
• The authors of “Industrial Applications of ASF+SDF” talk about a large suite of
mainframe-based COBOL applications that consist of 25,000 programs and 30 million lines
of code.34
• An audit report by the Office of the Inspector General in 2012 noted that as of June 2010,
the US SSA had a COBOL code inventory of “over 60 million lines of COBOL code.35”
• The Bank of New York Mellon is quoted as having a software inventory of 112,500 Cobol
programs consisting of 343 million lines of code.2
• Kwiatkowski and Verhoef report a case study where “a Cobol software portfolio of a large
organization operating in the financial sector” consisted of over “18.2 million physical lines
of code (LOC).25”
COBOL Applications Are Very Long-Lived
The huge investment in creating a software application consisting of millions of lines of COBOL code means the application cannot simply be discarded when a new programming language or technology appears. As a consequence, business applications between 10 and 30 years old are common, and some have been in existence for around 50 years.
A Microsoft case study on the Swedish company Stockholmshem noted that its computer system “was created
in 1963 and had been expanded over the years to include roughly 170 online Customer Information Control System (CICS)/COBOL programs and 370 batch COBOL programs.36”
Kwiatkowski and Verhoef25 published a version log (reproduced in Figure 1-1) for a module in the software portfolio of a large financial organization that illustrates the longevity of COBOL programs. Each line of the log is a comment that shows a version number, the name of a programmer, and the date the software was modified. The log shows that maintenance of this module started in 1975. Nor was this the oldest module found. That honor belonged to a program that had been written in 1967. For some readers of this book, the software in this portfolio started life long before they were born.
10
Chapter 1 ■ IntroduCtIon to CoBoL
Figure 1-1. COBOL module version log. Published in “Recovering Management Information from Source Code,”
Kwiatkowski and Verhoef 25
The longevity of COBOL applications can also be held largely accountable for the predominance of COBOL
programs in the Y2K problem (12,000,000 COBOL applications versus 1,400,000 C++ applications in the United States alone).10 Many years ago, when programmers were writing these applications, they just did not anticipate that the software would last into this millennium.
COBOL Applications Often Run in Critical Areas of Business
COBOL is used for mission-critical applications running in vital areas of the economy. Datamonitor reports that 75% of business data and 90% of financial transactions are processed in COBOL.37 The serious financial and legal consequences that can result from an application failure is one of the reasons for the near panic over the Y2K
problem.
COBOL Applications Often Deal with Enormous Volumes of Data
COBOL’s forte is file and record processing. Single files or databases measured in terabytes are not uncommon.
The SSA system mentioned earlier, for instance, manages over 1 petabyte (1 petabyte = 1,000 terabytes = 1,000,000
gigabytes) of data,38 and “Terabytes of new data come in daily.39”
Characteristics of COBOL
Although COBOL is a high-level programming language, it is probably quite unlike any language you have ever used. A genealogical tree of programming languages usually places COBOL by itself with no antecedents and no descendants. Occasionally a tree might include FLOW-MATIC and COMTRAN or might show a connection to PL/I
(because that language incorporated some COBOL elements). By and large though, COBOL is unique. So even
though COBOL supports the familiar elements of a programming language such as variables, arrays, procedures, and selection and iteration control structures, these familiar elements are implemented in an unfamiliar way. It’s like going to a foreign country and finding that your rental car uses a stick shift and people drive on the other side of the road: disconcerting.
This section examines some of the general characteristics of COBOL that distinguish it from languages with which you might be more familiar.
COBOL Is Self-Documenting
The most obvious characteristic of COBOL programs is their textual, rather than mathematical, orientation. One of the design goals for COBOL was to make it possible for non-programmers such as supervisors, managers, and users to read and understand COBOL code. As a result, COBOL contains such English-like structural elements as verbs, 11
Chapter 1 ■ IntroduCtIon to CoBoL
clauses, sentences, sections, and divisions. As it happens, this design goal was not realized. Managers and users nowadays do not read COBOL programs. Computer programs are just too complex for most nonprofessionals to
understand them, however familiar the syntactic elements. But the design goal and its effect on COBOL syntax had one important side effect: it made COBOL the most readable, understandable, and self-documenting programming language in use today. It also made it the most verbose.
It is easy for programmers unused to the business programming paradigm, where programming with a view to
ease of maintenance is very important, to dismiss the advantage of COBOL’s readability. Not only does this readability generally assist the maintenance process, but the older a program gets, the more valuable readability becomes.
When programs are new, both the in-program comments and the external documentation accurately reflect
the program code. But over time, as more and more revisions are applied to the code, it gets out of step with the documentation until the documentation is actually a hindrance to maintenance rather than a help. The self-documenting nature of COBOL means this problem is not as severe with COBOL as it is with other languages.
Readers who are familiar with C, C++, or Java might want to consider how difficult it becomes to maintain
programs
written in these languages. C programs you wrote yourself are difficult enough to understand when you return to them six months later. Consider how much more difficult it would be to understand a program that was written 15 years previously, by someone else, and which had since been amended and added to by so many others that the documentation no longer accurately reflected the program code. This is a nightmare awaiting maintenance programmers of the future, and it is already peeking over the horizon.
COBOL Is Stable
As a computer language, COBOL evolves with near-glacial slowness. The designers of COBOL do not jump on the bandwagon of every new, popular fad. Changes incorporating new ideas are made to the language only when the new idea has proven itself.
Since its creation in 1960, only four COBOL standards have been produced:
• ANS 68 COBOL: Resolved incompatibilities between different COBOL versions
• ANS 74 COBOL: Introduced the CALL verb and external subprograms
• ANS 85 COBOL: Introduced structured programming and internal subprograms
• ISO 2002 COBOL: Introduced object orientation to COBOL
Enterprises running mission-critical applications are unsurprisingly suspicious of change. Many of these
organizations stay one version behind the very slow leading edge of COBOL. It is only now that the 2002 version of COBOL has been specified that many will start to move to the 1985 standard. This is one reason this book mainly adheres to the ANS 85 standard.
Conscious of the long life of COBOL applications, backward compatibility has been a major concern of the
ANSI COBOL Committee. Very few language elements have been dropped from the language. As a result, programs I wrote in the 1980s for the DEC VAX using VAX COBOL compile, with little or no alteration, on the Micro Focus Visual COBOL compiler. Java, although only created in 1995, is now on its seventh version and already has a very long list of obsolete, deprecated, and removed features. In the years since its creation, Java has removed more language features than COBOL has in the whole of its 50-year history.