Hidden Business Rules in Legacy Code
Large organizations are often hesitant to replace very old legacy applications that have served mission-critical business operations for many years. In fact, they are reluctant to modify the applications at all. Why is that?
Before answering that question, let’s answer this one:
Who Cares?
Looking back over the course of the “agile” movement, it seems to me that larger companies began to take a serious interest in “going agile” once the idea reached the Early Adopter phase of the diffusion of innovations curve, and looked as if it was not going to fizzle out. Since then, many large organizations have undertaken “agile” initiatives, and quite a few have made multiple attempts. The proliferation of “scaling frameworks” and “certifications” reflects the demand for agility on the part of deep-pocketed companies.
The success stories are all about the outer layers of the enterprise technical environment. “Full stack” means (merely) mobile and Web front-ends talking to web servers and, possibly, app servers and local databases, and then handing off the “real work” to back-end systems that are not included in the “agile” ecosystem. The mission-critical business rules are baked into the back-end systems. The “full stack agile” apps mostly pass messages back and forth between client devices and APIs that lead to back-end systems. The true “full stack” in these larger enterprises is not involved at all.
To call that a successful “agile” transformation is akin to saying planet Earth consists only of its crust. The crust is certainly the most interesting part to us humans, as it’s the part we can interact with. But there’s a whole lot more underneath, without which the crust would be uninhabitable.
In the early years, “agilifying” the crust was sufficient to provide customers with the sense that the company was moving forward and able to respond to the market effectively. On some level, however, that approach papers over the larger problem of what to do with business-critical core systems that live on older platforms, behind the APIs the newer applications call. Changes on thosesystems don’t fit into the canonical “two-week sprint” model. There are all kinds of reasons for this, from technical to procedural to structural to regulatory.
Things have progressed to the point that this has become the next “weak link” to strengthen. How is that to be done? Several options are possible:
- Option 1: Isolate the back end behind an API layer, leave it alone, and hope for the best.
The challenge: Increasing risk as the platforms age off of vendor support and qualified technical staff retire. Both the probability and the impact of a failure will increase over time. - Option 2: Bring the back end up to date and fold it into the larger “agile” environment, and stop operating the data center as if it were still 1985. IBM has never stopped evolving the mainframe platform and related technologies. There are no technical barriers to doing this. As difficult as it already is to modernize these systems, the longer we wait the more difficult and expensive it will become.
The challenge: Price – not cost as in cost-benefit, but price. Activating the necessary features and capacity on zSeries systems is expensive. In some cases, the effort to modernize systems may require significant company resources for two or more years, as well. We should have started doing this 20 years ago, when it wasn’t so difficult. We have inherited the results of short-term thinking from that era. - Option 3: Identify business-critical functions within legacy applications and carve them out into microservices or serverless functions.
The challenge: The effort to extract and repackage selected routines from legacy languages is at least as great as the effort to rewrite those applications from scratch. In addition, calling a subprogram written in a mainframe language via foreign function interface (FFI) only works for the first invocation, due to differences in conventions for passing arguments and return values, making it problematic to wrap “rescued” legacy routines in cloud-friendly scripts (yes, I’ve tried it). Maybe that’s okay for a FaaS or “serverless” setup, but it feels pretty clunky. There’s also the fact that once we have analyzed the legacy source code sufficiently to identify key business rules hard-coded therein, we also understand those rules well enough to rewrite the application, and that is probably an easier task. - Option 4: Rewrite key applications or replace them with off-the-shelf products or network-based services.
The challenge: Ensuring market-differentiating logic baked into the legacy code is not lost in translation.
This post addresses the key concerns around Option 4.
Reasons for Hesitation
Clients have expressed two main reasons to me for their reluctance to delve into longstanding legacy applications. First, they don’t have many (or any) technical staff left on board who can confidently work with the code. They worry that any change will lead to a regression that they won’t be able to correct easily, if at all.
Second, they worry that the mysterious source code, written in a mysterious old language, contains mysterious hard-coded business rules that provide some of the company’s competitive advantage, and that are not documented anywhere or understood fully by anyone. Replacing or attempting to update the old code may wipe out that special logic. I suspect the second reason is a corollary of the first.
Some companies are in more dire straits than others. I remember one company that had re-hired a retired employee to maintain their single most mission-critical application. At the age of 78, he was the last remaining human who could work on that system. They were paying him $250,000 per year to work 2 days a week. He had accomplished the Vulcan dream to live long and prosper, but what of the company’s future prospects?
On a side note: Why would a company ever build such a solution in the first place? Remember that in the years when large enterprises were first taking advantage of early computer systems, there were no COTS packages or Internet-based services. Everything had to be written from scratch, and all of it was proprietary, so programmers could not benefit from things others had learned.
That much is understandable. However, why would people go out of their way to design a solution so arcane and complicated that no one else could work with it? To seek deeper understanding of the phenonenon, scholars have created a cross-disciplinary field of study drawing from formal logic, psychology, and pharmacology known as It Must Have Seemed Like A Good Idea At The Time. That is out of scope for this piece, although we look forward to reading (or smoking) the research findings.
Unreadable Code?
Let’s examine the first reason organizations avoid changing old code: Current staff are not familiar with the programming language. For context, let me clarify that we’re primarily talking about applications originally developed on IBM mainframes in Assembly language, COBOL, PL/I, or RPG and on (then) Tandem NonStop systems (now owned by HP) using TAL or COBOL. We’re also mainly considering large, established companies in the financial sector, which adopted computer technology very early in the game. These are the companies most likely to have legacy applications of that vintage still in production, with some core applications dating back to the 1970s.
It’s true that young professionals today are neither trained on nor interested in these older programming languages. They don’t want to be shunted into a multi-year conversion project that makes their marketable skills go stale. On the other hand, when we’re converting code from a language, we only need to be able to understand what the code is doing; we don’t have to learn the language in depth.
The good news is that COBOL accounts for nearly all the legacy applications we might wish to migrate or preserve. Even when written carelessly, COBOL is relatively understandable compared with assembly, TAL, or RPG (especially older RPG versions based on tabular input). PL/I is not inherently hard to read, but there was a tendency for people to write “clever” code; Perl Golf predates the invention of Perl in 1987, at least in spirit.
There was a time when over 97% of business application code in production worldwide was written in COBOL. You will probably not have to deal with other legacy languages that may be harder to understand than COBOL.
Why So Much Stylistic Variation?
Not only are legacy languages unfamiliar to younger colleagues, but the original authors did not always follow consistent conventions. The old code is full of surprises, some more fun than others. The optimistic appraisal of COBOL’s readability offers cold comfort to those who must cope with some of the more creative examples of one-off designs.
I’ve seen accidental “features” of COBOL exploited to the extent IBM could not fix the compiler without crashing hundreds of customers; in particular, the use of nested OCCURS DEPENDING ON clauses to achieve variable-length tables at run time, which is not a defined feature of COBOL and only worked by accident one fine day in an IBM compiler release; a primordial example of Hyrum’s Law. I’ve seen “clever” application generators based on assembly macros; a sort of Cretaceous-era version of the Rails scaffold command, only not as good an idea. I’ve seen COBOL code that wrote object code directly into a WORKING-STORAGE area and then passed it to an assembly subprogram to execute the code on the fly; in effect monkey-patching a static language.
It isn’t necessary for things to be that crazy for the code to be hard to follow. In some shops, people wanted to achieve design-time reuse. They depended on an IBM extension to COBOL that allows for nested COPY REPLACING statements. Just the opposite of the giant monolithic source file, their applications comprised a large number of code snippets filled with placeholder text. The code might look clean, as far as naming conventions and indentation are concerned, but the intent of the code was obfuscated. Eyeballing a physical print-out of the compiled program, with all the COPYs expanded, might be the only way to examine the source code…and we could be talking about 70,000-90,000 lines of code.
Today, each programming language community has settled on certain conventions. For instance, we normally expect people to write method names in upper camel case in C#, lower camel case in Java, and snake case in Ruby, even though the compilers don’t care. There are numerous small conventions that people generally try to follow when working in a given programming language. It helps others read and reuse their code. It’s easy to learn the conventions because we work in an “open” world where we can find tutorials and examples and shared code bases easily.
The world of mainframe development in the mid-20th century was “closed,” in contrast to the “open” world of today. Developers crafted their own conventions and styles within each company, and were largely unaware of the conventions used in other companies. Those of us who moved from organization to organization frequently had the opportunity to see code designed and written in a multitude of different ways. On the whole, things were more chaotic then than they are now, even though we now work with many more different platforms, languages, and frameworks.
I think that’s a consequence of the “closed” world of the time. There were no such things as the World Wide Web, StackOverflow, or Open Source. No one had a mainframe computer in their home. Developers learned on the job, and unless they changed jobs frequently they learned exactly one way of doing things. The way they learned could be very different from the way things were done in the office building next door. The wheel was reinvented independently in thousands of companies, thousands of times over. And there are a lot of ways to make a wheel.
Finding the Hidden Business ules
Fortunately, we don’t have to read every line of source code. With a little practice, we can visually scan the source for patterns that suggest something interesting is going on. For purposes of migrating solutions, we’re particularly interested in spotting code that appears to be doing something that looks convoluted or, to use the proper technical term, “squirrelley.”
As with any programming language, when you see long swathes of uninterrupted code, multiply-indented conditional statements, or excessive source comments trying to compensate for non-intention-revealing code, you know the program may be doing something that is not straightforward. After all, COBOL was designed to be relatively readable. If the intent of the code isn’t apparent, then something is definitely squirrelley. If there are any valuable hidden business rules in the code, they will be hiding there, among the squirrels.
Static Code Analysis
Finding patterns like complicated conditional logic sounds like a job for static code analysis. There are products that support COBOL, like SonarCOBOL, from the well-known code analysis company SonarSource, and Fortify, now owned by Microfocus, the leading cross-platform COBOL provider.
Unusual business rules almost always correlate with complicated conditional logic. Set up the tools to highlight source files that contain patterns like the ones shown in the examples below. Also look for programs or subprograms that are invoked very frequently, and examine the source for those programs visually.
Expect False Positives
Most of the time, you will not find anything worth preserving. Typically, the “special” rules embedded in old source code amount to hacks or workarounds that people came up with long ago, before things were standardized. There is more visceral fear than objective reason about the risk of missing something important when replacing a legacy system.
Here’s a common scenario: A financial institution processes credit card account numbers. They need a way to test their code. If they test using real account numbers, all kinds of undesirable things can happen. Besides, they’re not supposed to have access to real account numbers. There’s that Sarbanes-Oxley thing, you know, as well as rules about personally-identifiable information.
I’m reminded of a time when a colleague working on a back-end credit authorization system used his own Visa card to test the code. When his account showed a balance of $1,000,000 while his credit limit was $25,000, Visa were not amused. Fortunately, he was able to talk his way out of it.
Not everyone is equally adept at talking their way out of things, so people invented schemes to identify fake account numbers. Living in a “closed” world as we did, everyone invented a differentscheme.
Throughout all the application code, any references to an account number had to take the fake numbers into consideration. Yes, I know, that means the production code contained baked-in knowledge of testing. This is now, that was then. Just roll with it.
Here’s a contrived snippet of COBOL code that identifies credit card issuers based on the account number.
IDENTIFICATION DIVISION. PROGRAM-ID. CARD1. (code omitted) DATA DIVISION. (code omitted) WORKING-STORAGE SECTION. (code omitted) 01 WS-RECORD-AREAS. 05 WS-INPUT-RECORD. 10 IN-ACCOUNT-NUMBER. 15 FILLER PIC X(02). 88 TEST-ACCOUNT VALUE '99'. 15 FILLER PIC X(14). 10 FILLER PIC X(174). 05 WS-OUTPUT-RECORD. 10 OUT-ACCOUNT-NUMBER PIC X(16). 10 OUT-MESSAGE PIC X(184). (code omitted) PROCEDURE DIVISION. (code omitted) 2200-IDENTIFY-CARD-TYPE. EVALUATE TRUE WHEN TEST-ACCOUNT EVALUATE TRUE WHEN IN-ACCOUNT-NUMBER(5:1) IS EQUAL TO '3' MOVE 'AMERICAN EXPRESS' TO WS-CARD-TYPE WHEN IN-ACCOUNT-NUMBER(5:1) IS EQUAL TO '4' MOVE 'VISA' TO WS-CARD-TYPE WHEN IN-ACCOUNT-NUMBER(5:1) IS EQUAL TO '5' MOVE 'MASTERCARD' TO WS-CARD-TYPE WHEN IN-ACCOUNT-NUMBER(5:1) IS EQUAL TO '6' MOVE 'DISCOVER' TO WS-CARD-TYPE WHEN IN-ACCOUNT-NUMBER(5:1) IS EQUAL TO '7' MOVE 'DINERS CLUB' TO WS-CARD-TYPE WHEN IN-ACCOUNT-NUMBER(5:1) IS EQUAL TO '8' MOVE 'JAPAN CREDIT BUREAU' TO WS-CARD-TYPE WHEN OTHER MOVE 'UNKNOWN' TO WS-CARD-TYPE END-EVALUATE WHEN IN-ACCOUNT-NUMBER(1:1) IS EQUAL TO '4' MOVE 'VISA' TO WS-CARD-TYPE WHEN IN-ACCOUNT-NUMBER(1:2) IS >= '51' AND IN-ACCOUNT-NUMBER(1:2) IS < '56' MOVE 'MASTERCARD' TO WS-CARD-TYPE WHEN IN-ACCOUNT-NUMBER(1:2) = '36' WHEN IN-ACCOUNT-NUMBER(1:2) = '38' MOVE 'DINERS CLUB' TO WS-CARD-TYPE WHEN IN-ACCOUNT-NUMBER(1:4) = '6011' WHEN IN-ACCOUNT-NUMBER(1:2) = '65' MOVE 'DISCOVER' TO WS-CARD-TYPE WHEN IN-ACCOUNT-NUMBER(1:2) = '34' WHEN IN-ACCOUNT-NUMBER(1:2) = '37' MOVE 'AMERICAN EXPRESS' TO WS-CARD-TYPE WHEN IN-ACCOUNT-NUMBER(1:2) = '35' MOVE 'JAPAN CREDIT BUREAU' TO WS-CARD-TYPE WHEN OTHER MOVE 'UNKNOWN' TO WS-CARD-TYPE END-EVALUATE MOVE IN-ACCOUNT-NUMBER TO OUT-ACCOUNT-NUMBER MOVE WS-MESSAGE TO OUT-MESSAGE . (code omitted)
The basic idea is the code has to recognize test account numbers and handle them differently than real ones. This is in the nature of the “hidden business logic” that legacy applications contain. It isn’t a special way of processing that distinguishes our company from the competition; a valuable trade secret we may lose if we replace the existing application. Far from valuable business rules that create competitive advantage, it’s just a workaround for the fact that it was difficult to manage test data. When we migrate the solution to a different platform or language, we won’t carry that sort of thing over.
Not all legacy code is as clean as the example above. Here’s an equivalent routine based on some real-world examples I’ve seen. Real legacy code may be cluttered with comments documenting the full change history of the program, as we didn’t have very good version control systems in those days. This example uses IF/ELSE with inconsistent indentation instead of EVALUATE, and uses the older convention of ending each statement with a period, which leads to not-so-pretty ways to break out of the conditional block. This example also has some commented-out “debugging” code such as you might find in legacy programs, and features out-of-date comments at the top of the routine. I think you can still parse it even if you aren’t familiar with COBOL.
2200-IDENTIFY-CARD-TYPE. ************************************************************ * CARD TYPE STARTS WITH * * 3 = AMEX * * 4 = VISA * ************************************************************ SET UNKNOWN-CARD TO TRUE. * DISPLAY '2200 ACCT ' IN-ACCOUNT-NUMBER. * CHG0826 88-09-05 L FORTUNATA ADD TEST ACCT LOGIC MOVE '0' TO WS-TEST-FLAG. IF IN-ACCOUNT-NUMBER(1:2) = '34' * CHG0313 84-02-12 JK FRITZ ADD 37 FOR AMEX OR IN-ACCOUNT-NUMBER(1:2) = '37' * CHG0826 88-09-05 L FORTUNATA ADD TEST ACCT LOGIC * CHG1144 88-09-06 JK FRITZ ADD EXIT PARA IF TEST-ACCOUNT MOVE '1' TO WS-TEST-FLAG PERFORM 2295-POP-TST-MSG * DISPLAY 'AFTER POP TST MSG ' OUT-MESSAGE GO TO 2299-EXIT ELSE SET AMERICAN-EXPRESS-CARD TO TRUE * CHG0555 83-05-19 P MILLS MOVE POPULATE MESSAGE TO OWN PARA PERFORM 2290-POPULATE-MESSAGE GO TO 2299-EXIT. * CHG1090 88-06-26 T NGUYEN DISCOVER IF IN-ACCOUNT-NUMBER(1:2) = '65' * CHG1128 88-06-28 T NGUYEN FORGET CHECK 6011 * CHG1184 88-06-30 T NGUYEN CHECK 4 BITE OR IN-ACCOUNT-NUMBER(1:4) = '6011' * DISPLAY 'GOT DISCOVER' * CHG0826 88-09-05 L FORTUNATA ADD TEST ACCT LOGIC * CHG1144 88-09-06 JK FRITZ ADD EXIT PARA IF TEST-ACCOUNT MOVE '1' TO WS-TEST-FLAG PERFORM 2295-POP-TST-MSG GO TO 2299-EXIT ELSE SET DISCOVER-CARD TO TRUE * CHG0555 83-05-19 P MILLS MOVE POPULATE MESSAGE TO OWN PARA PERFORM 2290-POPULATE-MESSAGE GO TO 2299-EXIT. IF IN-ACCOUNT-NUMBER(1:2) >= '51' AND IN-ACCOUNT-NUMBER(1:2) <= '55' * CHG0826 88-09-05 L FORTUNATA ADD TEST ACCT LOGIC * CHG1144 88-09-06 JK FRITZ ADD EXIT PARA IF TEST-ACCOUNT MOVE '1' TO WS-TEST-FLAG PERFORM 2295-POP-TST-MSG GO TO 2299-EXIT ELSE SET MASTERCARD-CARD TO TRUE * CHG0555 83-05-19 P MILLS MOVE POPULATE MESSAGE TO OWN PARA PERFORM 2290-POPULATE-MESSAGE GO TO 2299-EXIT. IF IN-ACCOUNT-NUMBER(1:1) = '4' * CHG0826 88-09-05 L FORTUNATA ADD TEST ACCT LOGIC * CHG1144 88-09-06 JK FRITZ ADD EXIT PARA IF TEST-ACCOUNT MOVE '1' TO WS-TEST-FLAG PERFORM 2295-POP-TST-MSG GO TO 2299-EXIT ELSE SET VISA-CARD TO TRUE * CHG0555 83-05-19 P MILLS MOVE POPULATE MESSAGE TO OWN PARA PERFORM 2290-POPULATE-MESSAGE GO TO 2299-EXIT. * CHG0492 86-07-15 P LING DINER IF IN-ACCOUNT-NUMBER(1:2) = '36' OR IN-ACCOUNT-NUMBER(1:2) = '38' * DISPLAY WS-INPUT-RECORD * CHG0826 88-09-05 L FORTUNATA ADD TEST ACCT LOGIC * CHG1144 88-09-06 JK FRITZ ADD EXIT PARA IF TEST-ACCOUNT MOVE '1' TO WS-TEST-FLAG PERFORM 2295-POP-TST-MSG GO TO 2299-EXIT. ELSE SET DINERS-CLUB-CARD TO TRUE PERFORM 2290-POPULATE-MESSAGE. * CHG1345 90-01-16 BLAKE J REMOVED EXTRA PERFORM * CHG1550 90-01-16 T CONNOR RESTORED PERFORM ELSE SOMETIMES FAILS * CHG1601 90-01-23 BLAKE J SHOULD WORK * CHG1646 90-02-04 T CONNOR WTF BLAKE LEAVE IT * DISPLAY 'Before 2290 call'. PERFORM 2290-POPULATE-MESSAGE. * DISPLAY 'After 2290 call'. 2299-EXIT.
Even if you’re not familiar with COBOL, you probably find the first example easier to follow than the second. And yet, you can see there’s a pattern in the code that warrants a closer look: A complicated if/else structure. In many cases, there will be just one or two paragraphs or blocks in a program that you really need to study. It isn’t too daunting to identify valuable logic in legacy code.
There are many other reasons people may have created hacks for one thing or another. It isn’t always or only to enable testing. The point is that most of this sort of thing doesn’t prevent us migrating the application or replacing it entirely.
I’ll reiterate that it is possible an old program really does contain genuinely valuable business rules that aren’t documented anywhere else. But it’s far less likely than most people assume.