Speed Kills – Unless You Drive HostBridge
March 26, 2009 (Updated September 28, 2015
Greetings from the CICS integration fast lane,
A few weeks ago we received an urgent technical support call from a HostBridge customer in the automotive industry – a manufacturer in the top 50 of the Fortune Global 500. Our customer opened the case by saying, “I don’t think this is really a HostBridge issue, but we can’t figure out what’s going on. The problem only happens when we drive work through HostBridge.”
Whenever we hear those words, we know things are going to be interesting. They were…
Now that you know the lay of the land, here’s the problem as reported by the customer: “Whenever a ‘post thru bill’ is generated directly by a CSR using a 3270 terminal/emulator, all the VSAM files are updated correctly. However, when the ‘post thru bill’ is generated using Siebel and HostBridge, the account data in one of the VSAM files is sometimes corrupted, which results in an ‘out of balance’ condition.”
After our initial research it was clear that the problem: (a) occurred inconsistently, (b) was not related to processing load, and (c) occurred in production but could not be reproduced in a test environment. As a software vendor, these are all the things you don’t want to hear.
The first step in resolving complex problems like this one is to weed out the “red herrings” – determine all the factors that are irrelevant (but tempting to consider nonetheless) so you can focus on the factors that are relevant. In complex integration scenarios like this one, this step must be done with discipline and patience (and minimal finger pointing). After walking through this process, it seemed to us (HostBridge) that the situation had most of the characteristics of problems that stem from “over-driving” a CICS application – that is, driving a CICS application harder or faster than was ever envisioned by the original authors.
Many CICS applications were written quite a while ago and were designed around a perfectly valid set of assumptions… at the time. For example, some were designed to operate at human speeds, and according to human limitations. However, when you drive these applications via a high-performance service composition tool like HostBridge, such assumptions can be violated and the results unexpected. Consider a CICS transaction that writes audit records to a keyed VSAM file. Assume the key of the file is a 6-digit account number followed by a date/time stamp in format “YYYYMMDDHHMMSS.” What happens if the transaction is executed more than once within the same second? That’s right… bad stuff.
Given the significance of the customer’s problem, the day soon arrived for the “mother of all conference calls” – involving a dozen people scattered all across the U.S. and two people internationally (me for one).
Fortunately, the vendor of the lease management software package was on the call, and we were able to discuss how the application operated. In fact, the application does generate VSAM record keys that include a date/time stamp. However, the time stamp goes to a resolution of hundredths-of-seconds. And as a further protection, if a record already exists with the generated key, the application increments the hundredths-of-seconds value by 1 and attempts to write the record again (it will continue this until the write operation is successful). This technique looked pretty solid so (at least in theory) it didn’t seem that the sheer volume of the transactions would cause problems. However, this led us to another theory. What if the application had other assumptions or dependencies related to concurrency?
Normally, the productivity of large conference calls is inversely related to the number of attendees. However, this time it worked. By having the Siebel application architects, the lease management software developer, the customer, and HostBridge on the call, we were able to analyze the entire operational path end-to-end. As we did, certain things came to light.
As described above, the CSR interacts with the Siebel application, and Siebel interacts with the CICS-based lease management application through a set of HostBridge services (scripts). It turned out that in certain scenarios, the Siebel application would cause two HostBridge services to be invoked simultaneously in reference to the same account (one of these services being “LeasePostThruBill”). From a business process perspective, this made perfect sense – and HostBridge doesn’t care.
However, these two HostBridge services exercised two different CICS transactions (I’ll refer to them as A and B). Fortunately, this caught the ear of the lease management software vendor. It turned out that the customer was using a version of their software that contained a design assumption (bug?) that transaction A and B would not be executed at the same time for the same account (after all, why or how would a human operator ever do that). If they were, data corruption in the VSAM file could occur. Bingo!
The diagram below summarizes this scenario and describes the problem:
So… how did we solve the problem? Two options were first considered: (a) change the Siebel application to insure that HostBridge service 1 and 2 would not be executed in parallel (but this really didn’t make sense in terms of the business process and would require significant changes); or (b) change the lease management application to correct the assumption/bug so that transaction A and B could be executed simultaneously for the same account number (in fact, the vendor had already addressed this issue in a more recent release of their software).
Given the nature of these options and the degree of testing that would be required, neither offered a fast fix. And the customer needed to solve the problem today!
That led us to consider how HostBridge might be able to circumvent the problem by compensating for the characteristics of the CICS application. We explored two alternatives, both trivial. Ultimately we decided to serialize the execution of script 1 and 2 when processing the same account. To accomplish this, all we had to do insert the following lines at the beginning of each script:
resource = ‘HBSerializeAccount:’ + account;
And this line at the end of each script:
That’s it. Add three lines to each script and HostBridge can compensate for the assumption/bug of the application. This allowed the customer to overcome their production problem immediately. In the final analysis, this turned out to be a case study of how CICS terminal-oriented applications were sometimes written to assume that they were being driven by a human operator. When they are operating according to human limitations regarding speed and concurrency, they behave just fine. However, when these same applications are driven by a high-performance integration tool like HostBridge, the design flaws begin to show up. While the diagnosis of this problem was excruciating (think House, M.D.), the solution/circumvention was trivial (think House, M.D. again).
HostBridge became the solution to allow our customers’ Siebel CRM system to drive their existing CICS-based leasing application faster than ever. Or as we now say around the office…
Speed kills – unless you drive HostBridge.
Until next time, remember to buckle in with the ultimate integration driving machine.