Head, Apply Wall To It Several Times!

If you don’t like that title, how about “How to out smart yourself and shoot yourself in the foot, all at the same time!” or “No smart deed goes unpunished!”.

In the summer of 2014, I decided to clean up up some compiler warning messages that certain compilers (HP-UX cc comes to mind) were complaining about. Note: The idea was a good one, it is the extra “smart” stuff I did that came back to bit me in the ass.

The compiler was complaining about assigning “size_t” to an “int” for the strlen() function. I was doing:

int   len = strlen( pStr );

So, I made an executive decision (I am the CTO) to change any and all variables that were used for strlen() in 14 commercial products, 6 ‘licensed as free’ products and 9 ‘open source’ projects to use “size_t” rather than “int”. Simple idea and nothing bad could come of it, right.

So, along the way, I decided that some for-loop counters could also be “size_t” rather than “int”. So, subroutine RemoveTrailingBlanks (along with a lot of other code) change from:

void RemoveTrailingBlanks( char *pStr )
{
   int      i;
   int      len = strlen( pStr );

   if (len <= 0)
      return;

   for ( i = (len - 1 ); i >= 0 ; i-- )
   {
      if ( *(pStr + i) != ' ' )     /* non-blank? */
      {
         break;
      }
   }

   /* Set trailing blanks to nulls. */
   memset( pStr + i + 1, '\0', (len - i) );
}

To:

void RemoveTrailingBlanks( char *pStr )
{
   size_t      i;
   size_t      len = strlen( pStr );

   if (len <= 0)
      return;

   for ( i = (len - 1 ); i >= 0 ; i-- )
   {
      if ( *(pStr + i) != ' ' )     /* non-blank? */
      {
         break;
      }
   }

   /* Set trailing blanks to nulls. */
   memset( pStr + i + 1, '\0', (len - i) );
}

After I updated many programs and subroutines, I thoroughly tested everything out and everything worked.

Now, if you can tell how I shot myself in the foot without reading further then you get a gold star because I missed it and even after I knew there was an issue, it took me more than 30 minutes to figure it out.

So, the other day a customer emailed me saying that they did an upgrade of MQAUSX on IBM i (OS/400) and the channel crashed. They sent me the IBM i joblog and the MQAUSX logfile. Here’s a snippet of the joblog:

From module . . . :   MQAUSXO
From procedure  . :   RemoveTrailingBlanks
Statement . . . . :   5
To module . . . . :   MQAUSXO
To procedure  . . :   RemoveTrailingBlanks
Statement . . . . :   5
Thread  . . . . . :   00000141
Message   . . . . :   Pointer not set for location referenced.
Cause   . . . . . :   A pointer was used, either directly or as a basing pointer, that has not been set to an address.

I looked at it and said WTF!?! RemoveTrailingBlanks is a world’s simplest routine that I wrote something like 15 years ago. So, I opened the code in Eclipse and stared at RemoveTrailingBlanks and nothing jumped out at me. After a few minutes, I remembered my “size_t” task I did 16 months ago but still nothing jumped out.

The IBM i joblog says it is line # 5 which is the “if” statement. For those of you counting lines and saying I don’t know how to count, the first line of “size_t i” is a declaration and not a statement. I’m staring at the “if” statement thinking how could it work for 15 years and suddenly not work anymore. Then I thought, what did I change back in June 2014? I looked in my source code repository and reviewed what changed. If you look above you can see that I only changed the first 2 lines of RemoveTrailingBlanks subroutine. I changed both lines to declare the variables as “size_t” rather than “int”. Again, I stared at it thinking there is nothing wrong with the code.

And then it hit me like a freight train, “size_t” is just a “typedef” for “unsigned int”. So, do you get it yet?

No, let me explain more. An “unsigned int” variable means that it will contain whole positive numbers. So, do you get it yet?

No, let me explain more. Look carefully at the for-loop. What is the code testing for? A number great than or equal to zero (“i >= 0”). So, do you get it yet?

No, let me explain more. Look at the last part of for-loop, the code is decrementing the counter (“i–“). So, do you get it yet?

No, let me explain more. What happens when “i” is zero and it is decremented by 1? Remember, “i” was declared as “size_t” and not “int”. The next number after 0 is 4294967296 and not -1. Remember, we are dealing with whole positive numbers. Therefore, “i” will never be negative, hence, the for-loop will go off into la la land until it finds a non-blank character.

So, why didn’t this show up in my testing, well luckily or unluckily, it is a rare case where the entire string that is passed into RemoveTrailingBlanks is all blanks which happened to be what happened with this customer. 99.99% of the time, the incoming string will have at least 1 non-blank character.

Now back to the regularly scheduled banging of my head against the wall.

Regards,
Roger Lacroix
Capitalware Inc.

This entry was posted in C, Capitalware, IBM i (OS/400), IBM MQ, Licensed As Free, Linux, macOS (Mac OS X), Open Source, Programming, Unix, Windows, z/OS.

Comments are closed.