Telekom Malaysia Bhd Billing Bug Explained

A lesson for Computer Science Students

WARNING: this page is the creation of a wandering mind thinking of ways to make bugs in their own software more real to Computer Science Students.

It is amazing how fast modern computers are. Just think, in one second, a modern computer can multiply over a million times, a single human mistake. And if it gets on the Internet, no matter how embarrassing it is, it will never be forgotten.
Gilbert Healton

An Embarrassing Software Bug

The Associated Press ran a story in 2006-April about a Malaysian man whose father died the previous December. The son, Yahaya Wahab, nicely settled with the Telekom Malaysia Bhd. in January to close his father's account. But the company computers decided more money was due. 806,400,000,000,000.01 ringgit (8,064 x1020 ringgit, or $218 trillion) to be exact. The final touch was that he had 10 days to pay up or face collection action. Looking at the CIA Factbook, Malaysia had a 2005 Gross Domestic Product of $287 billion. For reference, the GDP of the United States was $12.4 trillion and the world as a whole $60.6 trillion.

Thinking About The Problem

As a professional 30-year software developer with an interest in time and date problems (see The Best Of Dates, The Worst Of Dates for hints), I've been wondering with might cause this particular problem. High on my list was the fact that 2005-December-31 had a leap second. Some operating systems handle these correctly, others don't. Some applications handle these correctly, others don't. If you run a problem application under an OS that knows that it is doing you are asking for trouble. Regardless of cause, it is a good example of how easy it is to write software bugs.

Letting my mind wander on this problem came up with the following tidbits beyond the leap second issue:

Recreating The Problem

Putting all of the above together I came up with a C program to show my way of forcing the error.

Computer Science students are urged to understand what every part of this program does. I've seen similar problems in countless programs. The "carry" condition is a repeat offender, in and out of dates. If you don't understand the bug herein you are very likely to repeat it, somewhere.

/* reproducing the Telekom Malaysia bug */

  /* DISCLAIMER: 
     I have no idea if something like this was the 
     problem, and have not checked. Just having fun 
     showing software people how easy it is to make 
     date bugs. */

  /* NOTE: this program was not written until it was noticed
     that the billed amount (RINGGIT) was close to 2 to the 64
     (2^64 is shorthand used in documentation, though it is
     NOT VALID C!). As this is very close to time_t values 
     on some newer computers I began to wonder about it.  */

#include <stdio.h>
#include <stdlib.h>

#define RINGGIT 8.064e20             /* amount billed */

typedef unsigned long long time_64t; /* simulate 64-bit time_t */
   /* as a 64-bit time_t is needed to express this bug, but
      the compiler/OS this sample is being run on may still be 
      using the historic 32-bit version, we make our own typedef 
      that forces 64-bits. */

int main ( void )
{

			/* starting and ending times of phone call */
			/* assuming a call started close to midnight 
                   of 2004-Dec-31 and then went just one 
                   second into 2005-Jan-01. A one-second interval
                   immediately before or after midnight might 
                   have been the cause of the bug. Thus this 
                   program only looks at that one second,
                   ignoring call time on the other
                   side of midnight. */
    time_64t     startt   = 1;
    time_64t     endt     = 0;
               /* (assume leap second did something silly) */

		  /* calculate length of call we are interested in */
    time_64t      minutes = endt - startt;
          /* unsigned numbers are often used in integer
             calculations that are known to be non-negative 
             as they allow values twice as large their signed 
             counterparts to be used without any additional cost.

             Such unsigned values work great unless some special 
             exception, such as a leap second, does something 
             strange to result in a calculation that should yield
             a negative result (e.g., 0 - 1). But for unsigned 
             values the resulting carry produces a really 
             huge negative value. */

			/* calculate a cost per-minute from givens */
    float     costPerMin = RINGGIT / minutes;

    float     bill;		/* calculated billed amount */

    bill = minutes  * costPerMin;	/* calculate bill */

    printf( "costPerMin %.3f\n", costPerMin );
    printf( "Minutes %llu (0x%llx)\n", 
                       minutes, 
                              (long long)minutes );
    printf( "Please pay %.2f\n", bill );

    exit( EXIT_SUCCESS );
}

The exact output you get on the Please Pay line depends on the run-time library of the compiler being used. Some will be smart enough to know that a float just doesn't have that many significant digits in it and start throwing zeros out once the available precision is reached. Others are not that smart and keep doing binary to decimal conversions on the floating point number even though only garbage is being produced. It looks like Telekom Malaysia has a smarter library.

I'll bet that even if Yahaya Wahab paid up the computers at both the phone company and banks would of expressed their own bugs over the amount.

How To Avoid Such Errors

First, be sure you fully understand date and time processing. Just because you use dates and times every day of your life does not mean you understand them, especially to the detailed levels used by computer software.

Second, be sure your application software handles leap years and leap seconds in the same way the OS it runs on does.

Test test test. Especially the boundary conditions. A five second range is best: two seconds before, one second before, right on, one second after, and two seconds after. Start and stop times must use all permutations and combinations of these.

On important issues it is important to have the software trap values exceeding some sanity threshold for manual review.

Think.

[]


 


 

   ============================================================

   ============================================================
[home] / [y2k]y2k
[AnyBrowser]
NetMechanic HTML Code Excellence Award
.http://www.exit109.com/~ghealton/y2k/TelekomBug.html  
 Hits since 2006-12-25: [unavailable]  $Id: TelekomBug.hmac,v 1.4 2007/08/26 22:16:41 ghealton Exp $
Last formatted 2007-08-26
(Disclaimer)