Thursday, March 1, 2018

Use of localtime(), localtime_r() and their performance implications

It's about time

I recently troubleshooted a multi-threaded program suffering from performance issues. Specifically, threads that were suppposed to be busy spinning and clocking 100% CPU were only doing about 30%. It turns out that call to localtime() is the culprit, and the problem is resolved by replacing it with localtime_r().

I wrote a small test program to illustrate the scale of the problem. The program creates 3 threads, each of which is pinned to a CPU core and calls localtime() or localtime_r() 100K times. The latter is more than 4x faster. Here is the code and output:




vagrant@ubuntu-xenial:/vagrant_data/temp$ time ./localtime 
thread exiting ...
thread exiting ...
thread exiting ...
main exiting... 

real 0m0.894s
user 0m0.384s
sys 0m0.724s

vagrant@ubuntu-xenial:/vagrant_data/temp$ time ./localtime _r
thread exiting ...
thread exiting ...
thread exiting ...
main exiting... 

real 0m0.193s
user 0m0.264s
sys 0m0.112s



The Investigation


It was puzzling why localtime() was so much slower compared to its thread-safe version. Indeed, if you look at the man page excerpt of localtime() and localtime_r() below, it doesn't seem to suggest so. In fact, the observation that localtime_r() runs faster seemed counter-intuitive at first - shouldn't the thread-safe version run a bit slower than it's non-thread-safe brother due to the added serialization? (Granted, thread safety is achieved through the use of a user-provided, instead of a global, result buffer).


The localtime() function converts the calendar time timep to broken-down time representation, expressed relative to the user's specified timezone. The function acts as if it called tzset(3) and sets the external variables tzname with information about the current timezone, timezone with the difference between Coordinated Universal Time (UTC) and local standard time in seconds, and daylight to a nonzero value if daylight savings time rules apply during some part of the year. The return value points to a statically allocated struct which might be overwritten by subsequent calls to any of the date and time functions. The localtime_r() function does the same, but stores the data in a user-supplied struct. It need not set tzname, timezone, and daylight.


The answer, of course, lies in the source code! Both functions call __tz_convert(), which is the source of the problem. Here are the source code: localtime.c and tzset.c.
First, there is a critical section in __tz_convert() protected by a mutex (__libc_lock). Who'd have thought that!


  __libc_lock_lock (tzset_lock);
 
  /* Update internal database according to current TZ setting.
     POSIX.1 8.3.7.2 says that localtime_r is not required to set tzname.
     This is a good idea since this allows at least a bit more parallelism.  */
  tzset_internal (tp == &_tmbuf && use_localtime);
 
  if (__use_tzfile)
    __tzfile_compute (*timer, use_localtime, &leap_correction,
                      &leap_extra_secs, tp);
  else
    {
      if (! __offtime (timer, 0, tp))
        tp = NULL;
      else
        __tz_compute (*timer, tp, use_localtime);
      leap_correction = 0L;
      leap_extra_secs = 0;
    }
 
  __libc_lock_unlock (tzset_lock);


Second, the critical section calls a function tzset_internal(), which is quite expensive. localtime_r() calls this function only once, but localtime() calls it every time. In the code snippet below, the always flag is true for localtime(), but false for localtime_r().


/* Interpret the TZ envariable.  */
static void
tzset_internal (int always)
{
  static int is_initialized;
  const char *tz;
 
  if (is_initialized && !always)
    return;
  is_initialized = 1;
  
  ...
  ...
}


At this point, it is easy to see why localtime() runs slow - more code being executed, and why thread's CPU usage drops significantly - longer critical section leading to more contention. Looking back at the man page description, it hinted that localtime_r() "need not set tzname, timezone, and daylight".

Lesson learnt


Latency-sensitive multi-threaded applications may want to do away with localtime_r() entirely due to the use of mutex. If timezone conversion is needed, the application can call localtime_r() once upon initialization to obtain local time T0, and subsequently call the much cheaper time() again to obtain a delta, then apply the delta to T0.

1 comment:

  1. Hey Jia, You are not comparing equivalent things though. localtime_r() by not calling tzset() internally, makes it necessary for us to make a call to tzset() before calling localtime_r() if we want to get the same result as localtime(). The question is whether it is necessary to call tzset() everytime we call localtime_r() or is it sufficient to just do so once at the start of the application

    ReplyDelete

Using Google App Script and Google Sheets to create a class sign-up system

G Suite and Google App Script I've been learning to use G Suite and Google Apps Script lately. G Suite is a neat little platform that, a...