One annoying thing about DTrace is that there is now generally no good excuse for not finding out what really is going on whenever a puzzling system statistic rears its ugly head. In the past I would just shrug off such gaps in my understanding and move onto something “more imporant” (i.e. less soul destroying). With the old tools the “return on investment” was often just too low, but with DTrace the ROI really can be significant: not only is my understanding improving, but I’m finding new stuff to “go fix” all the time!
Jon Haslam and I were sipping coffee in a Washington DC hotel lobby, hacking together some examples for our demo session at the SUPerG 2005 conference. I was simulating a thundering herd scenario on my Acer Ferarri laptop by having 100 threads in a process doing this:
for(;;)
poll(0, 0, 10);
I’d done this kind of thing many times before, but this time I happened to notice that I wasn’t getting my expected number of syscalls or context switches, as reported by good old vmstat(1M). By some strange quirk I was seeing only about 6,000 additional syscalls per second, but context switches were up by 10,000 per second (where I naively expected the syscalls to be). Of course, I now realise that I should have expected no more than 5,000 poll(2) syscalls per second.
I will presume the reader is pretty used to the idea that Solaris rounds most timers and timeouts (I’m not going to cover CLOCK_HIGHRES timers here) up to the next clock tick. The default clock frequency is 100Hz (and has been as long as I can remember, although from Solaris 7 onwards this has been configurable) which equates to a tick interval of 10ms.
This means that the following code segment running in a single thread will generate no more that 100 poll(2) syscalls per second:
for (;;)
poll(0, 0, 9);
Although we’re asking for 9ms timeouts these will get rounded up to the next 10ms tick. On a quiet system we can expect the first call to synchronise us with the clock after one or two ticks. From then on all subsequent calls should only require one tick each (with nearly 1ms to spare for call and implementation overhead).
Likewise we can fairly safely predict that following code segment will generate no more than 50 syscalls per second:
for (;;)
poll(0, 0, 11);
When we come to a 10ms timeout it is fairly safe to predict (based on the above) that the following code should generate somewhere between 50 and 100 syscalls per second tops:
for (;;)
poll(0, 0, 10);
Of course we can only hit 100 syscalls per second if the C langauge overhead and the poll(2) syscall implementation overhead were both zero. On reflection, this obviously cannot be the case!
A second guess would be that all 10ms sleeps must take more than 10ms elapsed time, so we might expect just 50 syscalls per second maximum. However in the singlethreaded case I observe about 60 on my system (and this accords with the 6,000 I saw from 100 threads).
At this point it is important to add that I have verified that none of the poll(2) syscalls are returning early – i.e. all are taking at least 10ms.
Right now my best guess is that the gap between two consecutive clock ticks reaching the timer callout queue sometimes exceeds 10ms. Of course if this happens it is very likely that the very next tick will reach the callout queue in less than 10ms, but this too would be consistent with my observations.
To summarise. I currently believe that something like this is happening:
poll() |
tick0 |
tick1 |
tick2 |
tick3 |
tick4 |
tick5 |
tick6 |
tick7 |
9ms |
|
wakeup |
wakeup |
wakeup |
wakeup |
wakeup |
wakeup |
wakeup |
10ms |
|
|
wakeup |
|
wakeup |
wakeup |
|
wakeup |
11ms |
|
|
wakeup |
|
wakeup |
|
wakeup |
|
In the 10ms example above I’m guessing that the gap between tick4 and tick5 arriving at the timer callout queue is slightly more than 10ms. This results in two consecutive 10ms wakeups. Since I’m seeing 60 syscalls where I would expect 50, I can further surmise that this kind of thing happens about 20% of the time.
Whilst I did use some trivial DTrace scripts to verify some of the above data (e.g. the number of actual poll(2) syscalls per second) most of this post is intended to set the scene and whet your appetite for Part 2.
Next time I hope to explain how I used DTrace to find the reason for the excessive context switches I observed. This led to me filing a performance bug and opened up a host of additional questions to do timers, timeouts and ticks and how they relate to the time of day.
Tag: Solaris, OpenSolaris