next up previous
Next: Stack Size Up: Performance Previous: Performance

Number of Flows

We measured the context-switching performance of four different implementations of flows of control.

We ran our experiments on a variety of machines. We report context switch times as the time per flow of control per context switch.

Our experiments have shown there is a wide variation in the limitations and performance of these methods on different machines. In general, the user-level threads (Cth) on most of these machines have the fastest context switch time except on IBM SP and Alpha machines. On these machines except IBM SP, the context switch time of the user-level threads tends to increase slowly as the number of flows increases.


Table 2: Approximate practical limitations (on stock systems) for various methods to implement flow of control.
Flow of control Limiting Factor Linux Sun IBM SP Alpha Mac OS IA-64
Process ulimit/kernel 8000 25000 100 1000 500 50000+
Kernel Threads kernel 250 3000 2000 90000+ 7000 30000+
User-level Threads memory 90000+ 90000+ 15000 90000+ 90000+ 50000+


Figure 4: Context switching time vs. number of flows on a x86 Linux machine.
\includegraphics[width=3.2in]{fig/contextswitch/x86/plot}

Figure 5: Context switching time vs. number of flows on a Mac Apple G5 machine.
\includegraphics[width=3.2in]{fig/contextswitch/mac/plot}

Figure 6: Context switching time vs. number of flows on a Sun Solaris machine.
\includegraphics[width=3.2in]{fig/contextswitch/sun/plot}

Figure 7: Context switching time vs. number of flows on an IBM SP machine. This is a 16-way SMP node. We believe the low times for processes and threads are due to the OS ignoring our repeated sched_yield() calls.
\includegraphics[width=3.2in]{fig/contextswitch/sp/plot}

Figure 8: Context switching time vs. number of flows on Alpha machine. This is a 4-way SMP node. Again, process and thread switching numbers may be unrealistically low.
\includegraphics[width=3.2in]{fig/contextswitch/alpha/plot}

Table 2 illustrates approximate practical limitations on. stock systems. It shows the approximate maximum number of processes a user can create on a processor and the maximum number of threads a user can create in a process. As we can see, an unmodified Linux Red Hat 9 machine can spawn less than 256 pthreads in one process; while the per-user process limit on our IBM SP was only 100 processes. Both of these limitations can be extended with only a small amount of system administrator effort, but this effort is likely beyond the reach of a typical parallel user. In general, processes and kernel threads were limited to a few thousand, with only the Alpha allowing more than 5000 threads at a time and IA-64 without such limitation. By contrast, we could create tens of thousands of user-level threads easily on all platforms.


next up previous
Next: Stack Size Up: Performance Previous: Performance
Gengbin Zheng 2006-03-18