When a program consists of many concurrent tasks, it is often simplest to represent these tasks with independent flows of control. Concurrency is required by many applications.
In this paper, we will focus on high-concurrency parallel programs, and use the term ``flows'' instead of ``threads-of-control'' to avoid confusion with threads, which are a particular mechanism for encapsulating a control flow.
One processor can only execute one flow of control at a time. A flow-of-control may suspend its execution and give control to another flow either because of external reasons, such as an interrupt indicating the end of a time slice; or because of internal reasons, such as an explicit yield or a wait on an event or resource. A suspended flow-of-control can be resumed at a later time.
There are several methods for supporting multiple flows of control. The oldest example of this sort of concurrent programming is coroutines [27]. Some methods are supported by the OS kernel itself, including separate processes as described in Section 2.1, and kernel threads in Section 2.2. Other methods are like coroutines in that they are defined and supported entirely by user code, such as user-level threads of Section 2.3 and event-driven objects of Section 2.4. We analyze the performance of these methods in Section 4.
On multiprocessor systems, distributing execution of flows-of-control over multiple processors can exploit parallelism and thus achieve improved performance. However, the load in parallel applications may change over time, and to keep overloaded processors from bottlenecking the overall computation, we must perform load balancing [35]. Although migrating flows-of-control between processors is challenging, it presents a useful way to implement application-independent load balancing [41]. Section 3 describes obstacles to and techniques for migrating the flows of control across processors.