How do you assign work to threads? This episode explains several ways you can think about this and when to use them.
You’ll learn about the following four ways to divide work to be done by your threads:
- When you have small work items arriving at different times, then assign a task to a thread as needed. You’ll probably want to use a thread pool so that you can avoid the extra overhead of creating and destroying threads for each small task.
- When you have a bunch of small tasks that are ready to be processed, just divide them up into smaller groups and assign each group to a thread.
- If it’s not clear how to divide a task or when you have several tasks that could take different amounts of work to complete, then consider a queue where a thread can initially take a piece of work from the queue and either decide to complete the task or divide the work into smaller tasks and put each smaller part back into the queue.
- And if you have large work items arriving at different times, then consider splitting the work itself into fixed stages and assigning the tasks to a queue for the first stage. Then a thread can take a task out of the first stage queue, perform just its part related to the first stage, and then put the task into another queue for the second stage.
Listen to the full episode or you can also read the full transcript below.
Transcript
You’re all eager to use multiple threads but for what purpose? Assuming you have enough work for multiple threads, how do you divide the work between your threads? You have some choices that will depend a lot on your specific circumstance.
Here’s a description of various scenarios and how you might approach dividing the work among threads for each scenario. This will be a short episode focused on four different scenarios and suggestions for each about how to manage your threads.
Let’s say that you have a line of messengers waiting to deliver internal memos and collect other memos for delivery. Maybe this is the mail room for a large company located in a high-rise office building. Every so often, you send one of the messengers to a specific floor to collect any memos that need to be sent to another employee. The messenger comes back with a handful of papers. You then start handing one memo at a time to a waiting messenger.
You can use this approach with threads too. If you have a series of small tasks that keep arriving, then hand each task to a waiting thread. Usually, you’ll want to use a thread pool for this instead of creating new threads each time and destroying them when done with each task. A thread pool acts like that line of messengers and when a thread is done with its task, it goes back into the pool of waiting threads.
Other times, you might know up-front exactly how many tasks you need to perform. Let’s say you have a thousand tasks to perform. This is sort of like the previous example only instead of the tasks arriving bit by bit, you have all of them to work with at once. This scenario allows you to plan how many threads to use based on the available number of cores. If you know that you can efficiently run four threads at the same time, then give the first 250 tasks to the first thread, the second 250 to the second thread, etc. This will divide all thousand tasks between the four threads. You may not need to use a thread pool in this case since you’re taking a more active role in calculating how many threads you want and giving each thread a major piece of work to perform.
I’ll explain two more scenarios right after this message from our sponsor.
( Message from Sponsor )
The next scenario is a bit like the second where you have all the items available. The difference is that in this case, the tasks aren’t in easily identifiable pieces. In other words, you may not have 1000 separate tasks where you can just split them into even groups. Actually, you may want to follow this suggestion even if you do have separate tasks whenever the time required for those tasks could vary.
Imagine you’re one of four dishwashers assigned to wash the pots and pans for several chefs. Your supervisor notices that since there are four dishwashers and four chefs, it seems like a great idea to assign one dishwasher to each chef. Seems reasonable, right? It would be if your chef would only stop burning every single pot and pan. You quickly figure out that you got a bad deal in this arrangement. This third suggestion could help.
The idea here is to start dividing the work into smaller chunks and putting those chunks into a queue. Then as threads are available, they can take one of the chunks and either do the work or decide to split it into smaller chunks and put the smaller chunks back into the queue.
This allows you to recursively divide the work without creating a new thread for each division. It keeps a limited number of threads busy and at the same time reallocates the work so that you don’t have some threads sitting idle at the end while others are struggling to finish their task.
And the last technique that I’ll describe is great when the tasks themselves are each long and involve many different steps. Just look at any modern car manufacturer to get an idea of where this suggestion is going. Building a car from start to finish involves many different and specialized steps. So instead of assigning one or more workers to build a single car, a more efficient method is to use an assembly line. This moves unfinished cars from one step to another. the workers at each step perform their specialized task and the car moves along to the next step.
You can do the same thing with threads and this approach is also great when you don’t know how many tasks you’re eventually going to need to complete.
Each time you receive a task, just add it to a queue for the first step. Then a thread, maybe from a thread pool, will take that task out of the queue, perform the first set of calculations or the first piece of work on the task and put it into a queue for the next step. This frees up the thread to go get another task out of the first queue. You can have separate thread pools for each step or a single pool of threads that can change their behavior to match the type of work that needs to be done. Or you can keep it simple and have just a single thread for each stage.
This arrangement is called an assembly line in real life. In programming terms, it’s often called a pipeline. Modern computer processors use pipelines to increase their performance. A processor can do specific tasks during each clock period. But that doesn’t stop them from preparing to fetch one memory value, reading in another memory value, figuring out what to do with the memory value read during the last clock, and actually running the instruction read two clock period ago. By doing all these things at the same time, a modern processor can execute a new instruction at almost every clock period.