In this example i've got a massive dataset stored (once) in memory with a year's worth of data. I want to get totals for each day's worth of this. So, what i have is 365 (or 366) discrete 'tasks' that need doing, and these are a prime candidate for multithreading because they only *read* from a shared data pool. If they wrote to a shared data pool then it'd be more complicated and possibly not as worthwhile. So here's how I implemented the 'to-do list' queue. First up, here's a class that represents a single task:
class Task
{
  public byte[] sharedData;
  public DateTime day;
  public void Run()
  {
    ... do the processing here, using the shared data and the day ...
  }
}
Then i'll need a class that maintains a list of tasks:
class ToDoList
{
  public byte[] sharedData;
  List<Task> toDoList;

  public void Go()
  {
    // Fill the to-do list with a task for every day in the year
    toDoList = new List<Task>();
    for (DateTime d = new DateTime(2008, 1, 1);
         d.Year == 2008;
         d = d.AddDays(1))
    {
      Task t = new Task();
      t.day = d; // The one particular day we want
      t.sharedData = sharedData; // The year's worth of data
      toDoList.Add(t); // Add the task to the to-do list
    }

    // Run 2 threads per logical processor (a physical cpu with
    // hyperthreading enabled counts as 2 logical processors)
    Thread[] threads = new Thread[Environment.ProcessorCount * 2];
    for (int t = 0; t < threads.Length; t++)
    {
      threads[t] = new Thread(Runner); // Make the new thread
      threads[t].Start(); // Start it running
    }

    // Wait for the threads to finish all the tasks
    for (int t = 0; t < threads.Length; t++)
    {
      threads[t].Join();
    }
  }

  /// <summary>
  /// This is the thread runner that does the work
  /// It basically eats its way through the to-do list,
  /// returning when there's nothing left on the list.
  /// </summary>
  void Runner()
  {
    while (true)
    {
      Task task;

      // Try find a task to do
      lock (toDoList) // Lock the task list while grabbing
                      // a task so other threads cant touch it
      {
        if (toDoList.Count == 0) return; // nothing left to do so quit this thread
        task = toDoList[0]; // grab it from the head of the list
        toDoList.RemoveAt(0); // remove it from the list
      }

      // Run the task just grabbed from the list
      task.Run();
    }
  }
}
The 'Go' function creates a to-do list, fills it with tasks, then starts all the threads that start eating through the tasks. Each of these threads runs as the 'Runner' function, which takes an task off the list, and runs it, then repeats until there's nothing left. Notice that the only 'lock' needed this way is on the to-do list, which keeps things simple. You may notice that i'm 'overdriving' the processors - with 2 threads per processor. I'm doing this just for the hell of it really. And here's how i call it:
class Program
{
  static void Main(string[] args)
  {
    byte[] data = File.ReadAllBytes("MyYearsData.dat");
    
    ToDoList todo = new ToDoList();
    todo.sharedData = data;
    todo.Go();
  }
}
Further thoughts: You may also consider if each task has an IO bound section, and then after the IO, it becomes CPU bound for a while. If this is the case, then you probably want multiple threads per processor to take full advantage of everything.

Thanks for reading! And if you want to get in touch, I'd love to hear from you: chris.hulbert at gmail.

Chris Hulbert

(Comp Sci, Hons - UTS)

Software Developer (Freelancer / Contractor) in Australia.

I have worked at places such as Google, Cochlear, Assembly Payments, News Corp, Fox Sports, NineMSN, FetchTV, Coles, Woolworths, Trust Bank, and Westpac, among others. If you're looking for help developing an iOS app, drop me a line!

Get in touch:
[email protected]
github.com/chrishulbert
linkedin



 Subscribe via RSS