Here's a fun project I worked on to try and keep some of my skills sharp. Conway's Game of Life is a simple simulation of sorts that operates on a few very basic rules. I've been fascinated with it for a while because of how versatile it can be and the things people can create with it. I knew the rules were simple so I decided I would take a shot at creating a simple implementation. The original implementation was super simple and very easy to implement, but I wanted to take it a step further and set it up to handle much larger grids. So instead of using a single thread to process the whole array, the next best thing is to use multiple threads and decompose the problem. My first thought was to create multiple threads each frame and explicitly tell each thread which portion of the grid to work on. While this does work, the overhead for spawning multiple threads each frame caused some serious performance issues. In fact it was less performant than the single threaded version. I haven't done a lot of multi-threading work in C# and this was a good reminder that those underlying system calls take just as long in high level languages. So instead of creating new threads each frame, I needed a rudimentary job system with semi-general purpose worker threads. Originally I toyed with mutex locks to keep the workers and the control thread synchronized, but I couldn't readily come up with a good design for that. The solution that I did come up with was much easier to implement and very straight forward. During my parallel programming class we had an assignment to implement a basic ecosystem simulation where different parts of the ecosystem needed to be simulated at different stages in each frame. We used wait barriers to keep everything synchronized, and that problem was similar enough to this problem that wait barriers worked really well. With this design I only need one mutex lock for the work queue, and three barriers for each processing stage. Some careful loop placement in the worker threads allows for more work units on the queue than worker threads, and my basic job system works very well! Getting rid of the many system calls per frame, but keeping multiple threads brought the performance back up to where I wanted to see it, and now the simulation runs well at larger scales. Check out the video and the code in the article!