Concurrency in games

I’ve been programming next-gen consoles for a while now, and I have to say: it’s not getting any easier to write games to take advantage of multi-core systems. The conventional wisdom at my company is: writing multithreaded code is hard, so “ordinary programmers” should not be doing it – let’s leave it to the senior engineers who write the systems code to get it right. Ordinary gameplay/AI/animation/etc engineers shouldn’t have to worry themselves with this stuff.

I’m currently a lead engineer (which puts me in the “senior” category – although technically at my company “lead” is not a job rank per se). Lead engineers typically have a long history in the industry, and usually have a background in tackling the hard tasks of engineering lower-level systems in games. Many leads have done rendering, some have physics experience, many have experience of architecting games, and a few (myself included) also have a networking background. Basically, these correspond pretty well to the skills that one can’t hire for love nor money. As the highest-ranking engineers who still actually write code, it falls to us to provide technical direction for the projects in our studio and our company.

I can partly see why “leave it to the senior engineers” is a reasonable stance – for example, dealing with the intricacies of memory barriers is a task I wouldn’t wish on anyone. This stuff is hard enough to get right that the fewer people who have to touch it, the better. But on the other hand, I don’t think we’re doing enough to get safe forms of concurrency into the hands of the engineers who write the game code.

We (both internally, and I think, in the wider industry) are making some progress on some systems code being multithreaded. However, it’s not great progress. Of course, one problem is that PCs are still lagging behind next-gen consoles: the min spec target, even for games looking at shipping this time next year, is still a single-core P4 based system. Hyperthreaded if we’re lucky. Multicore PCs are out there, and are getting more numerous, but don’t yet have the market share to support dropping the lower end. And of course both Xbox 360 and PS3 still have more cores than a current top-end PC (6 hardware threads and 2 PPU hardware threads + 6 SPUs respectively).

For the most part, we are still moving single systems to other threads – there is nothing general about our approach to multithreading yet, and it’s not going to scale except in a very few cases. We have managed to separate “game” and “render” threads, but both of them are still fairly monolithic. There is some hope for physics simulation being threaded, and this is mostly being worked on by companies like Havok and Ageia. There is also some work being done on lower-level AI tasks like pathfinding. Some game tasks (like culling objects against the view frustum) can be relatively easily split onto another thread. And some game tasks continue to be threaded, the way they have been for years (audio and networking spring to mind).

But the major problem area for multithreading is where the actual gameplay happens: the intersection of animation, “high-level” AI, player control and to some extent physics. When I talk about “high-level” AI I mean the decision-making and glue code that controls what entities in the game do. The main game loop of object updates has multiple ordering dependencies among these systems and updating one system for a single object often causes cascading requests to other systems. For example: animation must be posed before physics is resolved (because posing animation may cause an arm to stick through a wall that a character is standing close to, and the physics system will then apply constraints to push the arm back outside the wall). Another example is that AI will decide to change what a character is doing and therefore will drive animation blending. A third example is that AI code will often want to know whether two characters can see each other in order to decide what to do. This involves asking the physics system to do a line-of-sight trace between the objects, and the result is needed to make the AI decision, so the call is typically synchronous.

So, animation, AI and physics are often tightly coupled in the update of a single object, and the update of a single object will often touch the data of half a dozen other objects (e.g. the weapon I’m carrying, what my enemy is doing, what state my buddies are in, etc). It doesn’t take a genius to realise that this is a pretty bad state of affairs if we’re trying to split parts of these tasks on to multiple threads. And these are not lightweight tasks either: posing animation involves evaluating state machines and a bunch of matrices and quaternion calculations; a physics trace often involves a heavyweight data structure traversal (octtree, BSP or similar); AI involves complex state machines, sometimes interpreted script, and a lot of conditions (something, incidentally, that the PowerPC architecture of the Xbox 360 and PS3 is quite poor at).

So what’s to be done? Well, I have a couple of ideas designed to make it easier for “ordinary engineers” to take advantage of multiprocessing in at least a semi-safe way. The first idea is Software Transactional Memory or STM. You’ll find the literature if you care to look around the web. I think this is the only practical solution to achieving some concurrency in updating the thousands of objects that we need to, each of which touches half a dozen others. And I think we should apply it at the object level. This will require a bit of extra memory, but if done right, the performance gains could be significant. And importantly, it will scale. For the first time, I think we have the potential to update thousands of objects across as many cores as we have. Of course, I don’t yet have any practical experience, and we may find that a new bottleneck (e.g. memory access) springs up.

The second idea I have to enable more multiprocessing of mid- and systems-level code (and perhaps some higher-level code) by functional programming. I bet that more than 90% of the loops we write can be implemented in terms of map and fold. The challenge is finding a way to abstract this in C++ (much as I would like to write games in Haskell, I don’t see it being adopted any time soon – but I may well experiment with a scripting language embedded in Haskell). Imagine if AI and player control engineers could, instead of writing a synchronous for-loop to find the closest enemy, write that same functionality in terms of map and fold (which is easy) and automatically have it be distributed across cores? The beauty of this idea is that the concurrency work is done once to make map and fold themselves multicore, and then (with appropriate forethought and data constraints) everyone can take advantage of that.

So the only thing I need to do now is actually write this code, get some preliminary use cases together, try it out, and then introduce it to my team. Wish me luck…

11 Responses to “Concurrency in games”

  1. A link to STM:

    For parallelizing for loops something like Qt Concurrent is probably something you would enjoy reading:

  2. Nick says:

    How about something like Thread Building Blocks or OpenMP style parallelism as a slightly less functional alternative?

    Although I don’t know off hand if anything like this is available/viable for something like the PS3

  3. First of all: I am not a game programmer, so take my comments with a grain of salt. I know my way around concurrency, however.

    Functional decomposition will not scale far, thats for sure. Data decomposition scales way better and it looks like you have already figured that out. What I don’t understand is, why the first thing you consider when you think about data decomposition is memory barriers and transactional memory. What about the traditional techniques, such as plain old locking? Have you tried them? They are usually easier to get right and available right now (compare that to STM, which I have yet to see a production-ready system – at least if you don’t count Haskell).

    And if you think locks won’t scale, I have written about that just last week as well…

    Oh, and by the way, implementing map and fold in C++ is really easy :-), I have done so using OpenMP and the Intel Threading Building Blocks contain an implementation as well.

  4. The Beast in Black says:

    Good luck. Anyone who’s working on making better games has my respect and blessings.

  5. Michael B says:

    It just dawned on me while I was reading your post: game development is forever doomed to suffer in the dark ages of software development.

    Unlike the typical game, the lifetime of business software tends to be measured in decades. Investments in code reuse, modularity, design, clarity, portability, bug tracking, revision control and security auditing appreciate in value over the project life-cycle. In fact, the only design goal that depreciates in value is optimization.

    Every dollar paid to rewrite routines in assembler or aggressively pre-cache or preprocess datasets or tweaking/swapping algorithms is shedding value with every passing day. If Moore’s law says processing power doubles every 18 months, it means every dollar invested in performance is worth 50 cents in 18 months.

    Game development is something else. The project life-cycles are short and the game is only economically viable for a few years. Performance and features are king. Nothing else justifies the investment.

    You guys need a different kind of science, since the body of best development practices available now just does not apply to you.

    Good luck. 😉

  6. elbeno says:

    Yes, we already use plain old locks to control multithreading for the most part. You’re right, they are easier to get right, but I don’t think they’re easy enough for general use among gameplay programmers. That’s the solution I’m looking for, and perhaps it doesn’t really exist, but my hope is that STM and/or a functional style will add something useful to our current mix of techniques. I’m not looking to throw the baby out with the bathwater.

    Thanks for the wisdom – I’ll be keeping your locking article in mind as I deal with locks.

  7. Erik says:

    AI involves complex state machines, sometimes interpreted script, and a lot of conditions (something, incidentally, that the PowerPC architecture of the Xbox 360 and PS3 is quite poor at).

    Perhaps it would be more accurate to say “something that the in-order-execution cpu architecture of the Xbox 360 and PS3 is quite poor at.”

  8. Dru Nelson says:

    2 things.

    1st. your comment form doesn’t have labels for us non-openid enabled types.

    2nd. A long long time ago, I was writing a game and was lured by threads and the idea that the code should look clean. In the end, I
    discovered the hard way that actual games didn’t have concurrency or
    even good timer support. Your post tells me that things haven’t changed
    much. IHMO, functional might get you so far, but I think we’re going
    to see some changes hardware wise that will make things easier to
    scale. For example, I think we’ll see more support for message
    passing in hardware. At any rate, these are interesting and exciting
    problems to be thinking about.

  9. Anwar Ghuloum says:

    Very interesting article…this echos a keynote that Tim Sweeney (Epic) gave at POPL 2006. Have you seen that?

    Looking at what patterns of parallelism prevail in these applications is a good guide to where we should invest our efforts in parallel programming development in the near term (which will be a major effort).

    We’ve got a couple of blogs at Intel around technologies that we’re developing to address this problem:

  10. John says:

    Interesting comments you make. What you’re saying mirrors the sentiments of Tim Sweeney (the Unreal Engine guy) in a talk he gave last year at POPL 2006 titled “The Next Mainstream Programming Language”. You can take a look at the accompanying powerpoint presentation here:

    Most of the contents of that presentation revolve around what new features are needed for game development. As such, it would be worth your while to read.


  11. Douglas Brebner says:

    Isn’t part of this problem due to using non-concurrent languages with concurrency crudely grafted on to them?
    It might be better to look at languages designed for this, like Erlang.

Leave a Reply