XNA + Reactive Extensions = Input handling like you’ve always dreamed of

Implementing input handling in a generic, extensible way is a deceptively tough cookie. Sure, you can decouple your inputs from your commands, and maybe you can detect when keys are held down at the same time… But the more complex your requirements get, the more frustrating the code can be. Several times I have tried to define a class hierarchy based on composition and the decorator pattern in order to ‘process’ and then ‘match’ different combinations of key presses. Every time I have ended up with something that feels half-baked, that doesn’t quite fit my requirements or might break under certain circumstances… But not any more! Let me introduce to you… Reactive Extensions!

Rx: Why you should be excited

The first thing to know about Reactive Extensions (Rx) is that it is a different model of programming to the OOP you may know already. In fact, scratch that, it’s a different model of thinking. It borrows heavily from Functional Programming, and has been described by the team behind it as the ‘dual’ to the IEnumerable monad of .NET. At first glance, yes, it is can be horribly confusing – but gradually, as you begin to see the light, you will become overawed with wonder at this joyously simple model of event processing. It takes time to learn: take that time. Make the investment. You won’t be sorry.

To get your mouth watering, here is some sample code from my input module that rotates the camera when Shift and RMB are held and the mouse is moved left or right – note that any consumer of the InputStreamProvider can create input commands like this:

var keys = inputStream.KeyboardStream;
var mouse = inputStream.MouseButtonStream;
var mouseMove = inputStream.MouseMoveStream;
var shift = keys.Key(Keys.LeftShift);
var mouseRight = mouse.Button(MouseButton.MouseRight);

// here's the moneyshot:
var shiftAndRMBDrag = shift.Held(mouseRight.Held(mouseMove));

shiftAndRMBDrag
    .Subscribe(mm =>
    {
        var camera = cameraService.Active;
        camera.Rotation += mm.Delta.X * 0.01;
    });

It’s an incredibly powerful paradigm: in just one line – shift.Held(mouseRight.Held(mouseMove)) – I have described a behaviour that would otherwise involve all sorts of state-based processing. “Described” is a key word here – this is declarative programming at its finest.

Now that you’re foaming at the mouth, I’ll explain my approach. Please note that I am not going to attempt to explain Reactive Extensions – just how I used them to implement the above. To explain it fully would take far more than a blog post – if you aren’t familiar with Rx, please get to grips with it through the multitude of resources available on the internet. I learnt all of this in my spare time over roughly two weeks – it’s easily doable. Read more:

http://www.introtorx.com/
http://stackoverflow.com/questions/tagged/system.reactive
http://rxwiki.wikidot.com/101samples
http://sugarpillstudios.com/wp/?page_id=279

2 weeks later…

Welcome back – I assume you have now consumed all of the Rx knowledge that you can possibly acquire and are ready to dive in! I am developing on XNA in these samples, which uses poll-based input (i.e. you have to ask the game for the state of the input). Rx can easily work with either poll-based or event-based frameworks.

I will cover the keyboard processing from start to finish here – it’s not even a great deal of code. Mouse processing happens in exactly the same way, with added input events for movement and scrolling.

KeyboardStateProvider

This is a simple wrapper around Keyboard.GetState() for the purposes of unit testing and encapsulation. Ready? Here it is:

    public class KeyboardStateProvider : IDeviceStateProvider<KeyboardState>
    {
        public KeyboardState GetState()
        {
            return Keyboard.GetState();
        }
    }

“Wow”. The reason I’m showing you this is because this is what I termed a IDeviceStateProvider<T>, which is used in the following sample. This is where the real ‘meat’ of Rx becomes apparent.

RawInputStreamProvider

The responsibility of this class is to provide a stream of IEnumerable<Keys> that we can operate on later. I have removed the mouse processing stuff so you can get a good look at the logic that the class is performing on the keyboard state – it’s really not that much, but there are some Rx concepts to note that I have marked in the border. Everything else is basic C#:

public class RawInputStreamProvider
{
    private readonly IDeviceStateProvider keyboardStateProvider;

    public RawInputStreamProvider(IDeviceStateProvider<KeyboardState> keyboardStateProvider)
    {
        this.keyboardStateProvider = keyboardStateProvider;
    }

    public IObservable<IEnumerable< KeyboardStream { get; private set; }

    private IObservable<T> CreateRawStateStream<T>(IScheduler scheduler, IDeviceStateProvider<T> stateProvider)
    where T : new()
    {
(1)     return Observable.Generate(
            new T(),
            pressed => true,
            pressed => stateProvider.GetState(),
            pressed => pressed,
            pressed => TimeSpan.FromMilliseconds(20),
            scheduler)
(2)     .Publish()
        .RefCount();
    }

(3) public void Initialize(IScheduler scheduler)
    {
        var keyboardStateStream = CreateRawStateStream(scheduler, keyboardStateProvider);

(4)     KeyboardStream = keyboardStateStream
            .Select(kbs => kbs.GetPressedKeys())
            .DistinctUntilChanged();
        }
    }
}
  1. This is where we create the observable. The Generate function has a lot of unnecessary arguments so we discard most of them. Our initial state is a new T() – this is just a blank KeyboardState. We want the sequence to be infinite, so simply return true for the condition parameter. The third parameter is what is produced on every iteration – we want to return the call to GetState() on our device provider. The fourth parameter is also discarded. The fifth parameter represents how frequently we poll the keyboard – I have hardcoded 20 milliseconds here, just cuz. Finally, we provider a scheduler – this is important because XNA input calls must be made on the same thread as the Game instance was created. You can get this scheduler by calling
    var scheduler = new SynchronizationContextScheduler(SynchronizationContext.Current);
    in your Game’s constructor.
  2. Now that we have our stream, we want to publish it, otherwise the GetState() will be made for every subscriber – pointless when it will return the same value. In order to keep it “hot” we then call RefCount() – see here for more info.
  3. We can’t actually subscribe to these events until we have the scheduler, and we don’t have the scheduler until the Game instance is created – so an Initialize call is provided to set up the streams.
  4. We translate every KeyboardState event into an enumeration of its keys via the Select method, and discard duplicates (we only react when there’s a change)

So now we have an observable that is churning out collections of pressed keys, which we can subscribe to and act on. That’s pretty nice, but I think we can go further…

InputEvents

I define an InputEvent to be either:

  • A key being pressed
  • or a key being released…!

To that end, I defined this simple structure:

public enum InputEventType { Press, Release }

public class KeyboardEvent
{
    public Keys Key { get; private set; }
    public InputEventType Type { get; private set; }

    public KeyboardEvent(Keys key, InputEventType type)
    {
        Key = key;
        Type = type;
    }
}

We want to produce one of these structures whenever an input occurs. To do this, we need to do some more processing on the event pipeline.

Creating KeyboardEvents

In this next piece of code, we have taken the RawInputStreamProvider as an argument (rawInput):

    KeyboardStream = rawInput.KeyboardStream
        .Zip(rawInput.KeyboardStream.Skip(1), DeriveKeyboardEvents)
        .SelectMany(i => i)
        .Publish()
        .RefCount();

Bam. There it is. Well, that’s a lie – that’s not quite all of it, I am using a helper function too:

private IEnumerable<KeyboardEvent> DeriveKeyboardEvents(IEnumerable<Keys> previous, IEnumerable<Keys> current)
{
    var pressed = current.Except(previous)
        .Select(i => new KeyboardEvent(i, InputEventType.Press));

    var released = previous.Except(current)
        .Select(i => new KeyboardEvent(i, InputEventType.Release));

    return pressed.Concat(released);
}

I’ll now explain the above:

  1. We take the existing stream from the RawInputStreamProvider
  2. We then pair each yielded collection with its successor, and pass the resulting two key collections to DeriveKeyboardEvents. This function uses traditional Linq-To-Objects to figure out which keys were pressed and which were released, creating the respective KeyboardEvent for each then concatenating the results together. What we get out is an IEnumerable<KeyboardEvent> which may contain Presses, Releases or both.
  3. We use the powerful SelectMany operator to convert each collection into a sequence and flatten the sequence of sequences.
  4. We use the same Publish/RefCount trick as earlier to stop multiple subscribers from re-evaluating every function

That’s It.

Yep. The relatively miniscule amount of code above is all that’s needed to form the basis of an exceptionally powerful event handling paradigm. The methods I showed you at the top are simple extensions that could be defined like so:

public static IObservable<KeyboardEvent> Press(this IObservable<KeyboardEvent> source)
{
    return source.Where(i => i.Type == InputEventType.Press);
}

public static IObservable<KeyboardEvent> Release(this IObservable<KeyboardEvent> source)
{
    return source.Where(i => i.Type == InputEventType.Release);
}

public static IObservable<KeyboardEvent> Key(this IObservable<KeyboardEvent> source, Keys key)
{
    return source.Where(i => i.Key == key);
}

public static IObservable<KeyboardEvent> Held(this IObservable<KeyboardEvent> held, IObservable<KeyboardEvent> whenHeld)
{
    return held.Select(i => i.Type)
        .CombineLatest(whenHeld, (Held, WhenHeld) => new { Held, WhenHeld })
        .Where(x => x.Held == InputEventType.Press)
        .Select(x => x.WhenHeld);
}

…which can then be chained together to form the powerful expressions seen in the first example.

If you’re not quite there yet, don’t worry. It took me a while to truly get a grasp on things, and I’m still trying things out, learning what works and what doesn’t. Fortunately for us, a whole wave of Rx experts came before us and thought all of this stuff through very thoroughly, and a fair few of them visit stackoverflow. Help is at your fingertips!

Think bigger

This is only the tip of the iceberg for the use of Rx in games. Think of all the complex logic handling that can be done by treating events not as one-off occurrences but as repeated streams – collision reactions, scripting, AI… All can be modelled as elegantly as the input handling above, and more excitingly, they can be combined together in fantastic ways!

A hybrid Entity-System-Component architecture

Component-Based Architectures

Most people who are interested in engine design and architecture have encountered the concept of ‘Component’-based design, the general principle of which is “favour composition over inheritance”. This is not a game-specific principle, but one of the central tenets of object-oriented design.

If you are not familiar with the idea, consider the ‘traditional’ game hierarchy in which everything might derive from a GameObject class. Directly inheriting from this class, you may have a MoveableObject class and a DrawableObject class. But what if you want a moveable, drawable object? You now need to inherit from either of the two subclasses to make a MoveableDrawableObject. The more aspects of functionality your object gains, the deeper your hierarchy goes, until you have something that looks like GameObject->DrawableGameObject->MoveableDrawableGameObject->Vehicle->VehicleWithAWeapon->Tank. That’s already a pretty deep hierarchy just to describe something as generic as a tank – we haven’t even gone into any kind of specifics yet.

The proposed solution is to compose objects rather than build them using class inheritance. While this brings a certain level of indirection, it also greatly simplifies and flattens the hierarchy. There are also other benefits, such as being able to dynamically add or remove functionality at runtime: you want your menu buttons to bounce around when the game is idle? Simply add a PhysicsComponent to each of them and let the engine do its work. The ‘seminal’ texts on this subject are Evolve Your Hierarchy and the T=Machine series – check these out for a grounding in the basics.

It’s a powerful paradigm, but many people get stuck on implementation. Over the years, I have written and re-written several iterations of an engine based on this approach, in C++, Python and C#. Every time I seem to get a little bit further, but then the nuances of a problem catch me out and it’s back to the drawing board. The chief culprit tends to be ‘how do components interact with each other?’ which is what I hope to address here.

Dependency Hell and other minor inconveniences

The first approach was a fairly simple one, as described in the CP and T=Machine articles: each aspect of functionality is contained within a component, which are then brought together as an Entity (a GameObject). There is no inheritance hierarchy, and the Entity class is simply a list of components with an ID string attached. So, for a simple platformer, we might have:

  • PositionComponent
  • RenderComponent
  • PhysicsComponent
  • InputComponent
  • CollisionComponent

This looks fine, but look a little deeper and you will notice all kinds of dependencies between these components:

  • Collision, Rendering and Physics all rely on the Position component
  • Collision and Physics are related but perhaps separate enough to warrant their own component (perhaps you don’t always want a physical collision response)
  • Input may affect Physics (hitting the Jump button)

The more components you have, the more complex these dependencies become. You could create all sorts of rules like “if Entity has a RenderComponent it must also have a PositionComponent”, but this approach is hardly manageable – the whole point of the component architecture is to keep your code modular, right?

If in doubt, extract it out

After wrestling with these dependencies, it became more and more clear to me that the solution would be to extract the behaviour out of components and into an external manipulator. Apparently I was not the only one who had been thinking this way: after scouring the internet for more resources, I found this thread on a Java-based framework named Artemis that seemed to be doing exactly what I had been. While a little miffed that I had been beaten to it, I at least felt comfortable that my problems were shared and that I had come up with a similar answer.

So now we are using a data-driven approach. The difference is this: Components are now simply data which external Systems control (this is a fairly poor, undescriptive term – but interestingly one that people seem to come up with independently. I have also seem “Attribute/Behaviour” which I think are more descriptive but for the sake of canon I will use “System”).

The RenderComponent becomes a SpriteComponent, which the RenderSystem will use along with a PositionComponent to render the entity to screen. It is still possible to do everything implicitly – an Entity qualifies for rendering if it has these two components available, and the RenderSystem will check every new entity (and listen for component changes). Similarly the PhysicsComponent now merely has physics data (velocity, acceleration, forces) which are manipulated by the PhysicsSystem. Hooray, we have completely separated our concerns!

Do I know you…?

But wait… Isn’t all of this a little familiar? Separation of data and logic feels a bit procedural, no? We seem to have completely eschewed OOP in favour of its predecessor! Well, that is not quite true. Our systems can still benefit from OOP principles – the InputSystem will need to rely on an InputProvider and probably an InputMap and a Command structure etc, etc. It’s just that our game entities themselves will be split up into different systems and components. This is actually another OO principle: the single-responsibility principle. Having one megaclass to represent the rendering, movement, AI decisions, amount of HP, etc. is poor OOP!

The Answer To Life, The Universe And Everything

Now, this is where most people stop – the Entity-System-Component (ESC) architecture is generally implemented as above, where Systems are pure logic and Components are pure data with a flat hierarchy. But I have found a more appealing idea in a hybrid approach. Components themselves may have a (small) inheritance tree and therefore perform some logic.

My favourite example is rendering. In normal ESC, how do you differentiate between rendering a sprite and rendering some text? Would you have a RenderableSpriteComponent and a RenderableTextComponent? Would you need a different system for rendering sprites and text? That sounds slightly ridiculous!

My solution is to have a RenderableBaseComponent, which looks like this:

public abstract class RenderableBaseComponent : IComponent
{
    public abstract void Render(SpriteBatch spriteBatch, Vector2 position);
}

The base class then has two subclasses:

public class RenderableSpriteComponent : RenderableBaseComponent
{
    public Texture2D Sprite { get; set; }

    public override void Render(SpriteBatch spriteBatch, Vector2 position)
    {
        // Render the sprite to the spritebatch at position
    }
}

public class RenderableTextComponent : RenderableBaseComponent
{
    public string Text { get; set; }

    public override void Render(SpriteBatch spriteBatch, Vector2 position)
    {
        // Render the text to the spritebatch at position
    }
}

The RenderSystem simply looks for any entity with a RenderableBaseComponent and a PositionComponent, passing the relevant arguments into the overridden Render() method. In this way we can use all the power of object oriented programming, while still retaining the flexible, modular ESC architecture. Neat, huh?

Step 1: Start a Blog

As a hobbyist game developer with little time to sit down and write code, a productive use of otherwise idle time (the hour-long commute to & from work) is to contemplate whatever problem I’ve been looking at recently. This time around the problem is: how do I organise my thoughts on a subject in order to solve a problem? While having a notepad is invaluable for sketching out ideas, I often find that the best way to address a problem is to try and explain it. This is a recognised phenomenon, known as Rubber Ducking. It is noticeable when asking questions on Q&A sites such as StackExchange (useful when the Rubber Duck has no answer).

rubberduck

Starting a blog is perhaps the high-tech equivalent of talking to a small polymeric waterfowl, with the added benefit of recording your thoughts and maybe even inspiring some discussion on the subjects that interest you. Kind of makes me wonder why I haven’t done this before now..!

As such, this blog will be a tool to discuss the various problems and ideas I have during my dabblings in game design and development: My rubber duck.