I've written an R package implementing DeepMind’s multidimensional Elo rating approach for evaluating agents. The mELO rating system has the desirable property of being able to handle cyclic, non-transitive interactions (meaning it can handle rock-paper-scissors style dynamics). It is also better behaved in the presence of redundant copies of agents or tasks.