OpenAI proposes using reciprocity to encourage AI agents to work together

Many real-world problems require complex coordination between multiple agents — e.g., people or algorithms. A machine learning technique called multi-agent reinforcement learning (MARL) has shown success with respect to this, mainly in two-team games like Go, DOTA 2, StarCraft, hide-and-seek, and capture the flag. But the human world is far messier than games. That’s because humans face social dilemmas at multiple scales, from the interpersonal to the international, and they must decide not only how to cooperate but when to cooperate.

To address this challenge, researchers at OpenAI propose training AI agents with what they call randomized uncertain social preferences (RUSP), an augmentation that expands the distribution of environments in which reinforcement learning agents train. During training, agents share varying amounts of reward with each other; however, each agent has an independent degree of uncertainty over their relationships, creating “asymmetry” that the researchers hypothesize pressures agents to learn socially reactive behaviors.

Unlock premium content and VIP community perks with GB M A X! Join now to enjoy our free and premium perks.

Join now →

Sign in to your account.