Axelrod

Duels

19th Apr - 22nd Apr 2022

The goal of this contest is to write a python function that takes in a string that contains the opponent's source code, and returns either 'cooperate' or 'defect'. You can use this template as a starting point for your code. This is a question from open source game theory, where open source refers to the fact that the opponent's source code is open, and game theory is the study of interactions between agents, like the one you'll write. This problem is derived from a classic problem in game theory - Prisoner's dilemma. For a more thorough description of the problem, we encourage you to read the sections below.

The prisoner's dilemma

There is a famous example in game theory, where two prisoners are arrested, and given two options - stay silent, or betray the other prisoner. If both of them stay silent (i.e., cooperate with each other), they get 1 year of prison time each. If either of them betrays the other (i.e., defects), the defector can go free, but the prisoner who stays silent gets 3 years. If both of them betray each other, they get 2 years each.

Here's a table of the possible options and outcomes:

A cooperates A defects
B cooperates 1 year prison time each 0 years for A, 3 for B
B defects 3 years for A, 0 for B 2 years each

We can see that in any case except the one where both of them defect, one can get a better outcome by defecting instead of cooperating. So, a rational individual will always defect, which means at equilibrium (a point where no-one can take an action that can improve the situation for themselves), both of them will get 2 years each in prison. This video about Prisoner's dilemma and Nash equilibrium is a good introduction to this if you aren't already familiar.

Open source game theory

Now consider the scenario where A and B are computer programs, and they can read the other's source code. This is a classic example in open source game theory, which covers the cases when agents can look at their opponent's source code to figure out the best course of action.

Here are a few strategies such agents can take in the original problem of prisoner's dilemma:

  1. Always cooperate:

    def cooperate_strategy(opponent_source: str) -> str:
        return 'cooperate'

    This agent will always cooperate in all cases, which gives us a chance at achieving the optimal outcome.

  2. Always defect:

    def defect_strategy(opponent_source: str) -> str:
        return 'defect'

    This agent will always defect in all cases, which represents the optimal strategy when you cannot see your opponent's source code.

  3. Choose a strategy randomly:

    from random import random
    def random_strategy(opponent_source: str) -> str:
        if random() > 0.5:  # or, any some threshold in place of 0.5
            return 'cooperate'
        else:
            return 'defect'

    This agent will defect or cooperate with equal probability.

  4. Cooperate if you are sure that the opponent is going to cooperate, defect otherwise: This is a very interesting case. You try to prove that the opponent is going to cooperate, and cooperate only if you can prove that the opponent will cooperate. The pseudocode here could look like:

    def proof_strategy(opponent_source: str) -> str:
        # proof_will_cooperate isn't specified here, as this is one solution to this problem
        if proof_will_cooperate(opponent_source, self_source):
            return 'cooperate'
        else:
            return 'defect'

    Let us look at some cases where an agent follows proof_strategy:

    • proof_strategy(cooperate_strategy) will return cooperate, as we can be sure that the opponent is going to cooperate.

    • proof_strategy(defect_strategy) will return defect, as we can be sure that the opponent is going to defect.

    • proof_strategy(random_strategy) will return defect, as we can't be sure if the opponent is going to cooperate.

    • proof_strategy(proof_strategy) is a very interesting scenario. You might be tempted to say that this program will never halt, as both strategies will continue evaluating each other for infinitely many times (or, until a stack overflow). However, our strategy does not necessarily need to evaluate the opponent's strategy, just prove its cooperation. I encourage you to pause here and give this some thought. The correct solution follows:

      Suppose you could somehow look at the future. So, you decide to cooperate, and look at how the future plays out. Since you cooperated, the opponent cooperated as well. So, your decision to cooperate was correct. If you had decided to defect, your opponent would have defected as well. So, your decision would have been correct. Here, we can see that no matter what your decision is, the future pans out favorably to you. Therefore, we can prove opponent's cooperation conditioned on us cooperating, and thus we return cooperate.

      A more formal proof for this can be constructed using Löb's theorem, but completely understanding it isn't a requirement in the current context.

Rules

  1. The contest consists of 'rounds', where each round is a one on one match between your program and an opponent's program. Your program will run against all other participants, and your score will be the sum of your scores in each round (calculated as described in #8).

  2. You need to submit python code. It should have a function called strategy with the following signature:

    def strategy(opponent_source: str) -> str
  3. You can import and use scikit, pytorch, and tensorflow (CPU compute only). For a complete list of installed packages, see this page. If you'd like some other library to be included in the evaluation environment, send us an email with your reasons, and we'll consider your request.

  4. Your program is not allowed to write anything to the disk, or connect to the internet.

  5. Your program must exit in 6 seconds. If it does not exit, it will be terminated forcefully, and you will forfeit that round.

  6. The evaluation platform will have 6 cores/12 threads or better, and 12GiB memory or better. If your program needs more memory/compute, you can contact us and we'll consider your request.

  7. You are allowed to find and exploit any loopholes in the problem itself, but you are not allowed to use language-level exploits in python, or other exploits that go against the spirit of the contest. If something is fishy, you might be disqualified at the judge's discretion.

  8. Scoring:

    • +5 if you and opponent cooperate
    • +6 if you defect while opponent cooperates
    • +1 if you and opponent defect
    • 0 if you cooperate but opponent defects
    • -4 if your program times out, and +4 for your opponent. -4 for both if both time out
    • -4 if your program crashes, or doesn't return a valid answer (by printing it to out), and +4 for your opponent. -4 for both if both crash
  9. Eligibility: Although everyone is welcome to participate, we cannot send any prizes (except digital certificates) outside India.

  10. Prizes:

  • 1st place: ₹50,000
  • 2nd place: ₹25,000
  • 3rd place: ₹12,500
  • 4th place: ₹6,250
  • 5th place: ₹3,125
  • 6th place: ₹1,562.5
  • 7th place: ₹781.25
  • 8th place: ₹390.63
  • 9th place: ₹390.62
  1. The top 10 winners will be invited to interview to get an all-expenses paid trip to San Francisco, California. Depending on the quality of submissions, at the judge's discretion, more than 10 winners might be invited to the interviews. So, we encourage you to submit code with comments. Alternatively, you can send a write-up explaining your thought process and solution to pranavgade20@gmail.com after the contest ends.
  1. We recommend you use this script to validate your code before submitting it. You will not see the outcome of individual duels, only the total score against all opponents. The leaderboard will be updated around once every hour, so you might have to wait for a while before your score is updated. We will only consider your last submission when dueling.
  1. You can discuss the problem/ask for clarifications in our discord - https://discord.gg/xsnvJXMd