Posted February 9, 2013 by Hunter in Python Programming

Code A Learn By Mistake Artificial Intelligence Opponent in Python For a Tic Tac Toe Game


I was going to write about how to get SkyNet up and operational, but then I decided that I would like the human race to continue. So, instead, I thought I would introduce a very simple strategy AI that can learn how to play any strategy-based game or puzzle.


An artificial intelligence that can be adapted for any turn-based strategy game or static puzzle. The specific example shown here will for this AI learning how to play tic-tac-toe so well that it becomes impossible to beat him – I mean ‘it.’


Through failure. The AI starts out knowing nothing, making its decisions systematically. When the AI fails, it looks at the most recent decision it made, including the state of the game or puzzle at the time of the decision-making, and remembers not to make that exact decision again. For example, if the AI were learning how to get through a maze, it would take every left until it came to a dead end. It will remember the state of the puzzle (how many Lefts and Rights has it made) and the last decision (the final Left before it hit the dead end), and it will not make that decision again when in the same state. It would remember, “Do not make Left if Left, Left, Left.” Then, next time it runs the maze, it will do Left, Left, Left Right… and then Left until it meets another dead end, and remember not to do that, and run the maze again, and so on. Eventually it will reach the end.

This is a lot like running all the permutations of a puzzle, but the point about this is that the AI starts out knowing nothing about what to do, and remembers the wrong moves, so that in a game of tic-tac-toe or (if you have a small enough board) Go, if you, a human (I assume) plays against this AI many times, it will become better and better and eventually play perfectly.


Because I like to party, okay?


Right now:

import random
import re

## Game state current
gsp = []
## Game state previous
gsc = []
## Last move made
losing_move = 'hi'

def win_conditions():
    ## These conditions will be unique to whatever game the AI
    ## is adapted for.  Since win conditions are of a small
    ## enough number for tic-tac-toe, I can simply enumerate
    ## them all.
    global gsc
    global xoro
    if (gsc[0] == xoro) & (gsc[1] == xoro) & (gsc[2] == xoro):
        return 1
    elif (gsc[3] == xoro) & (gsc[4] == xoro) & (gsc[5] == xoro):
        return 1
    elif (gsc[6] == xoro) & (gsc[7] == xoro) & (gsc[8] == xoro):
        return 1
    elif (gsc[0] == xoro) & (gsc[3] == xoro) & (gsc[6] == xoro):
        return 1
    elif (gsc[1] == xoro) & (gsc[4] == xoro) & (gsc[7] == xoro):
        return 1
    elif (gsc[2] == xoro) & (gsc[5] == xoro) & (gsc[8] == xoro):
        return 1
    elif (gsc[0] == xoro) & (gsc[4] == xoro) & (gsc[8] == xoro):
        return 1
    elif (gsc[2] == xoro) & (gsc[4] == xoro) & (gsc[6] == xoro):
        return 1
        return 0

## Output.  The display isn't much, but this isn't about
## graphics.
def print_board():
    global gsc
    print (gsc[0]+' '+gsc[1]+' '+gsc[2])
    print (gsc[3]+' '+gsc[4]+' '+gsc[5])
    print (gsc[6]+' '+gsc[7]+' '+gsc[8])

## Turns the game state into a string to be checked against the
## strings stored in the text file used as the AI's memory.
def makereadable(gsc,move):
    readable = ''
    for i in gsc:
        readable += i
    readable += str(move)
    return readable

def ai_turn():
    global gsp
    global gsc
    global losing_move
    global gameon
    global textbook

    ## Position is a value representing a move.
    ## For tic-tac-toe, it is an integer 0-8,
    ## with each number representing a box on the
    ## board.
    position = 0
    ## First try to make one of the nine valid
    ## moves in tic-tac-toe.
    while position < 9:
        ai_move = position
        ## Checks this move against the record of bad moves
        if makereadable(gsc,ai_move) not in textbook:
            ## If the space is still open
            ## This condition will be unique for whatever
            ## game the AI is playing
            if gsc[ai_move] == '-':
                ## Set the game state previous as
                ## the game state current...
                gsp = []
                for i in gsc:
                ## ...then modify the game state to
                ## reflect the move
                gsc[ai_move] = 'x'
                gsp[ai_move] = '-'
                ## Record the move
                losing_move = ai_move
            ## If the move is not a valid move at the time...
                position += 1
        ## If the move has previously resulted in a loss...
            position += 1
    ## If there are no valid moves left, call the game off
    if position == 9:
        gameon = 0

## The artificially stupid computer,
## here to make quick moves against
## the AI so that the AI can learn
## quickly.
def as_turn():
    global gsc
    if '-' not in gsc:
        hold = 0
        hold = 1
    while hold == 1:
        as_move = random.randrange(0,9)
        if gsc[as_move] == '-':
            hold = 0
            gsc[as_move] = 'o'

## Opens or creates the file in which to store
## bad moves.
f = open('/Users/myusername/Desktop/school','a+')
textbook = f.read()

coursecredits = 0

while coursecredits < 100000:

    ## Sets the initial conditions of the game
    gsc = ['-','-','-','-','-','-','-','-','-']

    ## Sets it as x's turn
    xoro = 'x'
    gameon = 1

    ## Runs for one full game
    while gameon == 1:
        ## If it is x's turn
        if xoro == 'x':
            if gameon == 0:
            ## If the game didn't end, check to see
            ## if this most recent move made
            ## x win
            elif win_conditions() == 1:
                gameon = 0
            ## If the game didn't end one way or
            ## another, give the turn over to o
                xoro = 'o'
        ## If it is o's turn
            ## If o won - that is, if x, the AI,
            ## lost, write to the file the game state
            ## before x made its last move, and
            ## what that last move was
            if win_conditions() == 1:
                gameon = 0
                xoro = 'x'

    coursecredits += 1
    if coursecredits % 5000 == 0:
        print ('HAL has earned',coursecredits,'credit hours towards his degree.')


Run this once and it runs quickly, and the AI plays 10,000 games, recording each loss – then when you, not a random-move computer, plays against the AI, it will be smart. For this simple tic-tac-toe game, just replace the as_turn() function with this: ( just replace the body of the method, that is leave the as_turn method name the same, and just paste the body of human_turn there. as_turn is used for the AI, human_turn is used for human players)

def human_turn():
    global gsc
    hold = 1
    while hold == 1:
        minihold = 1
        while minihold == 1:
            human_move = input('Your turn: ')
                human_move = int(human_move)
                minihold = 0
            except ValueError:
                print ('Invalid input.  Must be an integer from 0 to 8.')
        if (human_move != 0) & (human_move != 1) & (human_move != 2) & (human_move != 3) & (human_move != 4) & (human_move != 5) & (human_move != 6) & (human_move != 7) & (human_move != 8):
                print ('Invalid input.  Must be an integer from 0 to 8.')
            if gsc[human_move] != '-':
                print('Invalid choice.  That position has already been played.')
                gsc[human_move] = 'o'
                hold = 0

Now run it again, and you will play as o’s. You’ll need to make use of the print_board() function to know what moves the computer is making, and you might want to add some kind of “Game over, do you want to play again?” when someone wins. To select a position to play, enter an integer 0-8. 0 is the top left, 1 is the top middle, 2 is top right, 3 is middle left, and so on, with 8 as bottom right. Try to beat the computer. I dare you.

But what we all wanted was to actually witness the AI learning. If you want to do that, don’t run the AI against a computer for 10,000 games. Instead, just start it out with you. To watch the AI learn, try this:

AI: Top left (the AI is systematic, so it will always place in this order)

You: Top Right

AI: Top center

You: Middle center

AI: Middle left

You: Bottom left, win

Then the second game:

AI: Top left

You: Top Right

AI: Top center

You: Middle center

AI: Middle right

You: Bottom left, win

Notice the AI took the middle right instead of the middle left for its last move. That’s because last time the board looked like it did and it took the middle left, it lost. So it skipped that. Then it looked at the middle center, but you were there, so it went to the middle right. Now play it again…

AI: Top left

You: Top Right

AI: Top center

You: Middle center

AI: Bottom left

You: Middle left, to block

AI: Middle right, cat’s game

Ah! The AI is getting a handle on this game!

I used the XKCD chart of optimal tic-tac-toe moves to play a perfect game against this AI once it had played 10,000 games against the AS (artificial stupidity, the random-move computer opponent). The AI blocked my every move and forced a cat’s game. The AI had, in fact, played a perfect game itself.

The structure of this code can be adapted to fit any turn-based strategy game. Change the win conditions or add loss conditions for a single-player game, redefine what the ‘positions’ are – you might need to make them two-dimensional, say for a game like checkers where you need to know WHAT piece moves WHERE – and give it an appropriate output, and you’ve got an AI for whatever you want.