MyNixOS website logo
Description

Reinforcement Learning using the Q Learning Algorithm.

Implements Q-Learning, a model-free form of reinforcement learning, described in work by Strehl, Li, Wiewiora, Langford & Littman (2006) <doi:10.1145/1143844.1143955>.

QLearning

This repository implements Q-Learning, a model-free form of reinforcement learning in R.

To download: In your R console, type:

install.packages("devtools")
require(devtools)
install_github("labressler/QLearning")
require(QLearning)

Currently, the R functions that this framework supports require two things:

-A line where a player must choose an action in the form "player <- 'Choose'". For example, if we are storing a player's action in a variable called "player1", the code for choosing a player's action must be:

player1 <- 'Choose'

-A return value of the "reward" after each play-through. This can be a simple as "1 for a win, 0 for a loss".

Example:

Let's set up a simple card game where four players are dealt one card out of a 8-card deck. Each players antes 1 dollar, each player has the chance to "call" for 2 extra dollars or "fold" for nothing, and we act last. The player with the highest card who didn't fold wins the entire money in the pot. Also, we assume that all other players will "call" with a 2 or higher.

cardgame <- function()
{
  playercards <- sample(1:8,4) #distribute the cards, we're player one
  ourcard <- playercards[1] #our card
  playertotals <- rep(-1,4) #including the antes
  playersinpot <- vector()
  for (player in 2:4) #other 3 players go first
  {
    if (playercards[player]>=2)
    {
      playertotals[player] <- (-3)
      playersinpot <- append(playersinpot,player)
    }
  }
  #the next line is where we want to choose our action
  player1 <- 'Choose' 
  if (player1=="Call")
  {
    playertotals[1] <- (-3)
    playersinpot <- append(playersinpot,1)
  }
  potsize <- -1*(sum(playertotals)) #the amount in the pot is how much the players put in
  playercards[!(1:4 %in% playersinpot)] <- 0 #get rid of everyone who folded
  winner <- which.max(playercards) #winner is the person with the highest card who didn't fold
  playertotals[winner] <- playertotals[winner]+potsize
  return(playertotals[1]) #return how much we won
}

We want to figure out the expected payoff of "Call" and "Fold" with a certain card (which is the variable "ourcard"). So, our state is which card we hold, and our possible actions are ("Call","Fold"). So, using this package, let's train our player by playing the game 50000 times:

strat <- qlearn(game="cardgame",statevars="ourcard",possibleactions=c("Call","Fold"),playername="player1",numiter=25000) #make sure each function and variable name is a string


strat #output

Our row names represent the card that our player is dealt, while the column names represent the possible actions.

Now that we have a reasonable strategy, we can tell our player to play the games 50000 more times, this time using our previous strategy as a baseline:

qlearn(game="cardgame",statevars="ourcard",possibleactions=c("Call","Fold"),playername="player1",numiter=25000,prevstrategy=strat) 
Metadata

Version

0.1.1

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows