\subtitle{Evaluating Performance}
\date{Monday (AM), 14 May 2018}
% What do we want?
% What do we measure?
% How do we measure it?
Collection of events that \textbf{occur} to the player \textbf{during} the game
\note{Should be clear - it is only the events that occur because of the game that are important}
\begin{frame}{What is Player Experience?}
Jeffrey is playing an online RTS game, and he is playing with a friend online against two other people.
Which of these are a part of the player experience and which are not?\note{All happen while the person is playing a game}
\item<2->{Losing a unit} \uncover<7->{Yes}
\item<3->{Laundry finishing} \uncover<8->{No}
\item<4->{Collecting resource} \uncover<9->{Yes}
\item<5->{New message in chat window} \uncover<10->{Yes}
\item<6->{Unit moving} \uncover<11->{Yes}
\note{\\ Anything that occurs during the game and as part of the game is part of the player experience. Which of these can be detected by an AI?}
Collect data on how players/bots work
What kinds of features can we collect?
\begin{frame}{Data from humans}
\item{High-level human experience}
\item Final game scores?
\item How long did they play for?
\item Where did they look?
\item Galvanic skin response
\item BCI
\item{Surveys and interviews}
\item Likert Scales
\item Why did you feel that way?
\item Internal State
\item Will depend on bot architecture
\item Measure state visits in FSM
\item Did the game make full use of the AI?
\item How many times does a bot face a difficult choice?
\item What is a difficult choice? \note{Difficult Choice: MCTS - near identical branches, GA - No Convergence}
\note{Some things can be measured regardless of if a human or AI is playing \begin{itemize}}
\item Final Score distribution\note{\item How high, variation?}
\item Game Duration \note{\item Length, range of lengths}
\item Score ``Drama'' \note{\item Runaway victory?, keep changing hands? loop?}
\item Statistical distribution of states \note{\item Some states not used at all? Some overused?}
\item Degree of challenge \note{\item How to measure this?}
\begin{frame}{Data from populations}
Variability of scores, skill-depth
\section{Action Sequences}
\begin{frame}{Data from either}
Actions taken, Record the sequence of button-pushes
\item Sometimes used to \textbf{interpret} aspects of player experience
\item $H(X) = \sum_{i=1}^{n} P(x_{i})I(x_{i}) = -\sum_{i=1}^{n}P(x_{i})\log_{b}P(x_{i})$ \note{\item We won't worry too much about the middle definition}
\item Take a fair coin - how much entropy?
\item $H(fairCoint) = \sum_{i=1}^{2}(\frac{1}{2})\log_{2}(\frac{1}{2}) = -\sum_{i=1}^{2}(\frac{1}{2}) \times (-1) = 1 $ \note{\item Because it is a fair coin - each toss can tell us nothing}
\item How about an unfair coin? What is the entropy for a coin of probability 0.9?
\note{\item Whiteboard time if students stuck: \begin{itemize}}
\note{\item Answer is: $ H(dodgyCoin) = \sum_{i=1}^{2}(0.9)\log_{2}(0.9) = $}
\note{\item Continued: $ -\Big( (0.9 \log_{2}0.9) + (0.1 \log_{2}0.1) \Big) = 0.47 $}
\uncover<6->{\includegraphics[scale=0.4]{entropy}\footnote<6->{Borrowed from \href{}{wikipedia}}}
\begin{frame}{A Game Example}
\note{\item Some sample 2D location visit counts}
\begin{tabularx}{\linewidth}{l | l | l | l}
loc & 0 & 1 & 2 \\
0 & 10 & 20 & 15 \\
1 & 12 & 35 & 13 \\
2 & 15 & 20 & 10 \\
\note{\item Converted into visit counts as fraction of total and then into probability of having visited that location}
\begin{tabularx}{\linewidth}{l | l | l | l}
loc & $\frac{visits}{150}$ & p(loc) & calc \\
0,0 & 10 & 0.067 & $0.067\log_{2}(0.067)$\\
\onslide<2>{0,1 & 12 & 0.08 & $0.08\log_{2}(0.08)$ \\
0,2 & 15 & 0.1 & $0.1\log_{2}(0.1)$ \\
1,0 & 20 & 0.134 & $0.134\log_{2}(0.134)$ \\
1,1 & 35 & 0.234 & $0.234\log_{2}(0.234)$ \\
1,2 & 20 & 0.134 & $0.134\log_{2}(0.134)$ \\
2,0 & 15 & 0.1 & $0.1\log_{2}(0.1)$ \\
2,1 & 13 & 0.0867 & $0.0867\log_{2}(0.0867)$ \\
2,2 & 10 & 0.067 & $0.067\log_{2}(0.067)$ \\
& & Total: & \\
\note{\item Then we just perform the math as a giant summation. Computers are good at this}
\note{\item Except computers are not keen on 0's}
% Simon's raw vs computed metrics.
% Evaluating skill depth
\begin{frame}{Skill Ratings}
\item How \textbf{good} is a player? \note{\item And how do we represent this?}
\item What is the \textbf{issue} with win rates? \note{\item Based on observations, was it enough? Watch F1 at one track and use those observations for another?}
\item If A $>$ B and B $>$ C is A $>$C? \note{\item Usually this is the case in games}
\note{\item Does close win rates prove a lack of skill depth? No, current set of players doesn't demonstrate it. Like me and Joe playing Pool}
\begin{frame}{Elo Ratings}
\item Elo is based on probability \note{\item Designed for chess}
\item $Elo(A) - Elo(B) = P(A$ beats $B)$ \note{\item Point difference between players denotes the probability of winning}
\note{\item Advantage of 100 points = 64\% chance of winning
\item Advantage of 200 points = 76\% chance of winning
\item Works by taking points from the loser and giving them to the winner. Number transfered proportional to difference between points
\only<3>{\includegraphics[scale=0.5]{elo}\footnote{Borrowed from \href{}{liquipedia}}}