Computational modeling of epiphany learning, Proceedings of the National Academy of Sciences 2017, 114(18), 4637-4642
Abstract: Models of reinforcement learning (RL) are prevalent in the decision-making literature, but not all behavior seems to conform to the gradual convergence that is a central feature of RL. In some cases learning seems to happen all at once. Limited prior research on these “epiphanies” has shown evidence of sudden changes in behavior, but it remains unclear how such epiphanies occur. We propose a sequential-sampling model of epiphany learning (EL) and test it using an eye-tracking experiment. In the experiment, subjects repeatedly play a strategic game that has an optimal strategy. Subjects can learn over time from feedback but are also allowed to commit to a strategy at any time, eliminating all other options and opportunities to learn. We find that the EL model is consistent with the choices, eye movements, and pupillary responses of subjects who commit to the optimal strategy (correct epiphany) but not always of those who commit to a suboptimal strategy or who do not commit at all. Our findings suggest that EL is driven by a latent evidence accumulation process that can be revealed with eye-tracking data. (Media Reports and Attention)
Bounded Memory, Inertia, Sampling and Weighting Model for Market Entry Games, Games 2011, 2(1), 187-199
Abstract: This paper describes the “Bounded Memory, Inertia, Sampling and Weighting” (BI-SAW) model, which won the http://sites.google.com/site/gpredcomp/Market Entry Prediction Competition in 2010. The BI-SAW model refines the I-SAW Model (Erev et al. ) by adding the assumption of limited memory span. In particular, we assume when players draw a small sample to weight against the average payoff of all past experience, they can only recall 6 trials of past experience. On the other hand, we keep all other key features of the I-SAW model: (1) Reliance on a small sample of past experiences, (2) Strong inertia and recency effects, and (3) Surprise triggers change. We estimate this model using the first set of experimental results run by the competition organizers, and use it to predict results of a second set of similar experiments later ran by the organizers. We find significant improvement in out-of-sample predictability (against the I-SAW model) in terms of smaller mean normalized MSD, and such result is robust to resampling the predicted game set and reversing the role of the sets of experimental results. Our model’s performance is the best among all the participants.
Extended Abstract: Although it is essential to understand how people make decisions, we still do not fully understand the latent evidence accumulation process in simple value-based decision-making. Prior research has shown that eye movements influence the choice process by boosting the evidence for the fixated item. Other research has demonstrated links between pupil dilation and information processing. In particular, it has been argued that pupil dilation may reflect the decision threshold in the drift-diffusion model (DDM). Here, we test this hypothesis and more generally investigate the relationship between pupil dilation and simple value-based decision-making.
A total of 44 undergraduate students from The Ohio State University participated in the experiment. Each session consisted of two stages. First, subjects rated 139 snack foods from -10 to +10 based on how much they would like to eat each item. In the second stage, subjects made 200 incentivized binary choices between different food items. During both stages, subjects’ eye movements and pupil dilation were monitored using an eye tracker. Crucially, all the food images used in the experiment were made isoluminant to avoid light reflex effects on pupil dilation. Consistent with prior research, we found that subjects chose in line with their ratings but were also biased toward the last item they looked at. This again confirmed that attention biases choices. As for the pupillometry results, subjects’ pupil size during the decision was negatively correlated with reaction time (RT) while post-decision pupil size was positively correlated with RT. On the other hand, neither of these pupil measures was correlated with choice accuracy. Further analysis revealed that pupil dilation influenced RT through modulating the theta parameter in the attentional drift-diffusion model (aDDM).
Abstract: We studied how to improve learning behavior that corrects irrational choices in probabilistic situations by conducting a series of laboratory experiments using the Monty Hall Problem (MHP) and a modified version of it. In the first experiment, we showed that after experienced a simplified version of the standard MHP (the 100-door version), subjects performed better in the standard MHP, compared to the control group who only experienced the standard version. However, even our control group performed much better than what was reported in the previous literature. We suspected that this is partially due to experienced subjects or because our subjects were employing a heuristic called the “Irrelevant Therefore Invariant” that coincidentally suggested the same optimal strategy as the Bayes’ rule in the standard MHP. Therefore, in the second experiment, we designed a modified version of the MHP that separates ITI from Bayes’ rule, and showed that control group subjects’ performance did decrease significantly, yet the 100-door version was still able to improve subjects behavior in the 2-door version. Lastly, we also
identified 33% of our subjects to be epiphany learners.
Abstract: People often fail to make optimal choices initially, but learn to act optimally in the long run. Thus, investigating learning behaviors is key to understanding human behavior. In economics, reinforcement learning (RL) have received much attention in the literature. However, not all behavior appears to conform to the gradual behavioral convergence that is a central feature of RL. In some cases, learning appears to happen all at once. Previous studies have documented this phenomenon in various experimental setting, and call this type of learning epiphany learning (EL). Here, we investigate the EL using a novel mixed-strategy method for measuring sudden shifts in subjects’ beliefs, and apply a model of EL we previously proposed, in a two-person beauty contest (2BC) experiment. We find that this model better predicts 45% of the subjects’ choice distributions (a proxy for beliefs) than the RL model.