Interaction dynamics of two reinforcement learners

0.00 Avg rating—0 Votes

Article ID:	iaor20073048
Country:	Germany
Volume:	14
Issue:	1
Start Page Number:	59
End Page Number:	86
Publication Date:	Mar 2006
Journal:	Central European Journal of Operations Research
Authors:	Gutjahr Walter J.

Abstract:

The paper investigates a stochastic model where two agents (persons, companies, institutions, states, software agents or other) learn interactive behavior in a series of alternating moves. Each agent is assumed to perform ‘stimulus–response–consequence’ learning, as studied in psychology. In the presented model, the response of one agent to the other agent's move is both the stimulus for the other agent's next move and part of the consequence for the other agent's previous move. After deriving general properties of the model, especially concerning convergence to limit cycles, we concentrate on an asymptotic case where the learning rate tends to zero (‘slow learning’). In this case, the dynamics can be described by a system of deterministic differential equations. For reward structures derived from [2×2] bimatrix games, fixed points are determined, and for the special case of the prisoner's dilemma, the dynamics is analyzed in more detail on the assumptions that both agents start with the same or with different reaction probabilities.

Reviews

Required fields are marked *. Your email address will not be published.