Contribution of SIMONE GOZZINI

Introduction

Trust and reputation are different but interrelated concepts. The former can be defined as “a particular level of the subjective probability with which an agent assesses that another agent or group of agents will perform a particular action, both before he can monitor such action (or independently of his capacity ever to be able to monitor it) and in a context in which it affects his own action” (Gambetta, 2000, p. 5), while the latter can be defined as “a situation when agents believe a particular agent to be something" (Cabral, 2005, p. 3). Trust is a fundamental human sentiment that plays a crucial role in fostering cooperation among individuals. It serves as a catalyst for positive outcomes in various domains, including stock market participation, firm performance, and efficient market transactions. Trust creates an environment where individuals feel secure and confident in engaging in economic exchanges, leading to smoother and more fluid interactions. In fact, trust is considered a vital ingredient that allows complex modern societies to not only exist but also evolve and thrive (Popitz, 1980). Its influence permeates through different aspects of human interaction, enabling cooperation, economic growth, and societal development. However, trust involves risk and uncertainty about the other party’s behavior, given that perfect monitoring of what the other agent is doing is not possible: trust and trustworthiness are not easy to develop. Throughout history, various solutions have been devised to address this problem, including the use of physical coercion, contract law, and reputation. Among these solutions, reputation holds significant importance in enhancing trust and trustworthiness by reducing uncertainty, given that it ”establish(es) links between past behavior and expectations of future behavior.” (Mailath & Samuelson, 2015, p. 166). By leveraging reputation, individuals can make more informed judgments about the trustworthiness of others, thereby reducing the risks associated with trust. As reputation serves as a bridge between past actions and future expectations, it plays a pivotal role in fostering trust and creating a more conducive environment for cooperative interactions.

Mailath and Samuelson (2006) extensively study the effects of reputation within the environment of repeated games, which has been found effective because those games provide a clear mathematical framework to describe both the short-term incentives that encourage opportunistic behavior and, through well-defined specifications of future actions, rewards, and punishments, the incentives that discourage opportunistic behavior.

The paper is organized as follows: section 2 offers contextual information on the connection between trust and reputation. It discusses the interplay between these concepts and their significance in various domains; section 3 provides a mathematical framework for stage-games (single round interactions) and repeated-games (interactions occurring over multiple rounds), focusing both on the case of perfect and imperfect monitoring; section 4 deals with the concept of reputation games within the context of both perfect and imperfect monitoring, in the case of one long-lived and one short-lived player; finally, section 5 presents the various specifications of repeated games in the case where the two players are both long-lived.

Reputation and Trust

Extensive research in the literature has thoroughly examined the relationship between trust and reputation across various applications and contexts.

Diekmann and Przepiorka (2021) explore the problem of trust in economic transactions, which they define as "the uncertainty regarding the trustworthiness and/or competence of the trustee that the truster faces" (Diekmann & Przepiorka, 2021, p. 132). Uncertainty can hinder the occurrence of efficient exchanges, as it creates a situation where one party is unsure whether the other party will fulfill their promises. This uncertainty may manifest in various ways, such as doubts regarding whether the other agent will exchange the agreed-upon amount, deliver the promised product of satisfactory quality, or even engage in the exchange at all. Reputation, which creates the possibility to engage in long-lasting relationships, can be a solution given that it is a “shadow of the past” (Diekmann & Przepiorka, 2021, p. 134) behavior of the trustee. Trustworthiness can arise because reputation is a way to make the past transaction history of the trustee known to the trustor, who, by having access to this information, can gain insights into the characteristics and behavior of his counterpart, thereby reducing uncertainty. This is especially true in online markets, where the seller and the vendor are usually anonymous and are likely to engage in one-time-only transactions: decentralized reputation systems like ratings play an indispensable role in ensuring the viability of these markets, given that buyers are more likely to buy from sellers with good reputation and therefore vendors have the incentive to maintain a good reputation by providing good services. Also according to Einwiller (2003), who conducted an empirical research involving 473 German internet users, reputation systems play a critical role in building trust within online markets. This is especially significant for consumers who are less familiar with engaging in such markets.

Xiong and Liu (2004) study the problem of reputation and trust in peer-to-peer online communities, which are platforms or networks that facilitate direct interaction and collaboration among individuals without relying on a centralized authority or server. To overcome the uncertainty and the threats related to decentralized networks, they develop a reputation-based trust supporting framework called PeerTrust, which aims to assess the trustworthiness of peers within these communities. This system is based on the following features: the feedback a peer receives from other peers, the total number of transactions a peer performs, the credibility of the feedback sources, the transaction context factor, and the community context factor. Once again, it is evident that building a reputation plays a pivotal role in enhancing trust and facilitating efficient exchanges. Similar conclusions are also achieved by Selcuk et al. (2004).

At a more granular level, the reputation of individual firms (and therefore their perceived trustworthiness) holds significant importance. Companies that foster a positive culture, underpinned by a well-defined code of ethics, possess the ability to cultivate a favorable reputation. This, in turn, bolsters the trust vested in them by their stakeholders (Webley, 2004). Consequently, these firms experience enhanced profitability as they attract high-quality employees, cultivate loyal customers, and are perceived as more valuable, affording them the opportunity to command premium pricing for their products (Eccles et al., 2007). Furthermore, reputation enhances the persistence and sustainability of these positive performances over time (Roberts & Dowling, 2002). As such, reputation emerges as an indispensable intangible asset (Dowling, 1993).

Definitions

To delve into the realm of game theory, it is essential to establish a clear understanding of the mathematical frameworks underlying stage games and repeated games.

Stage games with perfect monitoring

Repeated games can be defined starting from stage games. There are n players and each of them can choose various actions $a_{i}$ from a set of available pure actions $A_{i}$ . The set of pure actions profile is $A \equiv \prod_{i} A_{i}$ . The payoffs are given by a continuous function $u : \prod_{i} A_{i} \to ℝ^{n}$ . The set of mixed actions for player i is $Δ (A_{i})$ , with a typical element $α_{i}$ , while the set of mixed profile is $\prod_{i} Δ (A_{i})$ . The set of stage-game payoffs generated by pure action profiles in A is $F \equiv {v \in ℝ^{n} : \exists a \in A such that v = u (a)}$ and the set of feasible payoff is $F^{†} \equiv c o F$ , which is a convex hull of F. A payoff $v \in F^{†}$ is inefficient if $\exists v^{'} \in F^{†}$ s.t $v_{i}^{'} > v_{i}$ , otherwise it is efficient. According to Nash (1951), if the stage game is finite, there is a Nash equilibrium, but, because the payoffs are given by a continous function, also infinite stage games have Nash equilibria. Further assumptions are made:

$A_{i}$ is either finite, or a compact and convex subset of the Euclidean space $ℝ^{k}$ , for some k.
If $A_{i}$ is a continuum action space, then $u : A \to ℝ^{n}$ is continuous, and $u_{i}$ is quasiconcave in $a_{i}$ .

It is important to characterize the worst payoff that an individual can reach while optimizing. The worst outcome for player i, consistent with player i behaving optimally, is reached when the other player choose the action $a_{- i} \in A_{- i} \equiv \prod_{j \neq i} A_{j}$ that minimizes the payoffs i gets when he plays the best response to action $a_{- i}$ . This is called minmax payoff and mathematically is defined by $v_{i}^{p} \equiv \min_{\overset{a_{- i} \in A_{- i}}{}} \max_{\overset{a_{i} \in A_{i}}{}} u_{i} (a_{i}, a_{- i})$ . Thus, it is possible to define $\hat{a^{i}} = (\hat{a_{i}^{i}}, \hat{a_{- i}^{i}})$ as the minmax profile for player i. A payoff vector $v = (v_{1}, \dots, v_{n})$ is weakly individually rational if $v_{i} \geq v_{i}^{p}$ $\forall$ i, and it is strictly individually rational if $v_{i} > v_{i}^{p}$ $\forall$ i.

Repeated games with perfect monitoring

A repeated game is a stage game that is repeated in each period $t \in {0, 1, \dots}$ . The behavior in this kind of games is called strategy and it is a collection of actions. The authors firstly deal with perfect monitoring games, so at the end of each period t each player can observe the actions taken by all the other players. The set of period t histories is given by $A^{t}$ , that is the t-fold product of A. A history $h^{t} \in A^{t}$ is thus a list of $t$ action profiles, identifying the actions played in periods 0 through $t - 1$ . The set of all possible histories is $H \equiv ⋃_{t = 0}^{\infty} A^{t}$ . A pure strategy can be defined as $σ_{i} : H \to A_{i}$ , while a mixed strategy (also called behavior strategy) is $σ_{i} : H \to Δ (A_{i})$ . A continuation game is the infinitely repeated game that begins in t, following history $h^{t}$ . The continuation strategy induced by $h^{t}$ is denoted by $σ_{i | h^{t}}$ . The continuation game associated with each history is also defined as a subgame identical to the original game. This means that repeated games have a recursive structure. An outcome path in the infinitely repeated game is an infinite sequence of action profiles $a \equiv (a_{0}, a_{1}, a_{2}, \dots) \in A^{\infty}$ . They differ from histories because histories have a finite length. The first $t$ periods of an outcome are defined as $a^{t} = (a^{0}, a^{1}, \dots, a^{t - 1})$ . Thus, $a^{t}$ is the history in $A^{t}$ corresponding to the outcome $a$ . A strategy profile $σ$ induces an outcome $a (σ)$ and, analogously, a behavior strategy profile $σ$ induces a path of play. For a pure strategy profile, the induced path of play and induced outcome are the same. In period $t$ , the induced pure strategy profile $a^{t} (σ)$ yields a payoff $u_{i} (a^{t} (σ))$ . An outcome $a (σ)$ induces an infinite stream of stage game payoffs $(u_{i} (a^{0} (σ)), u_{i} (a^{1} (σ)), u_{i} (a^{2} (σ)), \dots) \in ℝ^{\infty}$ which are discounted with a discount factor $δ \in [0, 1)$ . The payoff from a pure strategy profile $σ$ is therefore: $U_{i} (σ) = (1 - δ) \sum_{t = 0}^{\infty} δ^{t} u_{i} (a^{t} (σ)) .$ The authors assume that long-lived players share a common discount factor $δ$ .

A Nash equilibrium is a strategy profile in which each player is optimally responding to the strategies of the others. Formally, $σ$ is a Nash equilibrium if $\forall$ players $i$ and strategies $σ_{i}^{'}$ , we have $U_{i} (σ) \geq U_{i} (σ_{i}^{'}, σ_{- i}) .$ Furthermore, a strategy profile $σ$ is a subgame-perfect equilibrium if $\forall$ histories $h^{t} \in H$ , $σ_{i | h^{t}}$ is a Nash equilibrium of the repeated game.

The authors further define the concept of one shot deviation: for player i and from strategy $σ_{i}$ , a one shot deviation is a strategy $\hat{σ_{i}} \neq σ_{i}$ with the property that $\exists$ a unique history $h^{t}$ $\in$ H such that $\forall h^{τ} \neq \tilde{h^{t}}$ , $σ_{i} (h^{τ}) = {\hat{σ}}_{i} (h^{τ})$ . A one shot deviation $\hat{σ_{i}}$ is profitable if, fixed an opponent strategy $σ_{- i}$ , at the history $\tilde{h^{t}}$ for which ${\hat{σ}}_{i} ({\tilde{h}}^{t}) \neq σ_{i} ({\tilde{h}}^{t})$ , $U_{i} ({\hat{σ}}_{i} ∣ {\tilde{h}}^{t}, σ_{- i} ∣ {\tilde{h}}^{t}) > U_{i} (σ ∣ {\tilde{h}}^{t})$ . A strategy profile is a subgame perfect IFF there are no profitable one shot deviations. However, along the equilibrium path, the absence of one-shot deviations is not sufficient to establish a Nash equilibrium, even though in the equilibrium path there cannot be profitable one-shot deviations. The one-shot principle is useful because it simplifies the strategies to check when evaluating subgame perfection. Another way to simplify is by grouping repeated-game strategies in automata. An automaton $(W, w^{0}, f, τ)$ is a set of states W, an initial state $w^{0} \in W$ , a decision function f : $W \to \prod_{i} Δ (A_{i})$ associating mixed action profiles with states, and a transition function $τ : W \times A \to W$ , which identifies the next state of the automaton, given its current state and the realized stage-game pure action profile. Any strategy profile can be represented by an automaton. It is possible to represent a single strategy $σ_{i}$ by the automaton $(W_{i}, w_{i}^{0}, f_{i}, τ_{i})$ , the strategy profile $σ$ as the automaton $(W, w^{0}, f, τ)$ and a continuation strategy as the automaton $(W, τ (w^{0}, h^{t}), f, τ)$ . The strategy profile with an automaton $(W, w^{0}, f, τ)$ is a subgame perfect equilibrium IFF $\forall$ w $\in$ W accessible from $w^{0}$ , the strategy profile induced by $(W, w, f, τ)$ is a Nash equilibrium for the repeated game.

Equilibria

To discuss the notion of equilibrium, the authors describe the notions of enforceability and pure-action decomposability:

Enforceability: A pure action profile $a^{*}$ is enforceable on W if $\exists$ some specification of continuation promises (i.e. a commitment made by a player in a sequential game to take a specific action in future stages of the game) $γ : A \to W$ such that, $\forall$ players i and $a_{i} \in A_{i}$ , $(1 - δ) u_{i} (a^{*}) + δ γ_{i} (a^{*}) \geq (1 - δ) u_{i} (a_{i}, a_{- i}^{*}) + δ γ_{i} (a_{i}, a_{- i}^{*})$ .
Decomposability: A payoff $v \in F^{†}$ is pure action decomposable on W if $\exists$ a pure action profile $a^{*}$ enforceable on W such that $v_{i} = (1 - δ) u_{i} (a^{*}) + δ γ_{i} (a^{*})$ where $γ$ is a function enforcing $a^{*}$ .

Any set of payoff $W \subset F^{†}$ with the property that every payoff in W is pure-action decomposable on W is a set of pure-strategy subgame-perfect equilibrium payoffs. Moreover, a set W is pure-action self-generating if every payoff in W is pure-action decomposable on W.

Simple strategy and penal code

Simple strategy profile: given (n + 1) outcomes ${a (0), a (1), \dots, a (n)}$ , the associated simple strategy profile $σ (a (0), a (1), \dots, a (n))$ is given by the automaton: $W = {0, 1, \dots, n} \times {0, 1, 2, \dots}$ , $w_{0} = (0, 0)$ , $f (j, t) = a^{t} (j)$ and $τ ((j, t), a) = {\begin{cases} (i, 0) & if a_{i} \neq a_{i}^{t} (j) and a_{- i} = a_{- i}^{t} (j) \\ (j, t + 1) & otherwise \end{cases}$ Therefore, a simple strategy consist in a prescribed outcome a(0) and a punishment outcome a(i) $\forall$ player i. The games continues according to the outcome a(0), but, if there is any deviation by player i, the other players respond with the player i outcome path a(i). it is important to underline that the punishment for a deviation is independent of when the deviation occurs and the nature of it. The simple strategy profile $σ (a (0), a (1), \dots, a (n))$ is a subgame-perfect equilibrium IFF: $U_{t}^{i} (a (j)) \geq \max_{a_{i} \in A_{i}} [(1 - δ) u_{i} (a_{i}, a_{- i}^{t} (j)) + δ U_{i}^{0} (a (i))],$ $\forall$ $i = 1, \dots, n$ , $j = 0, 1, \dots, n$ , and $t = 0, 1, \dots$ .

Then, a definition of an optimal penal code is given. Let ${a (i) : i = 1, \dots, n}$ be $n$ outcome paths satisfying $U_{i}^{0} (a (i)) = v_{i}^{*}$ , for $i = 1, \dots, n$ . The collection $σ (i) = σ (a (i), a (1), \dots, a (n))$ is an optimal penal code if $σ (i) \in ℰ^{p}, i = 1, \dots, n$ , being $ℰ^{p}$ the set of pure strategy subgame-equilibrium payoffs.

Long-lived and short-lived players

The authors introduce short-lived players, who differ from long-lived players since the latter live throughout the game. On the contrary, short-lived players are concerned only with the current period payoffs (therefore they do not discount) and they are called myopic. There are two interpretations for them:

In each period a collection of short-lived players enter the games and, after one period, they leave.
Each of them represents a continuum of long-lived agents.

Small players are assumed to be anonymous (i.e a change in the behavior of a member of the continuum does not affect the ditribution of play, so it does not affect the behavior of other players) and each of them observes observes only the history $h^{t} \in A^{t}$ . The notion of one-shot deviation applies also in games with long and short lived players. Moreover, being myopic, the short-lived players play a Nash equilibrium of the induced stage game, given the actions of the long-lived players.

It is possible to generalize the concept of minmax payoff with short-lived players. Let B: $\prod_{i = 1}^{n} Δ (A_{i}) \to \prod_{i = n + 1}^{N} Δ (A_{i})$ be the correspondence that maps any mixed-action profile for the long-lived players to the corresponding set of static Nash equilibria for the short-lived players. For each long-lived player i, the payoff $v_{i} = \min_{\overset{α \in B}{}} \max_{\overset{a_{i} \in A_{i}}{}} u_{i} (a_{i}, a_{- i})$ is player i’s (mixed-action) minmax payoff with short-lived players and it is a lower bound on the payoff that player i can obtain in an equilibrium of the repeated game.

Finally, repeated games with short-lived players have some features that must be underlined:

There are some restrictions on the payoffs that can be achieved by the long-lived players. In particular, short-lived players impose restrictions on the set of equilibrium payoffs that go beyond the specification of minmax payoffs.
There are restrictions on the structure of the equilibrium.

Let i being a long-lived player and ${\overline{v}}_{i} = \sup_{α \in B} \min_{a_{i} \in supp (α_{i})} u_{i} (a_{i}, α_{- i})$ be the minimum payoff that it is possible to construct by adjusting player i’s behavior within the support of his action $α_{i}$ . In every subgame-perfect equilibrium for player i, the payoff for player i will be less or equal to ${\overline{v}}_{i}$ .

Games with imperfect monitoring

The focus before was on perfect monitoring games, where deviations from the equilibrium can be easily detected and punished, thus providing incentives for players not to myopically optimize. Now the authors focus on games with imperfect monitoring, so games where players have only noisy information about past play, thus making deviations more difficult to discover. However, it is still possible to have players that do not myopically optimize, since punishments are still possible. When the noisy signals are observed by all players we are in the case of imperfect public monitoring, while if some signals are observed by some players but not others, we are in the case of private monitoring. The following sections focus on imperfect public monitoring, but the results can be extended for the case of private monitoring.

Stage games

The specification of this games is very similar to the stage games with perfect monitoring, seen in section 3.1. The main difference relies in the presence of a public signal $y$ , from the finite signal space $Y$ . $ρ (y | a)$ will be the probability that signal $y$ is realized, given $a \in A \equiv \prod_{i} A_{i}$ . The function $ρ : Y \times A \to [0, 1]$ is continuous. $ρ$ has full support if $\forall y and a, ρ (y | a) > 0$ . Player i’s payoff at the end of each period, given the realization $(y, a)$ is $u_{i}^{*} (y, a_{i})$ , while ex-ante stage game payoffs are $u_{i} (a) = \sum_{y \in Y} u_{i}^{*} (y, a_{i}) ρ (y | a)$ . The other assumptions are maintained.

Repeated games

Repeated games with imperfect monitoring have a similar structure to the games presented in section 3.2. The only public information available in t is the t-period history of public signals $h^{t} \equiv (y^{0}, y^{1}, \dots, y^{t - 1})$ . The set of public histories is $H \equiv ⋃_{t = 0}^{\infty} Y^{t}$ . A history for a long-lived player will include the public history and the history of the actions he has taken. It is assumed that each short lived-player in period t only observes the public history $h^{t}$ . A pure strategy profile does not induce a deterministic outcome path, since public signals may be random. Public monitoring games include also the special case of perfect monitoring games, where Y=A and $ρ (y | a) = 1$ if y=a, 0 otherwise.

Long-lived players have private information (the knowledge of their past actions), so information sets are isomorphic to private histories, not to public histories. A behavior strategy $σ_{i}$ is public if, in every period $t$ , it depends only on the public history $h^{t} \in Y^{t}$ and not on $i$ ’s private history: $\forall$ $h_{i}^{t}, {\hat{h}}_{i}^{t} \in H_{i}$ satisfying $y^{τ} = {\hat{y}}^{τ}$ $\forall$ $τ \leq t - 1$ , $σ_{i} (h_{i}^{t}) = σ_{i} ({\hat{h}}_{i}^{t})$ . Otherwise, it is private. If all players other than i are playing a public strategy, then player i has a public strategy as a best reply. $σ_{i}$ and $\hat{σ_{i}}$ are said to be realization equivalent if, for all strategies for the other players, $σ_{- i}$ , the distributions over outcomes induced by $(σ_{i}, σ_{- i})$ and $({\hat{σ}}_{i}, σ_{- i})$ are the same. Thus, it is possible to demonstrate that every pure strategy in a public monitoring game is realization equivalent to a public pure strategy.

Finally, it is possible to define a perfect public equilibrium (PPE) as a profile of public strategies $σ$ that, $\forall$ $h^{t}$ , specifies a Nash equilibrium for the repeated game. That is, $\forall$ $t$ and $h^{t} \in Y^{t}$ , $σ | h^{t}$ is a Nash equilibrium. $σ$ is a PPE IFF there are no profitable one shot deviations.

Reputation games

Reputation can be considered as a specification of the concept of trust. It can be defined as "a link between past behavior and expectations of future behavior" (Mailath & Samuelson, 2006, p.459) that arise after multiple interactions among agents. Repeated games are therefore a good environment to study reputation.

There are two kind of approaches:

In the first, an equilibrium of the repeated game is selected and it involves an equilibrium path that is not a Nash equilibrium of the stage game. Players who choose the equilibrium are said to maintain a reputation, triggering a punishment if they deviate, consisting in loosing their reputation. The link between past and future behavior, i.e. reputation, is an equilibrium phenomenon.
In the second (adverse selection approach), each player is uncertain about the characteristics of his opponent. Therefore, this incomplete information introduces a link between past and future behavior, thus making the concept of reputation arise. Here, the reputation approach does not describe a possible equilibrium, but places constraints on the equilibria available.

Commitment types

The authors firstly consider game with one long-lived player and one short-lived player. Player 2 does not know the type of player 1, but he has a prior belief $μ$ about player 1’s type $ξ$ , which comes from a countable set $Ξ$ . There are two possible categories of types for player 1:

the payoff type $Ξ_{1}$ , which maximizes the average discounted value of his payoffs. Of particular interest, in this category, there is the normal type $ξ_{0} \in Ξ_{1}$ , who has a stationary payoff function.
the commitment type $Ξ_{2}$ , who does not have payoffs but play a particular game strategy. If the strategy played consist in the same stage-game action in every period, regardless of the past history, the player will be a simple commitment type.

Usually, the probability $μ (ξ_{0})$ is large.

Player 1’s pure-action Stackelberg payoff can be defined as $v_{1}^{*} = \sup_{a_{1} \in A_{1}} \min_{α_{2} \in B (a_{1})} u_{1} (a_{1}, α_{2})$ where $B (a_{1})$ is the set of player 2 myopic best replies to $a_{1}$ . If the supremum is achieved by some action $a_{1}^{*}$ , that action will be a Stackelberg action $a_{1}^{*} \in \arg \max_{a_{1} \in A_{1}} \min_{α_{2} \in B (a_{1})} u_{1} (a_{1}, α_{2}) .$ This is a pure action, known as the Stackelberg action, that player 1 would commit to if given the opportunity to do so. The name "Stackelberg action" arises because such a commitment elicits a best response from player 2. The Stackelberg type of player 1 will play $a_{1}^{*}$ and is denoted by $ξ (a_{1}^{*}) \equiv ξ^{*}$ .

Perfect monitoring games

The authors firstly focus on repeated games with perfect monitoring. The action set $A_{2}$ of player 2 is assumed to be finite and each player only play a pure strategy. The basic reputation result establishes a minimum threshold for the equilibrium payoffs of the normal long-lived player. H is the set of public histories in the complete information game and also in the incomplete information game, and an history for player 1 will be $Ξ \times H$ , which specifies player 1’s type and the public history. A behavior strategy for player 1 will be $σ_{1} : H \times Ξ \to Δ (A_{1})$ such that, for all commitment types $ξ (\hat{σ_{1}}) \in Ξ_{2}$ , $σ_{1} (h^{t}, ξ ({\hat{σ}}_{1})) = {\hat{σ}}_{1} (h^{t}) \forall h^{t} \in H$ . A behavior strategy for player 2 will be $σ_{2} : H \to Δ (A_{2})$ .

Let $U_{1} (σ, ξ)$ be the type $ξ$ long-lived player’s payoff. A strategy profile $({\tilde{σ}}_{1}, {\tilde{σ}}_{2})$ is a Nash equilibrium of the reputation game with perfect monitoring if $\forall$ $ξ \in Ξ_{1}$ , ${\tilde{σ}}_{1}$ maximizes $U_{1} (σ_{1}, {\tilde{σ}}_{2}, ξ)$ over player 1’s repeated game strategies, and if $\forall$ $t$ and $h_{t} \in H$ that have positive probability under $({\tilde{σ}}_{1}, {\tilde{σ}}_{2})$ and $μ$ , $E [u_{2} ({\tilde{σ}}_{1} (h^{t}, ξ), {\tilde{σ}}_{2} (h^{t})) | h^{t}] = \max_{a_{2} \in A_{2}} E [u_{2} ({\tilde{σ}}_{1} (h^{t}, ξ), a_{2}) | h^{t}] .$

The first goal is to demonstrate that, when player 2 assign some probability to 1 being the simple type $ξ (a_{1}^{'}) = ξ^{'}$ , if the normal player 1 repeatedly plays $a_{1}^{'}$ , then player 2 will place an high probability on the fact that such action will be played in the future. Obviously, the reputation for playing $a_{1}^{'}$ will not arise instantaneously, and it can also be costly. However, the cost will be negligible if player 1 is sufficiently patient. If this action is the Stackelberg action $a_{1}^{*}$ , when player 1 is sufficiently patient, the resulting lower bound on player 1’s payoff is close to his Stackelberg payoff $v_{1}^{*}$ . Let $Ω \equiv Ξ \times (A 1 \times A 2)^{\infty}$ be the space of outcomes, being $ω$ a specific outcome. A profile of strategy $(σ_{1}, σ_{2})$ , along with $μ$ , induces a probability measure on the set of outcomes $Ω$ , $P \in Δ (Ω)$ . $Ω^{'}$ is the event that action $a_{1}^{'}$ is chosen in every period. $q^{t}$ will be the probability that $a_{1}^{'}$ is chosen in period t conditional on $h^{t}$ , that is $q^{t} \equiv P (a_{1}^{t} = a_{1}^{'} | h^{t})$ , and it is a random variable. The normal player 1 receives a payoff of at least $\min_{a_{2} \in B (a_{1}^{'})} u_{1} (a_{1}^{'}, a_{2})$ in any period t in which $q^{t}$ is large enough to permit player 2 to choose the best response to $a_{1}^{'}$ and player 1 effectively plays $a_{1}^{'}$ . Since player 1 can always play $a_{1}^{'}$ , the payoff generated by always playing $a_{1}^{'}$ must be the lower bound of the payoff in any Nash equilibrium.

Let $n_{ζ} : Ω \to N_{0} \cup \infty$ be the number of random variables $q^{t}$ for which $q^{t} \leq ζ$ and $ξ^{'}$ the event that player 1 is type $ξ^{'}$ . Fix $ζ \in [0, 1)$ . Suppose $μ (ξ (a_{1}^{'})) \in [μ^{'}, 1)$ for some $μ^{'} > 0$ and $a_{1}^{'} \in A_{1}$ . $\forall$ profile $(σ_{1}, σ_{2})$ , $P (n_{ζ} > \frac{\ln μ^{'}}{\ln ξ} | Ω^{'}) = 0$ and $\forall$ $ω \in Ω^{'}$ such that all histories $(h^{t} (ω))_{t = 0}^{\infty}$ have positive probability under P, $P (ξ (a_{1}^{'}) | h^{t} (ω))$ is non decreasing in t. Therefore, the more player 2 observes player 1 playing $a_{1}^{'}$ , the more he will expect $a_{1}^{'}$ to be played with an higher and higher probability. The bound $n_{ζ}$ is independent of P, meaning that this result does not say that the posterior probability attached to be the simple type converges to 1 as t increases, leaving the possibility that player 1 is a normal type playing like the simple type.

By committing to action $a_{1}$ , player 1 can guarantee the payoff $v_{1}^{*} (a_{1}) \equiv \min_{a_{2} \in B (a_{1})} u_{1} (a_{1}, a_{2})$ which is the one-shot bound from $a_{1}$ . Let $v_{1} (ξ_{0}, μ, δ)$ the infimum over the set of normal player 1’s payoff in any Nash Equilibrium. The reputation result establishes a lower bound on the equilibrium payoff of player 1. In particular, let $A_{2}$ be finite and $μ (ξ_{0}) > 0$ . Suppose $A_{1}^{'}$ is a finite subset of $A_{1}$ with $μ (ξ (a_{1})) > 0 \forall a_{1} \in A_{1}^{'}$ . Then, $\exists$ k such that: $v_{1} (ξ_{0}, μ, δ) \geq δ^{k} \max_{a_{1} \in A_{1}^{'}} v_{1}^{*} (a_{1}) + (1 - δ^{k}) \min_{a \in A} u_{1} (a) .$ If the set of commitment types is sufficiently rich, the lower bound of player’s 1 payoff is the Stackelberg payoff, if he is the normal type. Indeed, if there is a Stackelberg action and the Stackelberg type has $μ > 0$ , the normal player 1 effectively builds a reputation for playing like the Stackelberg type, thus receiving a payoff no less than $v_{1}^{*}$ despite the fact that there are other possible commitment types. However, the result does not tell if it is optimal for player 1 to play the Stackelberg action in each period. Reputation effects resulting from pure-action commitment types in perfect monitoring games impose a minimum threshold on player 1’s equilibrium payoffs, which can be considerably high. However, unlike mixed-action commitment types or imperfect monitoring games in general, they do not introduce the possibility of new payoffs. In this context, discounting plays a dual role:

It makes future payoffs relatively more important.
it diminishes the significance of the initial sequence of periods in which player 1 may incur in costs to imitate the commitment type.

Example: product-choice game

Let’s consider the following product-choice game (i.e. a game where the first agent is a firm that decides whether to exert high (H) or low (L) effort in producing its output, while the second agent is a consumer that can buy a high-priced (h) or low-priced (l) product).

		Player 2
		h	l
Player 1	H	2,3	0,2
Player 1	L	3,0	1,1

Player 1 is a long lived player, while player 2 is a short lived player. Player 1 can build a reputation of playing H by persistently doing so. This at the beginning could be costly for player 1, since it may take time for player 2 to be convinced (and in the meanwhile he will play l), but the subsequent payoffs could make the initial investment rewarding, if player 1 is sufficiently patient. However, nothing in the repeated game structure captures this feature. Therefore, it is necessary to introduce the probability $\hat{μ} > 0$ , i.e. the probability that player 2 assigns to the possibility that player 1 is a commitment type (so it has some hidden characteristics that ensure that he will exert high effort) and not a normal type (so a player without the hidden characteristics previously mentioned). This introduces a link between past and expected future play of H: player 2 can decide the action to play after having seen the behavior of player 1. Therefore, (L,l), which would have been the unique Nash equilibrium in the perfect monitoring game of complete information, is not the necessary outcome anymore. For example, if the product choice game is played twice in this setting, the response of player 2 will depend on what player 1 does, assuming that he is normal. If player 1 plays L, player 2 will subsequently play l, believing that he is facing the normal type. On the other hand, if player 1 plays H, player 2 will possibly conclude that he is facing the commitment type, thus best responding with h. Player 1, who has behaved like the commitment type, can sacrifice current payoff to get more in the following period.

Continuing with the example, the pure Stackelberg type of player 1 chooses H, with a payoff of 2. Suppose $Ξ = {ξ_{0}, ξ^{*}, ξ (L)}$ . If $δ \geq 1 / 2$ , always playing Hh is a subgame perfect equilibrium of the complete information game. Adapting the profile for incomplete information games we have that $σ_{1} (h^{t}, ξ) = {\begin{cases} H, & if ξ = ξ^{*} or (ξ = ξ_{0} and a^{τ} = H h \forall τ < t), \\ L, & otherwise, \end{cases}$

$σ_{2} (h^{t}) = {\begin{cases} h, & if a^{τ} = H h \forall τ < t, \\ l, & otherwise . \end{cases}$ This will be a Nash equilibrium for $δ \geq 1 / 2$ and $μ (ξ (L)) < 1 / 2$ . Player 2 will find optimal to play h in period 0. If he observes L, he will place probability 1 on type $ξ (L)$ or normal type, thus optimally punishing player 1. This ensures that player 1 will find optimal to play H. After observing $a_{1}^{0}$ in period 0, player 2 will assign 0 probability to $ξ (L)$ . It is also possible to demonstrate that it is impossible to obtain Nash equilibria with a low payoff for normal player 1. If $μ (ξ^{*}) < 1 / 3$ and $μ (ξ (L)) < 1 / 3$ , the normal player 1’s payoff in any pure strategy Nash equilibrium is bounded below by $2 δ$ and above by 2.

Now, let $a_{1}^{'} = H = a_{1}^{*}$ , the Stackeblerg action. Let $\tilde{ξ_{t}}$ a commitment type who: plays H if $τ < t$ and L afterwards. $\tilde{ξ_{0}}$ will be $ξ (L)$ , the simple commitment type who always plays L and $\tilde{ξ_{t}}$ the nonsimple commitment type for t $\geq$ 1. Finally, let $\hat{ξ}$ be the type that plays H in period 0 and in every other period if 2 plays h in time 0. Consider the strategy profile where normal player 1 always play H and player 2 always play h. $Ω^{'}$ is the set of outcomes in which player 1 always plays H. $q^{0} = 1 - μ ({\tilde{ξ}}_{0})$ . In period 1 Hh is the only history consistent with $Ω^{'}$ that has a non zero probability. With Bayes rule, we obtain that $q^{1} (H h) = \frac{1 - μ ({\tilde{ξ}}_{0}) - μ ({\tilde{ξ}}_{1})}{1 - μ ({\tilde{ξ}}_{0})} .$ If instead player 1 still always plays H, but player 2 plays h with probability 1/2 and l with probability 1/2 in the first period and then always plays h, the calculation for $q^{0}$ does not change, but now Hl and Hh are both consistent with $Ω^{'}$ . So, $q^{1} (H h) = \frac{1 - μ ({\tilde{ξ}}_{0}) - μ ({\tilde{ξ}}_{1})}{1 - μ ({\tilde{ξ}}_{0})} .$ and $q^{1} (H l) = \frac{1 - μ (\hat{ξ}) - μ ({\tilde{ξ}}_{0}) - μ ({\tilde{ξ}}_{1})}{1 - μ ({\tilde{ξ}}_{0})} .$

Imperfect monitoring games

The authors now study imperfect monitoring games, focusing firstly on the case with one long-lived and one short-lived players. $A_{i}$ will be the action space and $Z_{i}$ will be the finite signal space. In each stage of the repeated game, each player only learns the realized value of this (private) signal. Let $π (z | a)$ be the distribution over private signals $z = (z_{1}, z_{2})$ for each action profile $a$ and let $u_{i}^{*} (z_{i}, a_{i})$ be the ex-post payoff of normal player 1 and player 2 after the realization of z and a. Ex-ante stage game payoffs are $u_{i} (a) \equiv \sum_{z} u_{i}^{*} (z_{i}, a_{i}) π (z | a)$ . $Ξ$ is the same as the previous section. The set of private histories for player 1, excluding his type, is $H_{1} \equiv ⋃_{t = 0}^{\infty} (A_{1} \times Z_{1})^{t}$ , while a behavior strategy for player 1 can be defined as $σ_{1} : H_{1} \times Ξ \to Δ (A_{1})$ , such that $\forall ξ (\hat{σ_{1}}) \in Ξ_{2}$ , $σ_{1} (h_{1}^{t}, ξ ({\hat{σ}}_{1})) = {\hat{σ}}_{1} (h_{1}^{t}) \forall h_{1}^{t} \in H_{1}$ . The set of private histories for player 2 is $H_{2} \equiv ⋃_{t = 0}^{\infty} (A_{2} \times Z_{2})^{t}$ , while a behavior strategy is $σ_{2} : H_{2} \to Δ (A_{2})$ . A strategy profile $({\tilde{σ}}_{1}, {\tilde{σ}}_{2})$ is a Nash equilibrium of the reputation game with imperfect monitoring if $\forall$ $ξ \in Ξ_{1}$ , ${\tilde{σ}}_{1}$ maximizes $U_{1} (σ_{1}, {\tilde{σ}}_{2}, ξ)$ (type $ξ$ long lived player’s payoff) over player 1’s repeated game strategies, and if $\forall$ $t$ and $h_{2}^{t} \in H_{2}$ that have positive probability under $({\tilde{σ}}_{1}, {\tilde{σ}}_{2})$ and $μ$ , $E [u_{2} ({\tilde{σ}}_{1} (h_{1}^{t}, ξ), {\tilde{σ}}_{2} (h_{2}^{t})) | h_{2}^{t}] = \max_{a_{2} \in A_{2}} E [u_{2} ({\tilde{σ}}_{1} (h_{1}^{t}, ξ), a_{2}) | h_{2}^{t}] .$

Since in this environment monitoring is not perfect, the best responses obtainable by playing $a_{1}$ are not simply the actions in $B (a_{1})$ . Consider the set of possible best responses of player 2 to player 1’s mixed action $α_{1}$ . An action $α_{2}$ is an $ε - c o n f i r m e d$ best response to $α_{1}$ if $\exists α_{1}^{'}$ such that $α_{2} (a_{2}) > 0 \Rightarrow a_{2} \in arg \max_{a_{2}^{'}} u_{2} (α_{1}^{'}, a_{2}^{'})$ and $| π_{2} (\cdot | α_{1}, α_{2}) - π_{2} (\cdot | α_{1}^{'}, α_{2}) | \leq ε$ . $B_{ε} (α_{1})$ is the set of $ε$ -confirmed best responses to $α_{1}$ . For private monitoring games, let $B_{ϵ}^{*} ({\hat{α}}_{1}) \equiv {α_{2} : supp (α_{2}) \subset B_{ϵ} ({\hat{α}}_{1})}$ . The results of the authors show that if player 2 assign positive probability to a simple type $ξ (α_{1}^{'})$ , a patient normal player 1’s payoff in every Nash equilibrium (with an $ε > 0$ approximation) can be no lower than ${\bar{v}}_{1} (α_{1}^{'}) \equiv \min_{α_{2} \in B_{0}^{*} (α_{1}^{'})} u_{1} (α_{1}^{'}, α_{2})$ . Taking the supremum over $α_{1}^{'}$ , the payoff will be $v_{1}^{* *} \equiv \sup_{α_{1}^{'}} \min_{α_{2} \in B_{0}^{*} (α_{1}^{'})} u_{1} (α_{1}^{'}, α_{2})$ .

Further assumptions are necessary. In particular, it must be that $\forall a_{2} \in A_{2}$ , (and $\forall α_{2} \in Δ (A_{2})$ ), ${π_{2} (\cdot | (a_{1}, a_{2})) : a_{1} \in A_{1}}$ is linearly independent, that is for any action of player 2 no two actions for player 1 should generate the same distribution of signals. For the private-monitoring game, this means that $B (α_{1}) = B_{0}^{*} (α_{1})$ and $v_{1}^{* *}$ equals the mixed action Stackelberg payoff.

It is possible to demonstrate the following proposition. Let $\hat{ξ}$ be the simple commitment type that plays $\hat{α_{1}} \in Δ (A_{1})$ (or $\hat{α_{1}} \in A_{1}$ ). Suppose $μ (ξ_{0})$ , $μ (\hat{ξ}) > 0$ . In the private monitoring game (or canonical public monitoring game), $\forall ε > 0$ , $\exists$ K such that $\forall δ$ : $\bar{v_{1}} (ξ_{0}, μ, δ) \geq (1 - ε) δ^{K} \inf_{α_{2} \in B_{ε}^{*} ({\hat{α}}_{1})} u_{1} ({\hat{α}}_{1}, α_{2}) + (1 - (1 - ε) δ^{K}) \min_{a \in A} u_{1} (a)$ This means that, as for perfect monitoring games, the normal player 1 manages to establish a reputation for consistently playing like a simple type. This phenomenon persists even when there are numerous other potential commitment types present. This guarantees that even if it is not possible to be certain about player 2’s ultimate beliefs regarding player 1, we can be confident that player 2’s beliefs will converge towards something.

The same results can be obtained when the short-lived player is interpreted as a continuum of small and anonymous long lived players. Each small player receives a private signal from the finite set $Z_{2}$ , while the large player observes a private signal $z_{1} \in Z_{1}$ of the aggregate behavior of the small players. If the private signal received by the small players is common across all small players, the model is identical to what previously described. If each small player observes different realization of the private signal (idiosyncratic signals), the model is different. In this case, let $\hat{ξ}$ denote the simple commitment type that always plays ${\hat{α}}_{1} \in Δ (A_{1})$ (if $A_{1}$ is finite) or ${\hat{α}}_{1} \in A_{1}$ (if $A_{1}$ is infinite). Suppose $ξ_{0}, \hat{ξ} \in Ξ$ . $\forall$ $ε > 0$ , $\exists$ $K$ such that $\forall$ $δ$ , $v_{1} (ξ_{0}, μ, δ) \geq (1 - ε) δ^{K} \inf_{α_{2} \in B_{ε}^{*} ({\hat{α}}_{1})} u_{1} ({\hat{α}}_{1}, α_{2}) + (1 - (1 - ε) δ^{K}) \min_{a \in A} u_{1} (a) .$

Example: product-choice game with public monitoring

Player 1’s actions are not public. There is a public signal that can take values $\bar{y}$ and $\underline{y}$ , with distribution $ρ (\bar{y} | a) = {\begin{cases} p, & if a_{1} = H, \\ q, & if a_{1} = L, \end{cases}$ , with $0 < q < p < 1$ . Player 2’s actions are public and $\hat{α_{1}}$ is player 1’s mixed action that randomize equally between his possible actions. $\forall ε \geq 0$ , every pure or mixed action for player 2 $\in B_{ε} (\hat{α_{1}})$ . Therefore, $\min_{α_{2} \in B_{0} ({\hat{α}}_{1})} u_{1} ({\hat{α}}_{1}, α_{2}) = 1 / 2$ and $v_{1}^{* *} = 5 / 2$ . The resulting payoff is the mixed action Stackelberg payoff and it is higher than $2 - \frac{1 - p}{p - q} < 2$ , which it is possible to demonstrate that it is the upper bound player 1’s payoff in the public monitoring game with complete information.

Temporary reputations

Under imperfect monitoring, the authors show that player 2 must eventually learn the type of player 1. While the normal and commitment types may exhibit similar behavior over an extended duration, the normal type will inevitably find motivation to deviate slightly from the commitment strategy. This contradicts the belief held by player 2 that player 1 will consistently demonstrate commitment. Consequently, player 2 will eventually acquire knowledge about player 1’s true type.

The environment in which they operate is that of incomplete information private monitoring games. The authors assume that $\forall i = 1, 2, a \in A, z_{i} \in Z_{i}, π_{i} (z_{i} | a) > 0$ and $\forall a_{1} \in A_{1}$ , the collection of probability distributions ${π_{1} (\cdot, (a_{1}, a_{2})) : a_{2} \in A_{2}} is linearly independent.$ This assumptions implies that player i is able to correctly identify any fixed-stage game action of player j. The focus will be on one simple commitment type for player 1. For the the normal type, a strategy will be denoted as $\tilde{σ_{1}}$ , while for the commitment type $\hat{σ_{1}}$ . Let $P \in Δ (Ω)$ be the unconditional probability measure induced by $μ$ and the strategy profile, while $\hat{P}$ will be the measure induced by conditioning on $\hat{ξ}$ . For player 2, $\hat{σ_{2}}$ will be the unique stage game best response to $\hat{σ_{1}}$ . $(\hat{σ_{1}}, \hat{σ_{2}})$ will not be a stage-game Nash equilibrium. Suppose that there is a Nash equilibrium of the incomplete information game in which it is possible that player 1 can be either the normal or the commitment type. Player 2 will not distinguish between the signals generated by the two types, thus believing that both are playing the same strategy. Player 2 will play a best response to this strategy, that is a best response to the commitment type. Since it is not a best response also for player 1, he will find optimal to deviate, contradicting player 2’s beliefs. This means that in any Nash equilibrium of the game with imcomplete information, $\hat{μ^{t}} \equiv P ({\hat{ξ}} | G_{2}^{t}) \to 0$ , being $G_{2}^{t}$ the filtration on $Ω$ generated by player 2’s histories.

It is also possible to demonstrate that there cannot be equilibria in which uncertainty about player 1’s type survives after T, for any period T. Indeed $\forall ε > 0, \exists T$ such that for any Nash equilibrium of the game with incomplete information, $\tilde{P} ({\hat{μ}}^{t} < ε, \forall t > T) > 1 - ε .$

Reputation with long-lived players

This section deals with perfect monitoring games with two long-lived players. The goal is to demonstrate that when player 1 consistently plays the Stackelberg action and there exists a type of player 1 committed to that action, player 2 will eventually assign a high probability to the occurrence of the Stackelberg action in future rounds. However, being now player 2 long-lived, he may not play a best response to the Stackelberg type, but something else. When dealing with two long-lived players, a crucial step is to determine the conditions in which player 2, as his conviction regarding the appearance of the Stackelberg action grows, will ultimately choose to play a best response to that action. The following considerations applies in this setting: it is possible to think that as long as player 2 considers future benefits (thus discounting), any losses incurred by not playing a current best response must be compensated within a finite duration. However, if player 2 holds a strong conviction that the Stackelberg action will be played not only in the present but also in numerous subsequent periods, there will be no chance to accumulate future gains. Consequently, player 2 may find advantageous to simply opt for a stage-game best response. If this is the case, player 1 will receive almost the Stackelberg payoff in each period, thus putting a lower bound on his payoff if he is sufficiently patient. However, here lies the difference with the setting with short-lived players: player 2 might select an option other than a best response to the Stackelberg action due to concerns about triggering a future punishment by playing a current best response. This punishment would not occur if player 2 were facing the Stackelberg type, but player 2 can only be certain that he is facing the Stackelberg action, not the Stackelberg type. Short-lived players have the same fear, but this uncertainty does not affect their behavior given their time horizon.

Perfect monitoring

Consider a perfect monitoring repeated game with two long-lived players. The two players have different discount factors $δ_{i}$ and the characteristics of player 2 are known. The remaining environment is similar to the case with a short-lived player 2.

The focus is now on a commitment type that minmaxes player 2. It is possible to demonstrate that if $\exists$ a pure action $a_{1}^{'}$ that mixed-action minmaxes player 2 and there is a positive probability that player 1 is the simple type $ξ (a_{1}^{'})$ , a sufficiently patient normal player 1 gets a payoff arbitrarily close to $v_{1}^{*} (a_{1}^{'})$ , which is the one-shot bound on player 1’s payoff when he commits to $a_{1}^{'}$ . The following definition applies: the stage game has conflicting interests if a pure Stackelberg action $a_{1}^{*}$ mixed action minmaxes player 2. The highest reputation bound is obtained when the game has conflicting interests. Let $v 1 (ξ_{0}, μ, δ_{1}, δ_{2})$ be the infimum of the normal player 1’s payoffs. Suppose $μ (ξ (a_{1}^{'})) > 0 for some pure action a_{1}^{'} that mixed-action minmaxes player 2.$ $\exists$ a value k, independent of $δ_{1}$ (but depending on $δ_{2}$ ), such that $v_{1} (ξ_{0}, μ, δ_{1}, δ_{2}) \geq δ^{k} v_{1}^{*} (a_{1}^{'}) + (1 - δ_{1}^{k}) \min_{a} u_{1} (a) .$ Therefore, only in some periods k player 2 can play something other than the best response to $a_{1}^{'}$ and if player 1 is patient enough, these k periods have a small effect on his payoffs. Eventually, player 2 will play the best response to $a_{1}^{'}$ . Moreover, $\forall ε > 0, \exists \underline{δ_{1}} \in (0, 1)$ such that $\forall δ_{1} \in (\underline{δ_{1}}, 1)$ , $v 1 (ξ_{0}, μ, δ_{1}, δ_{2}) > v_{1}^{*} (a_{1}^{'}) - ε .$ If the equilibrium strategy of player 2 results in a payoff lower than his minmax payoff, given that $a_{1}^{'}$ is always played, then it implies that player 2 does not anticipate $a_{1}^{'}$ to be played consistently.

Example: The product choice game (same structure of section 1). This game is not one of conflicting interests. When player 1 takes the Stackelberg action H, it elicits a best response h from player 2, resulting in a payoff of 3 for player 2, surpassing his minmax payoff of 1. In contrast to games with conflicting interests, both normal player 1 and player 2 benefit more when player 1 chooses the Stackelberg action (with player 2 best responding) compared to the Nash equilibrium in the stage-game.

It is also possible that not only player 1’s type is unknown, but also player 2’s type. Each player’s type comes from a countable set before the game begins. $λ^{0} > 0$ is the probability that player 2 is normal. Also in this setting there is a maximum number of periods in which normal player 2 can play something other than the best response to $a_{1}^{'}$ . Suppose $μ (ξ (a_{1})) > 0$ for some action $a_{1}^{'}$ minmaxing player 2. $\exists$ a constant k, independent of player 1’s discount factor, such that the normal player 1’s payoff in any Nash equilibrium of the repeated game is at least $λ^{0} δ_{1}^{k} v_{1}^{*} (a_{1}^{'}) + (1 - λ^{0} δ_{1}^{k}) \min_{a} u_{1} (a) .$ All in all, to establish a reputation, it is not important that incomplete information is only one sided, but that player 1 is sufficiently patient and that $μ (ξ (a_{1}^{'})) > 0 .$ Therefore, this ensures that player 2 will eventually play an optimal response to $a_{1}^{'}$ and not in a very far away period.

Other actions

If action $a_{1}^{'}$ does not minmax player 2, the number of periods in which player 2 is not optimally responding cannot be bound anymore. However, it is possible to bound the number of times in which player 2 can expect a continuation payoff lower than his minmax value. Player 1 payoff bound is $v_{1}^{†} (a_{1}^{'}) \equiv \min_{α_{2} \in D (a_{1}^{'})} u_{1} (a_{1}^{'}, α_{2})$ where $D (a_{1}^{'}) = {α_{2} \in Δ (A_{2}) ∣ u_{2} (a_{1}^{'}, α_{2}) \geq v_{2}}$ is the set of player 2 actions that imply his minmax utility. It is possible to demonstrate that, fixing $δ_{2} \in [0, 1)$ and $a_{1}^{'} \in A_{1}$ with $μ (ξ (a_{1}^{'})) > 0$ , $\forall ε > 0$ , $\exists \underline{δ_{1}} < 1$ such that $\forall δ_{1} \in (\underline{δ_{1}}, 1), v_{1} (ξ_{0}, μ, δ_{1}, δ_{2}) \geq v_{1}^{†} (a_{1}^{'}) - ε .$ If $a_{1}^{'}$ does not minmax player 2, then $v_{1}^{†} (a_{1}^{'}) < v_{1}^{*} (a_{1}^{'})$ , the one shot bound. Moreover, it is not necessary that the Stackelberg action maximizes $v_{1}^{†} (a_{1})$ and this bound is verified for all actions. If $μ > 0$ for all simple pure commitment types, then a normal player 1 will get a payoff close to $\max_{a_{1} \in A_{1}} v_{1}^{†} (a_{1})$ .

Imperfect Public Monitoring

In this section, there are 2 long-lived players who play an imperfect public monitoring game. $A_{1}, A_{2}$ are finite action sets. $ρ$ is the public monitoring distribution and it has full support, so $\forall y \in Y and a \in A_{1} \times A_{2}$ , $ρ (y | a) > 0 .$ Player 2’s actions are imperfectly monitored by player 1 and player 2 is able to update his belief based on what he observes about player 1. Therefore, it is also assumed that $\forall$ mixed action $α_{2} \in Δ (A_{2})$ , $ρ (\cdot | (α_{1}, α_{2})) = ρ (\cdot | (α_{1}^{'}, α_{2})) \Rightarrow α_{1} = α_{1}^{'} .$ As usual, $δ_{i}$ is the discount factor for player i, $μ$ is the prior with a support $Ξ$ . Player 1’s set of history is $H_{1} = ⋃_{t = 0}^{\infty} (A_{1} \times Y)^{t}$ and a strategy behavior is $σ_{1} : H_{1} \times Ξ \to Δ (A_{1})$ , while a set of histories for player 2 is $H_{2} = ⋃_{t = 0}^{\infty} (A_{2} \times Y)^{t}$ , with a behavior strategy $σ_{2} : H_{2} \times Ξ \to Δ (A_{2})$ . A Nash equilibrium $σ = (σ_{1}, σ_{2})$ and $μ$ induce a measure P over the set of outcomes $Ω \equiv Ξ \times (A_{1} \times A_{2} \times Y)^{\infty}$ . It is possible that player 1 is committed to a non simple strategy. Being $G^{N} (δ_{2})$ the complete information finitely repeated game that plays the complete information stage game N times, it is possible to define the payoff as follow: for player 1, $\frac{1}{N} \sum_{t = 0}^{N - 1} u_{1} (a^{t})$ ; for player 2, $\frac{1 - δ_{2}}{1 - δ_{2}^{N}} \sum_{t = 0}^{N - 1} δ_{2}^{t} u_{2} (a^{t})$ . $σ_{1}^{N}$ is a strategy in a infinitely repeated game or in a finite repeated game of length N or integer multiple of N. The target for player 1’s payoff is the maximum payoff achievable by the strategies $Ξ$ , the support of $μ$ , within the corresponding finitely repeated game, when player 1 exhibits arbitrary patience. The set of player 1 payoffs can be defined as $V_{1} (δ_{2}, Ξ) \equiv {v_{1} : \forall ε > 0, \exists N, ξ (σ_{1}^{N}) \in Ξ s.t. \forall σ_{2}^{N} \in B^{N} (σ_{1}^{N}; δ_{2}), U_{1}^{N} (σ_{1}^{N}, σ_{2}^{N}) \geq v_{1} - ε}$ $and set v_{1}^{‡} (δ_{2}, Ξ) = \sup V_{1} (δ_{2}, Ξ)$ . If $ξ (α_{1}) \in Ξ$ , then $v_{1}^{‡} (δ_{2}, ξ) \geq v_{1}^{*} (α_{1})$ . If $ξ$ contains only simple commitment types, then $v_{1}^{‡} (δ_{2}, ξ) = v_{1}^{* *}$ . $v_{1}^{‡}$ may be much higher than $v_{1}^{* *}$ It is possible to demonstrate that $\forall η > 0$ and $δ_{2}$ , $\exists \underline{δ_{1}} < 1$ such that $\forall δ_{1} \in (\underline{δ_{1}}, 1)$ , $v_{1} (ξ_{0}, μ, δ_{1}, δ_{2}) \geq v_{1}^{‡} (δ_{2}, Ξ) - η$ . Moreover, $\forall η > 0 and δ_{2} > 0, \exists N^{'}, δ_{1}^{'}, ε^{'}, and a strategy σ_{1}^{N^{'}} for G^{N^{'}} (δ_{2})$ , with $ξ (σ_{1}^{N^{'}}) \in Ξ$ , such that if player 2 plays an $ε^{'}$ -best response to $σ_{1}^{N^{'}}$ in $G^{N^{'}} (δ_{2})$ , then player 1’s $δ_{1}$ -discounted payoff in $G^{N^{'}} (δ_{2})$ is at least $v_{1}^{‡} (δ_{2}, Ξ) - \frac{η}{2} .$ It can also be shown that $\forall ε > 0, N \in ℕ, and σ_{1}^{N}, \exists γ > 0$ such that $\forall {\tilde{σ}}_{1}^{N}$ , if $| ρ^{N} (\cdot | (σ_{1}^{N}, σ_{2}^{N})) - ρ_{N} (\cdot | ({\tilde{σ}}_{1}^{N}, σ_{2}^{N})) | < γ for σ_{2}^{N}$ an $ε -best response to σ_{1}^{N} in G^{N} (δ_{2})$ , then $σ_{2}^{N}$ is a $2 ε -best response$ to ${\tilde{σ}}_{1}^{N} .$

Given $G^{N} (δ_{2})$ , let us divide the infinitely repeated game into blocks of length N. Incomplete information repeated games have a prior $μ \in Δ (Ξ)$ and a posteriors $μ^{'} \in Δ (Ξ)$ . $Given a strategy profile σ and a prior μ^{'}, let ρ_{σ, μ^{'}}^{N} (\cdot | h_{2}^{N k})$ be player 2’s "one-block" ahead prediction of the distribution over signals in block $G^{N, k}$ (the $k^{t h}$ block of periods of length N) for any private history $h_{2}^{N, k} \in H_{2}^{N, k} .$ $P^{(σ_{1}^{N}, σ_{2})}$ is the probability measure over $Ω$ implied by $σ_{2}$ conditioning on the event $ξ (σ_{1}^{N}) \in Ξ$ . It can be shown that, fixing $λ, μ^{†} \in (0, 1)$ and $γ > 0$ , integer N, and a strategy $σ_{1}^{N}$ , there exists an integer L such that $\forall (σ_{1}, σ_{2})$ and all $μ^{'} \in (Δ (Ξ))$ with $μ^{'} (ξ (σ_{1}^{N})) \geq μ^{†}$ , $P^{(σ_{1}^{N}, σ_{2})} (| {k \geq 0 : | ρ_{(σ_{1}^{N}, σ_{2}), μ^{'}}^{N} (\cdot | h_{2}^{N k}) - ρ_{(σ_{1}, σ_{2}), μ^{'}}^{N} (\cdot | h_{2}^{N k}) | \geq γ} | \leq L) \geq 1 - λ .$

Commitment types who punish

A similar result can be obtained by adding, in the environment of perfect monitoring, commitment types who punish player 2 for not behaving properly. Player 2 will thus know the features of player 1, which previously remained hidden. This uncertainty was the reason why player 2 did not play a best response to the Stackelberg type in the simple environment of perfect monitoring.

Let $a_{1}^{'} \in A_{1}$ be an action for player 1, $a_{2}^{'}$ the best response for player 2 for which $u_{2} (a_{1}^{'}, a_{2}^{'}) > v_{2}^{p}$ and $\hat{a_{1}^{2}}$ the action for player 1 that minmaxes plaer 2. Player 1 is a commitment type who plays strategy $\hat{σ_{1}}$ which consist in phase k, where he plays $\hat{a_{1}^{2}}$ and then he plays $a_{1}^{'}$ . If player 2 does not play $a_{2}^{'}$ , he will punish him. It is possible to demonstrate that fixing an integer $K > 0$ and $η > 0$ , there exists an integer $T (K, η, {\hat{μ}}^{0})$ such that $\forall$ pure strategy $σ_{2}$ and any $ω \in \hat{Ω}$ , there are no more than $T (K, η, {\hat{μ}}^{0})$ periods $t$ in which player 2 attaches probability no greater than $1 - η$ to the event that player 1 plays as ${\hat{σ}}_{1}$ in periods $t, \dots, t + K$ , given that player 2 plays as $σ_{2}$ . Moreover, fixing $ε > 0$ and letting $Ξ$ contain ${\hat{σ}}_{1}$ , for some action profile $a^{'}$ with $u_{2} (a^{'}) > v_{2}^{p}$ , $\exists$ a $\underline{δ_{2}} < 1$ such that $\forall δ_{2} \in (\underline{δ_{2}}, 1)$ , $\exists$ a $\underline{δ_{1}}$ such that $\forall δ_{1} \in (\underline{δ_{1}}, 1)$ , $v (ξ_{0}, μ, δ_{1}, δ_{2}) \geq u_{1} (a^{'}) - ε$ . If $a_{1}^{'}$ is player 1’s Stackelberg action, then this result gives player 1’s Stackelberg payoff as a lower bound of his equilibrium payoff in the game of incomplete information. The bound on the payoff for the normal player 1 is determined by demonstrating that player 2 will face only a finite number of punishments resulting from player 1’s commitment type.

Temporary reputations with two long-lived players

The results obtained in section 4.4.2 can be generalized also in the case of two long-lived players. Considering the case in which player 1’s type is unknown, it is possible to show some conditions under which player 2 is effectively able to learn about player 1’s type. The authors assume a commitment type who plays $\hat{σ_{1}}$ , a strategy with no long-run credibility for which:

$\hat{σ_{2}}$ is the best response for player 2 and it is unique on the equilibrium path.
$\exists T^{0}$ such that $\forall t > T^{0}$ , normal player 1 is likely to deviate from $\hat{σ_{1}}$ , given $\hat{σ_{2}}$ .

The set of best responses to $σ_{1}$ for player 2 in the game of complete information is $B (σ_{1}) \equiv {σ_{2} : U_{2} (σ_{1}, σ_{2}) \geq U_{2} (σ_{1}, σ_{2}^{'}) \forall σ_{2}^{'}}$ , with $U_{i}^{t}$ being player i’s continuation value in period t.

Let $π$ be the monitoring distribution and let the commitment type’s strategy ${\hat{σ}}_{1}$ be public and with no long-run credibility. Then in any Nash equilibrium of the game with incomplete information, ${\hat{μ}}^{t} \to 0$ $\tilde{P}$ -almost surely. $\hat{σ_{1}}$ is public so that player 1 can anticipate player 2’s optimal response to $\hat{σ_{1}}$ . It is important to underline that a long-lived player 2 will best respond to the commitment type once he is convinced that he is almost certainly facing the commitment strategy. Moreover, the normal type deviations from the commitment strategy will occur only in a finite number of periods.

References

Cabral, L. M. B. (2005). The economics of trust and reputation: A primer [Last access June 8, 2023]. Reputation June05.pdf
Diekmann, A., & Przepiorka, W. (2021). Trust and reputation in historical markets and contemporary online markets. In A. Maurer (Ed.), Handbook of economic sociology for the 21st century: New theoretical approaches, empirical studies and developments (pp. 131–145). Springer International Publishing.
Dowling, G. (1993). Developing your company image into a corporate asset. Long Range Planning, 26, 101–109.
Eccles, R., Newquist, S., & R., S. (2007). Reputation and its risks [Last access June 10, 2023]. [1]
Einwiller, S. (2003). When reputation engenders trust: An empirical investigation in business-to-consumer electronic commerce. Electronic Markets, 13 (3), 196–209.
Gambetta, D. (2000). Can we trust trust? Trust: Making and Breaking Cooperative Relations, electronic edition, Department of Sociology, University of Oxford, 213–237.
Mailath, G. J., & Samuelson, L. (2006). Repeated Games and Reputations: Long-Run Relationships. Oxford: Oxford University Press.
Mailath, G. J., & Samuelson, L. (2015). Chapter 4 - reputations in repeated games. In H. P. Young & S. Zamir (Eds.), Handbook of Game Theory with Economic Applications. Elsevier. 165-238.
Nash, J. (1951). Non-cooperative games. Annals of Mathematics, 54 (2), 286–295.
Popitz, H. (1980). Die normative konstruktion von gesellschaft. Tubinga: Mohr Siebeck.
Roberts, P. W., & Dowling, G. R. (2002). Corporate reputation and sustained superior financial performance. Strategic Management Journal, 23 (12), 1077–1093.
Selcuk, A., Uzun, E., & Pariente, M. (2004). A reputation-based trust management system for p2p networks. IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004., 251–258.
Webley, S. (2004). Risk, reputation and trust. Journal of Communication Management, 8, 9–12.
Xiong, L., & Liu, L. (2004). Peertrust: Supporting reputation-based trust for peer-to-peer electronic communities. IEEE Transactions on Knowledge and Data Engineering, 16 (7), 843–857.

Anonymous

Search

Reputation, Trust, and Reputation Games: Exploring the Dynamics of Reputation in Game-Theory Contexts

Namespaces

More

Page actions

Contents

Introduction

Reputation and Trust

Definitions

Stage games with perfect monitoring

Repeated games with perfect monitoring

Equilibria

Simple strategy and penal code

Long-lived and short-lived players

Games with imperfect monitoring

Stage games

Repeated games

Reputation games

Commitment types

Perfect monitoring games

Example: product-choice game

Imperfect monitoring games

Example: product-choice game with public monitoring

Temporary reputations

Reputation with long-lived players

Perfect monitoring

Other actions

Imperfect Public Monitoring

Commitment types who punish

Temporary reputations with two long-lived players

References

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Reputation, Trust, and Reputation Games: Exploring the Dynamics of Reputation in Game-Theory Contexts

Introduction

Reputation and Trust

Definitions

Stage games with perfect monitoring

Repeated games with perfect monitoring

Equilibria

Simple strategy and penal code

Long-lived and short-lived players

Games with imperfect monitoring

Stage games

Repeated games

Reputation games

Commitment types

Perfect monitoring games

Example: product-choice game

Imperfect monitoring games

Example: product-choice game with public monitoring

Temporary reputations

Reputation with long-lived players

Perfect monitoring

Other actions

Imperfect Public Monitoring

Commitment types who punish

Temporary reputations with two long-lived players

References

Navigation

Wiki tools

Page tools