-

⚠️ Scheduled Maintenance – February 16, 9:00–10:00 AM
The site will undergo a software update during this time. Please avoid using the site, as temporary disruptions may occur.

Reputation, Trust, and Reputation Games: Exploring the Dynamics of Reputation in Game-Theory Contexts

From Fintech Lab Wiki
Revision as of 16:05, 11 June 2023 by 3122188 (talk | contribs) (Created page with "{{DISPLAYTITLE: Reputation, Trust, and Reputation Games: Exploring the Dynamics of Reputation in Game-Theory Contexts}} Contribution of SIMONE GOZZINI <span id="introduc...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Contribution of SIMONE GOZZINI

Introduction

Trust and reputation are different but interrelated concepts. The former can be defined as “a particular level of the subjective probability with which an agent assesses that another agent or group of agents will perform a particular action, both before he can monitor such action (or independently of his capacity ever to be able to monitor it) and in a context in which it affects his own action” (Gambetta, 2000, p. 5), while the latter can be defined as “a situation when agents believe a particular agent to be something" (Cabral, 2005, p. 3). Trust is a fundamental human sentiment that plays a crucial role in fostering cooperation among individuals. It serves as a catalyst for positive outcomes in various domains, including stock market participation, firm performance, and efficient market transactions. Trust creates an environment where individuals feel secure and confident in engaging in economic exchanges, leading to smoother and more fluid interactions. In fact, trust is considered a vital ingredient that allows complex modern societies to not only exist but also evolve and thrive (Popitz, 1980). Its influence permeates through different aspects of human interaction, enabling cooperation, economic growth, and societal development. However, trust involves risk and uncertainty about the other party’s behavior, given that perfect monitoring of what the other agent is doing is not possible: trust and trustworthiness are not easy to develop. Throughout history, various solutions have been devised to address this problem, including the use of physical coercion, contract law, and reputation. Among these solutions, reputation holds significant importance in enhancing trust and trustworthiness by reducing uncertainty, given that it ”establish(es) links between past behavior and expectations of future behavior.” (Mailath & Samuelson, 2015, p. 166). By leveraging reputation, individuals can make more informed judgments about the trustworthiness of others, thereby reducing the risks associated with trust. As reputation serves as a bridge between past actions and future expectations, it plays a pivotal role in fostering trust and creating a more conducive environment for cooperative interactions.

Mailath and Samuelson (2006) extensively study the effects of reputation within the environment of repeated games, which has been found effective because those games provide a clear mathematical framework to describe both the short-term incentives that encourage opportunistic behavior and, through well-defined specifications of future actions, rewards, and punishments, the incentives that discourage opportunistic behavior.

The paper is organized as follows: section 2 offers contextual information on the connection between trust and reputation. It discusses the interplay between these concepts and their significance in various domains; section 3 provides a mathematical framework for stage-games (single round interactions) and repeated-games (interactions occurring over multiple rounds), focusing both on the case of perfect and imperfect monitoring; section 4 deals with the concept of reputation games within the context of both perfect and imperfect monitoring, in the case of one long-lived and one short-lived player; finally, section 5 presents the various specifications of repeated games in the case where the two players are both long-lived.

Reputation and Trust

Extensive research in the literature has thoroughly examined the relationship between trust and reputation across various applications and contexts.

Diekmann and Przepiorka (2021) explore the problem of trust in economic transactions, which they define as "the uncertainty regarding the trustworthiness and/or competence of the trustee that the truster faces" (Diekmann & Przepiorka, 2021, p. 132). Uncertainty can hinder the occurrence of efficient exchanges, as it creates a situation where one party is unsure whether the other party will fulfill their promises. This uncertainty may manifest in various ways, such as doubts regarding whether the other agent will exchange the agreed-upon amount, deliver the promised product of satisfactory quality, or even engage in the exchange at all. Reputation, which creates the possibility to engage in long-lasting relationships, can be a solution given that it is a “shadow of the past” (Diekmann & Przepiorka, 2021, p. 134) behavior of the trustee. Trustworthiness can arise because reputation is a way to make the past transaction history of the trustee known to the trustor, who, by having access to this information, can gain insights into the characteristics and behavior of his counterpart, thereby reducing uncertainty. This is especially true in online markets, where the seller and the vendor are usually anonymous and are likely to engage in one-time-only transactions: decentralized reputation systems like ratings play an indispensable role in ensuring the viability of these markets, given that buyers are more likely to buy from sellers with good reputation and therefore vendors have the incentive to maintain a good reputation by providing good services. Also according to Einwiller (2003), who conducted an empirical research involving 473 German internet users, reputation systems play a critical role in building trust within online markets. This is especially significant for consumers who are less familiar with engaging in such markets.

Xiong and Liu (2004) study the problem of reputation and trust in peer-to-peer online communities, which are platforms or networks that facilitate direct interaction and collaboration among individuals without relying on a centralized authority or server. To overcome the uncertainty and the threats related to decentralized networks, they develop a reputation-based trust supporting framework called PeerTrust, which aims to assess the trustworthiness of peers within these communities. This system is based on the following features: the feedback a peer receives from other peers, the total number of transactions a peer performs, the credibility of the feedback sources, the transaction context factor, and the community context factor. Once again, it is evident that building a reputation plays a pivotal role in enhancing trust and facilitating efficient exchanges. Similar conclusions are also achieved by Selcuk et al. (2004).

At a more granular level, the reputation of individual firms (and therefore their perceived trustworthiness) holds significant importance. Companies that foster a positive culture, underpinned by a well-defined code of ethics, possess the ability to cultivate a favorable reputation. This, in turn, bolsters the trust vested in them by their stakeholders (Webley, 2004). Consequently, these firms experience enhanced profitability as they attract high-quality employees, cultivate loyal customers, and are perceived as more valuable, affording them the opportunity to command premium pricing for their products (Eccles et al., 2007). Furthermore, reputation enhances the persistence and sustainability of these positive performances over time (Roberts & Dowling, 2002). As such, reputation emerges as an indispensable intangible asset (Dowling, 1993).

Definitions

To delve into the realm of game theory, it is essential to establish a clear understanding of the mathematical frameworks underlying stage games and repeated games.

Stage games with perfect monitoring

Repeated games can be defined starting from stage games. There are n players and each of them can choose various actions ai from a set of available pure actions Ai. The set of pure actions profile is AiAi. The payoffs are given by a continuous function u:iAin. The set of mixed actions for player i is Δ(Ai), with a typical element αi, while the set of mixed profile is iΔ(Ai). The set of stage-game payoffs generated by pure action profiles in A is F{vn:aA such that v=u(a)} and the set of feasible payoff is FcoF, which is a convex hull of F. A payoff vF is inefficient if vF s.t vi>vi, otherwise it is efficient. According to Nash (1951), if the stage game is finite, there is a Nash equilibrium, but, because the payoffs are given by a continous function, also infinite stage games have Nash equilibria. Further assumptions are made:

  • Ai is either finite, or a compact and convex subset of the Euclidean space k, for some k.
  • If Ai is a continuum action space, then u:An is continuous, and ui is quasiconcave in ai .

It is important to characterize the worst payoff that an individual can reach while optimizing. The worst outcome for player i, consistent with player i behaving optimally, is reached when the other player choose the action aiAijiAj that minimizes the payoffs i gets when he plays the best response to action ai. This is called minmax payoff and mathematically is defined by vipminaiAimaxaiAiui(ai,ai). Thus, it is possible to define ai^=(aii^,aii^) as the minmax profile for player i. A payoff vector v=(v1,,vn) is weakly individually rational if vivip i, and it is strictly individually rational if vi>vip i.

Repeated games with perfect monitoring

A repeated game is a stage game that is repeated in each period t{0,1,}. The behavior in this kind of games is called strategy and it is a collection of actions. The authors firstly deal with perfect monitoring games, so at the end of each period t each player can observe the actions taken by all the other players. The set of period t histories is given by At, that is the t-fold product of A. A history htAt is thus a list of t action profiles, identifying the actions played in periods 0 through t1. The set of all possible histories is Ht=0At. A pure strategy can be defined as σi:HAi, while a mixed strategy (also called behavior strategy) is σi:HΔ(Ai). A continuation game is the infinitely repeated game that begins in t, following history ht. The continuation strategy induced by ht is denoted by σi|ht. The continuation game associated with each history is also defined as a subgame identical to the original game. This means that repeated games have a recursive structure. An outcome path in the infinitely repeated game is an infinite sequence of action profiles a(a0,a1,a2,)A. They differ from histories because histories have a finite length. The first t periods of an outcome are defined as at=(a0,a1,,at1). Thus, at is the history in At corresponding to the outcome a. A strategy profile σ induces an outcome a(σ) and, analogously, a behavior strategy profile σ induces a path of play. For a pure strategy profile, the induced path of play and induced outcome are the same. In period t, the induced pure strategy profile at(σ) yields a payoff ui(at(σ)). An outcome a(σ) induces an infinite stream of stage game payoffs (ui(a0(σ)),ui(a1(σ)),ui(a2(σ)),) which are discounted with a discount factor δ[0,1). The payoff from a pure strategy profile σ is therefore: Ui(σ)=(1δ)t=0δtui(at(σ)). The authors assume that long-lived players share a common discount factor δ.

A Nash equilibrium is a strategy profile in which each player is optimally responding to the strategies of the others. Formally, σ is a Nash equilibrium if players i and strategies σi, we have Ui(σ)Ui(σi,σi). Furthermore, a strategy profile σ is a subgame-perfect equilibrium if histories htH, σi|ht is a Nash equilibrium of the repeated game.

The authors further define the concept of one shot deviation: for player i and from strategy σi, a one shot deviation is a strategy σi^σi with the property that a unique history ht H such that hτht~, σi(hτ)=σ^i(hτ). A one shot deviation σi^ is profitable if, fixed an opponent strategy σi, at the history ht~ for which σ^i(h~t)σi(h~t), Ui(σ^ih~t,σih~t)>Ui(σh~t). A strategy profile is a subgame perfect IFF there are no profitable one shot deviations. However, along the equilibrium path, the absence of one-shot deviations is not sufficient to establish a Nash equilibrium, even though in the equilibrium path there cannot be profitable one-shot deviations. The one-shot principle is useful because it simplifies the strategies to check when evaluating subgame perfection. Another way to simplify is by grouping repeated-game strategies in automata. An automaton (W,w0,f,τ) is a set of states W, an initial state w0W, a decision function f : WiΔ(Ai) associating mixed action profiles with states, and a transition function τ:W×AW, which identifies the next state of the automaton, given its current state and the realized stage-game pure action profile. Any strategy profile can be represented by an automaton. It is possible to represent a single strategy σi by the automaton (Wi,wi0,fi,τi), the strategy profile σ as the automaton (W,w0,f,τ) and a continuation strategy as the automaton (W,τ(w0,ht),f,τ). The strategy profile with an automaton (W,w0,f,τ) is a subgame perfect equilibrium IFF w W accessible from w0, the strategy profile induced by (W,w,f,τ) is a Nash equilibrium for the repeated game.

Equilibria

To discuss the notion of equilibrium, the authors describe the notions of enforceability and pure-action decomposability:

  • Enforceability: A pure action profile a* is enforceable on W if some specification of continuation promises (i.e. a commitment made by a player in a sequential game to take a specific action in future stages of the game) γ:AW such that, players i and aiAi, (1δ)ui(a*)+δγi(a*)(1δ)ui(ai,ai*)+δγi(ai,ai*).
  • Decomposability: A payoff vF is pure action decomposable on W if a pure action profile a* enforceable on W such that vi=(1δ)ui(a*)+δγi(a*) where γ is a function enforcing a*.

Any set of payoff WF with the property that every payoff in W is pure-action decomposable on W is a set of pure-strategy subgame-perfect equilibrium payoffs. Moreover, a set W is pure-action self-generating if every payoff in W is pure-action decomposable on W.

Simple strategy and penal code

Simple strategy profile: given (n + 1) outcomes {a(0),a(1),,a(n)}, the associated simple strategy profile σ(a(0),a(1),,a(n)) is given by the automaton: W={0,1,,n}×{0,1,2,}, w0=(0,0), f(j,t)=at(j) and τ((j,t),a)={(i,0)if aiait(j) and ai=ait(j)(j,t+1)otherwise Therefore, a simple strategy consist in a prescribed outcome a(0) and a punishment outcome a(i) player i. The games continues according to the outcome a(0), but, if there is any deviation by player i, the other players respond with the player i outcome path a(i). it is important to underline that the punishment for a deviation is independent of when the deviation occurs and the nature of it. The simple strategy profile σ(a(0),a(1),,a(n)) is a subgame-perfect equilibrium IFF: Uti(a(j))maxaiAi[(1δ)ui(ai,ait(j))+δUi0(a(i))], i=1,,n, j=0,1,,n, and t=0,1,.

Then, a definition of an optimal penal code is given. Let {a(i):i=1,,n} be n outcome paths satisfying Ui0(a(i))=vi*, for i=1,,n. The collection σ(i)=σ(a(i),a(1),,a(n)) is an optimal penal code if σ(i)p,i=1,,n, being p the set of pure strategy subgame-equilibrium payoffs.

Long-lived and short-lived players

The authors introduce short-lived players, who differ from long-lived players since the latter live throughout the game. On the contrary, short-lived players are concerned only with the current period payoffs (therefore they do not discount) and they are called myopic. There are two interpretations for them:

  • In each period a collection of short-lived players enter the games and, after one period, they leave.
  • Each of them represents a continuum of long-lived agents.

Small players are assumed to be anonymous (i.e a change in the behavior of a member of the continuum does not affect the ditribution of play, so it does not affect the behavior of other players) and each of them observes observes only the history htAt. The notion of one-shot deviation applies also in games with long and short lived players. Moreover, being myopic, the short-lived players play a Nash equilibrium of the induced stage game, given the actions of the long-lived players.

It is possible to generalize the concept of minmax payoff with short-lived players. Let B: i=1nΔ(Ai)i=n+1NΔ(Ai) be the correspondence that maps any mixed-action profile for the long-lived players to the corresponding set of static Nash equilibria for the short-lived players. For each long-lived player i, the payoff vi=minαBmaxaiAiui(ai,ai) is player i’s (mixed-action) minmax payoff with short-lived players and it is a lower bound on the payoff that player i can obtain in an equilibrium of the repeated game.

Finally, repeated games with short-lived players have some features that must be underlined:

  • There are some restrictions on the payoffs that can be achieved by the long-lived players. In particular, short-lived players impose restrictions on the set of equilibrium payoffs that go beyond the specification of minmax payoffs.
  • There are restrictions on the structure of the equilibrium.

Let i being a long-lived player and vi=supαBminaisupp(αi)ui(ai,αi) be the minimum payoff that it is possible to construct by adjusting player i’s behavior within the support of his action αi. In every subgame-perfect equilibrium for player i, the payoff for player i will be less or equal to vi.

Games with imperfect monitoring

The focus before was on perfect monitoring games, where deviations from the equilibrium can be easily detected and punished, thus providing incentives for players not to myopically optimize. Now the authors focus on games with imperfect monitoring, so games where players have only noisy information about past play, thus making deviations more difficult to discover. However, it is still possible to have players that do not myopically optimize, since punishments are still possible. When the noisy signals are observed by all players we are in the case of imperfect public monitoring, while if some signals are observed by some players but not others, we are in the case of private monitoring. The following sections focus on imperfect public monitoring, but the results can be extended for the case of private monitoring.

Stage games

The specification of this games is very similar to the stage games with perfect monitoring, seen in section 3.1. The main difference relies in the presence of a public signal y, from the finite signal space Y. ρ(y|a) will be the probability that signal y is realized, given aAiAi. The function ρ:Y×A[0,1] is continuous. ρ has full support if y and a,ρ(y|a)>0. Player i’s payoff at the end of each period, given the realization (y,a) is ui*(y,ai), while ex-ante stage game payoffs are ui(a)=yYui*(y,ai)ρ(y|a). The other assumptions are maintained.

Repeated games

Repeated games with imperfect monitoring have a similar structure to the games presented in section 3.2. The only public information available in t is the t-period history of public signals ht(y0,y1,,yt1). The set of public histories is Ht=0Yt. A history for a long-lived player will include the public history and the history of the actions he has taken. It is assumed that each short lived-player in period t only observes the public history ht. A pure strategy profile does not induce a deterministic outcome path, since public signals may be random. Public monitoring games include also the special case of perfect monitoring games, where Y=A and ρ(y|a)=1 if y=a, 0 otherwise.

Long-lived players have private information (the knowledge of their past actions), so information sets are isomorphic to private histories, not to public histories. A behavior strategy σi is public if, in every period t, it depends only on the public history htYt and not on i’s private history: hit,h^itHi satisfying yτ=y^τ τt1, σi(hit)=σi(h^it). Otherwise, it is private. If all players other than i are playing a public strategy, then player i has a public strategy as a best reply. σi and σi^ are said to be realization equivalent if, for all strategies for the other players, σi, the distributions over outcomes induced by (σi,σi) and (σ^i,σi) are the same. Thus, it is possible to demonstrate that every pure strategy in a public monitoring game is realization equivalent to a public pure strategy.

Finally, it is possible to define a perfect public equilibrium (PPE) as a profile of public strategies σ that, ht, specifies a Nash equilibrium for the repeated game. That is, t and htYt, σ|ht is a Nash equilibrium. σ is a PPE IFF there are no profitable one shot deviations.

Reputation games

Reputation can be considered as a specification of the concept of trust. It can be defined as "a link between past behavior and expectations of future behavior" (Mailath & Samuelson, 2006, p.459) that arise after multiple interactions among agents. Repeated games are therefore a good environment to study reputation.

There are two kind of approaches:

  • In the first, an equilibrium of the repeated game is selected and it involves an equilibrium path that is not a Nash equilibrium of the stage game. Players who choose the equilibrium are said to maintain a reputation, triggering a punishment if they deviate, consisting in loosing their reputation. The link between past and future behavior, i.e. reputation, is an equilibrium phenomenon.
  • In the second (adverse selection approach), each player is uncertain about the characteristics of his opponent. Therefore, this incomplete information introduces a link between past and future behavior, thus making the concept of reputation arise. Here, the reputation approach does not describe a possible equilibrium, but places constraints on the equilibria available.

Commitment types

The authors firstly consider game with one long-lived player and one short-lived player. Player 2 does not know the type of player 1, but he has a prior belief μ about player 1’s type ξ, which comes from a countable set Ξ. There are two possible categories of types for player 1:

  • the payoff type Ξ1, which maximizes the average discounted value of his payoffs. Of particular interest, in this category, there is the normal type ξ0Ξ1, who has a stationary payoff function.
  • the commitment type Ξ2, who does not have payoffs but play a particular game strategy. If the strategy played consist in the same stage-game action in every period, regardless of the past history, the player will be a simple commitment type.

Usually, the probability μ(ξ0) is large.

Player 1’s pure-action Stackelberg payoff can be defined as v1*=supa1A1minα2B(a1)u1(a1,α2) where B(a1) is the set of player 2 myopic best replies to a1. If the supremum is achieved by some action a1*, that action will be a Stackelberg action a1*argmaxa1A1minα2B(a1)u1(a1,α2). This is a pure action, known as the Stackelberg action, that player 1 would commit to if given the opportunity to do so. The name "Stackelberg action" arises because such a commitment elicits a best response from player 2. The Stackelberg type of player 1 will play a1* and is denoted by ξ(a1*)ξ*.

Perfect monitoring games

The authors firstly focus on repeated games with perfect monitoring. The action set A2 of player 2 is assumed to be finite and each player only play a pure strategy. The basic reputation result establishes a minimum threshold for the equilibrium payoffs of the normal long-lived player. H is the set of public histories in the complete information game and also in the incomplete information game, and an history for player 1 will be Ξ×H, which specifies player 1’s type and the public history. A behavior strategy for player 1 will be σ1:H×ΞΔ(A1) such that, for all commitment types ξ(σ1^)Ξ2, σ1(ht,ξ(σ^1))=σ^1(ht)htH. A behavior strategy for player 2 will be σ2:HΔ(A2).

Let U1(σ,ξ) be the type ξ long-lived player’s payoff. A strategy profile (σ~1,σ~2) is a Nash equilibrium of the reputation game with perfect monitoring if ξΞ1, σ~1 maximizes U1(σ1,σ~2,ξ) over player 1’s repeated game strategies, and if t and htH that have positive probability under (σ~1,σ~2) and μ, E[u2(σ~1(ht,ξ),σ~2(ht))|ht]=maxa2A2E[u2(σ~1(ht,ξ),a2)|ht].

The first goal is to demonstrate that, when player 2 assign some probability to 1 being the simple type ξ(a1)=ξ, if the normal player 1 repeatedly plays a1, then player 2 will place an high probability on the fact that such action will be played in the future. Obviously, the reputation for playing a1 will not arise instantaneously, and it can also be costly. However, the cost will be negligible if player 1 is sufficiently patient. If this action is the Stackelberg action a1* , when player 1 is sufficiently patient, the resulting lower bound on player 1’s payoff is close to his Stackelberg payoff v1*. Let ΩΞ×(A1×A2) be the space of outcomes, being ω a specific outcome. A profile of strategy (σ1,σ2), along with μ, induces a probability measure on the set of outcomes Ω, PΔ(Ω). Ω is the event that action a1 is chosen in every period. qt will be the probability that a1 is chosen in period t conditional on ht, that is qtP(a1t=a1|ht), and it is a random variable. The normal player 1 receives a payoff of at least mina2B(a1)u1(a1,a2) in any period t in which qt is large enough to permit player 2 to choose the best response to a1 and player 1 effectively plays a1. Since player 1 can always play a1, the payoff generated by always playing a1 must be the lower bound of the payoff in any Nash equilibrium.

Let nζ:ΩN0 be the number of random variables qt for which qtζ and ξ the event that player 1 is type ξ. Fix ζ[0,1). Suppose μ(ξ(a1))[μ,1) for some μ>0 and a1A1. profile (σ1,σ2), P(nζ>lnμlnξ|Ω)=0 and ωΩ such that all histories (ht(ω))t=0 have positive probability under P, P(ξ(a1)|ht(ω)) is non decreasing in t. Therefore, the more player 2 observes player 1 playing a1, the more he will expect a1 to be played with an higher and higher probability. The bound nζ is independent of P, meaning that this result does not say that the posterior probability attached to be the simple type converges to 1 as t increases, leaving the possibility that player 1 is a normal type playing like the simple type.

By committing to action a1, player 1 can guarantee the payoff v1*(a1)mina2B(a1)u1(a1,a2) which is the one-shot bound from a1. Let v1(ξ0,μ,δ) the infimum over the set of normal player 1’s payoff in any Nash Equilibrium. The reputation result establishes a lower bound on the equilibrium payoff of player 1. In particular, let A2 be finite and μ(ξ0)>0. Suppose A1 is a finite subset of A1 with μ(ξ(a1))>0 a1A1. Then, k such that:v1(ξ0,μ,δ)δkmaxa1A1v1*(a1)+(1δk)minaAu1(a). If the set of commitment types is sufficiently rich, the lower bound of player’s 1 payoff is the Stackelberg payoff, if he is the normal type. Indeed, if there is a Stackelberg action and the Stackelberg type has μ>0, the normal player 1 effectively builds a reputation for playing like the Stackelberg type, thus receiving a payoff no less than v1* despite the fact that there are other possible commitment types. However, the result does not tell if it is optimal for player 1 to play the Stackelberg action in each period. Reputation effects resulting from pure-action commitment types in perfect monitoring games impose a minimum threshold on player 1’s equilibrium payoffs, which can be considerably high. However, unlike mixed-action commitment types or imperfect monitoring games in general, they do not introduce the possibility of new payoffs. In this context, discounting plays a dual role:

  • It makes future payoffs relatively more important.
  • it diminishes the significance of the initial sequence of periods in which player 1 may incur in costs to imitate the commitment type.

Example: product-choice game

Let’s consider the following product-choice game (i.e. a game where the first agent is a firm that decides whether to exert high (H) or low (L) effort in producing its output, while the second agent is a consumer that can buy a high-priced (h) or low-priced (l) product).

Player 2
h l
Player 1 H 2,3 0,2
L 3,0 1,1

Player 1 is a long lived player, while player 2 is a short lived player. Player 1 can build a reputation of playing H by persistently doing so. This at the beginning could be costly for player 1, since it may take time for player 2 to be convinced (and in the meanwhile he will play l), but the subsequent payoffs could make the initial investment rewarding, if player 1 is sufficiently patient. However, nothing in the repeated game structure captures this feature. Therefore, it is necessary to introduce the probability μ^>0, i.e. the probability that player 2 assigns to the possibility that player 1 is a commitment type (so it has some hidden characteristics that ensure that he will exert high effort) and not a normal type (so a player without the hidden characteristics previously mentioned). This introduces a link between past and expected future play of H: player 2 can decide the action to play after having seen the behavior of player 1. Therefore, (L,l), which would have been the unique Nash equilibrium in the perfect monitoring game of complete information, is not the necessary outcome anymore. For example, if the product choice game is played twice in this setting, the response of player 2 will depend on what player 1 does, assuming that he is normal. If player 1 plays L, player 2 will subsequently play l, believing that he is facing the normal type. On the other hand, if player 1 plays H, player 2 will possibly conclude that he is facing the commitment type, thus best responding with h. Player 1, who has behaved like the commitment type, can sacrifice current payoff to get more in the following period.

Continuing with the example, the pure Stackelberg type of player 1 chooses H, with a payoff of 2. Suppose Ξ={ξ0,ξ*,ξ(L)}. If δ1/2, always playing Hh is a subgame perfect equilibrium of the complete information game. Adapting the profile for incomplete information games we have that σ1(ht,ξ)={H,if ξ=ξ* or (ξ=ξ0 and aτ=Hh  τ<t),L,otherwise,

σ2(ht)={h,if aτ=Hh  τ<t,l,otherwise. This will be a Nash equilibrium for δ1/2 and μ(ξ(L))<1/2. Player 2 will find optimal to play h in period 0. If he observes L, he will place probability 1 on type ξ(L) or normal type, thus optimally punishing player 1. This ensures that player 1 will find optimal to play H. After observing a10 in period 0, player 2 will assign 0 probability to ξ(L). It is also possible to demonstrate that it is impossible to obtain Nash equilibria with a low payoff for normal player 1. If μ(ξ*)<1/3 and μ(ξ(L))<1/3, the normal player 1’s payoff in any pure strategy Nash equilibrium is bounded below by 2δ and above by 2.

Now, let a1=H=a1*, the Stackeblerg action. Let ξt~ a commitment type who: plays H if τ<t and L afterwards. ξ0~ will be ξ(L), the simple commitment type who always plays L and ξt~ the nonsimple commitment type for t 1. Finally, let ξ^ be the type that plays H in period 0 and in every other period if 2 plays h in time 0. Consider the strategy profile where normal player 1 always play H and player 2 always play h. Ω is the set of outcomes in which player 1 always plays H. q0=1μ(ξ~0). In period 1 Hh is the only history consistent with Ω that has a non zero probability. With Bayes rule, we obtain that q1(Hh)=1μ(ξ~0)μ(ξ~1)1μ(ξ~0). If instead player 1 still always plays H, but player 2 plays h with probability 1/2 and l with probability 1/2 in the first period and then always plays h, the calculation for q0 does not change, but now Hl and Hh are both consistent with Ω. So, q1(Hh)=1μ(ξ~0)μ(ξ~1)1μ(ξ~0). and q1(Hl)=1μ(ξ^)μ(ξ~0)μ(ξ~1)1μ(ξ~0).

Imperfect monitoring games

The authors now study imperfect monitoring games, focusing firstly on the case with one long-lived and one short-lived players. Ai will be the action space and Zi will be the finite signal space. In each stage of the repeated game, each player only learns the realized value of this (private) signal. Let π(z|a) be the distribution over private signals z=(z1,z2) for each action profile a and let ui*(zi,ai) be the ex-post payoff of normal player 1 and player 2 after the realization of z and a. Ex-ante stage game payoffs are ui(a)zui*(zi,ai)π(z|a). Ξ is the same as the previous section. The set of private histories for player 1, excluding his type, is H1t=0(A1×Z1)t, while a behavior strategy for player 1 can be defined as σ1:H1×ΞΔ(A1), such that ξ(σ1^)Ξ2, σ1(h1t,ξ(σ^1))=σ^1(h1t)h1tH1. The set of private histories for player 2 is H2t=0(A2×Z2)t, while a behavior strategy is σ2:H2Δ(A2). A strategy profile (σ~1,σ~2) is a Nash equilibrium of the reputation game with imperfect monitoring if ξΞ1, σ~1 maximizes U1(σ1,σ~2,ξ) (type ξ long lived player’s payoff) over player 1’s repeated game strategies, and if t and h2tH2 that have positive probability under (σ~1,σ~2) and μ, E[u2(σ~1(h1t,ξ),σ~2(h2t))|h2t]=maxa2A2E[u2(σ~1(h1t,ξ),a2)|h2t].

Since in this environment monitoring is not perfect, the best responses obtainable by playing a1 are not simply the actions in B(a1). Consider the set of possible best responses of player 2 to player 1’s mixed action α1. An action α2 is an εconfirmed best response to α1 if α1 such that α2(a2)>0a2argmaxa2u2(α1,a2) and |π2(|α1,α2)π2(|α1,α2)|ε. Bε(α1) is the set of ε-confirmed best responses to α1. For private monitoring games, let Bϵ*(α^1){α2:supp(α2)Bϵ(α^1)}. The results of the authors show that if player 2 assign positive probability to a simple type ξ(α1), a patient normal player 1’s payoff in every Nash equilibrium (with an ε>0 approximation) can be no lower than v¯1(α1)minα2B0*(α1)u1(α1,α2). Taking the supremum over α1, the payoff will be v1**supα1minα2B0*(α1)u1(α1,α2).

Further assumptions are necessary. In particular, it must be that a2A2, (and α2Δ(A2)), {π2(|(a1,a2)):a1A1} is linearly independent, that is for any action of player 2 no two actions for player 1 should generate the same distribution of signals. For the private-monitoring game, this means that B(α1)=B0*(α1) and v1** equals the mixed action Stackelberg payoff.

It is possible to demonstrate the following proposition. Let ξ^ be the simple commitment type that plays α1^Δ(A1) (or α1^A1). Suppose μ(ξ0), μ(ξ^)>0. In the private monitoring game (or canonical public monitoring game), ε>0, K such that δ:v1¯(ξ0,μ,δ)(1ε)δKinfα2Bε*(α^1)u1(α^1,α2)+(1(1ε)δK)minaAu1(a) This means that, as for perfect monitoring games, the normal player 1 manages to establish a reputation for consistently playing like a simple type. This phenomenon persists even when there are numerous other potential commitment types present. This guarantees that even if it is not possible to be certain about player 2’s ultimate beliefs regarding player 1, we can be confident that player 2’s beliefs will converge towards something.

The same results can be obtained when the short-lived player is interpreted as a continuum of small and anonymous long lived players. Each small player receives a private signal from the finite set Z2, while the large player observes a private signal z1Z1 of the aggregate behavior of the small players. If the private signal received by the small players is common across all small players, the model is identical to what previously described. If each small player observes different realization of the private signal (idiosyncratic signals), the model is different. In this case, let ξ^ denote the simple commitment type that always plays α^1Δ(A1) (if A1 is finite) or α^1A1 (if A1 is infinite). Suppose ξ0,ξ^Ξ. ε>0, K such that δ, v1(ξ0,μ,δ)(1ε)δKinfα2Bε*(α^1)u1(α^1,α2)+(1(1ε)δK)minaAu1(a).

Example: product-choice game with public monitoring

Player 1’s actions are not public. There is a public signal that can take values y¯ and y_, with distribution ρ(y¯|a)={p,if a1=H,q,if a1=L,, with 0<q<p<1. Player 2’s actions are public and α1^ is player 1’s mixed action that randomize equally between his possible actions. ε0, every pure or mixed action for player 2 Bε(α1^). Therefore, minα2B0(α^1)u1(α^1,α2)=1/2 and v1**=5/2. The resulting payoff is the mixed action Stackelberg payoff and it is higher than 21ppq<2, which it is possible to demonstrate that it is the upper bound player 1’s payoff in the public monitoring game with complete information.

Temporary reputations

Under imperfect monitoring, the authors show that player 2 must eventually learn the type of player 1. While the normal and commitment types may exhibit similar behavior over an extended duration, the normal type will inevitably find motivation to deviate slightly from the commitment strategy. This contradicts the belief held by player 2 that player 1 will consistently demonstrate commitment. Consequently, player 2 will eventually acquire knowledge about player 1’s true type.

The environment in which they operate is that of incomplete information private monitoring games. The authors assume that i=1,2,aA,ziZi,πi(zi|a)>0 and a1A1, the collection of probability distributions {π1(,(a1,a2)):a2A2} is linearly independent. This assumptions implies that player i is able to correctly identify any fixed-stage game action of player j. The focus will be on one simple commitment type for player 1. For the the normal type, a strategy will be denoted as σ1~, while for the commitment type σ1^. Let PΔ(Ω) be the unconditional probability measure induced by μ and the strategy profile, while P^ will be the measure induced by conditioning on ξ^. For player 2, σ2^ will be the unique stage game best response to σ1^. (σ1^,σ2^) will not be a stage-game Nash equilibrium. Suppose that there is a Nash equilibrium of the incomplete information game in which it is possible that player 1 can be either the normal or the commitment type. Player 2 will not distinguish between the signals generated by the two types, thus believing that both are playing the same strategy. Player 2 will play a best response to this strategy, that is a best response to the commitment type. Since it is not a best response also for player 1, he will find optimal to deviate, contradicting player 2’s beliefs. This means that in any Nash equilibrium of the game with imcomplete information, μt^P({ξ^}|G2t)0, being G2t the filtration on Ω generated by player 2’s histories.

It is also possible to demonstrate that there cannot be equilibria in which uncertainty about player 1’s type survives after T, for any period T. Indeed ε>0, T such that for any Nash equilibrium of the game with incomplete information, P~(μ^t<ε,t>T)>1ε.

Reputation with long-lived players

This section deals with perfect monitoring games with two long-lived players. The goal is to demonstrate that when player 1 consistently plays the Stackelberg action and there exists a type of player 1 committed to that action, player 2 will eventually assign a high probability to the occurrence of the Stackelberg action in future rounds. However, being now player 2 long-lived, he may not play a best response to the Stackelberg type, but something else. When dealing with two long-lived players, a crucial step is to determine the conditions in which player 2, as his conviction regarding the appearance of the Stackelberg action grows, will ultimately choose to play a best response to that action. The following considerations applies in this setting: it is possible to think that as long as player 2 considers future benefits (thus discounting), any losses incurred by not playing a current best response must be compensated within a finite duration. However, if player 2 holds a strong conviction that the Stackelberg action will be played not only in the present but also in numerous subsequent periods, there will be no chance to accumulate future gains. Consequently, player 2 may find advantageous to simply opt for a stage-game best response. If this is the case, player 1 will receive almost the Stackelberg payoff in each period, thus putting a lower bound on his payoff if he is sufficiently patient. However, here lies the difference with the setting with short-lived players: player 2 might select an option other than a best response to the Stackelberg action due to concerns about triggering a future punishment by playing a current best response. This punishment would not occur if player 2 were facing the Stackelberg type, but player 2 can only be certain that he is facing the Stackelberg action, not the Stackelberg type. Short-lived players have the same fear, but this uncertainty does not affect their behavior given their time horizon.

Perfect monitoring

Consider a perfect monitoring repeated game with two long-lived players. The two players have different discount factors δi and the characteristics of player 2 are known. The remaining environment is similar to the case with a short-lived player 2.

The focus is now on a commitment type that minmaxes player 2. It is possible to demonstrate that if a pure action a1 that mixed-action minmaxes player 2 and there is a positive probability that player 1 is the simple type ξ(a1), a sufficiently patient normal player 1 gets a payoff arbitrarily close to v1*(a1), which is the one-shot bound on player 1’s payoff when he commits to a1. The following definition applies: the stage game has conflicting interests if a pure Stackelberg action a1* mixed action minmaxes player 2. The highest reputation bound is obtained when the game has conflicting interests. Let v1(ξ0,μ,δ1,δ2) be the infimum of the normal player 1’s payoffs. Suppose μ(ξ(a1))>0 for some pure action a1 that mixed-action minmaxes player 2. a value k, independent of δ1 (but depending on δ2), such that v1(ξ0,μ,δ1,δ2)δkv1*(a1)+(1δ1k)minau1(a). Therefore, only in some periods k player 2 can play something other than the best response to a1 and if player 1 is patient enough, these k periods have a small effect on his payoffs. Eventually, player 2 will play the best response to a1. Moreover, ε>0,δ1_(0,1) such that δ1(δ1_,1), v1(ξ0,μ,δ1,δ2)>v1*(a1)ε. If the equilibrium strategy of player 2 results in a payoff lower than his minmax payoff, given that a1 is always played, then it implies that player 2 does not anticipate a1 to be played consistently.

Example: The product choice game (same structure of section 1). This game is not one of conflicting interests. When player 1 takes the Stackelberg action H, it elicits a best response h from player 2, resulting in a payoff of 3 for player 2, surpassing his minmax payoff of 1. In contrast to games with conflicting interests, both normal player 1 and player 2 benefit more when player 1 chooses the Stackelberg action (with player 2 best responding) compared to the Nash equilibrium in the stage-game.

It is also possible that not only player 1’s type is unknown, but also player 2’s type. Each player’s type comes from a countable set before the game begins. λ0>0 is the probability that player 2 is normal. Also in this setting there is a maximum number of periods in which normal player 2 can play something other than the best response to a1. Suppose μ(ξ(a1))>0 for some action a1 minmaxing player 2. a constant k, independent of player 1’s discount factor, such that the normal player 1’s payoff in any Nash equilibrium of the repeated game is at least λ0δ1kv1*(a1)+(1λ0δ1k)minau1(a). All in all, to establish a reputation, it is not important that incomplete information is only one sided, but that player 1 is sufficiently patient and that μ(ξ(a1))>0. Therefore, this ensures that player 2 will eventually play an optimal response to a1 and not in a very far away period.

Other actions

If action a1 does not minmax player 2, the number of periods in which player 2 is not optimally responding cannot be bound anymore. However, it is possible to bound the number of times in which player 2 can expect a continuation payoff lower than his minmax value. Player 1 payoff bound is v1(a1)minα2D(a1)u1(a1,α2) where D(a1)={α2Δ(A2)u2(a1,α2)v2} is the set of player 2 actions that imply his minmax utility. It is possible to demonstrate that, fixing δ2[0,1) and a1A1 with μ(ξ(a1))>0,  ε>0, δ1_<1 such that δ1(δ1_,1),v1(ξ0,μ,δ1,δ2)v1(a1)ε. If a1 does not minmax player 2, then v1(a1)<v1*(a1), the one shot bound. Moreover, it is not necessary that the Stackelberg action maximizes v1(a1) and this bound is verified for all actions. If μ>0 for all simple pure commitment types, then a normal player 1 will get a payoff close to maxa1A1v1(a1).

Imperfect Public Monitoring

In this section, there are 2 long-lived players who play an imperfect public monitoring game. A1,A2 are finite action sets. ρ is the public monitoring distribution and it has full support, so yYand aA1×A2, ρ(y|a)>0. Player 2’s actions are imperfectly monitored by player 1 and player 2 is able to update his belief based on what he observes about player 1. Therefore, it is also assumed that mixed action α2Δ(A2), ρ(|(α1,α2))=ρ(|(α1,α2))α1=α1. As usual, δi is the discount factor for player i, μ is the prior with a support Ξ. Player 1’s set of history is H1=t=0(A1×Y)t and a strategy behavior is σ1:H1×ΞΔ(A1), while a set of histories for player 2 is H2=t=0(A2×Y)t, with a behavior strategy σ2:H2×ΞΔ(A2). A Nash equilibrium σ=(σ1,σ2) and μ induce a measure P over the set of outcomes ΩΞ×(A1×A2×Y). It is possible that player 1 is committed to a non simple strategy. Being GN(δ2) the complete information finitely repeated game that plays the complete information stage game N times, it is possible to define the payoff as follow: for player 1, 1Nt=0N1u1(at); for player 2, 1δ21δ2Nt=0N1δ2tu2(at). σ1N is a strategy in a infinitely repeated game or in a finite repeated game of length N or integer multiple of N. The target for player 1’s payoff is the maximum payoff achievable by the strategies Ξ, the support of μ, within the corresponding finitely repeated game, when player 1 exhibits arbitrary patience. The set of player 1 payoffs can be defined as V1(δ2,Ξ){v1:ε>0,N,ξ(σ1N)Ξ s.t. σ2NBN(σ1N;δ2),U1N(σ1N,σ2N)v1ε} and set v1(δ2,Ξ)=supV1(δ2,Ξ). If ξ(α1)Ξ, then v1(δ2,ξ)v1*(α1). If ξ contains only simple commitment types, then v1(δ2,ξ)=v1**. v1 may be much higher than v1** It is possible to demonstrate that η>0 and δ2, δ1_<1 such that δ1(δ1_,1), v1(ξ0,μ,δ1,δ2)v1(δ2,Ξ)η. Moreover, η>0 and δ2>0,N,δ1,ε, and a strategy σ1N for GN(δ2), with ξ(σ1N)Ξ, such that if player 2 plays an ε-best response to σ1N in GN(δ2), then player 1’s δ1-discounted payoff in GN(δ2)is at least v1(δ2,Ξ)η2. It can also be shown that ε>0,N, and σ1N,γ>0 such that σ~1N, if |ρN(|(σ1N,σ2N))ρN(|(σ~1N,σ2N))|<γ for σ2N an ε-best response to σ1N in GN(δ2), then σ2N is a 2ε-best response to σ~1N.

Given GN(δ2), let us divide the infinitely repeated game into blocks of length N. Incomplete information repeated games have a prior μΔ(Ξ) and a posteriors μΔ(Ξ). Given a strategy profile σ and a prior μ, let ρσ,μN(|h2Nk) be player 2’s "one-block" ahead prediction of the distribution over signals in block GN,k (the kth block of periods of length N) for any private history h2N,kH2N,k. P(σ1N,σ2) is the probability measure over Ω implied by σ2 conditioning on the event ξ(σ1N)Ξ. It can be shown that, fixing λ,μ(0,1) and γ>0, integer N, and a strategy σ1N, there exists an integer L such that (σ1,σ2) and all μ(Δ(Ξ)) with μ(ξ(σ1N))μ, P(σ1N,σ2)(|{k0:|ρ(σ1N,σ2),μN(|h2Nk)ρ(σ1,σ2),μN(|h2Nk)|γ}|L)1λ.

Commitment types who punish

A similar result can be obtained by adding, in the environment of perfect monitoring, commitment types who punish player 2 for not behaving properly. Player 2 will thus know the features of player 1, which previously remained hidden. This uncertainty was the reason why player 2 did not play a best response to the Stackelberg type in the simple environment of perfect monitoring.

Let a1A1 be an action for player 1, a2 the best response for player 2 for which u2(a1,a2)>v2p and a12^ the action for player 1 that minmaxes plaer 2. Player 1 is a commitment type who plays strategy σ1^ which consist in phase k, where he plays a12^ and then he plays a1. If player 2 does not play a2, he will punish him. It is possible to demonstrate that fixing an integer K>0 and η>0, there exists an integer T(K,η,μ^0) such that pure strategy σ2 and any ωΩ^, there are no more than T(K,η,μ^0) periods t in which player 2 attaches probability no greater than 1η to the event that player 1 plays as σ^1 in periods t,,t+K, given that player 2 plays as σ2. Moreover, fixing ε>0 and letting Ξ contain σ^1, for some action profile a with u2(a)>v2p, a δ2_<1 such that δ2(δ2_,1), a δ1_ such that δ1(δ1_,1), v(ξ0,μ,δ1,δ2)u1(a)ε. If a1 is player 1’s Stackelberg action, then this result gives player 1’s Stackelberg payoff as a lower bound of his equilibrium payoff in the game of incomplete information. The bound on the payoff for the normal player 1 is determined by demonstrating that player 2 will face only a finite number of punishments resulting from player 1’s commitment type.

Temporary reputations with two long-lived players

The results obtained in section 4.4.2 can be generalized also in the case of two long-lived players. Considering the case in which player 1’s type is unknown, it is possible to show some conditions under which player 2 is effectively able to learn about player 1’s type. The authors assume a commitment type who plays σ1^, a strategy with no long-run credibility for which:

  • σ2^ is the best response for player 2 and it is unique on the equilibrium path.
  •  T0 such that t>T0, normal player 1 is likely to deviate from σ1^, given σ2^.

The set of best responses to σ1 for player 2 in the game of complete information is B(σ1){σ2:U2(σ1,σ2)U2(σ1,σ2)  σ2}, with Uit being player i’s continuation value in period t.

Let π be the monitoring distribution and let the commitment type’s strategy σ^1 be public and with no long-run credibility. Then in any Nash equilibrium of the game with incomplete information, μ^t0 P~-almost surely. σ1^ is public so that player 1 can anticipate player 2’s optimal response to σ1^. It is important to underline that a long-lived player 2 will best respond to the commitment type once he is convinced that he is almost certainly facing the commitment strategy. Moreover, the normal type deviations from the commitment strategy will occur only in a finite number of periods.

References

  • Cabral, L. M. B. (2005). The economics of trust and reputation: A primer [Last access June 8, 2023]. Reputation June05.pdf
  • Diekmann, A., & Przepiorka, W. (2021). Trust and reputation in historical markets and contemporary online markets. In A. Maurer (Ed.), Handbook of economic sociology for the 21st century: New theoretical approaches, empirical studies and developments (pp. 131–145). Springer International Publishing.
  • Dowling, G. (1993). Developing your company image into a corporate asset. Long Range Planning, 26, 101–109.
  • Eccles, R., Newquist, S., & R., S. (2007). Reputation and its risks [Last access June 10, 2023]. [1]
  • Einwiller, S. (2003). When reputation engenders trust: An empirical investigation in business-to-consumer electronic commerce. Electronic Markets, 13 (3), 196–209.
  • Gambetta, D. (2000). Can we trust trust? Trust: Making and Breaking Cooperative Relations, electronic edition, Department of Sociology, University of Oxford, 213–237.
  • Mailath, G. J., & Samuelson, L. (2006). Repeated Games and Reputations: Long-Run Relationships. Oxford: Oxford University Press.
  • Mailath, G. J., & Samuelson, L. (2015). Chapter 4 - reputations in repeated games. In H. P. Young & S. Zamir (Eds.), Handbook of Game Theory with Economic Applications. Elsevier. 165-238.
  • Nash, J. (1951). Non-cooperative games. Annals of Mathematics, 54 (2), 286–295.
  • Popitz, H. (1980). Die normative konstruktion von gesellschaft. Tubinga: Mohr Siebeck.
  • Roberts, P. W., & Dowling, G. R. (2002). Corporate reputation and sustained superior financial performance. Strategic Management Journal, 23 (12), 1077–1093.
  • Selcuk, A., Uzun, E., & Pariente, M. (2004). A reputation-based trust management system for p2p networks. IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004., 251–258.
  • Webley, S. (2004). Risk, reputation and trust. Journal of Communication Management, 8, 9–12.
  • Xiong, L., & Liu, L. (2004). Peertrust: Supporting reputation-based trust for peer-to-peer electronic communities. IEEE Transactions on Knowledge and Data Engineering, 16 (7), 843–857.