I'm a Postdoc at Max Planck Institute for Software Systems (MPI-SWS), hosted by Rupak Majumdar. My research interests lie in computational game theory, multi-agent systems, and artificial intelligence. My work also has many connections with microeconomics and operations research. The main task of my research is to find ways to compute optimal strategies to play with intelligent agents, in real-world scenarios with all sorts of complexities. I'm broadly interested in algorithmic problems arising from such strategic interactions.
Prior to MPI, I obtained by DPhil (PhD) degree from the University of Oxford, where I was supervised by Edith Elkind and Michael Wooldridge. Even before that, I worked as a research associate in Nanyang Technological University in Singapore. I obtained my master's degree at the University of Chinese Academy of Sciences & the Institute of Computing Technology, CAS, where I was supervised by Bo An; and my bachelor's degree at East China Normal University.
* Asterisks denote alphabetical author order.
We study a dynamic model of Bayesian persuasion in sequential decision-making settings. An informed principal observes an external parameter of the world and advises an uninformed agent about actions to take over time. The agent takes actions in each time step based on the current state, the principal's advice/signal, and beliefs about the external parameter. The action of the agent updates the state according to a stochastic process. The model arises naturally in many applications, e.g., an app (the principal) can advice the user (the agent) on possible choices between actions based on additional real-time information the app has. We study the problem of designing a signaling strategy from the principal's point of view. We show that the principal has an optimal strategy against a myopic agent, who only optimizes their rewards locally, and the optimal strategy can be computed in polynomial time. In contrast, it is NP-hard to approximate an optimal policy against a far-sighted agent. Further, we show that if the principal has the power to threaten the agent by not providing future signals, then we can efficiently design a threat-based strategy. This strategy guarantees the principal's payoff as if playing against an agent who is far-sighted but myopic to future signals.
Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower. Such a learning algorithm operates by querying the best responses or the payoffs of the follower, who consequently can deceive the algorithm by responding as if his payoffs were much different than what they actually are. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the payoffs that would make the learning algorithm compute a commitment so that best responding to it maximizes the follower's utility, according to his true payoffs. While this problem has been considered before, the related literature only focused on the simplified scenario in which the payoff space is finite, thus leaving the general version of the problem unanswered. In this paper, we fill in this gap, by showing that it is always possible for the follower to compute (near-)optimal payoffs for various scenarios about the learning interaction between leader and follower.
Complexity of voting manipulation is a prominent topic in computational social choice. In this work, we consider a two-stage voting manipulation scenario. First, a malicious party (an attacker) attempts to manipulate the election outcome in favor of a preferred candidate by changing the vote counts in some of the voting districts. Afterwards, another party (a defender), which cares about the voters' wishes, demands a recount in a subset of the manipulated districts, restoring their vote counts to their original values. We investigate the resulting Stackelberg game for the case where votes are aggregated using two variants of the Plurality rule, and obtain an almost complete picture of the complexity landscape, both from the attacker's and from the defender's perspective.
(A preliminary version of this paper appeared in the Proceedings of IJCAI '19. This full version includes all proofs that were omitted from the conference version as well as additional examples and algorithmic results, such as a pseudo-polynomial time algorithm for the weighted version of the recounting problem for Plurality over Districts when the attacker is limited to regular manipulations, and a polynomial time algorithm for the unweighted version of the recounting problem for Plurality over Districts under an additional technical assumption.)
Recent work studied Stackelberg security games with multiple defenders, in which heterogeneous defenders allocate security resources to protect a set of targets against a strategic attacker. Equilibrium analysis was conducted to characterize outcomes of these games when defenders act independently. Our starting point is the observation that the use of resources in equilibria may be inefficient due to lack of coordination. We explore the possibility of reducing this inefficiency by coordinating the defenders—specifically, by pooling the defenders’ resources and allocating them jointly. The defenders’ heterogeneous preferences then give rise to a collective decision-making problem, which calls for a mechanism to generate joint allocation strategies. We seek a mechanism that encourages coordination, produces efficiency gains, and incentivizes the defenders to report their true preferences and to execute the recommended strategies. Our results show that, unfortunately, even these basic properties clash with each other and no mechanism can achieve them simultaneously, which reveals the intrinsic difficulty of achieving meaningful defense coordination in security games. On the positive side, we put forward mechanisms that fulfill some of these properties and we identify special cases of our setting where more of these properties are compatible.
In Stackelberg security games when information about the attacker's payoffs is uncertain, algorithms have been proposed to learn the optimal defender commitment by interacting with the attacker and observing their best responses. In this paper, we show that, however, these algorithms can be easily manipulated if the attacker responds untruthfully. As a key finding, attacker manipulation normally leads to the defender learning a maximin strategy, which effectively renders the learning attempt meaningless as to compute a maximin strategy requires no additional information about the other player at all. We then apply a game-theoretic framework at a higher level to counteract such manipulation, in which the defender commits to a policy that specifies her strategy commitment according to the learned information. We provide a polynomial-time algorithm to compute the optimal such policy, and in addition, a heuristic approach that applies even when the attacker's payoff space is infinite or completely unknown. Empirical evaluation shows that our approaches can improve the defender's utility significantly as compared to the situation when attacker manipulation is ignored.
We consider the house allocation problem, where m houses are to be assigned to n agents so that each agent gets exactly one house. We present a polynomial-time algorithm that determines whether an envy-free assignment exists, and if so, computes one such assignment. We also show that an envy-free assignment exists with high probability if the number of houses exceeds the number of agents by a logarithmic factor.
Information uncertainty is one of the major challenges facing applications of game theory. In the context of Stackelberg games, various approaches have been proposed to deal with the leader's incomplete knowledge about the follower's payoffs, typically by gathering information from the leader's interaction with the follower. Unfortunately, these approaches rely crucially on the assumption that the follower will not strategically exploit this information asymmetry, i.e., the follower behaves truthfully during the interaction according to their actual payoffs. As we show in this paper, the follower may have strong incentives to deceitfully imitate the behavior of a different follower type and, in doing this, benefit significantly from inducing the leader into choosing a highly suboptimal strategy. This raises a fundamental question: how to design a leader strategy in the presence of a deceitful follower? To answer this question, we put forward a basic model of Stackelberg games with (imitative) follower deception and show that the leader is indeed able to reduce the loss due to follower deception with carefully designed policies. We then provide a systematic study of the problem of computing the optimal leader policy and draw a relatively complete picture of the complexity landscape; essentially matching positive and negative complexity results are provided for natural variants of the model. Our intractability results are in sharp contrast to the situation with no deception, where the leader's optimal strategy can be computed in polynomial time, and thus illustrate the intrinsic difficulty of handling follower deception. Through simulations we also examine the benefit of considering follower deception in randomly generated games.
We consider strategic games that are inspired by Schelling's model of residential segregation. In our model, the agents are partitioned into k types and need to select locations on an undirected graph. Agents can be either stubborn, in which case they will always choose their preferred location, or strategic, in which case they aim to maximize the fraction of agents of their own type in their neighborhood. We investigate the existence of equilibria in these games, study the complexity of finding an equilibrium outcome or an outcome with high social welfare, and also provide upper and lower bounds on the price of anarchy and stability. Some of our results extend to the setting where the preferences of the agents over their neighbors are defined by a social network rather than a partition into types.
Stackelberg security games have received much attention in recent years. While most existing work focuses on single-defender settings, there are many real-world scenarios that involve multiple defenders (e.g., multi-national anti-crime actions in international waters, different security agencies patrolling the same area). In this paper, we consider security games with uncoordinated defenders who jointly protect a set of targets, but may have different valuations for these targets; each defender schedules their own resources and selfishly optimizes their own utility. We generalize the standard (single-defender) model of Stackelberg security games to this setting and formulate an equilibrium concept that captures the nature of strategic interaction among the players. We argue that an exact equilibrium may fail to exist, and, in fact, deciding whether it exists is NP-hard. However, under mild assumptions, every multi-defender security game admits an ϵ-equilibrium for every ϵ > 0, and the limit points corresponding to ϵ → 0 can be efficiently approximated.
Most existing models of Stackelberg security games ignore the underlying topology of the space in which targets and defence resources are located. As a result, allocation of resources is restricted to a discrete collection of exogenously defined targets. However, in many practical security settings, defense resources can be located on a continuous plane. Better defense solutions could therefore be potentially achieved by placing resources in a space outside of actual targets (e.g., between targets). To address this limitation, we propose a model called Security Game on a Plane (SGP) in which targets are distributed on a 2-dimensional plane, and security resources, to be allocated on the same plane, protect targets within a certain effective distance. We investigate the algorithmic aspects of SGP. We find that computing a strong Stackelberg equilibrium of an SGP is NP-hard even for zerosum games, and these are inapproximable in general. On the positive side, we find an exact solution technique for general SGPs based on an existing approach, and develop a PTAS (polynomial-time approximation scheme) for zero-sum SGP to more fundamentally overcome the computational obstacle. Our experiments demonstrate the value of considering SGP and effectiveness of our algorithms.
Paul-Ehrlich-Straße G 26, 67663 Kaiserslautern, Germany
© Jiarui Gan 2020 | Powered by Skeleton