Research Topics
We are researching sequential decision-making problems in the field of machine learning.
These are problems where optimal actions must be selected while dynamically collecting data,
especially in situations where sufficient prior information is unavailable.
Such problems arise in a wide range of applications, ranging from clinical trials for new drugs to recommendation systems and product pricing.
Our goal is to mathematically model these challenges and develop efficient algorithms through both theoretical analysis and empirical studies.
Algorithms and Theory of Bandit Problems
The multi-armed bandit problem is one of the most fundamental frameworks for sequential decision-making.
It is a model of a gambler to play multiple slot machines (or "arms") with unknown reward distributions, and the goal is to estimate these distributions sequentially while choosing which arm to play next.
The goal is to estimate these distributions sequentially while deciding which arm to play next, in order to maximize the total reward or identify the arm with the highest expected reward.
This problem has been widely applied in recommendation systems, where items with unknown click-through or purchase rates are presented to users.
Other applications include channel selection in wireless communication, agricultural condition optimization, and more.
Research on this topic dates back to the 1950s.
We develop practical algorithms for this problem through both theoretical and empirical approaches, using various techniques such as probability theory, statistics, optimization, and information theory.
We also aim to theoretically clarify the fundamental limits of achievable performance in this setting.
Two primary settings are commonly considered: the stochastic setting and the adversarial setting.
In the stochastic setting, rewards are drawn from fixed probability distributions, whereas in the adversarial setting, an adversary generates rewards to minimize the player's gain.
Our research has mainly focused on designing algorithms and building theoretical foundations within the stochastic setting.
However, best-of-both-worlds strategies—those that achieve optimal performance in both settings—have recently attracted significant attention, and we are actively working on developing such algorithms as well.
Modeling of Various Decision-making Problems
In practice, only a few real-world sequential decision-making problems can be formulated as simple bandit problems.
In many cases, the relationship between the available actions and the information obtained is much more complex.
For example, in dynamic pricing problems, a buyer's valuation of a product cannot be directly observed; instead, only whether the buyer chooses to purchase the product at a given price is observable.
Our goal is to develop practical algorithms by appropriately modeling such real-world problems, which often involve diverse and complex characteristics.
Decision Making by Reinforcement Learning
Reinforcement learning (RL) is widely used to acquire effective decision-making policies in complex environments such as game AI and robot control.
As is often noted in textbooks, bandit problems can be seen as one of the simplest tasks in RL, at least in principle.
However, in practice, there exists a certain division between these research communities: while bandit problem research tends to focus on relatively simple tasks with strong theoretical guarantees, RL research typically tackles more complex tasks, where theoretical guarantees are difficult to establish and experimental approaches are more commonly employed.
We aim to bridge this gap by applying RL to problems that have traditionally been studied through theoretical methods, such as clinical trials for new drugs, and by examining the respective advantages and limitations of bandit-based and RL-based approaches.