PhysRevResearch.5.013122 Physics of Networks
PhysRevResearch.5.013122 Physics of Networks
(Received 11 May 2022; revised 6 July 2022; accepted 30 October 2022; published 16 February 2023)
“When in a difficult situation, it is sometimes better to give up and start all over again.” While this empirical
truth has been regularly observed in a wide range of circumstances, quantifying the effectiveness of such a
heuristic strategy remains an open challenge. In this paper, we combine the notions of optimal control and
stochastic resetting to address this problem. The emerging analytical framework allows one not only to measure
the performance of a given restarting policy, but also to obtain the optimal strategy for a wide class of dynamical
systems. We apply our technique to a system with a final reward and show that the reward value must be larger
than a critical threshold for resetting to be effective. Our approach, analogous to the celebrated Hamilton-Jacobi-
Bellman paradigm, provides the basis for the investigation of realistic restarting strategies across disciplines. As
an application, we show that the framework can be applied to an epidemic model to predict the optimal lockdown
policy.
DOI: 10.1103/PhysRevResearch.5.013122
√
Finding the optimal strategy to operate a complex system where 2D is the strength of the noise. The external control
is a longstanding problem and has attracted a lot of attention u(x, t ) can be tuned to achieve a given goal, e.g., performing
in recent decades. Since the seminal works of Pontryagin [1] a task for a robot or generating profits in finance. Of course,
and Bellman [2], optimal control theory has received renewed controlling the system will generate operating costs, such as
interest due to its applications in a wide range of contexts, electrical consumption or transaction fees. Optimally control-
such as artificial intelligence [3] and finance [4]. In a typical ling the system requires balancing a trade-off between high
setting, optimal control considers a system whose state at rewards, measured over time by a function R(x, t ), and low
time t can be represented by a d-dimensional vector x(t ). operating costs, often taken to be proportional to u2 (x, t ). To
For instance, the state x(t ) could correspond to the degrees be precise, for a system located at position x at time t, the
of freedom of an autonomous robot or the asset values in reward in a small time interval dt is R(x, t )dt and the cost is
a financial portfolio. The system typically evolves in time u2 (x, t )dt/2.
following a deterministic law, e.g., the laws of motion for In principle, solving this optimization problem is intrin-
mechanical systems or the law of supply and demand for sically difficult due to the high dimensionality of the space
financial markets. Oftentimes, the mathematical modeling of of solutions. Remarkably, Bellman introduced a general way
these laws is prohibitively expensive and one introduces a to solve this problem, known as dynamical programming,
stochastic contribution to account for the missing information which consists in breaking down the optimization into simpler
on the environment. Given the laws of motion, optimal control subproblems in a recursive manner such that the present action
aims to operate the system in the best possible way by using is taken to maximize the future outcome. In doing so, the key
an external control, e.g., actuators for robots or market orders quantity to keep track of is the optimal payoff J (x, t ), defined
in finance. as the expected payoff for an optimally controlled system
One of the simplest ways to describe analytically the evo- located at x at time t. Using this framework, one can show that
lution in time of the system x(t ) is a first-order differential the optimal control is simply given by u∗ (x, t ) = ∇ x J (x, t ),
equation of the form ẋ(t ) = f (x, t ). This law is often a sim- driving the system towards the direction in which the payoff
plified description of the system and a source of Gaussian increases the most. The optimal payoff J (x, t ) satisfies the
white noise η(t ) is introduced to capture the fluctuations celebrated Hamilton-Jacobi-Bellman (HJB) equation [5],
around the deterministic model. In addition, the external
control on the system is usually modeled as a drift u(x, t ). −∂t J = Dx J + f · ∇ x J + 21 (∇x J )2 + R, (2)
Summing up these contributions, the full mathematical de- where x and ∇ x are, respectively, the Laplacian and the
scription of the system is given by gradient operators. Here, for convenience, we dropped the x
√ and t dependence in the functions in Eq. (2). The quadratic
ẋ(t ) = f (x, t ) + 2D η(t ) + u(x, t ), (1)
term (∇x J )2 renders this equation nonlinear and difficult to
solve for arbitrary reward functions. Nevertheless, there ex-
ist few analytically solvable cases. For instance, in the case
Published by the American Physical Society under the terms of the of d = 1, where f (x, t ) = 0 and R(x, t ) = −α(x − x f )2 δ(t −
Creative Commons Attribution 4.0 International license. Further t f )/2, the optimal control has the simple form u∗ (x, t ) =
distribution of this work must maintain attribution to the author(s) −α(x − x f )/[1 + α(t f − t )], which, in the limit α → ∞,
and the published article’s title, journal citation, and DOI. is reminiscent of the effective force to generate bridge
Brownian motion [6]. This optimal control continuously symbol ·x0 indicates the average over all stochastic trajec-
drives the system to maximize the final reward by arriving tories starting from x0 at time t and evolving according to
close to the target x f at time t f . In more realistic systems, one Eq. (3). Note that the payoff F is a functional and depends
has to rely on numerical methods to solve Eq. (2) [7]. on the full function r.
Ideas from optimal control have also been proven success- Remarkably, we find that the optimal resetting policy r ∗
ful in different areas of physics [8–10]. Moreover, stochastic that maximizes F is bang-bang, resulting in an impulsive
optimal control has been applied to a variety of systems, control strategy [5]: the system is reset with probability one
such as supply-chain planning [11], swarms of drones [12], if its state is outside of a time-dependent domain (t ) and
and fluctuating interfaces [13]. These systems all have in evolves freely otherwise:
common that the optimal control can be orchestrated as a
0 if x ∈ (t ),
coordinated sequence of infinitesimal local changes. How- r ∗ (x, t )dt = (5)
1 if x ∈
/ (t ).
ever, numerous systems do not fall in this class and require
global changes to be optimally managed. A timely example The domain (t ) evolves according to
of such circumstances is the Covid-19 crisis, during which
(t ) = {x : J (x, t ) J (xres , t ) − c(x, t )}, (6)
the main control policies have been global measures such as
regional and national lockdowns. Other instances arise in the where the optimal payoff function J (x, t ) = maxr Fx,t [r] is
context of search processes, both in the cases of computer the solution of the differential equation
algorithms [14] and time-sensitive rescue missions [15]. In
−∂t J = Dx J + f · ∇ x J + R, x ∈ (t ). (7)
the latter situations, a common and widely observed strat-
egy is to stop and resume the operations from a predefined Equation (7) must be solved jointly with Eq. (6) starting from
location. Such situations particularly arise in the context of the final condition J (x, t f ) = 0 and evolving backward in time
search processes [16,17], chemical reactions [18], and catas- with the Neumann boundary condition ∇ x J (x, t ) · n(x) = 0,
trophes in population dynamics [19,20]. Unfortunately, the where n(x) is the normal unit vector to the boundary. Out-
HJB framework is not well suited to study such resetting side of the domain (t ), the solution is given by J (x, t ) =
protocols. Indeed, resetting is known to be quite different J (xres , t ) − c(x, t ). The definition of the domain (t ) in
from a local force and exhibits interesting features, including Eq. (6) has a clear interpretation: at any given time, the opti-
out-of-equilibrium dynamics [21–23], dynamical phase tran- mal policy is to restart the system if its expected payoff is less
sitions [24–26], and nontrivial first-passage properties [16,17] than the one at the resetting location minus the cost incurred
(for a recent review, see [27]). This observation naturally for a restart. The emerging framework outlined in Eqs. (5)–(7)
called into question the existence of an analytical framework is the main result of this paper and provides a general method
to devise the optimal resetting control policy. to optimally control stochastic systems through restarts. The
In this paper, we combine the notions of stochastic reset- derivation of this result is presented in Appendix A.
ting and optimal control into resetting optimal control, which Going from Eq. (4) to Eq. (7), we have reduced a functional
provides a natural framework to operate systems through optimization problem to a partial differential equation, which
stochastic restarts. Our goal is not to provide an accurate is often easier to solve. Note, however, that the mathematical
description of a specific system, but rather to consider a min- problem in Eqs. (6) and (7) is of a special kind as the evolution
imal model to explore resetting in optimal control. To model of the domain of definition (t ) is coupled to the solution
resetting policies, we exchange the control force u(x, t ) for a J (x, t ) of the differential equation. This kind of equations be-
resetting rate r(x, t ). In a small time interval dt, the state of longs to a class known as Stefan problems [28], which often
the system is reset to a predefined location xres with proba- arise in the field of heat transfer, where one studies the evo-
bility p = r(x, t )dt and evolves freely otherwise. In sum, the lution of an interface between two phases, e.g., ice and water
dynamical system evolves according to on a freezing lake. In this context, one must solve the heat
equation for the temperature profile with an interface between
xres , √ prob. = p
x(t + dt ) = the water and ice phases, which moves according to the tem-
x(t ) + f (x, t )dt + 2Dηdt, prob. = 1 − p,
perature gradient. The interface is therefore to be considered
(3) as an additional unknown function, which must be jointly
where the subscript “res” in xres stands for “resetting location.” solved with the differential equation. To draw an analogy
Similarly to the HJB framework, we aim to find the optimal with our case, the optimal payoff function J (x, t ) plays the
resetting rate r, as a function of x and t, that balances the role of the temperature and the boundary of the domain (t )
trade-off between high rewards, measured over time by the corresponds to the water-ice interface. Note, however, that the
function R(x, t ), and low operating costs, which depend on two Stefan problems have different boundary conditions. The
r(x, t ). To mathematically pose the optimization problem, we domain (t ) can be obtained by solving the Stefan problem
naturally extend the HJB paradigm and define the following in Eqs. (6) and (7) numerically. This can be achieved, for
payoff functional: instance, by using an Euler explicit scheme with a space-time
t f discretization and updating the domain (t ) at each time step
Fx0 ,t [r] = dτ [R(x(τ ), τ ) − c(x(τ ), τ )r(x(τ ), τ )] , according to Eq. (6). The domain (t ) is illustrated in Fig. 1
t x0 for the case of a one-dimensional random walk with a final
(4) reward. Such a situation corresponds to rewarding the system
where c(x, τ ) is the cost associated to resetting, t f is the by a finite amount for arriving at some target location at
time horizon up to which the system is controlled, and the the final time, while penalizing it with a unit cost for each
013122-2
RESETTING IN STOCHASTIC OPTIMAL CONTROL PHYSICAL REVIEW RESEARCH 5, 013122 (2023)
013122-3
BENJAMIN DE BRUYNE AND FRANCESCO MORI PHYSICAL REVIEW RESEARCH 5, 013122 (2023)
target position x f at time t f , while penalizing it with a constant uals and we model the evolution of the epidemic with the
cost c for each resetting event. Note that the system needs to susceptible-infectious-recovered (SIR) model, which is one
arrive at the location x f exactly at time t f to get the reward, of the simplest compartmental models in epidemiology [36].
while earlier visits do not provide any additional benefits (as Previous works have approached this problem by continu-
for a rendezvous, where one needs to be at the right place at ously controlling the infection rate [37]. However, in real
the right time). The δ reward function has to be understood situations, one cannot adapt the restrictions in a continuous
as the continuum limit of a Gaussian function whose width way and our framework is well suited to describe discontinu-
tends to zero [32]. Before presenting the optimal policy for ous policies, such as lockdowns. In Appendix D, we show that
this problem, it is instructive to consider two limiting cases. our technique can be directly applied to this problem to find
For α → 0, one should never reset as the reward is not worth the optimal time to impose a lockdown.
the resetting cost. On the other hand, for α → ∞, the cost In sum, we combined optimal control and stochastic re-
of resetting becomes negligible and the optimal strategy is to setting to address the effectiveness of restarting policies. The
reset if restarting would bring the system closer to x f , inde- emerging framework, contained in Eqs. (6) and (7), provides
pendently of time. Interestingly, we observe that the crossover a unifying paradigm to obtain the optimal resetting strategy
between these two regimes occurs as a√ sharp transition at the for a wide class of dynamical systems. Our method can be
critical value α = αc , where αc = x f c 2π e ≈ 4.13273 x f c, generalized to discrete-time systems and quadratic costs as-
which we predict analytically (see Appendix C). For α < sociated with resetting (see Appendix E). Furthermore, it is
αc , the optimal policy is to never reset, corresponding to a simple exercise to include a continuous control policy in
(t ) = R for all 0 t t f . The situation is more subtle for addition to resetting to account for realistic systems. In ad-
α > αc , where resetting is only favorable in a specific time dition, one would need to explore ways to solve the moving
window before t f . To describe this window, it is convenient boundary problem in high dimensions, which might require
to introduce the backward time τ = t f − t. We find that no approximation schemes. It would be interesting to investigate
boundary is present for τ < τ ∗ , where τ ∗ is the smallest extensions to optimal stopping problems and to study cost
∗
solution of the transcendental equation αe−x f /(4Dτ ) =
2
positive
√ functions that are first-passage functionals [38], for instance
c 4π Dτ ∗ . At τ = τ ∗ , a boundary appears and one must re- where the time horizon is a first-passage time. This would be
sort to numerical integration techniques to find the solution for particularly relevant in the context of search processes.
τ > τ ∗ (see Fig. 1). We observe numerically that the boundary
evolves with τ , i.e., backward in time, in a nonmonotonic We warmly thank F. Aguirre-Lopez, S. N. Majumdar,
way and eventually disappears. This optimal policy can be S. Redner, A. Rosso, G. Schehr, E. Trizac, and L. Zadnik
understood as follows. Close to t f , where τ < τ ∗ , it is unlikely for fruitful comments and feedback. This work was partially
for the system to reach the target location x f from the origin supported by the Luxembourg National Research Fund (FNR)
in the remaining time. Thus, it is not convenient to reset. On (App. No. 14548297).
the other hand, for very early times, it is not yet necessary to
reset since, after resetting, the system would typically evolve APPENDIX A: DERIVATION OF EQS. (6) AND (7)
away from the target.
Our framework can be easily generalized to other resetting In this Appendix, we derive our main result given in
dynamics. For instance, in light of the recent experiments of Eqs. (6) and (7) of the main text. This derivation is based on a
stochastic resetting on colloidal particles [33–35], it is rele- dynamical programming argument (also known as a backward
vant to consider the case where the new state of the system x approach). We consider evolving the system from time t to
after a resetting event is drawn at random from some probabil- t + dt. According to the equations of motion in Eq. (3) of the
ity distribution PR (x |x), which can eventually depend on the main text, the state
√ of the system either (i) evolves from x to
current x of the system. In practice, it is physically impossible x + f (x, t )dt + 2D η(t )dt with probability 1 − r(x, t )dt or
to reset the particles to a fixed location since resetting is (ii) is reset to position xres with probability r(x, t )dt. In the
performed by using an optic trap. In this case, our main results time interval dt, the payoff in Eq. (4) of the main text changes
in Eqs. (5) and (7) remain valid, while the definition of the by the amount R(x, t )dt − c(x, t )r(x, t )dt. For the subsequent
domain (t ) is modified as evolution of the process from√ t + dt to t f , the new initial value
is either x + f (x, t )dt + 2D η(t )dt in the former case or
(t ) = x : J (x, t ) dx [J (x , t )PR (x |x)] − c(x, t ) . xres in the latter case. Following this argument and using the
notation F[r | x, t] ≡ Fx,t [r], we obtain
(11)
Note that the case of a fixed resetting location xres corre- F[r | x, t] = R(x, t )dt − c(x, t )r(x, t )dt
sponds to PR (x |x) = δ(x − xres ). Similarly, one can extend √
+ F[r | x + f (x, t )dt + 2D η(t )dt, t + dt]
the framework to describe situations in which a finite amount
of time is required for each resetting event. × [1 − r(x, t )dt]
Finally, we illustrate the generality of our framework by + F[r | xres , t + dt]r(x, t )dt, (A1)
applying it to the problem of finding the optimal lockdown
policy to navigate a pandemic. We do not aim to provide where · is an average over the noise realizations η(t ). The
an accurate description of a specific pandemic, but rather goal is now to maximize both sides over the full function r.
to illustrate the generality of our framework and to gain Note that the functional F on the right-hand side of Eq. (A1)
interesting insights. We consider a population of N individ- does not depend on the value of the function r(x, t ) at time
013122-4
RESETTING IN STOCHASTIC OPTIMAL CONTROL PHYSICAL REVIEW RESEARCH 5, 013122 (2023)
(A2)
where we used the definition of J (x, t ), expanded to first
order in dt, and averaged over η(t ). We are left with maxi-
mizing a linear function of the scalar variable r(x, t ). Given
that r(x, t ) is a Poissonian resetting rate, it must be posi- FIG. 3. The rescaled optimal expected payoff J (x) for a one-
tive and less than 1/dt since r(x, t )dt is a probability. This dimensional random walk starting at x = 0 for the discounted reward
immediately gives the expression for the bang-bang optimal R(x) = −αx 2 and cost C(x) = c as a function of the cost-reward
control policy r ∗ (x, t ) given in Eq. (5) in the main text along ratio v = c/α.
with the domain (t ). Plugging this back into Eq. (A2), we
obtain the moving boundary problem introduced in the main given by
text.
The Neumann boundary condition ∇ x J (x, t ) · n(x) = 0 ∞ ∞
−βτ
can be obtained upon a careful analysis of Eq. (A1) close to J (x|L) = dτ e R(x) − e−βti C(x) , (B1)
0
the boundary. We start from Eq. (A1) and write i=1 x0
F[r | x, t] = R(x, t )dt − c(x, t )r(x, t )dt where ti is the time of the ith resetting event and the average
√ is computed over all trajectories {x(τ )} with x(0) = x0 . In
+ F[r | x + f (x, t )dt + 2D η(t )dt, t + dt] particular, for R(x) = −αx 2 and C(x) = c, we obtain
× [1 − r(x, t )dt] ∞ ∞
+ F[r | xres , t + dt]r(x, t )dt. (A3) J (x|L) = −α dτ e−βτ x(τ )2 x0 − c e−βti x0 . (B2)
0 i=1
We set x ∈ ∂(t ) and we expand Eq. (A3) to the first nontriv-
The two average quantities can be computed by standard
ial order in dt. The main difference with the case x ∈ (t ),
renewal techniques and one obtains
discussed above, is that the linear term in η(t ) does not vanish
√
as the gradient of the payoff functional with respect to x 2D x 2 L 2 cosh(x0 β/D)
vanishes outside of the domain (t ). We obtain J (x|L) = −α 2 + 0 − √
β β β cosh(L β/D) − 1
√ √ √
0 = 2D∇ x F[r | x, t] · η(t )dt 1[x + 2D η(t )dt ∈ (t )], cosh(x0 β/D)
(A4) −c √ . (B3)
cosh(L β/D) − 1
where 1 is the indicator
√ function and the average over η(t ) is Upon minimizing over L and setting β = D = 1, we recover
of the order of O( dt ). By decomposing η(t ) over the parallel the expression in Eqs. (9) and (10) in the main text. The
and perpendicular components of the normal direction n(x) to optimal payoff in Eq. (9) in the main text is shown as a
the boundary at x, Eq. (A4) immediately gives the boundary function of the cost-reward ratio in Fig. 3. The optimal payoff
condition ∇ x F[r|x, t] · n(x) = 0, which is, by definition, also decreases monotonically as the cost-reward ratio is increased.
valid for J.
APPENDIX C: DIRAC δ FINAL REWARD
APPENDIX B: FIRST-PASSAGE RESETTING
IN A FINITE INTERVAL
In this Appendix, we consider the case of a Dirac δ final
reward in d = 1, corresponding to R(x, t ) = αδ(x − x f )δ(t −
In this Appendix, we show that the optimal policy in the t f ), where α > 0 is the magnitude of the reward and x f is
special case of discounted payoffs can be obtained within the target location. We also assume that f (x, t ) = 0, c(x, t ) =
the framework of first-passage resetting. We consider a one- c > 0 and we consider the optimal payoff in the time-reversed
dimensional diffusive particle with diffusion coefficient D dynamics, defined as I (x, τ ) = J (x, t f − τ ). It is easy to show
whose position x belongs to the finite interval [−L, L] with that I (x, τ ) satisfies the diffusion equation,
L > 0. We assume that when the particle reaches any of the
interval boundaries, it is instantaneously reset to the origin ∂τ I (x, τ ) = D∂xx I (x, τ ), (C1)
xres = 0. We assume that at the initial time, the particle is with initial condition I (x, τ = 0) = αδ(x − x f ). Assuming
at position x0 , with −L < x0 < L. In analogy with the dis- that no boundary appears for small τ (to be verified a posteri-
counted payoff in the main text, we associate a discounted ori), the optimal payoff function is the usual Gaussian profile,
reward R(x, t ) = e−βt R(x) and discounted cost of resetting,
c(x, t ) = e−βt C(x), to the trajectory of the particle. Consider- 1
e−(x−x f ) /(4πDτ )
2
I (x, τ ) = α √ . (C2)
ing an infinite time horizon, the average payoff is therefore 4π Dt
013122-5
BENJAMIN DE BRUYNE AND FRANCESCO MORI PHYSICAL REVIEW RESEARCH 5, 013122 (2023)
013122-6
RESETTING IN STOCHASTIC OPTIMAL CONTROL PHYSICAL REVIEW RESEARCH 5, 013122 (2023)
with c(x, τ ) 0. Following the procedure outlined in the main text, we obtain the moving boundary problem,
The differential equation (E2) must by solved by imposing the continuity of the solution and its derivative at the boundary of
(t ). The optimal resetting rate r ∗ (x, t ) is no longer bang-bang and is given by
0, x ∈ (t )
r ∗ (x, t ) = (E4)
[J (xres , t ) − J (x, t )]/c[x(τ ), τ ], x ∈
/ (t ).
013122-7
BENJAMIN DE BRUYNE AND FRANCESCO MORI PHYSICAL REVIEW RESEARCH 5, 013122 (2023)
where the control is the sequence of resetting probabilities, {pm , . . . , pn }, and the average on the right-hand side is taken over
all system trajectories starting from xm at step m. Following a similar approach as in the previous sections, we obtain that the
optimal policy is given by
0 if x ∈ m ,
pm (x) = (F3)
1 if x ∈ / m ,
where m ⊆ Rd is a time-dependent domain. The corresponding discrete moving boundary problem for the optimal payoff
function, J (x, m) = max{pm ,...,pn } F[{pm , . . . , pn } | x, m], is given by
∞
J (x, m) = R(x, m) + max J (xres , m + 1) + c(x, m), dη q(η) J[x + fm (x) + η, m + 1] , (F4)
−∞
with
∞
m = x : dη p(η) J[x + fm (x) + η, m + 1] J (xres , m + 1) + c(x, m) . (F5)
−∞
[1] L. Pontryagin, V. Boltyanskii, R. Gamkrelidze, and E. [11] A. Dolgui, D. Ivanov, S. P. Sethi, and B. Sokolov, Scheduling
Mishchenko, The Mathematical Theory of Optimal Processes in production, supply chain and Industry 4.0 systems by optimal
(Interscience, New York, 1962). control: fundamentals, state-of-the-art and applications, Intl. J.
[2] R. Bellman and R. E. Kalaba, Dynamic Programming and Mod- Product. Res. 57, 411 (2019).
ern Control Theory (Academic Press, New York, 1965). [12] V. Gómez, S. Thijssen, A. Symington, S. Hailes, and H. J.
[3] S. Russell and P. Norvig, Artificial Intelligence: A Modern Kappen, Real-time Stochastic optimal control for multi-agent
Approach (Prentice Hall, Englewood Cliffs, NJ, 2003). quadrotor systems, in Twenty-Sixth International Conference on
[4] H. Pham, Continuous-time Stochastic Control and Optimization Automated Planning and Scheduling, edited by A. Coles, A.
with Financial Applications (Springer Science & Business Me- Coles, S. Edelkamp, D. Magazzeni, and S. Sanner (AAAI Press,
dia, Berlin, 2009). Palo Alto, CA, 2016).
[5] R. Stengel, Optimal Control and Estimation (Dover, New York, [13] K. Khanin, S. Nechaev, G. Oshanin, A. Sobolevski, and O.
1993). Vasilyev, Ballistic deposition patterns beneath a growing KPZ
[6] S. N. Majumdar and H. Orland, Effective Langevin equations interface, Phys. Rev. E 82, 061107 (2010).
for constrained stochastic processes, J. Stat. Mech. (2015) [14] A. Montanari and R. Zecchina, Optimizing Searches via Rare
P06039. Events, Phys. Rev. Lett. 88, 178701 (2002).
[7] H. J. Kappen, Linear Theory for Control of Nonlinear Stochas- [15] M. F. Shlesinger, Search research, Nature (London) 443, 281
tic Systems, Phys. Rev. Lett. 95, 200201 (2005). (2006).
[8] T. Schmiedl and U. Seifert, Optimal Finite-Time Processes [16] M. R. Evans and S. N. Majumdar, Diffusion with Stochastic
In Stochastic Thermodynamics, Phys. Rev. Lett. 98, 108301 Resetting, Phys. Rev. Lett. 106, 160601 (2011).
(2007). [17] M. R. Evans and S. N. Majumdar, Diffusion with optimal reset-
[9] L. Bertini, A. De Sole, D. Gabrielli, G. Jona-Lasinio, and C. ting, J. Phys. A: Math. Theor. 44, 435001 (2011).
Landim, Macroscopic fluctuation theory, Rev. Mod. Phys. 87, [18] S. Reuveni, M. Urbakh, and J. Klafter, Role of substrate un-
593 (2015). binding in Michaelis Menten enzymatic reactions, Proc. Natl.
[10] U. Boscain, M. Sigalotti, and D. Sugny, Introduction to the Acad. Sci. USA 111, 4391 (2014).
pontryagin maximum principle for quantum optimal control, [19] A. Economou and D. Fakinos, A continuous-time Markov chain
Phys. Rev. X Quantum 2, 030203 (2021). under the influence of a regulating point process and applica-
013122-8
RESETTING IN STOCHASTIC OPTIMAL CONTROL PHYSICAL REVIEW RESEARCH 5, 013122 (2023)
tions in stochastic models with catastrophes, Eur. J. Operat. Res. [29] W. H. Fleming and H. M. Soner, Controlled Markov Processes
149, 625 (2003). and Viscosity Solutions (Springer Science & Business Media,
[20] P. Visco, R. J. Allen, S. N. Majumdar, M. R. Evans, Switching New York, 2006).
and growth for microbial populations in catastrophic responsive [30] B. De Bruyne, J. Randon-Furling, and S. Redner, Optimization
environments, Biophys. J. 98, 1099 (2010). in First-Passage Resetting, Phys. Rev. Lett. 125, 050602 (2020).
[21] V. Méndez and D. Campos, Transport properties and first-arrival [31] B. De Bruyne, J Randon-Furling, and S. Redner, Optimization
statistics of random motion with stochastic reset times, Phys. and growth in first-passage resetting, J. Stat. Mech. (2021)
Rev. E 93, 022106 (2016). 013203.
[22] S. Eule and J. J. Metzger, Nonequilibrium steady states of [32] This is the same way as for the standard deviation of the white
stochastic processes with intermittent resetting, New J. Phys. noise.
18, 033006 (2016). [33] O. Tal-Friedman, A. Pal, A. Sekhon, S. Reuveni, and Y.
[23] F. Mori, S. N. Majumdar, and G. Schehr, Distribution of the Roichman, Experimental realization of diffusion with stochastic
time of the maximum for stationary processes, Europhys. Lett. resetting, J. Phys. Chem. Lett. 11, 7350 (2020).
135, 30003 (2021). [34] B. Besga, A. Bovon, A. Petrosyan, S. N. Majumdar, and S.
[24] L. Kuśmierz, S. N. Majumdar, S. Sabhapandit, G. Schehr, First Ciliberto, Optimal mean first-passage time for a Brownian
Order Transition for the Optimal Search Time of Lévy Flights searcher subjected to resetting: Experimental and theoretical
with Resetting, Phys. Rev. Lett. 113, 220602 (2014). results, Phys. Rev. Res. 2, 032029(R) (2020).
[25] S. N. Majumdar, S. Sabhapandit, and G. Schehr, Dynamical [35] B. Besga, F. Faisant, A. Petrosyan, S. Ciliberto, and S. N.
transition in the temporal relaxation of stochastic processes Majumdar, Dynamical phase transition in the first-passage
under resetting, Phys. Rev. E 91, 052131 (2015). probability of a Brownian motion, Phys. Rev. E 104, L012102
[26] D. Campos and V. Méndez, Phase transitions in optimal search (2021).
times: How random walkers should combine resetting and flight [36] W. Kermack and A. G. McKendrick, A contribution to the
scales, Phys. Rev. E 92, 062115 (2015). mathematical theory of epidemics, Proc. R. Soc. London A 115,
[27] M. R. Evans, S. N. Majumdar, and G. Schehr, Stochastic Re- 700 (1927)
setting and Applications, J. Phys. A: Math. Theor. 53, 193001 [37] J. Samuel and S. Sinha, Optimal control in pandemics, Phys.
(2020). Rev. E 103, L010301 (2021).
[28] J. Crank, Free and Moving Boundary Problems (Oxford Univer- [38] S. N. Majumdar, Persistence in nonequilibrium systems, Curr.
sity Press, New York, 1984). Sci. 89, 2076 (2005).
013122-9