0% found this document useful (0 votes)

18 views20 pages

Deep Learning Matching Market

This document discusses the application of deep learning to design two-sided matching mechanisms, focusing on the tradeoff between strategy-proofness and stability. It introduces novel differentiable surrogates for quantifying these properties and demonstrates that deep learning can achieve better outcomes than traditional mechanisms like deferred acceptance, top trading cycles, and randomized serial dictatorship. The findings suggest new avenues for economic theory and matching market design, highlighting the potential of machine learning in this area.

Uploaded by

bee65473

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views20 pages

Deep Learning Matching Market

Uploaded by

bee65473

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Deep Learning for Two-Sided Matching

Sai Srivatsa Ravindranatha , Zhe Fenga , Shira Lic , Jonathan Mab , Scott D. Kominersc , and
David C. Parkesa
a
John A. Paulson School of Engineering and Applied Sciences, Harvard University
saisr,zhe_feng,parkes@g.harvard.edu
arXiv:2107.03427v2 [cs.GT] 15 Nov 2023

b
Harvard College
jonathan.q.ma@gmail.com
c
Harvard Business School
shirali@g.harvard.edu, kominers@fas.harvard.edu

November 16, 2023

Abstract
We initiate the study of deep learning for the automated design of two-sided matching
mechanisms. What is of most interest is to use machine learning to understand the possibility
of new tradeoffs between strategy-proofness and stability. These properties cannot be achieved
simultaneously, but the efficient frontier is not understood. We introduce novel differentiable
surrogates for quantifying ordinal strategy-proofness and stability and use them to train differen-
tiable matching mechanisms that map discrete preferences to valid randomized matchings. We
demonstrate that the efficient frontier characterized by these learned mechanisms is substantially
better than that achievable through a convex combination of baselines of deferred acceptance
(stable and strategy-proof for only one side of the market), top trading cycles (strategy-proof
for one side, but not stable), and randomized serial dictatorship (strategy-proof for both sides,
but not stable). This gives a new target for economic theory and opens up new possibilities for
machine learning pipelines in matching market design.

1 Introduction
Two-sided matching markets, classically used for settings such as high-school matching, medical
residents matching, and law clerk matching, and more recently used in online platforms such as
Uber, Lyft, Airbnb, and dating apps, play a significant role in today’s world. As a result, there is a
significant interest in designing better mechanisms for two-sided matching.
The seminal work of Gale and Shapley [1962] introduces a simple mechanism for stable, one-to-one
matching in two-sided markets—deferred-acceptance (DA)—which has been applied in many settings,
including doctor-hospital matching [Roth and Peranson, 1999], school choice [Abdulkadiroğlu and
Sönmez, 2003, Pathak and Sönmez, 2008, Abdulkadiroğlu et al., 2009], and cadet-matching [Sönmez
and Switzer, 2013, Sönmez, 2013]. The DA mechanism is stable, i.e., no pair of participants prefer
each other to their match (or to being unmatched, if they are unmatched in the outcome). However,
the DA mechanism is not strategy-proof (SP), and a participant can sometimes misreport their
preferences to obtain a better outcome (although it is SP for participants on one side of the market).
Although widely used, this failure of SP for the DA mechanism presents a challenge for two main
reasons. First, it can lead to unfairness, where better-informed participants can gain an advantage

1
in knowing which misreport strategies can be helpful. Second, strategic behavior can lead to lower
quality, unintended outcomes, and outcomes that are unstable with respect to true preferences.
In general, it is well-known that there must necessarily be a tradeoff between stability and
strategy-proofness: it is provably impossible for a mechanism to achieve both stability and strategy-
proofness [Dubins and Freedman, 1981, Roth, 1982]. A second example of a matching mechanism is
random serial dictatorship (RSD) [Abdulkadiroglu and Sönmez, 1998], which is typically adopted
for one-sided assignment problems rather than two-sided matching. When adapted to two-sided
matching, RSD is SP but not stable. In fact, a participant may even prefer to remain unmatched
than participate in the outcome of the matching. A third example of a matching mechanism is the
top trading cycles (TTC) mechanism [Shapley and Scarf, 1974], also typically adopted for one-sided
assignment problems rather than problems of two-sided matching. In application to two-sided
matching, TTC is neither SP nor stable (although it is SP for participants on one side of the market).
There have been various research efforts to circumvent this impossibility result. Some relax the
definition of strategyproofness [Mennle and Seuken, 2021] while others characterize the constraints
under which stability and strategyproofness are achieved simultaneously [Kamada and Kojima,
2018, Hatfield et al., 2021, Hatfield and Milgrom, 2005]. The tradeoff between these desiderata
remains poorly understood beyond the existing point solutions of DA, RSD, and TTC. However, we
argue that real-world scenarios demand a more nuanced approach that considers both properties.
The case of the Boston school choice mechanism highlights the negative consequences of lacking
strategy-proofness, resulting in unfair manipulations by specific parents [Abdulkadiroglu et al., 2006].
At the same time, the importance of stability in matching markets is well understood [Roth, 1991].
Recognizing this, and inspired by the success of deep learning in the study of revenue-optimal
auction design [Duetting et al., 2019, Curry et al., 2020, Shen et al., 2019, Rahme et al., 2020],
we initiate the study of deep learning for the design of two-sided matching mechanisms We ask
whether deep learning frameworks can enable a systematic study of this tradeoff. By answering this
question affirmatively, we open up the possibility of using machine learning pipelines to open up
new opportunities for economic theory—seeking theoretical characterizations of mechanisms that
can strike a new balance between strategyproofness and stability.
We use a neural network to represent the rules of a matching mechanism, mapping preference
reports to a distribution over feasible matchings, and show how we can use an unsupervised learning
pipeline to characterize the efficient frontier for the design tradeoff between stability and SP. The
main methodological challenge in applying neural networks to two-sided matching comes from
handling the ordinal preference inputs (the corresponding inputs are cardinal in auction design) and
identifying suitable, differentiable surrogates for approximate strategy-proofness and approximate
stability.
We work with randomized matching mechanisms, for which the strongest SP concept is ordinal
strategy-proofness. This aligns incentives with truthful reporting, whatever an agent’s utility function
(i.e., for any cardinal preferences consistent with an agent’s ordinal preferences). Ordinal SP is
equivalent to the property of first-order stochastic dominance (FOSD) [Erdil, 2014], which suitably
defines the property that an agent has a better chance of getting their top, top-two, top-three, and
so forth choices when they report truthfully. As a surrogate for SP, we quantify during training the
degree to which FOSD is violated. For this, we adopt an adversarial learning approach, augmenting
the training data with defeating misreports that reveal the violation of FOSD. We also define a
suitable surrogate to quantify the degree to which stability is violated. This surrogate aligns with
the notion of ex ante stability—the strongest stability concept for randomized matching.
We train neural network representations of matching mechanisms by adopting stochastic gradient
descent (SGD) for loss functions that are defined on different convex combinations of the two
surrogate quantities, and construct the efficient frontier for stability and strategy-proofness for

2
different market settings. A further challenge with ordinal preference inputs as opposed to cardinal
inputs arises because ordinal preferences are discrete. In the present work, we simply enumerate the
possible misreports of an agent as a step when evaluating the derivative of error for a particular
training example. In contrast, the use of deep-learning approaches for the design of revenue-optimal
auctions works with a continuous space of agent valuations and gradient-ascent in this adversarial
step of identifying useful misreports [Duetting et al., 2019]. As discussed at the end of the paper,
the challenge that this presents in scaling to large numbers of agents can be resolved by assuming a
suitable structure on preference orderings in the domain, or on the language that is made available
to agents in reporting their preferences to the mechanisms.
Our main experimental results demonstrate that this novel use of deep learning can strike a
much better trade-off between stability and SP than that achieved by a convex combination of
the DA, TTC, and RSD mechanisms. Taken as a whole, these results suggest that deep learning
pipelines can be used to identify new opportunities for matching theory. For example, we identify
mechanisms that are provably almost as stable as DA and yet considerably more strategy-proof. We
also identify mechanisms that are provably almost as strategy-proof as RSD and yet considerably
more stable. These discoveries raise opportunities for future work in economic theory, in regard to
understanding the structure of these two-sided matching mechanisms as well as characterizing the
preference distributions for which this is possible.

2 Related work.
Dubins and Freedman [1981] and Roth [1982] show the impossibility of achieving both stability and
SP in two-sided matching. Alcalde and Barberà [1994] also show the impossibility of individually
rational, Pareto efficient, and SP allocation rules, and this work has been extended to randomized
matching [Alva and Manjunath, 2020]. RSD is SP but may not be stable or even individually
rational (IR) [Abdulkadiroglu and Sönmez, 1998]. We will see that the top trading cycles (TTC)
mechanism [Shapley and Scarf, 1974], when applied in a two-sided context, is only SP for one side,
and is neither stable nor IR. The DA mechanism [Gale and Shapley, 1962] is stable but not SP; see
also [Roth et al., 1993], who study the polytope of stable matchings. The stable improvement cycles
mechanism [Erdil and Ergin, 2008] achieves as much efficiency as possible on top of stability but
fails to be SP even for one side of the market. Finally, a series of results show that DA becomes SP
for both sides of the market in large-market limit contexts [Immorlica and Mahdian, 2015, Kojima
and Pathak, 2009, Lee, 2016].
Aziz and Klaus [2019] discuss different stability and no envy concepts. We focus on ex ante
stability [Kesten and Ünver, 2015], also discussed by [Roth et al., 1993] as strong stability. Mennle and
Seuken [2021] discuss different notions of approximate strategy-proofness in the context of matching
and allocation problems. In this work, we focus on ordinal SP and its analog of FOSD [Erdil, 2014].
This is a strong and widely used SP concept in the presence of ordinal preferences. There are a lot
of other desiderata, such as efficiency, that are also incompatible with strategyproofness. Mennle
and Seuken [2017] study this trade-off through hybrid mechanisms which are convex combinations of
a mechanism with good incentive properties with another which is efficient. In the context of social
choice, other work studies the trade-off between approximate SP and desiderata, such as plurality
and veto voting [Mennle and Seuken, 2016].
Conitzer and Sandholm [2002, 2004] introduced the automated mechanism design (AMD) ap-
proach that framed problems as a linear program. However, this approach faces severe scalability
issues as the formulation scales exponentially in the number of agents and items [Guo and Conitzer,
2010]. Overcoming this limitation, more recent work seeks to use deep neural networks to address

3
problems of economic design [Duetting et al., 2019, Feng et al., 2018, Golowich et al., 2018, Curry
et al., 2020, Shen et al., 2019, Rahme et al., 2020, Duan et al., 2022, Ivanov et al., 2022], but not
until now to matching problems. As discussed in the introduction, two-sided matching brings about
new challenges, most notably in regard to working with discrete, ordinal preferences and adopting
the right surrogate loss functions for approximate SP and approximate stability. Other work has
made use of support vector machines to search for stable mechanisms, but without considering
strategy-proofness [Narasimhan et al., 2016]. A different line of research is also considering stable
matching together with bandits problems, where agent preferences are unknown a priori [Das
and Kamenica, 2005, Liu et al., 2020, Dai and Jordan, 2021, Liu et al., 2022, Basu et al., 2021,
Sankararaman et al., 2021, Jagadeesan et al., 2021, Cen and Shah, 2022, Min et al., 2022].
There have also been other recent efforts that leverage deep learning for matching (in the context
of online bipartite matching [Alomrani et al., 2022]) and other related combinatorial optimization
problems [Bengio et al., 2021]. Most of these papers adopt a reinforcement learning based approach
to compute their solutions. Our approach, on the other hand, is not sequential but rather end-to-end
differentiable, and our parameter weights are updated through a single backward pass. Additionally,
the focus of our work is on matching markets and mechanism design, and is concerned with capturing
core economic concepts within a machine learning framework and balancing the trade-offs between
stability and strategy-proofness.

3 Preliminaries
Let W denote a set of n workers and F denote a set of m firms. A feasible matching, µ, is a
set of (worker, firm) pairs, with each worker and firm participating in at most one match. Let B
denote the set of all matchings. If (w, f ) ∈ µ, then µ matches w to f , and we write µ(w) = f
and µ(f ) = w. If a worker or firm remains unmatched, we say it is matched to ⊥. We also
write (w, ⊥) ∈ µ (resp. (⊥, f ) ∈ µ). Each worker has a strict preference order, ≻w , over the set
F = F ∪ {⊥}. Each firm has a strict preference order, ≻f , over the set W = W ∪ {⊥}. Worker
w (firm f ) prefers remaining unmatched to being matched with a firm (worker) ranked below ⊥
(the agents ranked below ⊥ are said to be unacceptable). If worker w prefers firm f to f ′ , then we
write f ≻w f ′ , similarly for a firm’s preferences. Let P denote the set of all preference profiles, with
≻= (≻1 , . . . , ≻n , ≻n+1 , ≻n+m ) ∈ P denoting a preference profile comprising of the preference order
of the n workers and then the m firms.
A pair (w, f ) forms a blocking pair for matching µ if w and f prefer each other to their partners
in µ (or ⊥ in the case that one or both are unmatched). A matching µ is stable if and only if there
are no blocking pairs. A matching µ is individually rational (IR) if and only if it is not blocked by
any individual; i.e., no agent finds its match unacceptable and prefers ⊥.1

3.1 Randomized matchings.

We work with randomized matching mechanisms, g, that map preference profiles, ≻, to distributions
on matchings, denoted g(≻) ∈ △(B) (the probability simplex on matchings). Let r ∈ [0, 1](n+1)×(m+1)
denote the marginal probability,P rwf ≥ 0, with which worker w is matched with firm f , for each
w ∈ W and f ∈ F . We require f ′ ∈F rwf = 1 for all w ∈ W , and w′ ∈W rw′ f = 1 for all f ∈ F .
P
′

For notational simplicity, we write gwf (≻) for the marginal probability of matching worker w (or ⊥)
and firm f (or ⊥).
1
Stability precludes empty matchings. For example, if a matching µ leaves a worker w and a firm f unmatched,
where w finds f acceptable, and f finds w acceptable, then (w, f ) is a blocking pair to µ.

4
Theorem 1 (Birkhoff von-Neumann). Given any randomized matching r, there exists a distribution
on matchings, ∆(B), with marginal probabilities equal to r.

The following definition is standard [Budish et al., 2013], and generalizes stability to randomized
matchings.

Definition 2 (Ex ante justified envy). A randomized matching r causes ex ante justified envy if:
(1) some worker w prefers f over some fractionally matched firm f ′ (including f ′ = ⊥) and firm
f prefers w over some fractionally matched worker w′ (including w′ = ⊥) (“w has envy towards w′ "
and “f has envy towards f ′ "), or
(2) some worker w finds a fractionally matched f ′ ∈ F unacceptable, i.e. rwf ′ > 0 and ⊥ ≻w f ′ ,
or some firm f finds a fractionally matched w′ ∈ W unacceptable, i.e. rw′ f > 0 and ⊥ ≻f w′ .

A randomized matching r is ex ante stable if and only if it does not cause any ex ante justified
envy. Ex ante stability reduces to the standard concept of stability for deterministic matching.
Part (1) of the definition includes non-wastefulness: for any worker w, we should have rw⊥ = 0 if
there exists some firm f ′ ∈ F for which rw′ f ′ > 0 and w ≻f ′ w′ and for for any firm f , we need
r⊥f = 0 if there exists some worker w′ ∈ W for which rw′ f ′ > 0 and f ≻w′ f ′ . Part (2) of the
definition captures IR: for any worker w, we should have rwf ′ = 0 for all f ′ ∈ F for which ⊥ ≻w f ′ ,
and for any firm f , we need rw′ f = 0 for all w′ ∈ W for which ⊥ ≻f w′ .
To define strategy-proofness, say that uw : F → R is a ≻w -utility for worker w when uw (f ) >
uw (f ′ ) if and only if f ≻w f ′ , for all f, f ′ ∈ F . We similarly define a ≻f -utility for a firm f .
The following concept of ordinal SP is standard [Erdil, 2014], and generalizes SP to randomized
matchings.

Definition 3 (Ordinal strategy-proofness). A randomized matching mechanism g satisfies ordinal

SP if and only if, for all agents i ∈ W ∪ F , for any preference profile ≻, and any ≻i -utility for agent
i, and for all reports ≻′i , we have

Eµ∼g(≻i ,≻−i ) [ui (µ(i))] ≥ Eµ∼g(≻′i ,≻−i ) [ui (µ(i))]. (1)

By this definition, no worker or firm can improve their expected utility (for any utility function
consistent with their preference order) by misreporting their preference order. For a deterministic
mechanism, ordinal SP reduces to standard SP. Erdil [2014] shows that first-order stochastic
dominance is equivalent to ordinal SP.

Definition 4 (First Order Stochastic Dominance). A randomized matching mechanism g satisfies

first order stochastic dominance (FOSD) if and only if, for worker w, and each f ′ ∈ F such that
f ′ ≻w ⊥, and all reports of others ≻−w , we have (and similarly for the roles of workers and firms
transposed),
X X
gwf (≻w , ≻−w ) ≥ gwf (≻′w , ≻−w ). (2)
f ∈F :f ≻w f ′ f ∈F :f ≻w f ′

FOSD states that, whether looking at its most preferred firm, its two most preferred firms, or
so forth, worker w achieves a higher probability of matching on that set of firms for its true report
than for any misreport. We make use of a quantification of the violation of this condition to provide
a surrogate for the failure of SP during learning.

Theorem 5 ([Erdil, 2014]). A two-sided matching mechanism is ordinal SP if and only if it satisfies
FOSD.

5
3.2 Deferred Acceptance, RSD, and TTC.
We consider three benchmark mechanisms: the stable but not SP deferred-acceptance (DA) mecha-
nism, the SP but not stable randomized serial dictatorship (RSD) mechanism, and the Top Trading
Cycles (TTC) mechanism, which is neither SP nor stable. The DA and TTC mechanisms are ordinal
SP for the proposing side of the market but not for agents on both sides of the market.

Definition 6 (Deferred-acceptance (DA)). In worker-proposing deferred-acceptance (firm-proposing

is defined analogously), each worker w maintains a list of acceptable firms (f ≻w ⊥) for which it has
not had a proposal rejected (“remaining firms"). Repeat until all proposals are accepted:

• ∀w ∈ W : w proposes to its best acceptable, remaining firm.

• ∀f ∈ F : f tentatively accepts its best proposal (if any), and rejects the rest.

• ∀w ∈ W : If w is rejected by firm f , it updates its list of acceptable firms to remove f .

Theorem 7 (see [Roth and Sotomayor, 1990]). DA is stable but not Ordinal SP.

Definition 8 (Randomized serial dictatorship (RSD)). In the two-sided version of RSD, we first
sample a priority order, π, on the set W ∪ F , uniformly at random, such that π = (π1 , π2 , . . . , πm+n )
is a permutation on W ∪ F in decreasing order of priority. For the one-sided version, we sample a
priority order π on either W or F .
Proceed as follows:

• Initialize matching µ to the empty matching.

• In round k = 1, . . . , |π|:

– If πk is not yet matched in µ, then add to µ the match between πk and its most preferred
unmatched agent, or ⊥ if all remaining agents are unacceptable to πk .

Theorem 9. RSD satisfies FOSD—and thus is ordinal SP by Theorem 5—but is not stable.

Proof. We defer the proof to appendix A

Definition 10 (Top Trading Cycles (TTC)). In worker-proposing TTC (firm-proposing is defined

analogously), each agent (worker or firm) maintains a list of acceptable firms. Repeat until all agents
are matched:

• Form a directed graph with each unmatched agent pointing to their most preferred option.
The agents can point at themselves if there are no acceptable options available. Every worker
that is a part of a cycle is matched to a firm it points to ( or itself, if the worker is pointing at
itself). The unmatched agents remove from their lists every matched agent from this round.

Theorem 11. TTC is neither strategy-proof nor stable for both sides.

Proof. Example 17 in Appendix B shows that TTC is neither strategyproof nor stable.

Remark 12. TTC, like RSD, is usually used in one-sided assignment problems, where it is SP, and
where the notion of stability which is an important consideration in two-sided matching, is not a
concern.

6
column − wise
normalization
β
β ...
× s̄11 s̄1m
p11 s11 σ+ r11 = min{ŝ11 , ŝ′11 }
(1) (R)
.. h1 h1 .. .. ..
β . . . β s̄n1 ... s̄nm .
ŝ
pnm sn+1m σ+ rn1 = min{ŝn1 , ŝ′n1 }
(1) (R) ×
h2 h2 ..
... s̄⊥1 . s̄⊥m ..
β ŝ′ .
.. .. β
q11 . . s′11 σ+ r1m = min{ŝ1m , ŝ′1m }
× s̄′11 ... s̄′1m s̄′1⊥
.. .. .. ..
β . (1) (R) . . β .. .
hJ1 hJR .
qnm s′nm+1 σ+ rnm = min{ŝnm , ŝ′nm }
× s̄′n1 ... s̄′nm s̄′n⊥

row − wise
normalization

Figure 1: Matching network g for a set of n workers and m firms. Given inputs p, q ∈ Rn×m the matching
network is a feed-forward network with R hidden layers that uses softplus activation to generate non-negative
scores and normalization to compute the randomized matching. We additionally generate a Boolean mask
matrix, β, and multiply it with the score matrix before normalization to ensure IR by making the probability
of matches that are unacceptable zero.

4 Two-Sided Matching as a Learning Problem

In this section, we develop the use of deep learning for the design of two-sided matching mechanisms.

4.1 Neural Network Architecture

We use a neural network to represent a matching mechanism. Let g θ : P → △(B) denote the
mechanism, for parameters θ ∈ Rd . The input is a preference profile, and the output defines a
distribution on matchings. We use a feed-forward neural network with R = 4 fully connected hidden
layers, J = 256 units in each layer, leaky ReLU activations, and a fully connected output layer.2 . See
Figure 1.
To represent an agent’s preference order in the input, we adopt a utility for each agent on the other
side of the market that has a constant offset in utility across successive agents in the preference order.
This is purely a representation choice and does not imply that we use this particular utility to study
SP (on the contrary, we work with a FOSD-based quantification of the degree of approximation to
ordinal SP). In particular, let p≻
w = (pw1 , . . . , pwm ) and qf = (q1f , . . . , qnf ) represent the preference
≻ ≻ ≻ ≻ ≻

order of a worker and firm, respectively. We define p≻ w⊥ = 0 and q⊥f = 0, and to further illustrate
≻

this representation:

• For a preference order ≻ with f1 : w2 , w1 , w3 , we represent this at the input as qf≻1 = ( 23 , 1, 31 ).

• For a preference order ≻ with w1 : f1 , f2 , ⊥, f3 , we represent this as p≻ 2 1 1

w1 = ( 3 , 3 , − 3 ).

• For a preference order ≻ with w1 : f1 , f2 , ⊥, f3 , f4 , we represent this as p≻ 2 1 1 2

w1 = ( 4 , 4 , − 4 , − 4 ).
2
Leaky ReLU is a variation of ReLU that addresses the issue of dead neurons by allowing a small positive gradient
for negative input values, thereby preventing gradients from getting stuck at 0

7
Formally, we have:
 
m
1  X
p≻ (3)

wj = 1j≻w ⊥ + 1j≻w j ′ − 1⊥≻w j ′ 
m
j ′ =1
n
!
≻ 1 X
(4)

qif = 1i≻f ⊥ + 1i≻f i′ − 1⊥≻f i
n
i′ =1

where 1X is the indicator function for event X.

Taken together, the input is vector (p≻ ≻ ≻ ≻ ) of 2 × n × m numbers.
11 , . . . , pnm , q11 , . . . , qnm
, with m
Pn
The output of the network is a vector r ∈ [0, 1] n×m
j=1 rwj ≤ 1 and i=1 rif ≤ 1 for
P
every every w ∈ [n] and f ∈ [m]. This defines the marginal probabilities in a randomized matching for
this input profile. The network first outputs two sets of scores s ∈ R(n+1)×m and s′ ∈ Rn×(m+1) . We
apply the softplus function (denoted by σ+ ) element-wise to these scores, where σ+ (x) = ln(1 + ex ).
To ensure IR, we first construct a Boolean mask variable βwf , which is zero only when the match is
unacceptable to one or both the worker and firm, i.e., when ⊥ ≻w f or ⊥ ≻f w. We set βn+1,f = 1
for f ∈ F and βw,m+1 = 1 for w ∈ W . We multiply the scores s and s′ element-wise with the
(n+1)×m n×(m+1)
corresponding Boolean mask variable to compute s̄ ∈ R≥0 and s̄′ ∈ R≥0 .
For each w ∈ W , we have s̄wf = βwf ln(1 + eswf ), for all f ∈ F . For each f ∈ F , we have
′
s̄′wf = βwf ln(1 + eswf ), for all w ∈ W . We normalize s̄ along the rows and s̄′ along the columns to
obtain normalized scores, ŝ and ŝ′ respectively. The match probability rwf , for worker w ∈ W and firm
s̄′wf
f ∈ F , is computed as the minimum of the normalized scores: rwf = min P s̄wf , P ′ .
f ′ ∈F s̄wf ′ w′ ∈W s̄w′ f

We have rwf = 0 whenever βwf = 0, ensuring that every matching in the support of the distribution
will be IR. Based on our construction, the allocation matrix r is weakly doubly stochastic, with rows
and columns summing to at most 1. Budish et al. [2013] show that any weakly doubly stochastic
matrix can be decomposed to a convex combination of 0-1, weakly doubly stochastic matrices.

4.2 Formulation as a Learning Problem

We formulate a loss function L that is defined on training data of ℓ preference profiles, D = {≻(1)
, . . . , ≻(ℓ) }. Each preference profile ≻ sampled i.i.d. from a distribution on profiles. We allow
for correlated preferences; i.e., workers may tend to agree that one of the firms is preferable to
one of the other firms, and similarly for firms. The loss function captures a tradeoff between
stability and ordinal
Pm SP. Recall that g θ (≻) ∈ [0,P1]n×m denotes the randomized matching. We write
gw⊥ (≻) = 1 − f =1 gwf (≻) and g⊥f (≻) = 1 − nw=1 gwf
θ θ θ θ (≻) to denote the probability of worker w

and firm f being unmatched, respectively.

Stability Violation. For worker w and firm f , we define the stability violation at profile ≻ as
 
X
≻
stv wf (g θ , ≻) =  θ
gw ′ f (≻) · max{q − q ≻′ , 0}
wf wf
w′ ∈W
 
X
≻ ≻
× θ
gwf ′ (≻) · max{pwf − pwf ′ , 0}
. (5)
f ′ ∈F

This captures the first kind of ex ante justified envy in Definition 2. We can omit the second
kind of ex ante justified envy because the learned mechanisms satisfy IR through the use of the
Boolean mask matrix (and thus, there are no violations of the second kind).

8
The average P
stability violation (or just stability violation) of mechanism g θ on profile ≻ is stv (g θ , ≻
n m
) = 12 m f =1 stv wf (g , ≻). We define the expected stability violation, STV (g ) =
1
+ n1 θ θ
P
w=1
E≻ stv (g , ≻). We also write stv (g ) to denote the average stability violation on the training data.
θ θ

Theorem 13. A randomized matching mechanism g θ is ex ante stable up to zero-measure events if

and only if STV (g θ ) = 0.

Proof. We defer the proof to appendix C

Ordinal SP violation. We turn now to quantifying the degree of approximation to ordinal SP.
Let ≻−i = (≻1 , . . . , ≻i−1 , ≻i+1 , . . . , ≻n+m ). For a valuation profile, ≻ ∈ P , and a mechanism g θ , let
∆wf (g θ , ≻′w , ≻) = gwf −w ) − gwf (≻w , ≻−w ). The regret to worker w (firm f ) is defined as:
θ (≻′ , ≻ θ
w
 
X
regretw (g θ , ≻) = max
′
 max
′
∆wf (g θ , ≻′w , ≻) .
≻w ∈P f ≻w ⊥
f ≻w f ′
  (6)
X
regretf (g θ , ≻) = max
′
 max
′
∆wf (g θ , ≻′f , ≻) .
≻f ∈P w ≻f ⊥
w≻f w′

The average regret on profile ≻ is:

 
1 1 X 1 X
regret(g θ , ≻) =  regretw (g θ , ≻) + regretf (g θ , ≻) (7)
2 m n
w∈W f ∈F

The expected regret is RGT (g θ ) = E≻ regret(g θ , ≻). We can also write rgt(g θ ) to denote the average
regret on training data.

Theorem 14. The regret to a worker (firm) for a given preference profile is the maximum amount
by which the worker (firm) can increase their expected normalized utility through a misreport, fixing
the reports of others.

Proof. Consider some worker w ∈ W . Without loss of generality, let ≻w : f1 , . . . , fk , ⊥, fk+1 , . . . , fm .

Any normalized ≻w -utility function, uw , consistent with ordering given by ≻w satisfies 1 ≥ uw (f1 ) ≥
uw (f2 ) ≥ . . . uw (fk ) ≥ 0 ≥ uw (fk+1 ) ≥ . . . uw (fm ). Let Uw be the set of all such consistent utility
functions.
Consider some misreport ≻′w . We have ∆wf (g θ , ≻′w , ≻) = gwf θ (≻′ , ≻
w −w ) − gwf (≻w , ≻−w ). The
θ

increase in utility for worker w when the utility function is uw is given by f ∈F uw (f )∆wf (g θ , ≻′w , ≻).
P
The maximum amount by which worker w can increase P their expected normalized utility through
misreport ≻w is given by the objective: maxuw ∈Uw f ∈F uw (f )∆wf (g , ≻′w , ≻).
′ θ

Since g θ always guarantees IR, we have:

X X
uw (f )∆wf (g θ , ≻′w , ≻) = θ
uw (f )gwf (≻′w , ≻−w ) ≤ 0 (8)
f ∈F :⊥≻f f ∈F :⊥≻w f

. Thus, we can simplify our search space by only considering uw ∈ Uw where uw (fk+1 ), . . . , uw (fm ) =
0.

9
Define δk = uw (fk ), δk−1 = uw (fk−1 ) − uw (fk ), δ1 = uw (f1 ) − uw (f2 ). This objective can thus be
rewritten as:
 
Xk X k
max  δi  ∆wf (g θ , ≻′w , ≻) (9)
f =1 i=f
k
X
such that δi ≤ 1 and δ1 , . . . δk ≥ 0 (10)
i=1

Changing the order of summation, we have the following optimization problem:

 
k
X i
X
max δi  ∆wf (g θ , ≻′w , ≻) (11)
i=1 f =1
k
X
such that δi ≤ 1 and δ1 , . . . δk ≥ 0 (12)
i=1

This objective is of the form max∥x∥1 ≤1 xT y and it’s solution is given by the ||y||∞ . Thus, the
solution to the above maximization problem is given by maxi∈[k] if =1 ∆wf (g θ , ≻′w , ≻). But this
P

is the same as maxf ′ :f ′ ≻w ⊥ f :f ≻w f ′ ∆wf (g θ , ≻′w , ≻). Computing the maximum possible increase
P

over all such misreports gives us max≻w′ ∈P maxf ′ :f ′ ≻w ⊥ f :f ≻w f ′ ∆wf (g θ , ≻′w , ≻) . This quantity
P

is exactly regret w (g θ , ≻). The proof follows similarly for any firm f .

Theorem 15. A randomized mechanism, g θ , is ordinal SP up to zero-measure events if and only if

RGT (g θ ) = 0.
The proof is similar to Theorem 13, and deferred to Appendix D.

4.3 Training Procedure

For a mechanism parameterized as g θ , the training problem that we formulate is,
min λ · stv (g θ ) + (1 − λ) · rgt(g θ ), (13)
θ

where λ ∈ [0, 1] controls the tradeoff between approximate stability and approximate SP. We use
SGD to solve (13). The gradient of the degree of violation of stability with respect to network
parameters is straightforward to calculate. The gradient of regret is complicated by the nested
maximization in the definition of regret. In order to compute the gradient, we first solve the inner
maximization by checking possible misreports. Let ≻ ˆ (ℓ)
i denote the defeating preference report for
agent i (a worker or firm) at preference profile ≻ that maximizes regreti (g θ , ≻(ℓ) ). Given this,
(ℓ)

we obtain the derivative of regret for agent i with respect to the network parameters, fixing the
misreport to the defeating valuation and adopting truthful reports for the others.
For the case of TTC and RSD, we also need to quantify the IR violation (this is necessarily zero
for the other mechanisms). We define the IR violation at profile ≻ as:
n m
1 XX
≻

irv (g, ≻) = gwf (≻) · max{−qwf , 0}
2m
w=1 f =1
n m (14)
1 XX
+ gwf (≻) · max{−p≻
wf , 0}
2n
w=1 f =1

10
Uncorrelated Preference 0.06 pcorr = 0.25
0.06 DA
DA
= 0.8 TTC 0.04 TTC

SP-violation

SP-violation
0.04
= 0.9

0.02 = 0.4 0.02 = 0.4 = 0.3

RSD RSD
0.00 0.00
0.000 0.025 0.050 0.075 0.100 0.000 0.025 0.050 0.075 0.100
Stability Violation Stability Violation
0.03 pcorr = 0.50 0.010 pcorr = 0.75
DA DA
0.008 = 0.7
0.02 = 0.9 TTC TTC
SP-violation

SP-violation
0.006
= 0.3 = 0.1
0.01 RSD 0.004
= 0.2 RSD
0.002
0.00 0.000
0.000 0.025 0.050 0.075 0.100 0.000 0.025 0.050 0.075 0.100
Stability Violation Stability Violation

Figure 2: Comparing stability violation and strategy-proofness violation from the learned mechanisms for
different choices of λ (red dots) with the best of worker- and firm-proposing DA, as well as TTC, and RSD, in
4 × 4 two-sided matching, and considering uncorrelated preference orders as well as markets with increasing
correlation (pcorr ∈ {0.25, 0.5, 0.75}). The stability violation for TTC and RSD includes IR violations.

For the case of RSD, we also include the average IR violation on test data when reporting the
stability violation.

5 Experimental Results
We report the results on a test set of 204,800 preference profiles, and use the Adam optimizer to
train our models for 50,000 mini-batch iterations, with mini-batches of 1024 profiles. We use the
PyTorch deep learning library, and all experiments are run on a cluster of NVIDIA GPU cores.
We study the following market settings:

• For uncorrelated preferences, for each worker or firm, we sample uniformly at random from
all preference orders, and then, with probability 0.2 (truncation probability), we choose at
random a position at which to truncate this agent’s preference order.

• For correlated preferences, we sample a preference profile as in the uncorrelated case. We

also sample a common preference order on firms and a common preference order on workers.
For each agent, with probability, pcorr > 0, we replace its preference order with the common
preference order for its side of the market.

Specifically, we consider matching problems with n = 4 workers and m = 4 firms with uncorrelated
preference and varying probability of correlation pcorr = {0.25, 0.5, 0.75}.
We compare the performance of our mechanisms, varying parameter λ between 0 and 1, with
the best of worker- and firm- proposing DA and TTC (as determined by average SP violation over
the test data) and RSD3 . We also compare against convex combinations of DA, TTC, and RSD.
We plot the resulting frontier on stability violation (stv (g θ )) and SP violation (rgt(g θ )) in Figure 2.
3
We only plot the performance of one-sided RSD as it achieves lower stability violation the two-sided version

11
0.650
pcorr = 0.00 0.650
pcorr = 0.25 % similarity to DA vs
1.0
0.625 0.625
0.8

% Similarity to DA
Welfare per agent

Welfare per agent

0.600 0.600
0.575 0.575 0.6
Uncorrelated
0.550 g 0.550 g 0.4 pcorr = 0.25
DA DA pcorr = 0.50
0.525 0.525 pcorr = 0.75
RSD RSD 0.2
0.500 0.500
0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

pcorr = 0.50 pcorr = 0.75 Normalized Entropy vs

0.600 0.56
0.6

Normalized Entropy
0.575
Welfare per agent

Welfare per agent

0.54
0.550 0.4
0.525 0.52 Uncorrelated
0.500 g g 0.2 pcorr = 0.25
0.50 pcorr = 0.50
DA DA
0.475 RSD RSD pcorr = 0.75
0.450 0.48 0.0
0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

Figure 3: Left and Middle: Comparing welfare per agent of the learned mechanisms for different values
of the tradeoff parameter λ with the best of the firm- and worker- proposing DA, as well as TTC, and RSD.
Right, Top: Comparing the average instance-wise max similarity scores (sim(g θ )) of the learned mechanisms
with worker- and firm-proposing DA. Right, Bottom: Normalized entropy of the learned mechanisms for
different values of the tradeoff parameter λ. The results are shown for uncorrelated preferences as well as an
increasing correlation between preferences (pcorr ∈ {0.25, 0.5, 0.75})

As explained above, because TTC and RSD mechanisms do not guarantee IR, we include the IR
violations in the reported stability violation (none of the other mechanisms fail IR).
At λ = 0.0, we learn a mechanism that has very low regret (≈ 0) but poor stability. This
performance is similar to that of RSD. For large values of λ, we learn a mechanism that approximates
DA. For intermediate values, we find solutions that dominate the convex combination of DA, TTC,
and RSD and find novel and interesting tradeoffs between SP and stability. Notably, for lower
levels of correlations we see substantially better SP than DA along with very little loss in stability.
Given the importance of stability in practice, this is a very intriguing discovery. For higher levels of
correlations, we see substantially better stability than RSD along with very little loss in SP. It is
also interesting to see that TTC itself has intermediate properties, between those of DA and RSD.
Comparing the scale of the y-axes, we can also see that increasing correlation tends to reduce the
opportunity for strategic behavior across both the DA and the learned mechanisms.
Figure 3 shows the expected welfare for the learned mechanisms, measured here for the equi-
spaced utility function (the function used in the input representation). We define the welfare of a
mechanism g (for the equi-spaced utility function) on a profile ≻ as:

1 1 1 X X
welfare(g, ≻) = + gwf (≻) p≻ wf + q ≻
wf . (15)
2 n m
w∈W f ∈F

We compare against the maximum of the expected welfare achieved by the worker- and firm-
proposing DA and TTC mechanisms, as well as that from RSD. As we increase λ, and the learned
mechanisms come closer to DA, the welfare of the learned mechanisms improves. It is notable that for
choices of λ in the range 0.8 and higher, i.e., the choices of λ that provide interesting opportunities
for improving SP relative to DA, we also see good welfare. We also see that TTC, and especially
RSD have comparably lower welfare. It bears emphasis that when a mechanism is not fully SP,
as is the case for all mechanisms except RSD, this is an idealized view of welfare since it assumes

12
truthful reports. In fact, we should expect welfare to be reduced through strategic misreports and
the substantially improved SP properties of the learned mechanisms relative to DA (Figure 2) would
be expected to further work in favor of improving the welfare in the learned mechanisms relative to
DA.4 Lastly, we observe that for small values of λ the learned mechanisms have relatively low welfare
compared to RSD. This is interesting and suggests that achieving IR together with SP (recall that
RSD is not IR!) is very challenging in two-sided markets. In interpreting the rules of the learned
mechanisms, and considering the importance of DA, we can also compare their functional similarity
with DA. For this, let w-DA and f -DA denote the worker- and firm-proposing DA, respectively. For
a given preference profile, we compute the similarity of the learned rule with DA as
θ
P
M (≻)=1 gwf (≻)
(w,f ):gwf
θ
sim(g , ≻) = max P . (16)
M∈{w-DA,f -DA} (w,f ):g M (≻)=1 1
wf

This calculates the agreement between the two mechanisms, normalized by the size of the DA
matching, and taking the best of w-DA or f -DA. Let sim(g θ ) denote the average similarity score
on test data. As we increase λ, i.e., penalize stability violations more, we see in Figure 3 that the
learned matchings get increasingly close to the DA matchings, as we might expect. We also quantify
the degree of randomness of the learned mechanisms, by computing the normalized entropy per agent,
taking the expectation over all preference profiles. For a given profile ≻, we compute normalized
entropy per agent as (this is 0 for a deterministic mechanism):

1 X X gwf (≻) log2 gwf (≻)

H(≻) = −
2n log2 m
w∈W f ∈F
(17)
1 X X gwf (≻) log2 gwf (≻)
− .
2m log2 n
f ∈F w∈W

Figure 3 shows how the entropy changes with λ. As we increase λ and the mechanisms come
closer to DA, the allocations of the learned mechanisms also becomes less stochastic.

6 Discussion
The methodology and results in this paper give a first but crucial step towards using machine
learning to understanding the structure of mechanisms that achieve nearly the same stability as DA
while surpassing DA in terms of strategy-proofness. This is an interesting observation, given the
practical and theoretical importance of the DA mechanism. Our experimental results also suggest
that achieving IR together with SP (recall that RSD is not IR) is challenging in two-sided markets,
and this is reflected in the lower welfare achieved by our learning mechanisms in the part of the
frontier that emphasizes SP. The experimental results also show that for larger weights assigned
to the importance of stability, the learned mechanisms recover the DA mechanism, whereas when
more emphasis is given to the importance of strategy-proofness, the learned mechanisms are more
randomized. There are other interesting questions waiting to be addressed. For instance, can we use
this kind of framework to understand other tradeoffs, such as tradeoffs between strategy-proofness
and efficiency?
As discussed in the introduction, a challenge in scaling to larger problems is the need to find
defeating misreports, as exhaustively enumerating all misreports for an agent becomes intractable as
4
In fact, the same is true for the stability of a non-SP mechanism such as DA, but it has become standard to
assume truthful reports to DA in considering the stability properties of DA.

13
the number of agents on the other side of the market increases (and thus the preference domain
increases). A simple remedy is to restrict the language available to agents in making preference
reports; e.g., it is commonplace to only allow for “top-k preferences" to be reported. Another remedy
is to work in domains where there exists some structure on the preference domain, so that not all
possible preference orders exist; e.g., single-peaked preferences are an especially stark example [Black,
1948]. It will also be interesting to study complementary approaches that relax the discrete set
of preference orderings to a continuous convex hull such as the Birkhoff polytope. Confining these
preference orderings to the Birkhoff polytope can be accomplished using differentiable operations,
such as the sinkhorn operator [Adams and Zemel, 2011]. Such continuous relaxation allows misreports
to be identified through gradient ascent rather than sampling (and would be similar to working with
continuous valuations when learning revenue-optimal auctions [Duetting et al., 2019]). Despite this
limitation, our current approach scales much further than other, existing methods for automated
design, which are not well suited for this problem. For instance, methods that use linear programs
or integer programs do not scale well because of the number of variables5 required to make explicit
the input and output structure of the functional that must be optimized over.
A second challenge is that we have not been able to find a suitable, publicly available dataset to
test our approach. As a fallback, we have endeavored to capture some real-world structures by varying
the correlation between agent preferences and the truncation probabilities of preferences. Using such
stylized, probabilistic models and simulations for validating approaches is a well-established and
prevalent practice, consistently utilized when investigating two-sided matching markets [Chen and
Sönmez, 2006, Echenique and Yariv, 2013, Das and Kamenica, 2005, Liu et al., 2020, Dai and Jordan,
2021]. For instance, Chen and Sönmez [Chen and Sönmez, 2006] design an environment for school
choice where they consider six different schools with six seats each and where the students’ preferences
are simulated to depend on proximity, quality, and a random factor. Echenique an Yariv [Echenique
and Yariv, 2013] use a simulation study with eight participants on each side of the market, with the
payoff matrix designed such that there are one, two, or three stable matches. Further, recent papers
on bandit models for stable matching model agent preferences through synthetic datasets [Das and
Kamenica, 2005, Liu et al., 2020, Dai and Jordan, 2021].
In closing, we see exciting work ahead in advancing the design of matching mechanisms that
strike the right balance between stability, strategyproofness, and other considerations that are critical
to real-world applications. As an example, it will be interesting to extend the learning framework
to encompass desiderata such as capacity limitations or fairness considerations. Also relevant is to
explore additional kinds of neural network architectures, such as attention mechanisms [Vaswani
et al., 2017] or architectures that incorporate permutation equivariance [Rahme et al., 2020, Duan
et al., 2022], with the advantage that they can significantly reduce the search space.

Acknowledgements
The source code for all experiments along with the instructions to run it is available from Github at
https://github.com/saisrivatsan/deep-matching/. This work is supported in part through an
AWS Machine Learning Research Award.

References
A. Abdulkadiroglu and T. Sönmez. Random serial dictatorship and the core from random endowments
in house allocation problems. Econometrica, 66(3):689–702, 1998.
5
(m + 1)!n · (n + 1)!m · nm output variables

14
A. Abdulkadiroglu, P. Pathak, A. E. Roth, and T. Sonmez. Changing the Boston School Choice
Mechanism. NBER Working Papers 11965, National Bureau of Economic Research, Inc, Jan. 2006.
URL https://ideas.repec.org/p/nbr/nberwo/11965.html.

A. Abdulkadiroğlu and T. Sönmez. School choice: A mechanism design approach. American

Economic Review, 93:729–747, 2003.

A. Abdulkadiroğlu, P. A. Pathak, and A. E. Roth. Strategyproofness versus efficiency in matching

with indifferences: Redesigning the NYC high school match. American Economic Review, 99:
1954–1978, 2009.

R. P. Adams and R. S. Zemel. Ranking via sinkhorn propagation. arXiv preprint arXiv:1106.1925,
2011.

J. Alcalde and S. Barberà. Top dominance and the possibility of strategy-proof stable solutions to
matching problems. Economic Theory, 4(3):417–435, 1994.

M. A. Alomrani, R. Moravej, and E. B. Khalil. Deep policies for online bipartite matching: A
reinforcement learning approach. Transactions on Machine Learning Research, 2022. ISSN
2835-8856. URL https://openreview.net/forum?id=mbwm7NdkpO.

S. Alva and V. Manjunath. The impossibility of strategy-proof, Pareto efficient, and individually
rational rules for fractional matching. Games and Economic Behavior, 119:15–29, 2020.

H. Aziz and B. Klaus. Random matching under priorities: stability and no envy concepts. Social
Choice and Welfare, 53(2):213–259, 2019. ISSN 01761714, 1432217X. URL http://www.jstor.
org/stable/45212389.

S. Basu, K. A. Sankararaman, and A. Sankararaman. Beyond log 2 (t) regret for decentralized bandits
in matching markets. In M. Meila and T. Zhang, editors, Proceedings of the 38th International
Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages
705–715. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/basu21a.html.

Y. Bengio, A. Lodi, and A. Prouvost. Machine learning for combinatorial optimization: A method-
ological tour d’horizon. European Journal of Operational Research, 290(2):405–421, 2021. ISSN
0377-2217. doi: https://doi.org/10.1016/j.ejor.2020.07.063. URL https://www.sciencedirect.
com/science/article/pii/S0377221720306895.

D. Black. On the Rationale of Group Decision-making. Journal of Political Economy, 56(1):23–34,

1948. ISSN 0022-3808. doi: 10.2307/1825026. URL http://www.jstor.org/stable/1825026.

E. Budish, Y.-K. Che, F. Kojima, and P. Milgrom. Designing random allocation mechanisms: Theory
and applications. American Economic Review, 103(2):585–623, April 2013. doi: 10.1257/aer.103.2.
585. URL https://www.aeaweb.org/articles?id=10.1257/aer.103.2.585.

S. H. Cen and D. Shah. Regret, stability and fairness in matching markets with bandit learners.
In G. Camps-Valls, F. J. R. Ruiz, and I. Valera, editors, Proceedings of The 25th International
Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning
Research, pages 8938–8968. PMLR, 28–30 Mar 2022. URL https://proceedings.mlr.press/
v151/cen22a.html.

15
Y. Chen and T. Sönmez. School choice: an experimental study. Journal of Economic Theory,
127(1):202–231, 2006. ISSN 0022-0531. doi: https://doi.org/10.1016/j.jet.2004.10.006. URL
https://www.sciencedirect.com/science/article/pii/S0022053104002418.

V. Conitzer and T. Sandholm. Complexity of mechanism design. In Proceedings of the 18th Conference
on Uncertainty in Artificial Intelligence, pages 103–110, 2002.

V. Conitzer and T. Sandholm. Self-interested automated mechanism design and implications

for optimal combinatorial auctions. In Proceedings of the 5th ACM Conference on Electronic
Commerce, pages 132–141, 2004.

M. J. Curry, P. Chiang, T. Goldstein, and J. P. Dickerson. Certifying strategyproof auction networks.

CoRR, abs/2006.08742, 2020. URL https://arxiv.org/abs/2006.08742.

X. Dai and M. Jordan. Learning in multi-stage decentralized matching markets. Advances in Neural
Information Processing Systems, 34:12798–12809, 2021.

S. Das and E. Kamenica. Two-sided bandits and the dating market. In IJCAI, volume 5, page 19.
Citeseer, 2005.

Z. Duan, J. Tang, Y. Yin, Z. Feng, X. Yan, M. Zaheer, and X. Deng. A context-integrated transformer-
based neural network for auction design. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari,
G. Niu, and S. Sabato, editors, Proceedings of the 39th International Conference on Machine
Learning, volume 162 of Proceedings of Machine Learning Research, pages 5609–5626. PMLR,
17–23 Jul 2022. URL https://proceedings.mlr.press/v162/duan22a.html.

L. E. Dubins and D. A. Freedman. Machiavelli and the Gale-Shapley algorithm. American

Mathematical Monthly, 88:485–494, 1981.

P. Duetting, Z. Feng, H. Narasimhan, D. C. Parkes, and S. S. Ravindranath. Optimal auctions

through deep learning. In Proceedings of the 36th International Conference on Machine Learning,
ICML, pages 1706–1715, 2019.

F. Echenique and L. Yariv. An Experimental Study of Decentralized Matching. Working Papers

2013-3, Princeton University. Economics Department., Nov. 2013. URL https://ideas.repec.
org/p/pri/econom/2013-3.html.

A. Erdil. Strategy-proof stochastic assignment. Journal of Economic Theory, 151:146 – 162, 2014.
ISSN 0022-0531.

A. Erdil and H. Ergin. What’s the matter with tie-breaking? Improving efficiency in school choice.
American Economic Review, 98(3):669–89, 2008.

Z. Feng, H. Narasimhan, and D. C. Parkes. Deep learning for revenue-optimal auctions with budgets.
In Proceedings of the 17th Conference on Autonomous Agents and Multi-Agent Systems, pages
354–362, 2018.

D. Gale and L. S. Shapley. College admissions and the stability of marriage. The American
Mathematical Monthly, 69(1):9–15, 1962.

N. Golowich, H. Narasimhan, and D. C. Parkes. Deep learning for multi-facility location mechanism
design. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages
261–267, 2018.

16
M. Guo and V. Conitzer. Computationally feasible automated mechanism design: General approach
and case studies. In Proceedings of the 24th AAAI Conference on Artificial Intelligence, 2010.

J. W. Hatfield and P. R. Milgrom. Matching with contracts. American Economic Review, 95(4):
913–935, September 2005. doi: 10.1257/0002828054825466. URL https://www.aeaweb.org/
articles?id=10.1257/0002828054825466.

J. W. Hatfield, S. D. Kominers, and A. Westkamp. Stability, Strategy-Proofness, and Cumulative

Offer Mechanisms [Stability and Incentives for College Admissions with Budget Constraints].
Review of Economic Studies, 88(3):1457–1502, 2021. URL https://ideas.repec.org/a/oup/
restud/v88y2021i3p1457-1502..html.

N. Immorlica and M. Mahdian. Incentives in large random two-sided markets. ACM Transactions
on Economics and Computation, 3:#14, 2015.

D. Ivanov, I. Safiulin, I. Filippov, and K. Balabaeva. Optimal-er auctions through attention. In A. H.

Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing
Systems, 2022. URL https://openreview.net/forum?id=Xa1T165JEhB.

M. Jagadeesan, A. Wei, Y. Wang, M. Jordan, and J. Steinhardt. Learning equilibria in matching

markets from bandit feedback. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan,
editors, Advances in Neural Information Processing Systems, 2021. URL https://openreview.
net/forum?id=TgDTMyA9Nk.

Y. Kamada and F. Kojima. Stability and strategy-proofness for matching with constraints: A
necessary and sufficient condition. Theoretical Economics, 13(2):761–793, 2018.

O. Kesten and M. U. Ünver. A theory of school-choice lotteries. Theoretical Economics, 10(2):

543–595, 2015.

F. Kojima and P. A. Pathak. Incentives and stability in large two-sided matching markets. American
Economic Review, 99:608–627, 2009.

S. Lee. Incentive compatibility of large centralized matching markets. The Review of Economic
Studies, 84(1):444–463, 2016.

L. T. Liu, H. Mania, and M. Jordan. Competing bandits in matching markets. In S. Chiappa

and R. Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial
Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 1618–
1628. PMLR, 26–28 Aug 2020. URL https://proceedings.mlr.press/v108/liu20c.html.

L. T. Liu, F. Ruan, H. Mania, and M. I. Jordan. Bandit learning in decentralized matching markets.
J. Mach. Learn. Res., 22(1), jul 2022. ISSN 1532-4435.

T. Mennle and S. Seuken. The pareto frontier for random mechanisms. In Proceedings of the 2016
ACM Conference on Economics and Computation, EC ’16, page 769, New York, NY, USA, 2016.
Association for Computing Machinery.

T. Mennle and S. Seuken. Hybrid mechanisms: Trading off strategyproofness and efficiency of
random assignment mechanisms, 2017.

T. Mennle and S. Seuken. Partial strategyproofness: Relaxing strategyproofness for the random
assignment problem. Journal of Economic Theory, 191:105–144, 2021.

17
Y. Min, T. Wang, R. Xu, Z. Wang, M. Jordan, and Z. Yang. Learn to match with no regret:
Reinforcement learning in markov matching markets. In A. H. Oh, A. Agarwal, D. Belgrave,
and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https:
//openreview.net/forum?id=R3JMyR4MvoU.
H. Narasimhan, S. Agarwal, and D. C. Parkes. Automated mechanism design without money
via machine learning. In Proceedings of the 25th International Joint Conference on Artificial
Intelligence, pages 433–439, 2016.
P. A. Pathak and T. Sönmez. Leveling the playing field: Sincere and sophisticated players in the
Boston mechanism. American Economic Review, 98(4):1636–1652, 2008.
J. Rahme, S. Jelassi, J. Bruna, and S. M. Weinberg. A permutation-equivariant neural network
architecture for auction design. CoRR, abs/2003.01497, 2020. URL https://arxiv.org/abs/
2003.01497.
A. E. Roth. The economics of matching: Stability and incentives. Mathematics of Operations
Research, 7(4):617–628, 1982.
A. E. Roth. A natural experiment in the organization of entry-level labor markets: Regional markets
for new physicians and surgeons in the united kingdom. The American Economic Review, 81(3):
415–440, 1991. ISSN 00028282. URL http://www.jstor.org/stable/2006511.
A. E. Roth and E. Peranson. The redesign of the matching market for american physicians: Some
engineering aspects of economic design. American Economic Review, 89(4):748–780, September
1999.
A. E. Roth and M. Sotomayor. Two-Sided Matching: A Study in Game-Theoretic Modeling and
Analysis, volume 18 of Econometric Society Monographs. Cambridge University Press, 1990.
A. E. Roth, U. G. Rothblum, and J. H. V. Vate. Stable matchings, optimal assignments, and linear
programming. Mathematics of Operations Research, 18(4):803–828, 1993.
A. Sankararaman, S. Basu, and K. Abinav Sankararaman. Dominate or delete: Decentralized
competing bandits in serial dictatorship. In A. Banerjee and K. Fukumizu, editors, Proceedings
of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of
Proceedings of Machine Learning Research, pages 1252–1260. PMLR, 13–15 Apr 2021. URL
https://proceedings.mlr.press/v130/sankararaman21a.html.
L. Shapley and H. Scarf. On cores and indivisibility. Journal of Mathematical Economics, 1
(1):23–37, 1974. ISSN 0304-4068. doi: https://doi.org/10.1016/0304-4068(74)90033-0. URL
https://www.sciencedirect.com/science/article/pii/0304406874900330.
W. Shen, P. Tang, and S. Zuo. Automated mechanism design via neural networks. In Proceedings of
the 18th International Conference on Autonomous Agents and Multiagent Systems, 2019.
T. Sönmez. Bidding for army career specialties: Improving the ROTC branching mechanism. Journal
of Political Economy, 121:186–219, 2013.
T. Sönmez and T. B. Switzer. Matching with (branch-of-choice) contracts at United States Military
Academy. Econometrica, 81:451–488, 2013.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin.
Attention is all you need. Advances in neural information processing systems, 30, 2017.

18
A RSD is ordinal SP but not Stable
We first show RSD satisfies FOSD and is thus ordinal SP. Consider agent i in some position k in the
order. The agent’s report has no effect on the choices of preceding agents, whether workers or firms
(including whether agent i is selected by an agent on the other side). Reporting its true preference
ensures, in the event that it remains unmatched by position k, that it is matched with its most
preferred agent of those remaining. For the one-sided version, the same argument holds for agents
that are in the priority order. If an agent isn’t on the side that’s on the priority order, then that
agent’s report has no effect at all.
In the following example, we show RSD mechanism is not stable.

Example 16. Consider n = 3 workers and m = 3 firms with the following preference orders:

w1 : f2 , f3 , f1 , ⊥ f1 : w1 , w2 , w3 , ⊥
w2 : f2 , f1 , f3 , ⊥ f2 : w2 , w3 , w1 , ⊥
w3 : f1 , f3 , f2 , ⊥ f3 : w3 , w1 , w2 , ⊥

The matching found by worker-proposing DA is (w1 , f3 ), (w2 , f2 ), (w3 , f1 ). This is a stable match-
ing. If f1 truncates and misreports its preference as f1 : w1 , w2 , ⊥, w3 , the matching found is
(w1 , f1 ), (w2 , f2 ), (w3 , f3 ). Firm f1 is matched with a more preferred worker, and hence the mech-
anism is not strategy-proof. Now consider the matching under RSD. The marginal matching
probabilities r is given by:
11  1 7
24 4 24
1 3 1 
r= 6 4 12 
3 5
8 0 8
f2 and w2 are the most preferred options for w2 and f2 respectively and they would prefer to
be matched with each other always rather than being fractionally matched with each other. Here
(w2 , f2 ) is a blocking pair and thus RSD is not stable.

B TTC is neither Stable nor Strategyproof

Example 17. Consider n = 4 workers and m = 4 firms with the following preference orders:

w1 : f1 , ⊥ f1 : w2 , w3 , w4 , ⊥
w2 : f2 , ⊥ f2 : w1 ⊥
w3 : f1 , ⊥ f3 : w3 ⊥
w4 : f3 , ⊥ f4 : ⊥

If all agents report truthfully, w1 is matched with f1 . This violates IR as ⊥ ≻f1 w1 and thus the
matching is not ex-ante stable. If f1 misreports its preference as f1 : w4 , w3 , ⊥, then w3 is matched
with f1 . Since f1 is matched with a more preferred worker w3 with w3 ≻f1 ⊥ ≻f1 w1 , TTC is not
strategyproof.

19
w1 w3

f2 f1 f3 f4

w2 w4

Figure 4: Round 1 of TTC. The solid lines represent workers and firms pointing to their top preferred agent
truthfully. The dashed line represents a misreport by f1

C Proof of Theorem 13
Proof. Since stv (g θ , ≻) ≥ 0 then STV (g θ ) = E≻ stv (g θ , ≻) = 0 if and only if stv (g θ , ≻) = 0 except
on zero measure events. Moreover, stv (g θ , ≻) = 0 implies stv wf (g θ , ≻) = 0 for all w ∈ W , all f ∈ F .
This is equivalent to no justified envy. For firm f , this means ∀w′ ̸= w, qwf ≻
≤ qw≻
′ f if gw ′ f > 0 and
θ

q ≻wf ≤ 0 if g⊥f
θ > 0. Then there is no justified envy for a firm f . Analogously, there is no justified

envy for worker w. If g θ is ex ante stable, it trivially implies ST V (g θ ) = 0 by definition.

D Proof of Theorem 15
Proof. Since regret(g θ , ≻) ≥ 0 then RGT (g θ ) = E≻ regret(g θ , ≻) = 0 if and only if regret(g θ , ≻) = 0
except on zero measure events. Moreover, regret(g θ , ≻) = 0 implies regret w (g θ , ≻) = 0 for any worker
w and regret f (g θ , ≻) = 0 for any firm f . Thus, the maximum utility increase on misreporting is
at most zero, and hence g θ is ordinal SP. If g θ is Ordinal-SP, it is also satisfies FOSD 5 and it is
straightforward to show that regret(g θ , ≻) = 0.

E Training Details and Hyperparameters

We use a neural network with R = 4 hidden layers with 256 hidden units each for all our settings.
We use the leaky ReLU activation function at each of these layers. To train our neural network,
we use the Adam Optimizer with decoupled weight delay regularization (implemented as AdamW
optimizer in PyTorch) We set the learning rate to 0.005 for uncorrelated preferences setting and
0.002 when pcorr = {0.25, 0.5, 0.75}. The remaining hyperparameters of the optimizer are set to
their default values. We sample a fresh minibatch of 1024 profiles and train our neural networks for
a total of 50000 minibatch iterations. We reduce the learning rate by half once at 10000th iteration
and once at 25000th iteration. We report our results on 204800 preference profiles. For our training,
we use a single Tesla V-100 GPU. For each setting, the neural network takes 4.6 hours to train.

45 EBE Paper 113
No ratings yet
45 EBE Paper 113
23 pages
Decentralized and Uncoordinated Learning of Stable Matchings A Game-Theoretic Approach
No ratings yet
Decentralized and Uncoordinated Learning of Stable Matchings A Game-Theoretic Approach
8 pages
Robust Stability in Matching Markets
No ratings yet
Robust Stability in Matching Markets
13 pages
The Search For Stability: Learning Dynamics of Strategic Publishers With Initial Documents
No ratings yet
The Search For Stability: Learning Dynamics of Strategic Publishers With Initial Documents
36 pages
Efficient Algorithms For Stochastic Repeated Second-Price Auctions
No ratings yet
Efficient Algorithms For Stochastic Repeated Second-Price Auctions
52 pages
Designing Mechanisms To Focalize Welfare Improvin - 2019 - GEB
No ratings yet
Designing Mechanisms To Focalize Welfare Improvin - 2019 - GEB
21 pages
Matching Markets: Theory and Practice
No ratings yet
Matching Markets: Theory and Practice
47 pages
Discovering Reinforcement Learning Algorithms: Preprint. Under Review
No ratings yet
Discovering Reinforcement Learning Algorithms: Preprint. Under Review
19 pages
Willy Wonka Mechanisms
No ratings yet
Willy Wonka Mechanisms
12 pages
Constrained One Sided Games
No ratings yet
Constrained One Sided Games
37 pages
Diskrete Mathematik Und Optimierung
No ratings yet
Diskrete Mathematik Und Optimierung
19 pages
2021 Nature Bichler Equilibrium Learning
No ratings yet
2021 Nature Bichler Equilibrium Learning
33 pages
2110 04638
No ratings yet
2110 04638
32 pages
Learning Representations in Reinforcement Learning: An Information Bottleneck Approach
No ratings yet
Learning Representations in Reinforcement Learning: An Information Bottleneck Approach
19 pages
2014 - Decomposing Random Mechanisms
No ratings yet
2014 - Decomposing Random Mechanisms
39 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Advanced Deep RL for AI Experts
No ratings yet
Advanced Deep RL for AI Experts
10 pages
Cooperation Conflict and Transformative Artificial Intelligence A Research Agenda
No ratings yet
Cooperation Conflict and Transformative Artificial Intelligence A Research Agenda
10 pages
(Series On Theoretical Computer Science) David F Manlove-Algorithmics of Matching Under Preferences-World Scientific Publishing Company (2013)
No ratings yet
(Series On Theoretical Computer Science) David F Manlove-Algorithmics of Matching Under Preferences-World Scientific Publishing Company (2013)
524 pages
Sullivan and Longo - 2023 - Explaining Deep Q-Learning Experience Replay With
No ratings yet
Sullivan and Longo - 2023 - Explaining Deep Q-Learning Experience Replay With
23 pages
Deep IV Networks for Causal Inference
No ratings yet
Deep IV Networks for Causal Inference
23 pages
Enhanced Reinforcement Learning For Recommender Systems: Maximizing Sample Efficiency and Minimizing Variance
No ratings yet
Enhanced Reinforcement Learning For Recommender Systems: Maximizing Sample Efficiency and Minimizing Variance
6 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
169 pages
Power of Knockoff: The Impact of Ranking Algorithm, Augmented Design, and Symmetric Statistic
No ratings yet
Power of Knockoff: The Impact of Ranking Algorithm, Augmented Design, and Symmetric Statistic
67 pages
W Pg#s
No ratings yet
W Pg#s
36 pages
Variyam 1
No ratings yet
Variyam 1
23 pages
The Implementation Duality: Georg N Oldeke Larry Samuelson October 12, 2015
No ratings yet
The Implementation Duality: Georg N Oldeke Larry Samuelson October 12, 2015
55 pages
Dynamic Pricing Under Competition Using Reinforcement Learning
No ratings yet
Dynamic Pricing Under Competition Using Reinforcement Learning
14 pages
2017 - Stable and Strategy-Proof Matching With Flexible Allotments
No ratings yet
2017 - Stable and Strategy-Proof Matching With Flexible Allotments
26 pages
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
No ratings yet
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
9 pages
School Choice: Strategyproof Matching
No ratings yet
School Choice: Strategyproof Matching
65 pages
Unit 4
100% (1)
Unit 4
7 pages
A New Necessary Condition For Implementation in Iterat 2011 Journal of Econo
No ratings yet
A New Necessary Condition For Implementation in Iterat 2011 Journal of Econo
13 pages
Maximum Likelihood Decision Rules PDF
No ratings yet
Maximum Likelihood Decision Rules PDF
4 pages
AGale-Shapley View of Unique Stable Marriages
No ratings yet
AGale-Shapley View of Unique Stable Marriages
18 pages
CS656 Report Two Sided Matching Meets Fair Division
No ratings yet
CS656 Report Two Sided Matching Meets Fair Division
17 pages
DeepSAT: Solving SAT with ML
No ratings yet
DeepSAT: Solving SAT with ML
22 pages
Market Efficiencies and Drift
No ratings yet
Market Efficiencies and Drift
51 pages
Provably Efficient Maximum Entropy Exploration
No ratings yet
Provably Efficient Maximum Entropy Exploration
11 pages
Separating Equilibrium in Quasi Linear S
No ratings yet
Separating Equilibrium in Quasi Linear S
22 pages
Sample-Efficient Reinforcement Learning in The Presence of Exogenous Information
No ratings yet
Sample-Efficient Reinforcement Learning in The Presence of Exogenous Information
56 pages
A Graphical Approach To State Variable Selection in Off-Policy Learning
No ratings yet
A Graphical Approach To State Variable Selection in Off-Policy Learning
42 pages
Continuous-Time Optimal Investment With Portfolio
No ratings yet
Continuous-Time Optimal Investment With Portfolio
38 pages
Sample-Efficient Reinforcement Learning Via Counterfactual-Based Data Augmentation
No ratings yet
Sample-Efficient Reinforcement Learning Via Counterfactual-Based Data Augmentation
15 pages
A Short Tutorial On Reinforcement Learning: Review and Applications
No ratings yet
A Short Tutorial On Reinforcement Learning: Review and Applications
5 pages
Reinforcement Learning in A Nutshell
No ratings yet
Reinforcement Learning in A Nutshell
12 pages
RL Ese
No ratings yet
RL Ese
7 pages
Safe Multi-Agent RL for Autonomous Driving
No ratings yet
Safe Multi-Agent RL for Autonomous Driving
13 pages
2-Agent Strategyproof Mechanisms in Lp Space
No ratings yet
2-Agent Strategyproof Mechanisms in Lp Space
15 pages
Near-Feasible Solutions To Complex Stable Matching Problems: Gergely Csáji (Hun-Ren KRTK Kti)
No ratings yet
Near-Feasible Solutions To Complex Stable Matching Problems: Gergely Csáji (Hun-Ren KRTK Kti)
18 pages
Learning-Augmented Competitive Auctions
No ratings yet
Learning-Augmented Competitive Auctions
30 pages
Deep RL for Financial Trading
No ratings yet
Deep RL for Financial Trading
20 pages
What Is Important About The No Free Lunch Theorems?: Non-Uniform Anti Worst
No ratings yet
What Is Important About The No Free Lunch Theorems?: Non-Uniform Anti Worst
15 pages
Mechanism Design For Facility Location Problems: A Survey
No ratings yet
Mechanism Design For Facility Location Problems: A Survey
17 pages
House Allocation
No ratings yet
House Allocation
27 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
Sample-Efficient Reinforcement Learning Via Counterfactual-Based Data Augmentation
No ratings yet
Sample-Efficient Reinforcement Learning Via Counterfactual-Based Data Augmentation
14 pages
Optimizing Positional Scoring Rules
No ratings yet
Optimizing Positional Scoring Rules
20 pages
Constructing Long-Short Stock Portfolio With A New Listwise Learn-To-Rank Algorithm
No ratings yet
Constructing Long-Short Stock Portfolio With A New Listwise Learn-To-Rank Algorithm
20 pages
Industry Variation in Hiring
No ratings yet
Industry Variation in Hiring
2 pages
Renewable Energy
No ratings yet
Renewable Energy
2 pages
Key Design Decisions
No ratings yet
Key Design Decisions
1 page
Visual Improvements
No ratings yet
Visual Improvements
1 page
Artificial Intelligence
No ratings yet
Artificial Intelligence
2 pages
Tools of Crew
No ratings yet
Tools of Crew
1 page
Crew Prospex AI
No ratings yet
Crew Prospex AI
22 pages
A Practical Approach To Retrieval Augmented Generation Systems - 4 From Simple To Advanced RAG
No ratings yet
A Practical Approach To Retrieval Augmented Generation Systems - 4 From Simple To Advanced RAG
45 pages
Reduce LLM Costs Linked in
No ratings yet
Reduce LLM Costs Linked in
13 pages
Advanced RAG Architecture. What Is RAG - Advanced Topics & - by Uğur Özker - Medium
No ratings yet
Advanced RAG Architecture. What Is RAG - Advanced Topics & - by Uğur Özker - Medium
21 pages
The Model Context Protocol
No ratings yet
The Model Context Protocol
46 pages
RAGvs Agentic RAG
No ratings yet
RAGvs Agentic RAG
52 pages
A Practical Approach To Retrieval Augmented Generation Systems - 5 Observability Tools For RAG
No ratings yet
A Practical Approach To Retrieval Augmented Generation Systems - 5 Observability Tools For RAG
7 pages
(Time: 2 Hours) Total Marks: 75: Q.P. Code: 36158
No ratings yet
(Time: 2 Hours) Total Marks: 75: Q.P. Code: 36158
2 pages
Solved ISRO Scientist or Engineer Civil 2013 Paper With Solutions
No ratings yet
Solved ISRO Scientist or Engineer Civil 2013 Paper With Solutions
21 pages
Arguments
No ratings yet
Arguments
16 pages
Gibbs Free Energy Explained
No ratings yet
Gibbs Free Energy Explained
22 pages
X Project Topics
No ratings yet
X Project Topics
1 page
Camelopardalis: H H H H H H H H H H H
No ratings yet
Camelopardalis: H H H H H H H H H H H
1 page
C Programming Essentials
No ratings yet
C Programming Essentials
49 pages
Quiz-2 Along With Solution
No ratings yet
Quiz-2 Along With Solution
2 pages
5 PP
No ratings yet
5 PP
36 pages
Ch-1 Plane Table Survey
No ratings yet
Ch-1 Plane Table Survey
40 pages
Class 11 Commerce Exam Syllabus
No ratings yet
Class 11 Commerce Exam Syllabus
4 pages
Network Analysis (ECE - 2103)
No ratings yet
Network Analysis (ECE - 2103)
3 pages
Computer Number Systems Workbook
No ratings yet
Computer Number Systems Workbook
14 pages
Nonhomogeneous Differential Equations
No ratings yet
Nonhomogeneous Differential Equations
13 pages
2D Seismic Survey Modeling in MATLAB
No ratings yet
2D Seismic Survey Modeling in MATLAB
18 pages
As 4392.1-1996 Heavy Mineral Sands - Analysis by Wavelength Dispersive X-Ray Fluorescence Spectrometry Titani
No ratings yet
As 4392.1-1996 Heavy Mineral Sands - Analysis by Wavelength Dispersive X-Ray Fluorescence Spectrometry Titani
6 pages
11C, L-Ring 13C6 PET
No ratings yet
11C, L-Ring 13C6 PET
9 pages
Mathematics Research Paper - Binomial Distribution With The Galton Board
No ratings yet
Mathematics Research Paper - Binomial Distribution With The Galton Board
13 pages
1.2 Review of Laplace Transforms
No ratings yet
1.2 Review of Laplace Transforms
47 pages
The Use of Chinese Taro Xanthosoma Sagittifolium As Source of Bioplstic
No ratings yet
The Use of Chinese Taro Xanthosoma Sagittifolium As Source of Bioplstic
11 pages
Class-9 OPT Math
No ratings yet
Class-9 OPT Math
2 pages
Meredith ClassicalReview 1984
No ratings yet
Meredith ClassicalReview 1984
3 pages
文化资本及其对教育成果的影响
No ratings yet
文化资本及其对教育成果的影响
14 pages
CS3102 Algorithm Course Overview
No ratings yet
CS3102 Algorithm Course Overview
59 pages
Irpwm Acs 21
No ratings yet
Irpwm Acs 21
52 pages
Business Process Re-Engineering Guide
No ratings yet
Business Process Re-Engineering Guide
2 pages
Background: 1.1. DNA - Deoxyribonucleic Acid
No ratings yet
Background: 1.1. DNA - Deoxyribonucleic Acid
19 pages
Heap Sort
No ratings yet
Heap Sort
28 pages
Implementation of Symbol Table
No ratings yet
Implementation of Symbol Table
31 pages

Deep Learning Matching Market

Uploaded by

Deep Learning Matching Market

Uploaded by

Deep Learning for Two-Sided Matching

November 16, 2023

3.1 Randomized matchings.

Definition 3 (Ordinal strategy-proofness). A randomized matching mechanism g satisfies ordinal

Eµ∼g(≻i ,≻−i ) [ui (µ(i))] ≥ Eµ∼g(≻′i ,≻−i ) [ui (µ(i))]. (1)

Definition 4 (First Order Stochastic Dominance). A randomized matching mechanism g satisfies

Definition 6 (Deferred-acceptance (DA)). In worker-proposing deferred-acceptance (firm-proposing

• ∀w ∈ W : w proposes to its best acceptable, remaining firm.

• ∀w ∈ W : If w is rejected by firm f , it updates its list of acceptable firms to remove f .

• Initialize matching µ to the empty matching.

Proof. We defer the proof to appendix A

Definition 10 (Top Trading Cycles (TTC)). In worker-proposing TTC (firm-proposing is defined

4 Two-Sided Matching as a Learning Problem

4.1 Neural Network Architecture

• For a preference order ≻ with f1 : w2 , w1 , w3 , we represent this at the input as qf≻1 = ( 23 , 1, 31 ).

• For a preference order ≻ with w1 : f1 , f2 , ⊥, f3 , we represent this as p≻ 2 1 1

• For a preference order ≻ with w1 : f1 , f2 , ⊥, f3 , f4 , we represent this as p≻ 2 1 1 2

where 1X is the indicator function for event X.

4.2 Formulation as a Learning Problem

and firm f being unmatched, respectively.

Theorem 13. A randomized matching mechanism g θ is ex ante stable up to zero-measure events if

Proof. We defer the proof to appendix C

The average regret on profile ≻ is:

Proof. Consider some worker w ∈ W . Without loss of generality, let ≻w : f1 , . . . , fk , ⊥, fk+1 , . . . , fm .

Since g θ always guarantees IR, we have:

Changing the order of summation, we have the following optimization problem:

Theorem 15. A randomized mechanism, g θ , is ordinal SP up to zero-measure events if and only if

4.3 Training Procedure

0.02 = 0.4 0.02 = 0.4 = 0.3

• For correlated preferences, we sample a preference profile as in the uncorrelated case. We

Welfare per agent

pcorr = 0.50 pcorr = 0.75 Normalized Entropy vs

Welfare per agent

1 X X gwf (≻) log2 gwf (≻)

A. Abdulkadiroğlu and T. Sönmez. School choice: A mechanism design approach. American

A. Abdulkadiroğlu, P. A. Pathak, and A. E. Roth. Strategyproofness versus efficiency in matching

D. Black. On the Rationale of Group Decision-making. Journal of Political Economy, 56(1):23–34,

V. Conitzer and T. Sandholm. Self-interested automated mechanism design and implications

M. J. Curry, P. Chiang, T. Goldstein, and J. P. Dickerson. Certifying strategyproof auction networks.

L. E. Dubins and D. A. Freedman. Machiavelli and the Gale-Shapley algorithm. American

P. Duetting, Z. Feng, H. Narasimhan, D. C. Parkes, and S. S. Ravindranath. Optimal auctions

F. Echenique and L. Yariv. An Experimental Study of Decentralized Matching. Working Papers

J. W. Hatfield, S. D. Kominers, and A. Westkamp. Stability, Strategy-Proofness, and Cumulative

D. Ivanov, I. Safiulin, I. Filippov, and K. Balabaeva. Optimal-er auctions through attention. In A. H.

M. Jagadeesan, A. Wei, Y. Wang, M. Jordan, and J. Steinhardt. Learning equilibria in matching

O. Kesten and M. U. Ünver. A theory of school-choice lotteries. Theoretical Economics, 10(2):

L. T. Liu, H. Mania, and M. Jordan. Competing bandits in matching markets. In S. Chiappa

B TTC is neither Stable nor Strategyproof

envy for worker w. If g θ is ex ante stable, it trivially implies ST V (g θ ) = 0 by definition.

E Training Details and Hyperparameters

You might also like