How Algorithms Shape The Distribution of Political
How Algorithms Shape The Distribution of Political
ABSTRACT In this study, we evaluate the voluntary disclosures [19, 28] made
Online platforms play an increasingly important role in shaping by online platforms to understand how the platforms influence the
democracy by influencing the distribution of political information distribution of the political ads. We conduct the first large scale
to the electorate. In recent years, political campaigns have spent data analysis of political ads in the 2020 U.S. presidential elections
heavily on the platforms’ algorithmic tools to target voters with to investigate the practices of three platforms - Facebook, Google,
online advertising. While the public interest in understanding how and TikTok.
platforms perform the task of shaping the political discourse has One clear indication of the importance of online platforms to
never been higher, the efforts of the major platforms to make the political campaigns is to see how campaigns have shifted their
necessary disclosures to understand their practices falls woefully spending to online advertising. In 2008, the first significant digital
short. In this study, we collect and analyze a dataset containing over campaign spent roughly $20 million on online advertising, which
800,000 ads and 2.5 million videos about the 2020 U.S. presidential amounted to 0.4% of the total money spent on campaigning [17]. In
election from Facebook, Google, and TikTok. We conduct the first the 2020 election cycle, the campaigns spent more than $2.8 billion,
large scale data analysis of public data to critically evaluate how or 20% of the campaign budget [6] on the major platforms. As
these platforms amplified or moderated the distribution of political shown in figure 1, this spending generated billions of impressions
advertisements.1 We conclude with recommendations for how to for political ads placed in the two months prior to the US 2020
improve the disclosures so that the public can hold the platforms elections.
and political advertisers accountable. Mindful of the risk of elevating transparency to become the
supreme value in democratic politics [51], we do not focus on
CCS CONCEPTS transparency for its own sake. Instead, we develop a framework
that identifies the desirable properties for these disclosures that
• Applied computing → Law; • Information systems → On-
serve broader democratic values, and measuring the effectiveness
line advertising.
of the current platform disclosures against those properties. Two
straightforward research questions lie at the heart of our inquiry:
KEYWORDS
interpretability, political speech, algorithmic auditing, accountabil- RQ1: What do the ad libraries tell us about how the algorithmic
ity, political advertising, algorithmic targeting, regulation tools were used to distribute political ads?
RQ2: Can we interpret how platforms applied their moderation
1 INTRODUCTION policies to political ads?
As online advertising becomes a crucial part of political campaigns
[30, 32], the platforms’ control over their communication infras- 1.1 Contributions
tructure makes them key political actors [50] and gives them a • We develop and apply a three part evaluative framework for
power over the political discourse that goes beyond what the tradi- measuring the effectiveness of platform disclosures (section
tional definition of “platform” might denote [27]. Importantly, these 3).
platforms are not neutral carriers of political ads, but play a more • We attempt to reverse-engineer platforms’ ad targeting tools
active role in amplifying or moderating the reach of those political using the available data to assess how they influence the
messages. But the platforms have not disclosed data that would distribution of content (section 6.1).
allow for meaningful public oversight of their actions. In 2018, in • Our statistical analysis suggests that the platforms charge
an attempt to stave off regulation, some platforms begun to volun- different campaigns different rates for targeting specific re-
tarily create libraries of political advertisements and moderation gions and demographics (section 6.1.3).
decisions [23]. • As a whole, we demonstrate how the data provided by the
1 Our dataset and source code is available for public use under platforms is noisy and incomplete so it is difficult to draw
http://campaigndisclosures.princeton.edu/ definitive conclusions about how they operate (section 6.1.4).
Papakyriakopoulos et al.
3 EVALUATIVE CRITERIA
Government regulations for online political advertising have stalled
Figure 1: Overview of political content reach by platform in in the United States [49]. As a result, we do not have legal standards
our dataset, for the two months up to the election day. On with which to evaluate the current disclosures by the platforms.
the top, Figure A depicts Facebook in blue, YouTube in black, Nevertheless, we extract three potential criteria to measure the
and Google in yellow. Facebook & Google do not disclose the effectiveness of the disclosures:
exact number of impressions, but a range within which the
• First, do the disclosures meet the platforms’ self-described
value falls in. On the bottom, Figure B depicts the number
objective of making political advertisers accountable?
of views of videos containing the hashtags #Biden2020 and
• Second, how do the platforms’ disclosures compare against
#Trump2020 on TikTok in our dataset. It also presents the
what the law requires for radio and television broadcasters?
number of views for videos produced by 96 political influ-
• Third, do the platforms disclose all that they know about the
encers (see Table 5.
ad targeting criteria, the audience for the ads, and how their
algorithms distribute or moderate content?
• There are pervasive inconsistencies in how the platforms
implement their ad moderation policies. We detected numer- 3.1 Self-Imposed Standards
ous instances where ads are moderated after already having In 2018, facing potential regulation such as the proposed Honest
generated millions of impressions, or where some ads are Ads Act [12], several online platforms chose to create ad libraries of
flagged for moderation when others with the same content political campaign materials. For Facebook, Google, and YouTube,
are not (section 6.2). these libraries provide some basic information about who placed
ads, their content, how they were distributed, and whether they
2 RELATED WORK were moderated (table 1). TikTok recently created an ads library
Online platforms such as Facebook, Google, and TikTok play an [57], but the company disavows carrying ads about political issues
increasingly prominent role in shaping political discourse [35]. One and it does not disclose how it moderates political content.
widely studied aspect is the impact of the platform practices on Facebook. Facebook’s Political Ad Policy restricts the ability to
user behavior [13, 49, 55]. Another is how political actors utilize run electoral and issue ads to authorized advertisers only. Facebook
platforms in their political campaigns [33]. But the role of platforms’ states that the purpose of the ad library is to provide “advertising
algorithmic tools as intermediaries that shape the political discourse transparency by offering a comprehensive, searchable collection of
has not been studied as extensively as there is limited visibility into all ads currently running from across Facebook apps and services,
their practices [16, 26]. including Instagram.” Notably, as the policy explains, the advertising
We build on the works from Kreiss, Mcgregor & Barrett [34–36], transparency is directed at “making political ads more transparent
exploring the role of online platforms in shaping political commu- and advertisers more accountable” and not to hold the platform
nication, and the works of Fowler et al. [21, 22, 24] and West[60], accountable for how it distributes or moderates the political content.
How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok
Table 1: Platform specific strategies in distributing and mod- by a group of influencers, some of which directly linked to polit-
erating political content, showing how each platform de- ical organizations such as PACs. Political influencers are present
fines political content, the access to targeting tools, the on Facebook and Google as well. TikTok also recently published
presence of an ad library, and their moderation practices. its updated community guidelines [56], but the guidelines do not
YouTube had the same ad policy as google, so we did not in- mention how the platform moderates political content.
clude a seperate entry for it.
3.2 Broadcast Regulations
Facebook Google TikTok
Actor, Subject, Federal law imposes disclosure requirements on political cam-
Definition Actor, Subject Actor, Issue paigns and broadcasters to ensure that the public can understand
Issue
where campaigns spend money on reaching prospective voters and
Targeting Full Restricted None
whether the broadcasters carry the ads in a non discriminatory
ad cost, ad cost, manner. The rules for the broadcasters are set and administered by
impressions, impressions, the Federal Communication Commission. In particular, the FCC’s
Election Ad audience targeting No political Political Programming staff oversees whether a broadcaster is favor-
libraries characteristics parameters content ing one candidate at the expense of the other by charging different
(gender, age, (gender, rates or limiting the reach of candidate-sponsored ads [10]. Specifi-
state) age, location) cally, the FCC’s staff resolves issues related to the prohibition on
censorship of candidate-sponsored ads; the “Lowest Unit Charges”
Moderation Removal/ Label Removal/Label Label/Algorithm and “Comparable Rates” that broadcasters charge candidates for
their advertisements; and the on-air sponsorship identification for
political advertisements. The FCC’s staff also oversees the files
that broadcasters must maintain for the public to easily access and
For ad moderation Facebook applies its general Advertising Poli-
inspect.
cies and Community Standards. The Political Ads Policy falls under
In 2022, the FCC updated its regulations to require stations to
the Restricted Content section and consists of two policies: 9.a Ads
maintain a files that contain the following information: (1) whether
About Social Issues, Elections or Politics, and 9.b Disclaimers for Ads
the broadcaster accepted or rejected the request to purchase broad-
About Social Issues, Elections or Politics. Article 9.a outlines that
cast time; (2) the rate charged for the broadcast time; (3) the date
advertisers are required to complete Facebook’s authorization pro-
and time on which the communication is aired; (4) the class of time
cess, and failure to meet the reporting requirements may lead to
that is purchased; (5) the name of the candidate to which the com-
restrictions such as the disabling of existing ads.
munication refers and the office to which the candidate is seeking
Google. Google also launched its political Ad Library during
election, the election to which the communication refers, or the
summer 2018 and requires advertisers to be verified to publish
issue to which the communication refers; (6) in the case of a request
ads. Like Facebook, the library’s purpose is vaguely described as
made by, or on behalf of, a candidate, the name of the entity making
providing “greater transparency in political advertising,” without
the request; and (7) in the case of any other request, the name of the
disclosing what the transparency is being compared against or who
person purchasing the time, the name, address, and phone number
is the subject of the transparency goals. It is clear that the subject
of a contact person for such person, and a list of the chief executive
of transparency is the political campaign and not the platform.
officers or members of the executive committee or of the board of
The Google ads archive, which includes ads placed on the Google
directors of such person [9].
network (search engine, third party websites that use google ad
We extract an analogous evaluative criteria for online ads from
tools, and other google services) and on YouTube, shows the content
these regulations for broadcasters that requires, at minimum, that
of each instance of the political ad, the advertiser, its cost and related
the public should be able to evaluate how campaigns are spending
impressions. It also shows which user groups in terms of age, gender,
money to target audiences, and whether the platforms are carrying
and location (up the zip code), were targeted by the advertisers.
the content in a non-discriminatory manner.
Google and YouTube’s ad moderation policies are set forth in
the platform’s Advertising Policies. Google has a specific category
on Political Content, which is listed under the Restricted Content 3.3 Comprehensive Disclosures
and Features section of their policy. In case that an ad gets removed Platforms have unique data about the political campaign’s targeting
by the platform, the content of it is replaced by a red banner in parameters, how algorithms distribute or moderate content, and
the archive, stating that the ad violated the platform’s terms & who actually saw the ads. But, as shown in figure 2, platforms only
conditions. make a fraction of that information available for public scrutiny.
TikTok. TikTok does not have political ads in its library because Typically, an advertiser runs an ad on the platform by selecting from
it does not allow such ads on the platform. It explains that the “the a variety of targeting parameters, including age, gender, location, as
nature of paid political ads is not something we believe fits the Tik- well as some available specific contextual and audience properties.
Tok platform experience.” Nevertheless, from the content sample Google allows political advertisers to target based on demographic
we analyze in this study (see section B.3) we document a signif- properties and specific contextual features (ad placements, topics,
icant amount of political content shared on the platform around keywords against sites, apps, pages and videos) [28]. Facebook
the elections. We observe that a lot of that content is generated allows the use of demographic data for political ads, and also allows
Papakyriakopoulos et al.
campaigns to use predefined lists of individuals or algorithmically shares, and its description. We also collect video creators’ meta-
generated “look-alike” audience lists [19]. After selecting targeting data. Furthermore, we locate which TikTok videos were assigned
parameters, the advertiser chooses how it will pay for impressions, an election-related warning by the platform, since the platform
which the platform uses to calculate to whom the ad will be shown soft-moderated election related content, by placing a warning ban-
and at what cost. Given the advertiser’s choices, competing ads, and ner saying Get info on the U.S. elections, The banner was linking
user properties, the platform uses complex algorithms to distribute to a guide including authoritative information about the election
the ad. It then creates reports for the advertiser about the number of process. An in-detail description about the collected dataset can be
impressions, as well as the total ad placement cost of the campaign. found in table 2 and in appendix B.
But the platforms’ transparency libraries do not contain the
information they provide to advertisers. As discussed below, the 5 METHODS
appropriate disclosures for platforms should include information First, we quantify the prevalence of the political ads on the plat-
how their algorithms function. But even if we put information about forms. We quantify the number of impressions for political ads on
their algorithms to one side, they should disclose to the public, at Facebook, Google, and YouTube, and the cost to place this content.
minimum, all the information they make available to advertisers For TikTok, we document the number of views for videos produced
about costs, impressions, and targeting parameters. Accordingly, we by political influencers and of videos containing political hashtags
assess the effectiveness of the platforms’ ad libraries by comparing gather in our dataset (Figure 1).
what is disclosed currently against what could be made available Second, we analyze whether the data provided by the online plat-
to a hypothetical advertiser on that platform. forms adequately explain the platforms’ decisions and algorithms
For a platform’s ad moderation practices there is little precedent using the three analytical criteria described in section 3. Table 3
to draw on to develop standards for appropriate disclosures. We provides an overview of the methodologies we describe in detail
examine whether the public can understand if the policy has been next, together with the corresponding evaluative criteria and the
applied consistently and if the platform has provided an adequate sections we report our related results.
explanation for its decision to moderate an advertisement.
5.1 Distribution of political content
5.1.1 Assessing information in the ad libraries. For the platforms
that have political ad libraries (Facebook, Google, & YouTube), we
assess the platforms’ role in shaping the distribution strategy. Since
platforms do not provide all targeting information associated with
an ad, we explore what the limited data can tell us about how the
platforms’ tools were used to target specific audiences. Specifi-
cally, we quantify the unique number of ads Biden’s and Trump’s
campaigns placed in terms of content, location, age and gender
demographics. We also locate how the same ads were distributed
Figure 2: The visible and opaque aspects of online platform across different platforms, and we compare the distribution metrics
ad delivery mechanisms. Google discloses only the demo- provided by Facebook and Google to assess what information ad
graphic segments targeted by the advertisers. Facebook re- libraries can provide.
ports only on the demographic segments that saw specific
content. Hence, none of the platforms reveal full targeting 5.1.2 Reverse engineering the platforms’ targeting algorithms. Next,
and distributional parameters of ads. we attempt to reverse engineer the platforms’ targeting algorithms.
We do that by creating advertising accounts on Facebook and
Google, and evaluating the cost and impression estimates for hy-
pothetical ads that mimic the targeting criteria of original political
4 DATA ads that ran on the platforms. If the cost/impression estimates for
We create a large scale dataset of political ads and content for the hypothetical ads deviate significantly from the reported ranges
Facebook, Google (including YouTube), and TikTok. The dataset for the ads that did run, we assume that advertisers used additional
contains more than 850,000 ads on Facebook, Google, and YouTube targeting options.
and 2,7 million political videos on TikTok. We focus on content Specifically, we create four different dummy ads to investigate
that was created up to two months prior to the US Elections. For the relationships in more detail. On Google, our dummy ad targets
ads, we collect who sponsored them, what specific targeting param- the whole of the United States and to all available genders and ages,
eters & audience characteristics were used to distribute them, as and we calculate the upper and lower impressions it will generate
well as their reach in terms of impressions and the corresponding for a budget varying from $10, to $1,000,000. We do the same for an
cost. We also crawl the Facebook & Google ad libraries to locate ad on YouTube targeting Pennsylvania and females between 25-34.
moderated ads. For TikTok, we generate a list of political hashtags, For Facebook we target one ad at the whole of the United States
which we use to crawl videos from the platform. We also use a list and all genders and ages. Lastly, our other dummy ad on Facebook
of political influencers & Hypehouses (40 democrats, 56 republi- targets California, and females of all ages. Based on the reported
cans), and download all their videos. For each video, we also collect upper and lower values, we interpolate cost and impressions and
the corresponding engagement metrics in terms of views, likes, calculate an area of plausibility. If an ad in the ad libraries with
How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok
Table 2: Overview of the collected dataset. We provide information about how we obtained the data, for which periods, what
attributes, and what post-processing we performed. A more in detail explanation can be found in appendix B.
Table 3: Overview of the different methods we use for evaluating platforms’ given the evaluative criteria we developed. We
use a uniform heading scheme across methods & results to efficiently connect the two sections.
the same targeting options falls within this area, it suggests that way we can assess information quality in the ad libraries. One
the advertisers actually used these targeting options. In case that notable limitation of this approach is that we ran these dummy ads
an ad falls outside the area, it suggests that ads might have used at a different time period from the political ads we examined. As a
additional targeting options that platforms did not disclose. In this
Papakyriakopoulos et al.
result, the analysis is illustrative of the technique and should not Table 4: Demographic distribution of content by candidate
be taken as definitive. on Facebook & Google. For Google we show the targeting
choices of advertisers. For Facebook we report who was
5.1.3 Connecting targeting & audience characteristics to ads’ cost shown an ad (audience characteristics).
per impression. To further explore how algorithms distribute po-
litical content, we investigate the sensitivity of cost/impressions Google Facebook
ratios to different audience characteristics and targeting properties. Distribution Strategy Biden Trump Biden Trump
We create models that analyze ads generated by the Biden and Age 3% 0% 34% 31%
Trump campaigns, and uncover how ad specific properties link to Gender 1% 0% 6% 7%
ads’ distribution. For Facebook, we create a linear regression model zip Code/County 17% 19% - -
State 81% 69% 87% 96%
that has as dependent variable the cost/impressions ratio, and as Age & Gender & zip Code/County 0% 0% - -
independent the ratio of individuals for each state that viewed an Age & Gender & State 1% 0% 6% 7%
ad, the ratio of individuals that were either male or female and
belonged to the age buckets 18-24, 25-34, 35-44, 45-54, 55-65,65+, variables the likes, shares, comments, and views it generated, the
and whether the ad was placed by the Biden or Trump campaign. average amount of likes the video’s author collected, as well as the
Since the Google Ad Archive aggregates cost and impression into presence of three election-related (#biden,#trump, #vote) and three
intervals, we create an ordinal logistic regression model that has de- non-election related (#blm, #abortion, #gun) hashtags. Furthermore,
pendent variable the cost of an ad, and as independent the generated we calculate the ratio of political videos flagged for each user in
impressions, whether the ad was text (ad on google search), image our dataset. Based on the results of both analyses, we can uncover
(ad on third party affiliates), or video (YouTube ad), the targeted features that constituted algorithmic content moderation.
genders (male, female), different age groups (18-24, 25-34, 35-44,
5.2.2 Characterizing the magnitude and effectiveness of moderation
45-54, 55-65, 65+), the magnitude of region targeting (USA, state
for Facebook & Google. For Facebook, Google, and YouTube, our
level, county level, zipcode level), and whether the ad was placed by
investigation of moderation practices examines the ad libraries to
the Biden or Trump campaign. Based on located associations, we
document how many ads were moderated and how many individu-
uncover factors that shape the algorithmic distribution of content.
als saw ads that were flagged. In addition, we manually reviewed
5.1.4 Crossplatform comparison of targeting parameters & audi- a set of 200 moderated ads for each platform, which allows us to
ence characteristics. To understand what full disclosures can tell qualitatively understand features of the moderation process. Fur-
about the algorithms that distribute political advertisements, we thermore, based on the set of moderated ads in our sample, we
use as data the cross-platform tactics of advertisers. Following the locate other ads that contained the same content, but were not
principles of personalized advertising, we make the strong assump- moderated. We quantify their distribution on the platforms, and
tion that advertisers would have targeted specific demographics assess the robustness and degree of explainability of the moderation
with the same content across platforms, in order to maximize their process. These three steps allow us to uncover patterns about who
influence potential. Therefore, we identify 35 unique image ad de- was moderated, why, and how effective was this moderation.
signs created by Biden & Trump that correspond to 12,448 unique
ad placements on Facebook and 3,055 ad placements on Google. 6 RESULTS
Similarly, we identify 72 unique video ad designs that correspond
to 13,840 ad placements on Facebook and 4,383 ad placements on 6.1 Political content distribution
Google. We then identify ad instances that their targeting options 6.1.1 Assessing information in the ad libraries: Platforms provide
on Google match with ad instances’ distributional information on limited insights about how campaigns used their targeting tools. Table
Facebook, and assess how algorithms deliver ads based on specific 4 presents the demographic distribution of Biden and Trump ads
targeting choices of the advertisers. We do so by focusing on two on Google and Facebook. This superficial view seems to suggest
sets of ads, namely YouTube ads placed by the Biden campaign to that advertisers rarely resorted to micro-targeting, and instead
all genders and image ads placed by the Trump campaign to all applied broad criteria to target general segments of the society. For
available age groups. In this way, we uncover how consistent is Google, which publishes targeting data, we see that both Biden
Facebook’s algorithmic distribution. and Trump appear to infrequently use the platform’s fine-grained
demographic targeting (except for targeting by state). Facebook,
5.2 Moderation of Political advertising and provides audience characteristics of individuals who saw the ads,
content rather than targeting choices, but the pattern is similar.
Yet, a closer look of the actual content distributed on the plat-
We analyze platform tactics in algorithmic ad & content moderation
forms contradicts this superficial view. For example, we used algo-
based on each platform’s policies.
rithmic tools [47] and manual coding, to detect all ads placed by
We use two methods to evaluate TikTok’s moderation of election
Biden and Trump in the Spanish language. On Google, the majority
related content.
of the 1724 ads we located that were in Spanish language had no
5.2.1 Connecting TikTok video properties to their moderation. Our demographic targeting and were sent to geographies that did not
second technique investigates TikTok’s use of warning labels re- have large Hispanic populations. Even for zip code targeted ads,
lated to the U.S. elections that uses a logistic regression model the percentage did not exceed on average 30% on Google and 12%
to predict whether a video was flagged. We use as independent on YouTube. In other words, the campaigns potentially used some
How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok
undisclosed contextual targeting criteria to place the ads. On Face- while the cheapest impressions were generated in the states of
book, we also located 626 ads in Spanish, but since the ad library Idaho, Arizona, and Mississippi. Interestingly, ads that Biden placed
provides distributional data only at the state level, we were not able were overall more expensive by impression compared to those of
to evaluate how they were targeted. Trump.
On Google and YouTube, as with Facebook, there was a difference
6.1.2 Reverse engineering the platforms’ targeting algorithms. Sim-
when targeting different age demographics. Targeting individuals
ilarly, our reverse engineering of the platforms’ tools evidence
between the ages of 18-24 was by far the cheapest, while the most
indicates that the campaigns used undisclosed targeting strategies.
expensive targeting groups were people aged between 25-34. The
Figure 3 presents how the distribution of the actual ads in our sam-
more location-specific the targeting, the more expensive was the
ple falls within the retrieved cost-per-impression boundaries. We
ad placement, with zip code targeted ads being the most expensive
find that the reverse engineering data do not always correspond
and US-general ads being the cheapest. In contrast to Facebook, the
to the targeting/distributional information provided in the ad li-
ads that Trump placed were more expensive over those placed by
braries. For Google, this discrepancy is small, when looking at ads
Biden.
distributed over the United States across all genders and ages, with
Comparing these results to the broadcast regulations we observe
only about 2% of the ads reported in the ad libraries following out
multiples instances of a lack of rate parity between campaigns
our calculated boundaries. This discrepancy is significantly larger
and across different strategies. Of course, these rate differences
when comparing YouTube ads placed in Pennsylvania to Females
can be attributed to multiple factors, such as additional contextual
between 25 and 34, with the disagreement reaching 13%. However,
targeting criteria, which where not provided by Facebook or Google,
the true discrepancy may be much larger because Google uses very
as well as further information about how their algorithms distribute
large reporting buckets for costs and impressions, as Figure 3 re-
and price content. But it is worth noting that the potential for a
veals. By way of illustration, an ad is assigned the same value of
broadcaster to favor one campaign over another led to the FCC
≤ 10,000 impressions and ≤$100 costs whether it is shown to 500
rules on parity between campaigns and commercial advertisers,
individuals at a cost of $5 or 9,000 individuals for $90.
to ensure that the broadcaster was acting appropriately. Another
We locate similar discrepancies when reverse-engineering the
issue is that the algorithmic promotion and demotion of content by
information for Facebook. For ads placed to Females of all ages in
the platforms runs into the concern that the intermediary might
California, the disagreement between cost/impression data from our
covertly limit the reach of candidate-sponsored ads. But our analysis
analysis and from the ad libraries is 14%. For ads placed in the United
shows that there are unexplained artifacts in the distribution that
States generally, across all genders and ages, the disagreement
can be tied back to the algorithmic choices of the platforms.
exceeds 27%. These results illustrate that the available information
ad libraries provide is not sufficient to understand how political
6.1.4 Cross-platform comparison of targeting parameters & audi-
advertisers used the platform to distribute political messages. Even
ence characteristics: Full disclosures reveal properties of algorithmic
if the detected discrepancies are a result of specific undisclosed
distribution. Building on our results from the prior sections, we
parameters that influence targeting (e.g. auction system, time of
evaluate what the cross-platform tactics of advertisers can reveal
placed ad, etc.), the disclosures do not meet the platforms’ self-
about how content was algorithmically distributed to the public.
described objective of making political advertisers accountable.
We do this by pairing ads that were shown on Google and Facebook
Although the ad libraries provide information about the content of
so that we obtain data about targeting criteria from Google and the
ads, they do not allow the public to understand the exact segments
actual distribution from Facebook.
of the society advertisers wanted to target, the price they paid, or
Figure 4 shows how two specific sets of ads on Facebook were
how algorithms transformed these intentions to a specific content
distributed across genders and age-groups respectively, given that
distribution.
they were not targeted to specific gender or age subgroups on
6.1.3 Connecting targeting & audience characteristics to ads’ cost per Google. For video ads placed by Biden, which were not gender
impression: Disclosures and targeting algorithms fall short compared targeted, we find that they were distributed unequally among males
to existing broadcasting policies. Our regression analysis results (ta- and females. On average, ads were shown 45% to males, and 53% to
bles 10 & 11, appendix) reveal specific shortcomings for algorithms females, while the standard deviation of the distribution for each
and platform disclosures, both for Facebook and Google, since we gender was 12%. This discrepancy about the ads can be potentially
discover that there was no parity in the cost of ads between advertis- attributed to the gender demographics that used Facebook in the
ers. We also see that different demographic targeting and audience US in 2020, which was 54.5% for females and 43.5% for males [45].
characteristics resulted in different ad placing cost. By analyzing image ads for Trump, that did not have any limi-
For Facebook, there was no difference in the number of impres- tations in age targeting, we uncover similar patterns. Focusing on
sions per dollar an ad generated, regardless of the gender of individ- three age-groups, we find that on average an ad would be shown
uals who saw them. Nevertheless, age was a factor associated with at a rate of 6% to individuals between 18-24, at a rate of 16% to
different amounts of generated impressions. Specifically, placing individuals between 25-34, and at a rate of 21% to individuals be-
ads to older populations (> 55) and very young populations (18-24) tween 45-54. In contrary to gender distribution, these values do
was significantly more expensive than placing ads to individuals not correspond to the Facebook user demographics in 2020, which
between 25 and 53. Furthermore, ads cost varied between different were 8% for individuals between 18-24, 13% for individuals between
states. For example, the most expensive impressions were found in 25-34, and 7% for individuals between 45-54. This means that algo-
the states of Massachusetts, Rhodes Island, and Washington DC, rithmic targeting resulted in a disparate distribution of ads across
Papakyriakopoulos et al.
Figure 3: We plot the generated cost per impression of ads in the ad-libraries that were (1) targeted to all genders & ages on
Google, (2) to Females, between 25-34 on YouTube, (3) were seen by all genders & ages in the US on Facebook, and (4) only by
females of all ages located in California on Facebook. For Facebook, lower & upper bounds are provided for the impressions.
For Google, lower & upper bounds are provided for cost & impressions, given the extensive “bucketing” of the parameters
performed by the ad libraries when reporting them, which are denoted in the figures with boxes. Points represent the median
value of the boxes. We compare the generated cost-per impression of ads with the cost-per impression of a set of dummy
ads we placed on the platforms with the exact same targeting parameters & audience characteristics. Black lines represent
the upper and lower boundaries of an ad’s cost-per-impression as we extracted them from the dummy ads. We label an ad
placement as “plausible targeting”, when the ad cost-per-impression overlaps with the one we calculated, denoting that we
can assume that the ad library provides all relevant targeting parameters/audience characteristics about an ad. Similarly, an
placement labeled as “unexplainable targeting” represents an ad whose cost-per-impression is outside the upper and lower
reach values that we calculated, meaning that potentially platforms do not disclose full information about the distribution of
the ad.
age groups. These findings provide additional support for the pub- some that were not. On Facebook, this appeared in 51% of the mod-
lic interest in further understanding how the distribution of the erated ads. For Google, the amount of these ads was 75%, while for
campaigns ads was affected by the platforms’ algorithmic choices. YouTube it was 65%. In total, we found 11,549 ad instances across
platforms that were not moderated, although at least one identi-
6.2 Moderation of political advertising and cal to them was removed. In median for Google, non-moderated
content ad instances resulted in the generation of 1.1 billion impressions,
Focusing on moderation, we evaluate how platforms disclose their compared to 700 million for the moderated ones. For YouTube, non-
choices and practices when removing political ads (Facebook, Google, moderated ad instances generated 1.2 billion impressions, while
YouTube), and handling political content (TikTok). moderated instances 900 million. For Facebook, these numbers
were 440 million and 200 million respectively. These results suggest
6.2.1 Characterizing the magnitude and effectiveness of moderation that inconsistent ad moderation had serious implications, since
for Facebook & Google: Unexplainable ad moderation practices. Our both moderated ads resulted in a significant amount of impres-
study documents how difficult it is to understand how the platforms sions, and also their unmoderated counterparts resulted in an even
apply their moderation policies to political ads. higher diffusion of problematic content. Furthermore, the platform
On Google and Facebook we see that a large number of ads ad libraries do not provide any explanation why and when an ad
across a wide range of advertisers were removed (table 8, appen- was removed, therefore, is not possible to assess the reasons for
dix). Google removed 13.3% of the political ads from its network, these discrepancies. Especially for Facebook, we find that even the
YouTube removed 4.5%, while Facebook only 1.2%. Despite their classification of ads as removed was inconsistent. When manually
removal, these ads generated a significant amount of user impres- reviewing a random sample of 200 moderated ads, we found that
sions. Furthermore, these decisions affected a significant number 35 were no longer labeled as removed. These results raise questions
of advertisers. Google removed at least one ad from 256 advertisers about the efficacy, robustness, and explainability of the moderation
(18% of all), YouTube at least one ad from 307 advertisers (22% of practices.
all), and Facebook from 266 advertisers (31% of all).
Figure 5 illustrates how different instances of the same ad design 6.2.2 Connecting TikTok video properties to their moderation. Fo-
were moderated. For each platform we find a significant number of cusing on content moderation on TikTok, we find 505,062 videos
ads that contained some instances where the ad was removed, and that contained at least a hashtag from our curated hashtag list.
How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok
Figure 5: Comparison of different instances of moderated ads across platforms. The light blue bars show how many instances
of a single ad were moderated, and maroon bars show how many instances of the same ad were not. Results suggests an
inconsistent moderation of content across platforms, with some instances of the same ad being removed and some others not.
there is an equal baseline for disclosures across the platforms. Such Our study demonstrated the existence of strong barriers to public
definition should be carefully chosen, given the complexity and understanding advertisers’ tactics. We also found evidence that
non-triviality of the concept [52]. The same applies for available platforms’ disclosures falls well short of what is required under
targeted information. Platforms should disclose both the full and the law for broadcasters. Finally, we showed why we meed more
detailed targeting and distribution parameters (audience character- accurate and comprehensive disclosures to understand and robustly
istics) of ads, since anything less than this results in an incomplete evaluating targeting tools and algorithmic moderation.
and inefficient evaluation of advertisers’ campaigns and platforms’
decisions. Finally, the full disclosures of targeting criteria can facili- 9 ACKNOWLEDGMENTS
tate understanding the specific campaigning techniques attempted
This study was supported by a Princeton Data Driven Social Science
to influence voters.
Initiative Grant. We thank Laura Edelson, Andy Guess, and Matt
We believe that the creation of a standard and repository should
Salganik for constructive feedback on the final manuscript. We
be accompanied with detailed regulations that protect the public
are also grateful for the early feedback from the research seminar
and ensures fairness among political advertisers. Drawing from
run by Princeton’s Center for the Study of Democratic Politics and
the broadcasting regulations, we observed an apparent difference
later feedback from the MPSA’22 panel on political marketing. We
in rates between different advertisers. Platforms should disclose
would also like to thank Eli Lucherini for support in data analysis,
how they algorithmically control the price and reach of content,
Ashley Gorham for conceptual contributions in the early stages of
whether platforms deliberately or unintentionally limit the reach
the project, Juan Carlos Medina for a part of the data collection, and
of candidate-sponsored ads, how they ensure parity, and provide
Milica Maricic for support in classifying political ads and creating
information that can reveal whether advertisers or platforms target
the website of the project.
segments of the society in a biased way. Similarly, since we discov-
ered significant inconsistencies in ad moderation, we argue that
platforms should be obligated to disclose when and why an ad was REFERENCES
[1] Muhammad Ali. 2021. Measuring and Mitigating Bias and Harm in Personalized
removed, and make the removed content available for review in a Advertising. In Fifteenth ACM Conference on Recommender Systems. 869–872.
neutral repository. This can make platforms accountable for their [2] Muhammad Ali, Piotr Sapiezynski, Miranda Bogen, Aleksandra Korolova, Alan
decisions and algorithms, and can ensure the fair moderation of Mislove, and Aaron Rieke. 2019. Discrimination through optimization: How
Facebook’s Ad delivery can lead to biased outcomes. Proceedings of the ACM on
content among advertisers. Human-Computer Interaction 3, CSCW (2019), 1–30.
Lastly, the broad reach of the influencers that we document [3] Muhammad Ali, Piotr Sapiezynski, Aleksandra Korolova, Alan Mislove, and
on TikTok highlights the need for regulations that require disclo- Aaron Rieke. 2021. Ad Delivery Algorithms: The Hidden Arbiters of Political
Messaging. In Proceedings of the 14th ACM International Conference on Web Search
sures about their sources of funding and other activities. And such and Data Mining. 13–21.
influencer-driven marketing is also present on Google and Face- [4] Athanasios Andreou, Giridhari Venkatadri, Oana Goga, Krishna Gummadi,
Patrick Loiseau, and Alan Mislove. 2018. Investigating ad transparency mecha-
book properties, but is not disclosed in their ad libraries. Because nisms in social media: A case study of Facebook’s explanations. In NDSS 2018-
Federal Trade Commission’s endorsement guidelines are designed Network and Distributed System Security Symposium. 1–15.
for commercial transactions and not political campaigning, this [5] Sebastian Benthall and Jake Goldenfein. 2021. Artificial Intelligence and the
Purpose of Social Systems. In Proceedings of the 2021 AAAI/ACM Conference on
is an area where the Federal Election Commission would need to AI, Ethics, and Society. 3–12.
develop comprehensive disclosure requirements. [6] Published by Statista Research Department and Jan 14. 2021. Digital political
ad spend in the U.S. 2020. https://www.statista.com/statistics/309592/online-
political-ad-spend-usa
[7] Jilin Chen, Eben Haber, Ruogu Kang, Gary Hsieh, and Jalal Mahmud. 2015. Mak-
8 CONCLUSION ing use of derived personality: The case of social media ad targeting. In Ninth
International AAAI Conference on Web and Social Media.
In this study, we evaluated what the platform disclosures could [8] Federal Election Commision. 2022. Campaign finance data. https://www.fec.gov/
tell the public about their role in the distribution and moderation [9] Federal Communications Commission. 2022. FCC adopts updated political pro-
of political advertising. By taking the political ad libraries and gramming and record-keeping rules. https://www.fcc.gov/document/fcc-adopts-
updated-political-programming-and-record-keeping-rules
platforms transparency mechanisms seriously, we undertook large [10] Federal Communications Commission. 2022. Political programming. https:
scale data analysis of political ads on Facebook, Google, and TikTok. //www.fcc.gov/media/policy/political-programming
How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok
A ETHICAL CONSIDERATIONS
A.1 Data selection
Facebook provides, through special agreements for researchers, access to the FORT dataset [18], which purports to contain more detailed
information about ad targeting on the platform as compared to the public library we analyzed. We decided not to use the FORT data for
two reasons. First, we wanted to focus on data that was available to the public at large. Second, at the time we conducted our analysis, the
platform did not provide us with appropriate assurances that we could use that data without any research & publication restrictions.
B DATA
B.1 Facebook
Facebook’s Ads Library contains information about ads related to politics, credit, housing, and employment. The platform provides information
about whether an ad was sponsored or not, who sponsored it, its content, and data about the audience. Specifically, it provides a lower and
upper number of generated impressions and cost, as well as which user groups saw the ad in terms of age, gender2 , and location. Using the
Ads Library API service, we collected all political ads placed in the 60 days leading up to the election (September 1st to November 4th),
which tracks the legal definition of "electioneering communications," by advertisers who spent at least hundred thousand dollars. Our final
dataset consisted of 749,556 ads created by 803 advertisers, and we collected it in November 2020. This represented approximately 65 % of
political ads in the specified period. Facebook also enforced a ban for placing political ads during the week before the Elections [31], and
indeed we did not locate any new ads in our collected sample for the specific period.
To detect ads Facebook moderated, we crawled the ads in our dataset to locate on which Facebook placed a flag that specific that they
were removed. In total, we located the removal of 8635 ads from 253 advertisers on the platform.
For the ads in our collection that were in the form of image or video, we transformed them to text using the Google Cloud Vision and
Speech-to-Text APIs, to make them available for further statistical processing. Next, we queried the Federal Election Commission (FEC)
database [8], to investigate how many advertisers were registered as political entities. We also matched advertisers with their corresponding
records in the political tracking website FollowTheMoney [48], which classify them as Political Action Committees (PACs), Authorized
Campaign Committees, NGOs, State related entities, Corporation or Labor entities, or other entities. Based on the information of the platform,
we also categorized advertisers in respect to the content they created, i.e. whether they promoted a specific ideology or single issue, whether
they were promoting civil rights, they were general advertising agencies, they created policy related content, they created candidate or party
related content, they were selling merchandise, promoting the issues of government and state agencies, or they were asking individuals to
perform civic service (e.g. working in election administration).
2 Gender is a spectrum. Nevertheless, both Facebook & Google use a binary classification of genders. We adopt this language for the specific analysis, but we disagree with this form
of classification.
How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok
B.3 TikTok
Formally, TikTok does not allow the placement of political ads. But we observed influencers engaged in political campaigning, who formed
so-called HypeHouses. HypeHouses are TikTok accounts managed by coalitions of political influencers, generating content supporting
specific candidates.
We started with a list of known influencers and HypeHouses [40] and by snowballing we collected other popular accounts that interacted
with them. This resulted in a final list of 40 Democratic and 56 Republican HypeHouses and political influencers. We then crawled the
HypeHouse videos between September 1st and October 15th. We had wanted to collect data through November 4th, but our access to
crawl the platform was restricted, as the platform changed its internal API structure. (Appendix, table 5). Next, we created a list of political
hashtags (Appendix, table 6) that included candidates’ names, election related issues such as mail-in ballots, and general political issues such
as abortion or gun laws. We searched for videos containing these hashtags. Because our TikTok crawl returns only trending content and not
all videos related to a hashtag, we identified video creators of the returned content and collected all of their videos for the same period as
above. Our final dataset contained 2,690,923 videos from more than 61,000 TikTok creators. For each creator, we obtained information such as
their account description, number of followers, and general popularity. For each video, we collected information about how many times they
were viewed, liked, and shared, as well as their description. For the HypeHouses, we reviewed the profiles to manually categorize them based
on whether they reported links to following entities: PACs, NGOs, politicians, media outlets and whether they were selling merchandise, or
were asking for donations. For the purpose of evaluating its moderation practices, we relied on TikTok returning information about whether
a video was an ad or was assigned a flag, such as being related to the US elections.
C FIGURES
Figure 6: Left: Forest plot of logistic regression model predicting whether a political TikTok video will be labeled with a
warning flag. Right: Ratio of political videos flagged by user on TikTok.
Figure 7: Overview of visible advertisers in the libraries by platform in the dataset. Facebook is depicted in blue, Google is
depicted in burgundy, and TikTok is depicted in grey. The bottom-right bar plot depicts the number of influencers linked to
political entities.
Papakyriakopoulos et al.
Table 5: We collected generated content from the following 96 Republican & Democratic influencers.
Influencers
Republican Democratic
nickvideos,thecjpearson,eliciawho,dylan.odin,redboyhickibilly,
machooch,theconservativevalues,tophertownmusic,samditzhazy,
bodittle,jsinnmusic,thebiasone,mommy_nikki,rheannonfae,
thecadelewis,chabella40,belessstupid,the.ghost88,imnotnatalie,
conservativeHypeHouse,o_rileyyyautoparts,kp.thepatriot,
virtualconnectors,professorross,save.america,democrat_me,
c.j._production,therepublicanHypeHouse,donaldtrumpteam,
mr.shaw7,docd12,electro_high,donthecon_and_associates,wadeslade,
republicanism,matt4186,therealbentaylor,albertojdejesus,
thedemHypeHouse,thatliberalgirl,leftistjayce,futurestatesmanalexander,
youngrepub,thescoop_us,jimjrpavv,thebadopinion,realjohndennis,
somepoliticaldude,theleewithnoname,heathergtv,shashaankvideos,
yourcity,c.jennings7152822,patriotfacts,dylanmaddentv,
thehumanrightsgroup,maya2960,theleftistdude,kaivanshroff,
frankynofingers,lamot11,kindall.k,gcnow,truthseeker5536,
maxwellblaine,typical_democrat,j0emorris,izuhhhhhbel,thealanvargas,
daddy_no_pc,daddy_no_pc2,megaamerican,zc_55,americanblondie,
spiicyboi7,lord_timothais,deerodx,bidenssunglasses,
thesavvytruth,therightlefty,christianwalk1r,matty.merica,
theprogressivepolicy,jbiii,liberalcorner,yaboihatestiktok,bidencoalition,
claytonkeirns,emmanuelharouno,the.rickytaylor,chadvideos,
therepublicangirlls,imtriggered,mattconvard,bobs_politics,
youngrepublicans45,
Hashtags
trump|biden|harris|fakenews|election|debate|maga|democrat|
republican|gun|libert|lgbt|conservative|politic|president|left|right|
vote|ballot|equality|kamala|bluewave|envelope|blm|blacklives|
alllives|dems|reps|settlefor|kag|alm |floyd|breonna|abortion|vax|
vaccine|factcheck|fakenews|aoc
Table 7: Top advertisers on Facebook and Google who are not registered at the Federal Election Commission (FEC).
Table 8: Removed ads in our dataset for each platform, by how many advertisers and how many impressions they generated
prior to their removal.
Table 9: Warnings placed on TikTok videos with at least one political hashtag.
Label Counts
Get info on the U.S. elections 243,440
Learn the facts about COVID-19 2,341
The action in this video could result in serious injury. 30
Table 10: Ordinal linear regression results for pre- Table 11: Linear regression results predicting the impres-
dicting the generated number of impressions for sion/cost ratio for ads placed by Biden & Trump on Facebook.
each advertisement of Biden & Trump on Google.
Variable Estimator St. Error Variable Estimator St. Error
Variable Estimator St. Error AL 7.53∗ (3.48) AK 4.13 (6.31)
$100-$1k 5.23∗∗∗ (0.05) AZ 1.22 (2.80) AR 16.94∗∗∗ (4.40)
$1k-$50k 9.71∗∗∗ (0.07) CA −1.10 (2.87) CO 8.56∗∗ (2.89)
CT −3.28 (4.27) DE −0.98 (6.68)
$50k-$100k 15.37∗∗∗ (0.22)
FL 3.43 (2.80) GA 2.99 (2.80)
>$100k 17.97∗∗∗ (0.25)
ID 24.48∗∗∗ (5.43) IL 3.50 (3.19)
Google Network 9.06∗∗∗ (0.09)
IN 9.19∗∗ (3.37) IA 2.19 (2.81)
YouTube 5.22∗∗∗ (0.07) KS 22.58∗∗∗ (4.81) KY 30.35∗∗∗ (4.12)
Male −0.01 (0.22) LA 6.57 (3.42) ME −0.47 (2.83)
Female 0.08 (0.22) MD 1.19 (3.43) MA −13.80∗∗∗ (3.44)
Age 18-24 2.04∗∗∗ (0.29) MI 3.41 (2.80) MN 0.93 (2.81)
Age 25-34 −1.07∗∗∗ (0.22) MS 22.91∗∗∗ (4.54) MO 19.84∗∗∗ (3.80)
Age 45-54 −0.88∗∗∗ (0.13) MT −1.43 (4.95) NE −2.11 (2.83)
Age 55+ −0.39∗∗ (0.14) NV −0.72 (2.81) NH −0.30 (2.89)
Zip code −0.50∗∗∗ (0.03) NJ 0.49 (3.54) NM −0.48 (5.64)
County −0.13∗ (0.06) NY −8.92∗∗ (3.09) NC 1.89 (2.80)
USA 0.02 (0.06) ND 21.78∗∗∗ (6.56) OH 4.09 (2.81)
region not targeted −0.18 (0.13) OK 17.90∗∗∗ (4.17) OR 0.31 (3.40)
Trump over Biden 0.34∗∗∗ (0.03) PA 1.36 (2.80) RI −10.22 (6.91)
≤ 10k|10k-100k 9.41∗∗∗ (0.09) SC 17.96∗∗∗ (3.98) SD 1.26 (3.63)
10k-100k|100k-1M 14.30∗∗∗ (0.12) TN 19.89∗∗∗ (3.64) TX 5.18 (2.85)
100k-1M|1M-10M 18.14∗∗∗ (0.13) UT 5.22 (3.46) VT −3.84 (3.84)
1M-10M|> 10M 22.26∗∗∗ (0.23) VA 8.27∗∗ (2.86) WA −3.66 (3.25)
WV 11.07∗ (5.19) WI 0.65 (2.80)
AIC 50018.68
WY 11.90 (6.51) DC −26.14∗∗∗ (5.52)
BIC 50208.89
Male 18-24 2.06 (1.97) Male 25-34 7.70∗∗∗ (1.90)
Log Likelihood −24988.34
Male 35-44 7.88∗∗∗ (1.91) Male 45-54 7.05∗∗∗ (1.91)
Deviance 49976.68 Male 55-64 2.35 (1.91) Male 65+ −5.67∗∗ (1.92)
Num. obs. 63422 Female 18-24 1.43 (1.96) Female 25-34 7.59∗∗∗ (1.91)
∗∗∗ 𝑝 < 0.001; ∗∗ 𝑝 < 0.01; ∗ 𝑝 < 0.05
Female 35-44 8.95∗∗∗ (1.92) Female 45-54 6.49∗∗∗ (1.93)
Female 55-64 0.71 (1.91) Female 65+ −5.37∗∗ (1.90)
Ad delivery start time −0.17∗∗∗ (0.00) Biden Campaign 27.53∗∗∗ (3.35)
Trump Campaign 18.27∗∗∗ (3.35)
∗∗∗ 𝑝 < 0.001; ∗∗ 𝑝 < 0.01; ∗ 𝑝 < 0.05
Table 12: Logistic regression results for predicting whether a TikTok video contains an election related warning.