KEMBAR78
How Algorithms Shape The Distribution of Political | PDF | Advertising | Facebook
0% found this document useful (0 votes)
29 views15 pages

How Algorithms Shape The Distribution of Political

How_Algorithms_Shape_the_Distribution_of_Political

Uploaded by

Pop Otto Vasile
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views15 pages

How Algorithms Shape The Distribution of Political

How_Algorithms_Shape_the_Distribution_of_Political

Uploaded by

Pop Otto Vasile
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

How Algorithms Shape the Distribution of Political Advertising:

Case Studies of Facebook, Google, and TikTok


Orestis Papakyriakopoulos Christelle Tessono
Center for Information Technology Policy Center for Information Technology Policy
Princeton University Princeton University
orestis@princeton.edu ct8474@princeton.edu

Arvind Narayanan Mihir Kshirsagar


Center for Information Technology Policy Center for Information Technology Policy
Princeton University Princeton University
arvindn@cs.princeton.edu mihir@princeton.edu
arXiv:2206.04720v1 [cs.SI] 9 Jun 2022

ABSTRACT In this study, we evaluate the voluntary disclosures [19, 28] made
Online platforms play an increasingly important role in shaping by online platforms to understand how the platforms influence the
democracy by influencing the distribution of political information distribution of the political ads. We conduct the first large scale
to the electorate. In recent years, political campaigns have spent data analysis of political ads in the 2020 U.S. presidential elections
heavily on the platforms’ algorithmic tools to target voters with to investigate the practices of three platforms - Facebook, Google,
online advertising. While the public interest in understanding how and TikTok.
platforms perform the task of shaping the political discourse has One clear indication of the importance of online platforms to
never been higher, the efforts of the major platforms to make the political campaigns is to see how campaigns have shifted their
necessary disclosures to understand their practices falls woefully spending to online advertising. In 2008, the first significant digital
short. In this study, we collect and analyze a dataset containing over campaign spent roughly $20 million on online advertising, which
800,000 ads and 2.5 million videos about the 2020 U.S. presidential amounted to 0.4% of the total money spent on campaigning [17]. In
election from Facebook, Google, and TikTok. We conduct the first the 2020 election cycle, the campaigns spent more than $2.8 billion,
large scale data analysis of public data to critically evaluate how or 20% of the campaign budget [6] on the major platforms. As
these platforms amplified or moderated the distribution of political shown in figure 1, this spending generated billions of impressions
advertisements.1 We conclude with recommendations for how to for political ads placed in the two months prior to the US 2020
improve the disclosures so that the public can hold the platforms elections.
and political advertisers accountable. Mindful of the risk of elevating transparency to become the
supreme value in democratic politics [51], we do not focus on
CCS CONCEPTS transparency for its own sake. Instead, we develop a framework
that identifies the desirable properties for these disclosures that
• Applied computing → Law; • Information systems → On-
serve broader democratic values, and measuring the effectiveness
line advertising.
of the current platform disclosures against those properties. Two
straightforward research questions lie at the heart of our inquiry:
KEYWORDS
interpretability, political speech, algorithmic auditing, accountabil- RQ1: What do the ad libraries tell us about how the algorithmic
ity, political advertising, algorithmic targeting, regulation tools were used to distribute political ads?
RQ2: Can we interpret how platforms applied their moderation
1 INTRODUCTION policies to political ads?
As online advertising becomes a crucial part of political campaigns
[30, 32], the platforms’ control over their communication infras- 1.1 Contributions
tructure makes them key political actors [50] and gives them a • We develop and apply a three part evaluative framework for
power over the political discourse that goes beyond what the tradi- measuring the effectiveness of platform disclosures (section
tional definition of “platform” might denote [27]. Importantly, these 3).
platforms are not neutral carriers of political ads, but play a more • We attempt to reverse-engineer platforms’ ad targeting tools
active role in amplifying or moderating the reach of those political using the available data to assess how they influence the
messages. But the platforms have not disclosed data that would distribution of content (section 6.1).
allow for meaningful public oversight of their actions. In 2018, in • Our statistical analysis suggests that the platforms charge
an attempt to stave off regulation, some platforms begun to volun- different campaigns different rates for targeting specific re-
tarily create libraries of political advertisements and moderation gions and demographics (section 6.1.3).
decisions [23]. • As a whole, we demonstrate how the data provided by the
1 Our dataset and source code is available for public use under platforms is noisy and incomplete so it is difficult to draw
http://campaigndisclosures.princeton.edu/ definitive conclusions about how they operate (section 6.1.4).
Papakyriakopoulos et al.

who provide historical information about political advertising and


regulatory changes. We also draw on work by researchers who
describe the content of ad libraries (E.g. Edelson et al. [16] for the
US, Dubois et al for Canada [14] and Medina et al. for Germany
[43]). Like [4, 38, 39, 53], we assess the data quality of the libraries
to uncover misleading, incomplete, or wrong information.
Notably, our work is different from and complementary to the
attempts to understand the role of the platforms in democracy
through direct agreements between researchers and platforms [29]
that draw on the detailed user data available to the platform. Our
study is deliberately limited to the data that the platforms make
publicly accessible.
As we demonstrate in our analysis below, the platforms use
complex, and opaque algorithms to price and distribute ads (e.g
[42, 58]). We build on prior research studies that show how these
algorithmic tools can result in discriminatory and biased outcomes
[1–3], and can have differential effects on the user population [7,
37].
Finally, our recommendations for appropriate disclosures rely
on design frameworks (e.g. [5, 20, 54]) that seek to give users the
ability to evaluate and understand how algorithmic tools influence
them [44, 46, 59].

3 EVALUATIVE CRITERIA
Government regulations for online political advertising have stalled
Figure 1: Overview of political content reach by platform in in the United States [49]. As a result, we do not have legal standards
our dataset, for the two months up to the election day. On with which to evaluate the current disclosures by the platforms.
the top, Figure A depicts Facebook in blue, YouTube in black, Nevertheless, we extract three potential criteria to measure the
and Google in yellow. Facebook & Google do not disclose the effectiveness of the disclosures:
exact number of impressions, but a range within which the
• First, do the disclosures meet the platforms’ self-described
value falls in. On the bottom, Figure B depicts the number
objective of making political advertisers accountable?
of views of videos containing the hashtags #Biden2020 and
• Second, how do the platforms’ disclosures compare against
#Trump2020 on TikTok in our dataset. It also presents the
what the law requires for radio and television broadcasters?
number of views for videos produced by 96 political influ-
• Third, do the platforms disclose all that they know about the
encers (see Table 5.
ad targeting criteria, the audience for the ads, and how their
algorithms distribute or moderate content?
• There are pervasive inconsistencies in how the platforms
implement their ad moderation policies. We detected numer- 3.1 Self-Imposed Standards
ous instances where ads are moderated after already having In 2018, facing potential regulation such as the proposed Honest
generated millions of impressions, or where some ads are Ads Act [12], several online platforms chose to create ad libraries of
flagged for moderation when others with the same content political campaign materials. For Facebook, Google, and YouTube,
are not (section 6.2). these libraries provide some basic information about who placed
ads, their content, how they were distributed, and whether they
2 RELATED WORK were moderated (table 1). TikTok recently created an ads library
Online platforms such as Facebook, Google, and TikTok play an [57], but the company disavows carrying ads about political issues
increasingly prominent role in shaping political discourse [35]. One and it does not disclose how it moderates political content.
widely studied aspect is the impact of the platform practices on Facebook. Facebook’s Political Ad Policy restricts the ability to
user behavior [13, 49, 55]. Another is how political actors utilize run electoral and issue ads to authorized advertisers only. Facebook
platforms in their political campaigns [33]. But the role of platforms’ states that the purpose of the ad library is to provide “advertising
algorithmic tools as intermediaries that shape the political discourse transparency by offering a comprehensive, searchable collection of
has not been studied as extensively as there is limited visibility into all ads currently running from across Facebook apps and services,
their practices [16, 26]. including Instagram.” Notably, as the policy explains, the advertising
We build on the works from Kreiss, Mcgregor & Barrett [34–36], transparency is directed at “making political ads more transparent
exploring the role of online platforms in shaping political commu- and advertisers more accountable” and not to hold the platform
nication, and the works of Fowler et al. [21, 22, 24] and West[60], accountable for how it distributes or moderates the political content.
How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok

Table 1: Platform specific strategies in distributing and mod- by a group of influencers, some of which directly linked to polit-
erating political content, showing how each platform de- ical organizations such as PACs. Political influencers are present
fines political content, the access to targeting tools, the on Facebook and Google as well. TikTok also recently published
presence of an ad library, and their moderation practices. its updated community guidelines [56], but the guidelines do not
YouTube had the same ad policy as google, so we did not in- mention how the platform moderates political content.
clude a seperate entry for it.
3.2 Broadcast Regulations
Facebook Google TikTok
Actor, Subject, Federal law imposes disclosure requirements on political cam-
Definition Actor, Subject Actor, Issue paigns and broadcasters to ensure that the public can understand
Issue
where campaigns spend money on reaching prospective voters and
Targeting Full Restricted None
whether the broadcasters carry the ads in a non discriminatory
ad cost, ad cost, manner. The rules for the broadcasters are set and administered by
impressions, impressions, the Federal Communication Commission. In particular, the FCC’s
Election Ad audience targeting No political Political Programming staff oversees whether a broadcaster is favor-
libraries characteristics parameters content ing one candidate at the expense of the other by charging different
(gender, age, (gender, rates or limiting the reach of candidate-sponsored ads [10]. Specifi-
state) age, location) cally, the FCC’s staff resolves issues related to the prohibition on
censorship of candidate-sponsored ads; the “Lowest Unit Charges”
Moderation Removal/ Label Removal/Label Label/Algorithm and “Comparable Rates” that broadcasters charge candidates for
their advertisements; and the on-air sponsorship identification for
political advertisements. The FCC’s staff also oversees the files
that broadcasters must maintain for the public to easily access and
For ad moderation Facebook applies its general Advertising Poli-
inspect.
cies and Community Standards. The Political Ads Policy falls under
In 2022, the FCC updated its regulations to require stations to
the Restricted Content section and consists of two policies: 9.a Ads
maintain a files that contain the following information: (1) whether
About Social Issues, Elections or Politics, and 9.b Disclaimers for Ads
the broadcaster accepted or rejected the request to purchase broad-
About Social Issues, Elections or Politics. Article 9.a outlines that
cast time; (2) the rate charged for the broadcast time; (3) the date
advertisers are required to complete Facebook’s authorization pro-
and time on which the communication is aired; (4) the class of time
cess, and failure to meet the reporting requirements may lead to
that is purchased; (5) the name of the candidate to which the com-
restrictions such as the disabling of existing ads.
munication refers and the office to which the candidate is seeking
Google. Google also launched its political Ad Library during
election, the election to which the communication refers, or the
summer 2018 and requires advertisers to be verified to publish
issue to which the communication refers; (6) in the case of a request
ads. Like Facebook, the library’s purpose is vaguely described as
made by, or on behalf of, a candidate, the name of the entity making
providing “greater transparency in political advertising,” without
the request; and (7) in the case of any other request, the name of the
disclosing what the transparency is being compared against or who
person purchasing the time, the name, address, and phone number
is the subject of the transparency goals. It is clear that the subject
of a contact person for such person, and a list of the chief executive
of transparency is the political campaign and not the platform.
officers or members of the executive committee or of the board of
The Google ads archive, which includes ads placed on the Google
directors of such person [9].
network (search engine, third party websites that use google ad
We extract an analogous evaluative criteria for online ads from
tools, and other google services) and on YouTube, shows the content
these regulations for broadcasters that requires, at minimum, that
of each instance of the political ad, the advertiser, its cost and related
the public should be able to evaluate how campaigns are spending
impressions. It also shows which user groups in terms of age, gender,
money to target audiences, and whether the platforms are carrying
and location (up the zip code), were targeted by the advertisers.
the content in a non-discriminatory manner.
Google and YouTube’s ad moderation policies are set forth in
the platform’s Advertising Policies. Google has a specific category
on Political Content, which is listed under the Restricted Content 3.3 Comprehensive Disclosures
and Features section of their policy. In case that an ad gets removed Platforms have unique data about the political campaign’s targeting
by the platform, the content of it is replaced by a red banner in parameters, how algorithms distribute or moderate content, and
the archive, stating that the ad violated the platform’s terms & who actually saw the ads. But, as shown in figure 2, platforms only
conditions. make a fraction of that information available for public scrutiny.
TikTok. TikTok does not have political ads in its library because Typically, an advertiser runs an ad on the platform by selecting from
it does not allow such ads on the platform. It explains that the “the a variety of targeting parameters, including age, gender, location, as
nature of paid political ads is not something we believe fits the Tik- well as some available specific contextual and audience properties.
Tok platform experience.” Nevertheless, from the content sample Google allows political advertisers to target based on demographic
we analyze in this study (see section B.3) we document a signif- properties and specific contextual features (ad placements, topics,
icant amount of political content shared on the platform around keywords against sites, apps, pages and videos) [28]. Facebook
the elections. We observe that a lot of that content is generated allows the use of demographic data for political ads, and also allows
Papakyriakopoulos et al.

campaigns to use predefined lists of individuals or algorithmically shares, and its description. We also collect video creators’ meta-
generated “look-alike” audience lists [19]. After selecting targeting data. Furthermore, we locate which TikTok videos were assigned
parameters, the advertiser chooses how it will pay for impressions, an election-related warning by the platform, since the platform
which the platform uses to calculate to whom the ad will be shown soft-moderated election related content, by placing a warning ban-
and at what cost. Given the advertiser’s choices, competing ads, and ner saying Get info on the U.S. elections, The banner was linking
user properties, the platform uses complex algorithms to distribute to a guide including authoritative information about the election
the ad. It then creates reports for the advertiser about the number of process. An in-detail description about the collected dataset can be
impressions, as well as the total ad placement cost of the campaign. found in table 2 and in appendix B.
But the platforms’ transparency libraries do not contain the
information they provide to advertisers. As discussed below, the 5 METHODS
appropriate disclosures for platforms should include information First, we quantify the prevalence of the political ads on the plat-
how their algorithms function. But even if we put information about forms. We quantify the number of impressions for political ads on
their algorithms to one side, they should disclose to the public, at Facebook, Google, and YouTube, and the cost to place this content.
minimum, all the information they make available to advertisers For TikTok, we document the number of views for videos produced
about costs, impressions, and targeting parameters. Accordingly, we by political influencers and of videos containing political hashtags
assess the effectiveness of the platforms’ ad libraries by comparing gather in our dataset (Figure 1).
what is disclosed currently against what could be made available Second, we analyze whether the data provided by the online plat-
to a hypothetical advertiser on that platform. forms adequately explain the platforms’ decisions and algorithms
For a platform’s ad moderation practices there is little precedent using the three analytical criteria described in section 3. Table 3
to draw on to develop standards for appropriate disclosures. We provides an overview of the methodologies we describe in detail
examine whether the public can understand if the policy has been next, together with the corresponding evaluative criteria and the
applied consistently and if the platform has provided an adequate sections we report our related results.
explanation for its decision to moderate an advertisement.
5.1 Distribution of political content
5.1.1 Assessing information in the ad libraries. For the platforms
that have political ad libraries (Facebook, Google, & YouTube), we
assess the platforms’ role in shaping the distribution strategy. Since
platforms do not provide all targeting information associated with
an ad, we explore what the limited data can tell us about how the
platforms’ tools were used to target specific audiences. Specifi-
cally, we quantify the unique number of ads Biden’s and Trump’s
campaigns placed in terms of content, location, age and gender
demographics. We also locate how the same ads were distributed
Figure 2: The visible and opaque aspects of online platform across different platforms, and we compare the distribution metrics
ad delivery mechanisms. Google discloses only the demo- provided by Facebook and Google to assess what information ad
graphic segments targeted by the advertisers. Facebook re- libraries can provide.
ports only on the demographic segments that saw specific
content. Hence, none of the platforms reveal full targeting 5.1.2 Reverse engineering the platforms’ targeting algorithms. Next,
and distributional parameters of ads. we attempt to reverse engineer the platforms’ targeting algorithms.
We do that by creating advertising accounts on Facebook and
Google, and evaluating the cost and impression estimates for hy-
pothetical ads that mimic the targeting criteria of original political
4 DATA ads that ran on the platforms. If the cost/impression estimates for
We create a large scale dataset of political ads and content for the hypothetical ads deviate significantly from the reported ranges
Facebook, Google (including YouTube), and TikTok. The dataset for the ads that did run, we assume that advertisers used additional
contains more than 850,000 ads on Facebook, Google, and YouTube targeting options.
and 2,7 million political videos on TikTok. We focus on content Specifically, we create four different dummy ads to investigate
that was created up to two months prior to the US Elections. For the relationships in more detail. On Google, our dummy ad targets
ads, we collect who sponsored them, what specific targeting param- the whole of the United States and to all available genders and ages,
eters & audience characteristics were used to distribute them, as and we calculate the upper and lower impressions it will generate
well as their reach in terms of impressions and the corresponding for a budget varying from $10, to $1,000,000. We do the same for an
cost. We also crawl the Facebook & Google ad libraries to locate ad on YouTube targeting Pennsylvania and females between 25-34.
moderated ads. For TikTok, we generate a list of political hashtags, For Facebook we target one ad at the whole of the United States
which we use to crawl videos from the platform. We also use a list and all genders and ages. Lastly, our other dummy ad on Facebook
of political influencers & Hypehouses (40 democrats, 56 republi- targets California, and females of all ages. Based on the reported
cans), and download all their videos. For each video, we also collect upper and lower values, we interpolate cost and impressions and
the corresponding engagement metrics in terms of views, likes, calculate an area of plausibility. If an ad in the ad libraries with
How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok

Table 2: Overview of the collected dataset. We provide information about how we obtained the data, for which periods, what
attributes, and what post-processing we performed. A more in detail explanation can be found in appendix B.

Facebook Google Tiktok


Source FB ad library Google ad archive Website
Collection method Ad library API Crawled the ad archive Crawled the website
(All) videos by creators who had trending
Political ads placed through Google ad services
content related to political hashtags
Ads related to politics by advertisers who spent by advertisers who spent at least $100k
Scope Videos created by popular influencers and
at least $100k (includes ads appearing in Google search,
hypehouses engaged in political
third party websites, and on YouTube.)
campaigning (40 dem; 56 rep)
Sep 1 - Nov 4 (60 days preceding the election)
Sep 1 - Oct 15 (limited by technical
Date range No ads in the week preceding the Sep 1 - Nov 4 (60 days preceding the election)
restrictions; see Appendix B)
election (FB enforced ban)
For each ad:
For each creator:
Sponsored or not
For each ad: Account description
Who sponsored it
Content # followers
Content
Attributes Upper/lower bounds of generated impressions General popularity
Audience data
Upper/lower bounds of generated cost For each video:
Upper/lower bounds of generated impressions
Age, gender, location targeting criteria # views, likes, shares
Upper/lower bounds of generated cost
description
Age, gender, location stats of who saw the ad
749,556 ads 117,607 ads
2,690,923 videos
Dataset size 803 advertisers 490 advertisers
Over 61,000 creators
(65% of political ads in the specified period) (83% of political ads in the specified period)
Moderated Crawled ads in dataset to look for FB’s Crawled the archive again in January and compared to Looked for Tiktok videos having
ads & content: method removal flag initial crawl to locate missing ads an election-related’ warning
9,735 ads (content available through initial crawl)
Moderated 8,635 ads 243,440 TikToks having an
451 additional ads (not in initial crawl;
ads & content: size 253 advertisers election related warning
content unavailable)
Extract text using Google Cloud Vision and speech-to-text APIs
Post- Match advertisers to registered political entities in FEC database Manually categorized HypeHouses based
processing Match advertisers to records in FollowTheMoney to categorize organization on organization type and content type
type and type of content created

Table 3: Overview of the different methods we use for evaluating platforms’ given the evaluative criteria we developed. We
use a uniform heading scheme across methods & results to efficiently connect the two sections.

Method Platform Evaluative Criterion Section in methods & results


1. Quantification and qualitative evaluation of the
Facebook • Do the disclosures meet
information provided in the ad libraries by Assessing information in the ad libraries
& Google self described objective (A)?
assessing Biden’s and Trump’s ad campaigns
2. Reverse engineering the platforms’ targeting
Facebook • Do the disclosures meet Reverse engineering the platforms’
algorithms by placing dummy ads
& Google self described objective (A)? targeting algorithms.
on the advertising networks
3. Regression modeling connecting targeting &
Facebook • How do disclosures Connecting targeting & audience
audience characteristics to ads’ cost per
& Google compare to broadcast (B)? characteristics to ads’ cost per impression.
impression
4. Identification of ad instances that their targeting
options on Google match with ad instances’
Facebook • Do the platforms disclose Crossplatform comparison of targeting
distributional information on Facebook, assesing how
& Google all that they know (C)? parameters & audience characteristics.
algorithms deliver ads based on specific targeting
choices of the advertisers.
• Do the disclosures meet
6. Locating and quantifying the prevalence and Characterizing the magnitude and
Facebook self described objective (A)?
reach of specific ads that were partially effectiveness of moderation for Facebook
& Google • Do the platforms disclose all
moderated by the platforms & Google
that they know (C)?
7. Regression modeling connecting TikTok video • Do the platforms disclose all Connecting TikTok video properties to
TikTok
properties and its moderation that they know (C)? their moderation.

the same targeting options falls within this area, it suggests that way we can assess information quality in the ad libraries. One
the advertisers actually used these targeting options. In case that notable limitation of this approach is that we ran these dummy ads
an ad falls outside the area, it suggests that ads might have used at a different time period from the political ads we examined. As a
additional targeting options that platforms did not disclose. In this
Papakyriakopoulos et al.

result, the analysis is illustrative of the technique and should not Table 4: Demographic distribution of content by candidate
be taken as definitive. on Facebook & Google. For Google we show the targeting
choices of advertisers. For Facebook we report who was
5.1.3 Connecting targeting & audience characteristics to ads’ cost shown an ad (audience characteristics).
per impression. To further explore how algorithms distribute po-
litical content, we investigate the sensitivity of cost/impressions Google Facebook
ratios to different audience characteristics and targeting properties. Distribution Strategy Biden Trump Biden Trump
We create models that analyze ads generated by the Biden and Age 3% 0% 34% 31%
Trump campaigns, and uncover how ad specific properties link to Gender 1% 0% 6% 7%
ads’ distribution. For Facebook, we create a linear regression model zip Code/County 17% 19% - -
State 81% 69% 87% 96%
that has as dependent variable the cost/impressions ratio, and as Age & Gender & zip Code/County 0% 0% - -
independent the ratio of individuals for each state that viewed an Age & Gender & State 1% 0% 6% 7%
ad, the ratio of individuals that were either male or female and
belonged to the age buckets 18-24, 25-34, 35-44, 45-54, 55-65,65+, variables the likes, shares, comments, and views it generated, the
and whether the ad was placed by the Biden or Trump campaign. average amount of likes the video’s author collected, as well as the
Since the Google Ad Archive aggregates cost and impression into presence of three election-related (#biden,#trump, #vote) and three
intervals, we create an ordinal logistic regression model that has de- non-election related (#blm, #abortion, #gun) hashtags. Furthermore,
pendent variable the cost of an ad, and as independent the generated we calculate the ratio of political videos flagged for each user in
impressions, whether the ad was text (ad on google search), image our dataset. Based on the results of both analyses, we can uncover
(ad on third party affiliates), or video (YouTube ad), the targeted features that constituted algorithmic content moderation.
genders (male, female), different age groups (18-24, 25-34, 35-44,
5.2.2 Characterizing the magnitude and effectiveness of moderation
45-54, 55-65, 65+), the magnitude of region targeting (USA, state
for Facebook & Google. For Facebook, Google, and YouTube, our
level, county level, zipcode level), and whether the ad was placed by
investigation of moderation practices examines the ad libraries to
the Biden or Trump campaign. Based on located associations, we
document how many ads were moderated and how many individu-
uncover factors that shape the algorithmic distribution of content.
als saw ads that were flagged. In addition, we manually reviewed
5.1.4 Crossplatform comparison of targeting parameters & audi- a set of 200 moderated ads for each platform, which allows us to
ence characteristics. To understand what full disclosures can tell qualitatively understand features of the moderation process. Fur-
about the algorithms that distribute political advertisements, we thermore, based on the set of moderated ads in our sample, we
use as data the cross-platform tactics of advertisers. Following the locate other ads that contained the same content, but were not
principles of personalized advertising, we make the strong assump- moderated. We quantify their distribution on the platforms, and
tion that advertisers would have targeted specific demographics assess the robustness and degree of explainability of the moderation
with the same content across platforms, in order to maximize their process. These three steps allow us to uncover patterns about who
influence potential. Therefore, we identify 35 unique image ad de- was moderated, why, and how effective was this moderation.
signs created by Biden & Trump that correspond to 12,448 unique
ad placements on Facebook and 3,055 ad placements on Google. 6 RESULTS
Similarly, we identify 72 unique video ad designs that correspond
to 13,840 ad placements on Facebook and 4,383 ad placements on 6.1 Political content distribution
Google. We then identify ad instances that their targeting options 6.1.1 Assessing information in the ad libraries: Platforms provide
on Google match with ad instances’ distributional information on limited insights about how campaigns used their targeting tools. Table
Facebook, and assess how algorithms deliver ads based on specific 4 presents the demographic distribution of Biden and Trump ads
targeting choices of the advertisers. We do so by focusing on two on Google and Facebook. This superficial view seems to suggest
sets of ads, namely YouTube ads placed by the Biden campaign to that advertisers rarely resorted to micro-targeting, and instead
all genders and image ads placed by the Trump campaign to all applied broad criteria to target general segments of the society. For
available age groups. In this way, we uncover how consistent is Google, which publishes targeting data, we see that both Biden
Facebook’s algorithmic distribution. and Trump appear to infrequently use the platform’s fine-grained
demographic targeting (except for targeting by state). Facebook,
5.2 Moderation of Political advertising and provides audience characteristics of individuals who saw the ads,
content rather than targeting choices, but the pattern is similar.
Yet, a closer look of the actual content distributed on the plat-
We analyze platform tactics in algorithmic ad & content moderation
forms contradicts this superficial view. For example, we used algo-
based on each platform’s policies.
rithmic tools [47] and manual coding, to detect all ads placed by
We use two methods to evaluate TikTok’s moderation of election
Biden and Trump in the Spanish language. On Google, the majority
related content.
of the 1724 ads we located that were in Spanish language had no
5.2.1 Connecting TikTok video properties to their moderation. Our demographic targeting and were sent to geographies that did not
second technique investigates TikTok’s use of warning labels re- have large Hispanic populations. Even for zip code targeted ads,
lated to the U.S. elections that uses a logistic regression model the percentage did not exceed on average 30% on Google and 12%
to predict whether a video was flagged. We use as independent on YouTube. In other words, the campaigns potentially used some
How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok

undisclosed contextual targeting criteria to place the ads. On Face- while the cheapest impressions were generated in the states of
book, we also located 626 ads in Spanish, but since the ad library Idaho, Arizona, and Mississippi. Interestingly, ads that Biden placed
provides distributional data only at the state level, we were not able were overall more expensive by impression compared to those of
to evaluate how they were targeted. Trump.
On Google and YouTube, as with Facebook, there was a difference
6.1.2 Reverse engineering the platforms’ targeting algorithms. Sim-
when targeting different age demographics. Targeting individuals
ilarly, our reverse engineering of the platforms’ tools evidence
between the ages of 18-24 was by far the cheapest, while the most
indicates that the campaigns used undisclosed targeting strategies.
expensive targeting groups were people aged between 25-34. The
Figure 3 presents how the distribution of the actual ads in our sam-
more location-specific the targeting, the more expensive was the
ple falls within the retrieved cost-per-impression boundaries. We
ad placement, with zip code targeted ads being the most expensive
find that the reverse engineering data do not always correspond
and US-general ads being the cheapest. In contrast to Facebook, the
to the targeting/distributional information provided in the ad li-
ads that Trump placed were more expensive over those placed by
braries. For Google, this discrepancy is small, when looking at ads
Biden.
distributed over the United States across all genders and ages, with
Comparing these results to the broadcast regulations we observe
only about 2% of the ads reported in the ad libraries following out
multiples instances of a lack of rate parity between campaigns
our calculated boundaries. This discrepancy is significantly larger
and across different strategies. Of course, these rate differences
when comparing YouTube ads placed in Pennsylvania to Females
can be attributed to multiple factors, such as additional contextual
between 25 and 34, with the disagreement reaching 13%. However,
targeting criteria, which where not provided by Facebook or Google,
the true discrepancy may be much larger because Google uses very
as well as further information about how their algorithms distribute
large reporting buckets for costs and impressions, as Figure 3 re-
and price content. But it is worth noting that the potential for a
veals. By way of illustration, an ad is assigned the same value of
broadcaster to favor one campaign over another led to the FCC
≤ 10,000 impressions and ≤$100 costs whether it is shown to 500
rules on parity between campaigns and commercial advertisers,
individuals at a cost of $5 or 9,000 individuals for $90.
to ensure that the broadcaster was acting appropriately. Another
We locate similar discrepancies when reverse-engineering the
issue is that the algorithmic promotion and demotion of content by
information for Facebook. For ads placed to Females of all ages in
the platforms runs into the concern that the intermediary might
California, the disagreement between cost/impression data from our
covertly limit the reach of candidate-sponsored ads. But our analysis
analysis and from the ad libraries is 14%. For ads placed in the United
shows that there are unexplained artifacts in the distribution that
States generally, across all genders and ages, the disagreement
can be tied back to the algorithmic choices of the platforms.
exceeds 27%. These results illustrate that the available information
ad libraries provide is not sufficient to understand how political
6.1.4 Cross-platform comparison of targeting parameters & audi-
advertisers used the platform to distribute political messages. Even
ence characteristics: Full disclosures reveal properties of algorithmic
if the detected discrepancies are a result of specific undisclosed
distribution. Building on our results from the prior sections, we
parameters that influence targeting (e.g. auction system, time of
evaluate what the cross-platform tactics of advertisers can reveal
placed ad, etc.), the disclosures do not meet the platforms’ self-
about how content was algorithmically distributed to the public.
described objective of making political advertisers accountable.
We do this by pairing ads that were shown on Google and Facebook
Although the ad libraries provide information about the content of
so that we obtain data about targeting criteria from Google and the
ads, they do not allow the public to understand the exact segments
actual distribution from Facebook.
of the society advertisers wanted to target, the price they paid, or
Figure 4 shows how two specific sets of ads on Facebook were
how algorithms transformed these intentions to a specific content
distributed across genders and age-groups respectively, given that
distribution.
they were not targeted to specific gender or age subgroups on
6.1.3 Connecting targeting & audience characteristics to ads’ cost per Google. For video ads placed by Biden, which were not gender
impression: Disclosures and targeting algorithms fall short compared targeted, we find that they were distributed unequally among males
to existing broadcasting policies. Our regression analysis results (ta- and females. On average, ads were shown 45% to males, and 53% to
bles 10 & 11, appendix) reveal specific shortcomings for algorithms females, while the standard deviation of the distribution for each
and platform disclosures, both for Facebook and Google, since we gender was 12%. This discrepancy about the ads can be potentially
discover that there was no parity in the cost of ads between advertis- attributed to the gender demographics that used Facebook in the
ers. We also see that different demographic targeting and audience US in 2020, which was 54.5% for females and 43.5% for males [45].
characteristics resulted in different ad placing cost. By analyzing image ads for Trump, that did not have any limi-
For Facebook, there was no difference in the number of impres- tations in age targeting, we uncover similar patterns. Focusing on
sions per dollar an ad generated, regardless of the gender of individ- three age-groups, we find that on average an ad would be shown
uals who saw them. Nevertheless, age was a factor associated with at a rate of 6% to individuals between 18-24, at a rate of 16% to
different amounts of generated impressions. Specifically, placing individuals between 25-34, and at a rate of 21% to individuals be-
ads to older populations (> 55) and very young populations (18-24) tween 45-54. In contrary to gender distribution, these values do
was significantly more expensive than placing ads to individuals not correspond to the Facebook user demographics in 2020, which
between 25 and 53. Furthermore, ads cost varied between different were 8% for individuals between 18-24, 13% for individuals between
states. For example, the most expensive impressions were found in 25-34, and 7% for individuals between 45-54. This means that algo-
the states of Massachusetts, Rhodes Island, and Washington DC, rithmic targeting resulted in a disparate distribution of ads across
Papakyriakopoulos et al.

Figure 3: We plot the generated cost per impression of ads in the ad-libraries that were (1) targeted to all genders & ages on
Google, (2) to Females, between 25-34 on YouTube, (3) were seen by all genders & ages in the US on Facebook, and (4) only by
females of all ages located in California on Facebook. For Facebook, lower & upper bounds are provided for the impressions.
For Google, lower & upper bounds are provided for cost & impressions, given the extensive “bucketing” of the parameters
performed by the ad libraries when reporting them, which are denoted in the figures with boxes. Points represent the median
value of the boxes. We compare the generated cost-per impression of ads with the cost-per impression of a set of dummy
ads we placed on the platforms with the exact same targeting parameters & audience characteristics. Black lines represent
the upper and lower boundaries of an ad’s cost-per-impression as we extracted them from the dummy ads. We label an ad
placement as “plausible targeting”, when the ad cost-per-impression overlaps with the one we calculated, denoting that we
can assume that the ad library provides all relevant targeting parameters/audience characteristics about an ad. Similarly, an
placement labeled as “unexplainable targeting” represents an ad whose cost-per-impression is outside the upper and lower
reach values that we calculated, meaning that potentially platforms do not disclose full information about the distribution of
the ad.

age groups. These findings provide additional support for the pub- some that were not. On Facebook, this appeared in 51% of the mod-
lic interest in further understanding how the distribution of the erated ads. For Google, the amount of these ads was 75%, while for
campaigns ads was affected by the platforms’ algorithmic choices. YouTube it was 65%. In total, we found 11,549 ad instances across
platforms that were not moderated, although at least one identi-
6.2 Moderation of political advertising and cal to them was removed. In median for Google, non-moderated
content ad instances resulted in the generation of 1.1 billion impressions,
Focusing on moderation, we evaluate how platforms disclose their compared to 700 million for the moderated ones. For YouTube, non-
choices and practices when removing political ads (Facebook, Google, moderated ad instances generated 1.2 billion impressions, while
YouTube), and handling political content (TikTok). moderated instances 900 million. For Facebook, these numbers
were 440 million and 200 million respectively. These results suggest
6.2.1 Characterizing the magnitude and effectiveness of moderation that inconsistent ad moderation had serious implications, since
for Facebook & Google: Unexplainable ad moderation practices. Our both moderated ads resulted in a significant amount of impres-
study documents how difficult it is to understand how the platforms sions, and also their unmoderated counterparts resulted in an even
apply their moderation policies to political ads. higher diffusion of problematic content. Furthermore, the platform
On Google and Facebook we see that a large number of ads ad libraries do not provide any explanation why and when an ad
across a wide range of advertisers were removed (table 8, appen- was removed, therefore, is not possible to assess the reasons for
dix). Google removed 13.3% of the political ads from its network, these discrepancies. Especially for Facebook, we find that even the
YouTube removed 4.5%, while Facebook only 1.2%. Despite their classification of ads as removed was inconsistent. When manually
removal, these ads generated a significant amount of user impres- reviewing a random sample of 200 moderated ads, we found that
sions. Furthermore, these decisions affected a significant number 35 were no longer labeled as removed. These results raise questions
of advertisers. Google removed at least one ad from 256 advertisers about the efficacy, robustness, and explainability of the moderation
(18% of all), YouTube at least one ad from 307 advertisers (22% of practices.
all), and Facebook from 266 advertisers (31% of all).
Figure 5 illustrates how different instances of the same ad design 6.2.2 Connecting TikTok video properties to their moderation. Fo-
were moderated. For each platform we find a significant number of cusing on content moderation on TikTok, we find 505,062 videos
ads that contained some instances where the ad was removed, and that contained at least a hashtag from our curated hashtag list.
How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok

6.3 Consequences of different definitions of


"political ads"
In evaluating what self-disclosures reveal about advertisers and
algorithms, we found that each platform’s defined political ads
differently, which made it difficult to make comparisons across
platforms. As shown in Figure 7 (Appendix), each platform has a
different definition of what is a political ad, which dictates whether
and to what extent an advertiser will be allowed to use platforms’
targeting tools and whether their campaigning information will
be included in the ad libraries. Google bases its definition solely
on the identity of the advertiser, while Facebook takes a more ex-
pansive view that also captures advertisers running issue-based
ads. As a result, we see that the Facebook ad library has a higher
proportion of advertisers who are not registered with the FEC as
compared to the Google Ad Archive (45% vs 25%). In contrast, the
narrower definition of political ads on Google does not allow us to
see the targeting activities of a broad set of advertisers, including a
significant amount of NGOs (23% on Facebook vs 8% on Google). In
Citizens United v. FEC, the United States Supreme Court upheld the
disclosure requirements for issue-based ads on broadcast stations
Figure 4: Distribution of ads on Facebook, that were matched
because "[e]ven if the ads only pertain to a commercial transac-
with their targeting parameters on Google. We show how
tion, the public has an interest in knowing who is speaking about
Biden ads targeted to both genders were distributed on Face-
a candidate shortly before an election." That same public interest
book (up). We also show how Trump’s image ads that were
here should require disclosures about all ads involving “election-
targeted to the whole age spectrum were distributed among
eering communications” [11]. Platforms should not be allowed to
three age groups. For Gender, we find that ads distribution
strategically limit what information becomes available to the public.
had statistical parity given the user demographics on the
In the case of TikTok, we see that while direct political advertis-
platform. For age, we find that ads were not distributed in
ing is prohibited, there are other forms of political campaigning on
rates that correlate with the Facebook’s age demographics
the platform. We looked more closely at the 96 influencers we iden-
in the US.
tified to uncover links to political entities, especially those who had
registered with the FEC. In our sample, we found six influencers
directly linked to political NGOs, two to PACs, four to political
merchandise, while 18 were asking for donations. Given the im-
portance of this type of influencer advertising on of social media
From these, 243,440 videos (48.2%) were labeled with a U.S. election
platforms, future work should investigate the funding mechanisms
warning. We find that election related content was a driver for this
for influencer-driven political advertising in more detail [41].
moderation. Figure 6-Left (Appendix) shows the logistic regression
results for predicting whether a TikTok video containing politi-
cal hashtags has been labeled or not. We find that election related 7 POLICY RECOMMENDATIONS
hashtags (Biden/Trump/Vote) were strong predictors for content As our work demonstrates, there is an urgent need to standardize
moderation, while non-election related hashtags were not as much disclosures across platforms. In a white paper, Edelson et al. [15],
(e.g. BLM, abortion, gun). Similarly, video views and likes seem not outline a useful technical standard for universal digital transparency.
to have been associated with the probability of a video being labeled, They propose the creation of a public repository and a technical
while both video creator popularity and sharing/commenting were. standard that will make advertisers and platforms accountable for
The more popular the content creator, the less likely that a video disclosing how they distribute content to the public. Our study
of theirs was going to be labeled. Furthermore, the more organic builds on that work by demonstrating how the current voluntary
interactions a video gathered in terms of shares and comments, the disclosures about political ads by the platforms fall short of provid-
more likely it was to be labeled. Analyzing the distribution of mod- ing meaningful insights, and why such a standard is necessary for
erated videos by author (figure 6,right), we find a discrepancy in the political advertising.
videos labeled by user, with some videos containing election-related Specifically, we recommend that the FEC (or alternative federal
hashtags of a user being flagged, and some not. These results illus- agency) creates a cross-platform database that provides informa-
trate that although the content of videos was strongly associated tion about advertisers and platforms. Such a database should be
with warning placement, there is a need for additional information accompanied by a standard that provides a universal identifier for
about TikTok’s practices. For example, it would be important to each advertiser, so their campaign activities can be tracked across
know the exact definition of “election-related” on the platform, or different platforms. In our study, the mapping of entities was a
whether features such as video description or author information time consuming and complicated process. Furthermore, the stan-
were taken into consideration for labeling content. dard should clearly define what is considered "political," so that
Papakyriakopoulos et al.

Figure 5: Comparison of different instances of moderated ads across platforms. The light blue bars show how many instances
of a single ad were moderated, and maroon bars show how many instances of the same ad were not. Results suggests an
inconsistent moderation of content across platforms, with some instances of the same ad being removed and some others not.

there is an equal baseline for disclosures across the platforms. Such Our study demonstrated the existence of strong barriers to public
definition should be carefully chosen, given the complexity and understanding advertisers’ tactics. We also found evidence that
non-triviality of the concept [52]. The same applies for available platforms’ disclosures falls well short of what is required under
targeted information. Platforms should disclose both the full and the law for broadcasters. Finally, we showed why we meed more
detailed targeting and distribution parameters (audience character- accurate and comprehensive disclosures to understand and robustly
istics) of ads, since anything less than this results in an incomplete evaluating targeting tools and algorithmic moderation.
and inefficient evaluation of advertisers’ campaigns and platforms’
decisions. Finally, the full disclosures of targeting criteria can facili- 9 ACKNOWLEDGMENTS
tate understanding the specific campaigning techniques attempted
This study was supported by a Princeton Data Driven Social Science
to influence voters.
Initiative Grant. We thank Laura Edelson, Andy Guess, and Matt
We believe that the creation of a standard and repository should
Salganik for constructive feedback on the final manuscript. We
be accompanied with detailed regulations that protect the public
are also grateful for the early feedback from the research seminar
and ensures fairness among political advertisers. Drawing from
run by Princeton’s Center for the Study of Democratic Politics and
the broadcasting regulations, we observed an apparent difference
later feedback from the MPSA’22 panel on political marketing. We
in rates between different advertisers. Platforms should disclose
would also like to thank Eli Lucherini for support in data analysis,
how they algorithmically control the price and reach of content,
Ashley Gorham for conceptual contributions in the early stages of
whether platforms deliberately or unintentionally limit the reach
the project, Juan Carlos Medina for a part of the data collection, and
of candidate-sponsored ads, how they ensure parity, and provide
Milica Maricic for support in classifying political ads and creating
information that can reveal whether advertisers or platforms target
the website of the project.
segments of the society in a biased way. Similarly, since we discov-
ered significant inconsistencies in ad moderation, we argue that
platforms should be obligated to disclose when and why an ad was REFERENCES
[1] Muhammad Ali. 2021. Measuring and Mitigating Bias and Harm in Personalized
removed, and make the removed content available for review in a Advertising. In Fifteenth ACM Conference on Recommender Systems. 869–872.
neutral repository. This can make platforms accountable for their [2] Muhammad Ali, Piotr Sapiezynski, Miranda Bogen, Aleksandra Korolova, Alan
decisions and algorithms, and can ensure the fair moderation of Mislove, and Aaron Rieke. 2019. Discrimination through optimization: How
Facebook’s Ad delivery can lead to biased outcomes. Proceedings of the ACM on
content among advertisers. Human-Computer Interaction 3, CSCW (2019), 1–30.
Lastly, the broad reach of the influencers that we document [3] Muhammad Ali, Piotr Sapiezynski, Aleksandra Korolova, Alan Mislove, and
on TikTok highlights the need for regulations that require disclo- Aaron Rieke. 2021. Ad Delivery Algorithms: The Hidden Arbiters of Political
Messaging. In Proceedings of the 14th ACM International Conference on Web Search
sures about their sources of funding and other activities. And such and Data Mining. 13–21.
influencer-driven marketing is also present on Google and Face- [4] Athanasios Andreou, Giridhari Venkatadri, Oana Goga, Krishna Gummadi,
Patrick Loiseau, and Alan Mislove. 2018. Investigating ad transparency mecha-
book properties, but is not disclosed in their ad libraries. Because nisms in social media: A case study of Facebook’s explanations. In NDSS 2018-
Federal Trade Commission’s endorsement guidelines are designed Network and Distributed System Security Symposium. 1–15.
for commercial transactions and not political campaigning, this [5] Sebastian Benthall and Jake Goldenfein. 2021. Artificial Intelligence and the
Purpose of Social Systems. In Proceedings of the 2021 AAAI/ACM Conference on
is an area where the Federal Election Commission would need to AI, Ethics, and Society. 3–12.
develop comprehensive disclosure requirements. [6] Published by Statista Research Department and Jan 14. 2021. Digital political
ad spend in the U.S. 2020. https://www.statista.com/statistics/309592/online-
political-ad-spend-usa
[7] Jilin Chen, Eben Haber, Ruogu Kang, Gary Hsieh, and Jalal Mahmud. 2015. Mak-
8 CONCLUSION ing use of derived personality: The case of social media ad targeting. In Ninth
International AAAI Conference on Web and Social Media.
In this study, we evaluated what the platform disclosures could [8] Federal Election Commision. 2022. Campaign finance data. https://www.fec.gov/
tell the public about their role in the distribution and moderation [9] Federal Communications Commission. 2022. FCC adopts updated political pro-
of political advertising. By taking the political ad libraries and gramming and record-keeping rules. https://www.fcc.gov/document/fcc-adopts-
updated-political-programming-and-record-keeping-rules
platforms transparency mechanisms seriously, we undertook large [10] Federal Communications Commission. 2022. Political programming. https:
scale data analysis of political ads on Facebook, Google, and TikTok. //www.fcc.gov/media/policy/political-programming
How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok

[11] Federal Election Committee. 2022. Making electioneering communica- Association.


tions. https://www.fec.gov/help-candidates-and-committees/other-filers/ [39] Paddy Leerssen, Tom Dobber, Natali Helberger, and Claes de Vreese. 2021. News
making-electioneering-communications/ from the ad archive: how journalists use the facebook ad library to hold online
[12] United States Congress. [n. d.]. Text - S.1989 - 115th congress (2017-2018): Honest advertising accountable. Information, Communication & Society (2021), 1–20.
ads act. https://www.congress.gov/bill/115th-congress/senate-bill/1989/text [40] Taylor Lorenz. 2020. The political pundits of TikTok. https://www.nytimes.com/
[13] Zoe Corbyn. 2012. Facebook experiment boosts US voter turnout. Nature News 2020/02/27/style/tiktok-politics-bernie-trump.html
(2012). [41] Arunesh Mathur, Angelina Wang, Carsten Schwemmer, Maia Hamin, Brandon M
[14] Philippe R Dubois, Camille Arteau-Leclerc, and Thierry Giasson. 2021. Micro- Stewart, and Arvind Narayanan. 2020. Manipulative tactics are the norm in
Targeting, Social Media, and Third Party Advertising: Why the Facebook Ad political emails: Evidence from 100K emails from the 2020 US election cycle.
Library Cannot Prevent Threats to Canadian Democracy. (2021). Epub ahead of print 5 (2020).
[15] Laura Edelson, Jason Chuang, Erika Franklin Fowler, Michael Franz, and Travis N [42] J Nathan Matias, Austin Hounsel, and Nick Feamster. 2021. Software-Supported
Ridout. 2021. Universal Digital Ad Transparency. Available at SSRN 3898214 Audits of Decision-Making Systems: Testing Google and Facebook’s Political
(2021). Advertising Policies. arXiv preprint arXiv:2103.00064 (2021).
[16] Laura Edelson, Shikhar Sakhuja, Ratan Dey, and Damon McCoy. 2019. An [43] Juan Carlos Medina Serrano, Orestis Papakyriakopoulos, and Simon Hegelich.
analysis of united states online political advertising transparency. arXiv preprint 2020. Exploring political ad libraries for online advertising transparency: lessons
arXiv:1902.04385 (2019). from Germany and the 2019 European elections. In International conference on
[17] David Erickson. 2016. US political ad spending by format, 2008- social media and society. 111–121.
2016. https://trends.e-strategyblog.com/2016/06/09/us-political-ad-spending- [44] Silvia Milano, Brent Mittelstadt, Sandra Wachter, and Christopher Russell. 2021.
by-format/27038/ Epistemic fragmentation poses a threat to the governance of online targeting.
[18] Facebook. [n. d.]. Facebook open research and transparency home. https: Nature Machine Intelligence 3, 6 (2021), 466–472.
//fort.fb.com/ [45] NapoleonCat. [n. d.]. Facebook users in United States of America - decem-
[19] Facebook. 2021. About Ads About Social Issues, Elections or Politics. https: ber 2020. https://napoleoncat.com/stats/facebook-users-in-united_states_of_
//www.facebook.com/business/help/167836590566506?id=288762101909005 america/2020/12/
[20] Jessie Finocchiaro, Roland Maio, Faidra Monachou, Gourab K Patro, Manish [46] Gina Neff. 2020. From bad users and failed uses to responsible technologies: A
Raghavan, Ana-Andreea Stoica, and Stratis Tsirtsis. 2021. Bridging machine call to expand the AI ethics toolkit. In Proceedings of the AAAI/ACM Conference
learning and mechanism design towards algorithmic fairness. In Proceedings of on AI, Ethics, and Society. 5–6.
the 2021 ACM Conference on Fairness, Accountability, and Transparency. 489–503. [47] Jeroen Ooms. 2020. cld2: Google’s Compact Language Detector 2. https://CRAN.R-
[21] Erika Franklin Fowler, Michael M Franz, Gregory J Martin, Zachary Peskowitz, project.org/package=cld2 R package version 1.2.1.
and Travis N Ridout. 2021. Political advertising online and offline. American [48] OpenSecrets. 2022. https://www.followthemoney.org/
Political Science Review 115, 1 (2021), 130–149. [49] Nathaniel Persily and Joshua A Tucker. 2020. Social Media and Democracy: The
[22] Erika Franklin Fowler, Michael M Franz, and Travis N Ridout. 2018. Political State of the Field, Prospects for Reform. Cambridge University Press.
Advertising in the United States. (2018). [50] Jean-Christophe Plantin, Carl Lagoze, Paul N Edwards, and Christian Sandvig.
[23] Erika Franklin Fowler, Michael M Franz, and Travis N Ridout. 2020. Online 2018. Infrastructure studies meet platform studies in the age of Google and
political advertising in the United States. Social Media and Democracy: The State Facebook. New media & society 20, 1 (2018), 293–310.
of the Field, Prospects for Reform (2020), 111–138. [51] Michael Schudson. 2020. The shortcomings of transparency for democracy.
[24] Geoffrey A. Fowler. 2021. How politicians target you: 3,000 data points on American behavioral scientist 64, 11 (2020), 1670–1678.
every voter, including your phone number. https://www.washingtonpost.com/ [52] Jaime E Settle. 2018. Frenemies: How social media polarizes America. Cambridge
technology/2020/10/27/political-campaign-data-targeting/ University Press.
[25] Aline Shakti Franzke, Anja Bechmann, Michael Zimmer, Charles Ess, et al. 2020. [53] Márcio Silva, Lucas Santos de Oliveira, Athanasios Andreou, Pedro Olmo Vaz de
Internet research: Ethical guidelines 3.0. Association of Internet Researchers 4, 1 Melo, Oana Goga, and Fabrício Benevenuto. 2020. Facebook ads monitor: An
(2020), 2056305118763366. independent auditing system for political ads on facebook. In Proceedings of the
[26] Avijit Ghosh, Giridhari Venkatadri, and Alan Mislove. 2019. Analyzing political Web Conference 2020. 224–234.
advertisers’ use of Facebook’s targeting features. In IEEE workshop on technology [54] Josh Simons and Dipayan Ghosh. 2020. Utilities for democracy: Why and how the
and consumer protection (ConPro’19). algorithmic infrastructure of Facebook and Google must be regulated. Brookings
[27] Tarleton Gillespie. 2010. The politics of ‘platforms’. New media & society 12, 3 website (2020).
(2010), 347–364. [55] Daniel Susser and Vincent Grimaldi. 2021. Measuring Automated Influence:
[28] Google. [n. d.]. Political advertising on Google. https://transparencyreport. Between Empirical Evidence and Ethical Values. In Proceedings of the 2021
google.com/political-ads/region/US AAAI/ACM Conference on AI, Ethics, and Society. 242–253.
[29] Andy Guess, Kevin Aslett, Joshua Tucker, Richard Bonneau, and Jonathan Nagler. [56] TikTok. [n. d.]. Community Guidelines. https://www.tiktok.com/community-
2021. Cracking open the news feed: Exploring what us Facebook users see and guidelines-2022?lang=en#40
share with large-scale platform data. Journal of Quantitative Description: Digital [57] TikTok. 2021. Top ads: High-performing auction ads. https://ads.tiktok.com/
Media 1 (2021). business/creativecenter/inspiration/topads/pc/en
[30] Eitan D Hersh. 2015. Hacking the electorate: How campaigns perceive voters. [58] Vincent Toubiana, Arvind Narayanan, Dan Boneh, Helen Nissenbaum, and Solon
Cambridge University Press. Barocas. 2010. Adnostic: Privacy preserving targeted advertising. In Proceedings
[31] Steve Kovach. 2020. Facebook to ban new political ads in week before presidential Network and Distributed System Symposium.
election. https://www.cnbc.com/2020/09/03/facebook-to-ban-political-ads-in- [59] Briana Vecchione, Karen Levy, and Solon Barocas. 2021. Algorithmic Auditing
week-before-presidential-election.html and Social Justice: Lessons from the History of Audit Studies. In Equity and
[32] Daniel Kreiss. 2016. Prototype politics: Technology-intensive campaigning and the Access in Algorithms, Mechanisms, and Optimization. 1–9.
data of democracy. Oxford University Press. [60] Darrell M West. 2017. Air wars: Television advertising and social media in election
[33] Daniel Kreiss. 2016. Seizing the moment: The presidential campaigns’ use of campaigns, 1952-2016. CQ Press.
Twitter during the 2012 electoral cycle. New media & society 18, 8 (2016), 1473–
1490.
[34] Daniel Kreiss and Bridget Barrett. 2020. Democratic tradeoffs: Platforms and
political advertising. Ohio St. Tech. LJ 16 (2020), 493.
[35] Daniel Kreiss and Shannon C McGregor. 2018. Technology firms shape political
communication: The work of Microsoft, Facebook, Twitter, and Google with
campaigns during the 2016 US presidential cycle. Political Communication 35, 2
(2018), 155–177.
[36] Daniel Kreiss and Shannon C McGregor. 2019. The “arbiters of what our voters
see”: Facebook and Google’s struggle with policy, process, and enforcement
around political advertising. Political Communication 36, 4 (2019), 499–522.
[37] Sanne Kruikemeier, Minem Sezgin, and Sophie C Boerman. 2016. Political mi-
crotargeting: relationship between personalized advertising on Facebook and
voters’ responses. Cyberpsychology, Behavior, and Social Networking 19, 6 (2016),
367–372.
[38] Victor Le Pochat, Laura Edelson, Tom Van Goethem, Wouter Joosen, Damon
McCoy, and Tobias Lauinger. 2022. An Audit of Facebook’s Political Ad Policy
Enforcement. In Proceedings of the 31st USENIX Security Symposium. USENIX
Papakyriakopoulos et al.

A ETHICAL CONSIDERATIONS
A.1 Data selection
Facebook provides, through special agreements for researchers, access to the FORT dataset [18], which purports to contain more detailed
information about ad targeting on the platform as compared to the public library we analyzed. We decided not to use the FORT data for
two reasons. First, we wanted to focus on data that was available to the public at large. Second, at the time we conducted our analysis, the
platform did not provide us with appropriate assurances that we could use that data without any research & publication restrictions.

A.2 Privacy concerns


The analysis of the Facebook and Google data included only public information about ads and advertisers, as collected from the their APIs.
For TikTok, we crawled the platform and collected only the public meta-data of user-generated TikToks, and not the actual videos. For
security purposes, only one researcher of the project has access to the information, and will delete it upon completion of the study, as
proposed by ethical data collection & analysis guidelines [25].

B DATA
B.1 Facebook
Facebook’s Ads Library contains information about ads related to politics, credit, housing, and employment. The platform provides information
about whether an ad was sponsored or not, who sponsored it, its content, and data about the audience. Specifically, it provides a lower and
upper number of generated impressions and cost, as well as which user groups saw the ad in terms of age, gender2 , and location. Using the
Ads Library API service, we collected all political ads placed in the 60 days leading up to the election (September 1st to November 4th),
which tracks the legal definition of "electioneering communications," by advertisers who spent at least hundred thousand dollars. Our final
dataset consisted of 749,556 ads created by 803 advertisers, and we collected it in November 2020. This represented approximately 65 % of
political ads in the specified period. Facebook also enforced a ban for placing political ads during the week before the Elections [31], and
indeed we did not locate any new ads in our collected sample for the specific period.
To detect ads Facebook moderated, we crawled the ads in our dataset to locate on which Facebook placed a flag that specific that they
were removed. In total, we located the removal of 8635 ads from 253 advertisers on the platform.
For the ads in our collection that were in the form of image or video, we transformed them to text using the Google Cloud Vision and
Speech-to-Text APIs, to make them available for further statistical processing. Next, we queried the Federal Election Commission (FEC)
database [8], to investigate how many advertisers were registered as political entities. We also matched advertisers with their corresponding
records in the political tracking website FollowTheMoney [48], which classify them as Political Action Committees (PACs), Authorized
Campaign Committees, NGOs, State related entities, Corporation or Labor entities, or other entities. Based on the information of the platform,
we also categorized advertisers in respect to the content they created, i.e. whether they promoted a specific ideology or single issue, whether
they were promoting civil rights, they were general advertising agencies, they created policy related content, they created candidate or party
related content, they were selling merchandise, promoting the issues of government and state agencies, or they were asking individuals to
perform civic service (e.g. working in election administration).

B.2 Google & YouTube


Google’s ad archive contains information about political ads placed through the Google ad services. This includes ads that appeared in the
google search engine, on third party websites that use the services, and on YouTube. The archive provides information about the content of
an ad, a lower and upper number of generated impressions and cost, as well as which user groups in terms of age, gender, and location were
targeted by the advertiser. Like with Facebook, we crawled the ad archive and collected all political ads placed in the 2 months prior to
the election (September 1st to November 4th) by advertisers who spent again at least hundred thousand dollars. The final dataset contains
117,607 ads from 490 advertisers, which represents 83% of political ads placed during this period.
Unlike Facebook, Google removes ads that violate its terms & conditions, leaving only its meta-data visible in the archive. To locate the
content of ads that were removed, we systematically crawled the archive between September and November 2020, and once in January 2021.
By comparing the crawls we located 9735 Ads that were moderated by Google. Similarly, we located 451 Ads that were also moderated, but
we were not able to uncover their exact content, since they were removed prior to us collecting their original content. We transformed
all ads that were in the form of images and all YouTube videos into text using the Google Cloud Vision and Speech-to-Text APIs, to make
them available for further statistical processing. For the advertisers present in the dataset, we followed the same classification process as for
Facebook, searching for their presence in the FEC database, and coding their type and advertising content.

2 Gender is a spectrum. Nevertheless, both Facebook & Google use a binary classification of genders. We adopt this language for the specific analysis, but we disagree with this form
of classification.
How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok

B.3 TikTok
Formally, TikTok does not allow the placement of political ads. But we observed influencers engaged in political campaigning, who formed
so-called HypeHouses. HypeHouses are TikTok accounts managed by coalitions of political influencers, generating content supporting
specific candidates.
We started with a list of known influencers and HypeHouses [40] and by snowballing we collected other popular accounts that interacted
with them. This resulted in a final list of 40 Democratic and 56 Republican HypeHouses and political influencers. We then crawled the
HypeHouse videos between September 1st and October 15th. We had wanted to collect data through November 4th, but our access to
crawl the platform was restricted, as the platform changed its internal API structure. (Appendix, table 5). Next, we created a list of political
hashtags (Appendix, table 6) that included candidates’ names, election related issues such as mail-in ballots, and general political issues such
as abortion or gun laws. We searched for videos containing these hashtags. Because our TikTok crawl returns only trending content and not
all videos related to a hashtag, we identified video creators of the returned content and collected all of their videos for the same period as
above. Our final dataset contained 2,690,923 videos from more than 61,000 TikTok creators. For each creator, we obtained information such as
their account description, number of followers, and general popularity. For each video, we collected information about how many times they
were viewed, liked, and shared, as well as their description. For the HypeHouses, we reviewed the profiles to manually categorize them based
on whether they reported links to following entities: PACs, NGOs, politicians, media outlets and whether they were selling merchandise, or
were asking for donations. For the purpose of evaluating its moderation practices, we relied on TikTok returning information about whether
a video was an ad or was assigned a flag, such as being related to the US elections.

C FIGURES

Figure 6: Left: Forest plot of logistic regression model predicting whether a political TikTok video will be labeled with a
warning flag. Right: Ratio of political videos flagged by user on TikTok.

Figure 7: Overview of visible advertisers in the libraries by platform in the dataset. Facebook is depicted in blue, Google is
depicted in burgundy, and TikTok is depicted in grey. The bottom-right bar plot depicts the number of influencers linked to
political entities.
Papakyriakopoulos et al.

Table 5: We collected generated content from the following 96 Republican & Democratic influencers.

Influencers
Republican Democratic
nickvideos,thecjpearson,eliciawho,dylan.odin,redboyhickibilly,
machooch,theconservativevalues,tophertownmusic,samditzhazy,
bodittle,jsinnmusic,thebiasone,mommy_nikki,rheannonfae,
thecadelewis,chabella40,belessstupid,the.ghost88,imnotnatalie,
conservativeHypeHouse,o_rileyyyautoparts,kp.thepatriot,
virtualconnectors,professorross,save.america,democrat_me,
c.j._production,therepublicanHypeHouse,donaldtrumpteam,
mr.shaw7,docd12,electro_high,donthecon_and_associates,wadeslade,
republicanism,matt4186,therealbentaylor,albertojdejesus,
thedemHypeHouse,thatliberalgirl,leftistjayce,futurestatesmanalexander,
youngrepub,thescoop_us,jimjrpavv,thebadopinion,realjohndennis,
somepoliticaldude,theleewithnoname,heathergtv,shashaankvideos,
yourcity,c.jennings7152822,patriotfacts,dylanmaddentv,
thehumanrightsgroup,maya2960,theleftistdude,kaivanshroff,
frankynofingers,lamot11,kindall.k,gcnow,truthseeker5536,
maxwellblaine,typical_democrat,j0emorris,izuhhhhhbel,thealanvargas,
daddy_no_pc,daddy_no_pc2,megaamerican,zc_55,americanblondie,
spiicyboi7,lord_timothais,deerodx,bidenssunglasses,
thesavvytruth,therightlefty,christianwalk1r,matty.merica,
theprogressivepolicy,jbiii,liberalcorner,yaboihatestiktok,bidencoalition,
claytonkeirns,emmanuelharouno,the.rickytaylor,chadvideos,
therepublicangirlls,imtriggered,mattconvard,bobs_politics,
youngrepublicans45,

Table 6: Regexp expression by which we matched TikTok videos’ hashtags.

Hashtags
trump|biden|harris|fakenews|election|debate|maga|democrat|
republican|gun|libert|lgbt|conservative|politic|president|left|right|
vote|ballot|equality|kamala|bluewave|envelope|blm|blacklives|
alllives|dems|reps|settlefor|kag|alm |floyd|breonna|abortion|vax|
vaccine|factcheck|fakenews|aoc

Table 7: Top advertisers on Facebook and Google who are not registered at the Federal Election Commission (FEC).

Top Non FEC registered advertisers


Google Spent($) Facebook Spent($)
YES ON 22 - SAVE APP-BASED JOBS & SERVICES 7,122,400 Yes on Prop 22 5,382,534
CONSERVATIVE BUZZ LLC 5,849,400 WhatsApp 5,000,000
SAHAK NALBANDYAN 3,254,100 No On Proposition 23 4,302,798
NO ON 23 - STOP THE DANGEROUS & COSTLY
DIALYSIS PROPOSITION, A COALITION OF DIALYSIS 3,221,900 When We All Vote 4,014,869
PROVIDERS, NURSES, DOCTORS AND PATIENTS
THERESA GREENFIELD FOR IOWA, INC. 3,193,900 U.S. Census Bureau 3,658,788
NEWSMAX MEDIA INC 2,716,300 Democratic Governors Association (DGA) 2,884,356
EPOCH USA INC. 2,154,300 Voto Latino 2,670,929
ALLIANCE FOR A BETTER MINNESOTA ACTION FUND 1,229,800 Stop the Illinois Tax Hike Amendment 2,378,970
COALITION TO STOP THE PROPOSED
1,181,900 The Collective PAC 2,121,249
TAX HIKE AMENDMENT
JUDICIAL WATCH INC 1,140,700 One North Carolina 2,033,109

Table 8: Removed ads in our dataset for each platform, by how many advertisers and how many impressions they generated
prior to their removal.

Platform % removed reach Advertisers


150 mil. -
Google 13.3% 256 (18%)
1 billion
120 mil. -
YouTube 4.5% 307 (22%)
1 billion
Facebook 1.2% 200 million 253 (30%)
How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok

Table 9: Warnings placed on TikTok videos with at least one political hashtag.

Label Counts
Get info on the U.S. elections 243,440
Learn the facts about COVID-19 2,341
The action in this video could result in serious injury. 30

Table 10: Ordinal linear regression results for pre- Table 11: Linear regression results predicting the impres-
dicting the generated number of impressions for sion/cost ratio for ads placed by Biden & Trump on Facebook.
each advertisement of Biden & Trump on Google.
Variable Estimator St. Error Variable Estimator St. Error
Variable Estimator St. Error AL 7.53∗ (3.48) AK 4.13 (6.31)
$100-$1k 5.23∗∗∗ (0.05) AZ 1.22 (2.80) AR 16.94∗∗∗ (4.40)
$1k-$50k 9.71∗∗∗ (0.07) CA −1.10 (2.87) CO 8.56∗∗ (2.89)
CT −3.28 (4.27) DE −0.98 (6.68)
$50k-$100k 15.37∗∗∗ (0.22)
FL 3.43 (2.80) GA 2.99 (2.80)
>$100k 17.97∗∗∗ (0.25)
ID 24.48∗∗∗ (5.43) IL 3.50 (3.19)
Google Network 9.06∗∗∗ (0.09)
IN 9.19∗∗ (3.37) IA 2.19 (2.81)
YouTube 5.22∗∗∗ (0.07) KS 22.58∗∗∗ (4.81) KY 30.35∗∗∗ (4.12)
Male −0.01 (0.22) LA 6.57 (3.42) ME −0.47 (2.83)
Female 0.08 (0.22) MD 1.19 (3.43) MA −13.80∗∗∗ (3.44)
Age 18-24 2.04∗∗∗ (0.29) MI 3.41 (2.80) MN 0.93 (2.81)
Age 25-34 −1.07∗∗∗ (0.22) MS 22.91∗∗∗ (4.54) MO 19.84∗∗∗ (3.80)
Age 45-54 −0.88∗∗∗ (0.13) MT −1.43 (4.95) NE −2.11 (2.83)
Age 55+ −0.39∗∗ (0.14) NV −0.72 (2.81) NH −0.30 (2.89)
Zip code −0.50∗∗∗ (0.03) NJ 0.49 (3.54) NM −0.48 (5.64)
County −0.13∗ (0.06) NY −8.92∗∗ (3.09) NC 1.89 (2.80)
USA 0.02 (0.06) ND 21.78∗∗∗ (6.56) OH 4.09 (2.81)
region not targeted −0.18 (0.13) OK 17.90∗∗∗ (4.17) OR 0.31 (3.40)
Trump over Biden 0.34∗∗∗ (0.03) PA 1.36 (2.80) RI −10.22 (6.91)
≤ 10k|10k-100k 9.41∗∗∗ (0.09) SC 17.96∗∗∗ (3.98) SD 1.26 (3.63)
10k-100k|100k-1M 14.30∗∗∗ (0.12) TN 19.89∗∗∗ (3.64) TX 5.18 (2.85)
100k-1M|1M-10M 18.14∗∗∗ (0.13) UT 5.22 (3.46) VT −3.84 (3.84)
1M-10M|> 10M 22.26∗∗∗ (0.23) VA 8.27∗∗ (2.86) WA −3.66 (3.25)
WV 11.07∗ (5.19) WI 0.65 (2.80)
AIC 50018.68
WY 11.90 (6.51) DC −26.14∗∗∗ (5.52)
BIC 50208.89
Male 18-24 2.06 (1.97) Male 25-34 7.70∗∗∗ (1.90)
Log Likelihood −24988.34
Male 35-44 7.88∗∗∗ (1.91) Male 45-54 7.05∗∗∗ (1.91)
Deviance 49976.68 Male 55-64 2.35 (1.91) Male 65+ −5.67∗∗ (1.92)
Num. obs. 63422 Female 18-24 1.43 (1.96) Female 25-34 7.59∗∗∗ (1.91)
∗∗∗ 𝑝 < 0.001; ∗∗ 𝑝 < 0.01; ∗ 𝑝 < 0.05
Female 35-44 8.95∗∗∗ (1.92) Female 45-54 6.49∗∗∗ (1.93)
Female 55-64 0.71 (1.91) Female 65+ −5.37∗∗ (1.90)
Ad delivery start time −0.17∗∗∗ (0.00) Biden Campaign 27.53∗∗∗ (3.35)
Trump Campaign 18.27∗∗∗ (3.35)
∗∗∗ 𝑝 < 0.001; ∗∗ 𝑝 < 0.01; ∗ 𝑝 < 0.05

Table 12: Logistic regression results for predicting whether a TikTok video contains an election related warning.

coef std err z P> |z| [0.025 0.975]


video likes (by 100k) -0.1177 0.031 -3.738 0.000 -0.179 -0.056
video shares (by 100k) 3.5197 0.278 12.669 0.000 2.975 4.064
video comments (by 100k) 1.3187 0.237 5.566 0.000 0.854 1.783
playCount (by 100k) -0.0431 0.006 -7.270 0.000 -0.055 -0.032
author likes (by 100k) -0.6334 0.019 -33.465 0.000 -0.671 -0.596
#biden 5.6269 0.010 555.612 0.000 5.607 5.647
#trump 5.8192 0.018 319.928 0.000 5.784 5.855
#vote 5.5635 0.018 312.633 0.000 5.529 5.598
#blm 0.6550 0.028 23.246 0.000 0.600 0.710
#abortion 1.3162 0.087 15.117 0.000 1.146 1.487
#gun 0.6233 0.068 9.172 0.000 0.490 0.756
const -4.5996 0.008 -570.047 0.000 -4.615 -4.584

You might also like