KEMBAR78
Data Science Ethics - Lecture 1 | PDF | Artificial Intelligence | Intelligence (AI) & Semantics
0% found this document useful (0 votes)
137 views68 pages

Data Science Ethics - Lecture 1

Uploaded by

Niloofar Fallahi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views68 pages

Data Science Ethics - Lecture 1

Uploaded by

Niloofar Fallahi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Data Science & Ethics

Lecture 1

Prof. David Martens


david.martens@uantwerp.be
Data Science & Ethics
▪ Goal

▪ AI ethics in the news

▪ Course and Evaluation

▪ FAT Flow

1
Data Science & Ethics
▪ Understand the ethical aspects of data science

▪ Crucial for business, large and small

▪ Data science has impact


• Costs and benefits for businesses
• Decisions on humans
• More than making calls to predefined Python libraries

2
Why Care?
1. Expected from society
▪ Generation Z
• Born 1995 – 2010
• 90 million in the US
• Cares about social justics and ethics

https://www.statista.com/statistics/797321/us-population-by-generation/ 3
https://www.mckinsey.com/industries/consumer-packaged-goods/our-insights/true-gen-generation-z-and-its-implications-for-companies
Why Care?
2. Huge potential risks
▪ Be aware of the risks and countermeasures
▪ Risks for humans
• Physical and mental well-being (self-driving)
• Privacy
• Discrimination

▪ Risks for businesses


• Reputational
• Financial

4
Why Care?
3. Potential benefits
▪ Understanding ethical concerns and applying techniques to
deal with this, can improve the data, model and be a
marketing instrument
• Remove bias in data: improve the accuracy and fairness of the
model
• Explain predictions: improve the trust the model
• Ensure proper data gathering: better data quality
• Part of a company’s brand (cf. 1. Expected from society)

5
Why Care?
Summary:
▪ Life goal in itself (philosophical goal)
▪ Societal and business reasons:
1. Expected from society
2. Huge potential risks
3. Data science ethics can bring value

Data scientists and business students are not inherently unethical,


but at the same time not trained to think this through neither.

6
Why Care?
▪ Future
• Increased digitalization
• Increased automation
• Increased use of AI
➔ EU AI Act

▪ SciFi becomes Sci


• Minority Report
• Black Mirror
• Terminator
• Artificial Intelligence

7
Goal of the course

Provide guidance and insight on deciding


what is right and wrong when conducting data science.

with the introduction of


▪ Concepts
▪ Techniques
▪ Cautionary tales

8
AI Ethics in the News (past days)
Course and Evaluation
▪ Weekly classes
▪ Discussions in class
▪ Presentation (4 points, 20% of your final grade)
• Each week one presentation
• On topic of the class before
➢ Present additional techniques, cases or regulations/framework
➢ 15 min. (10 min. ppt + 5 min. Q&A)
➢ Place in context of content seen before
• Groups of 4: make group on Google Doc (avoid overlap)
https://docs.google.com/document/d/1dXqCIwDGAh2vZpa5mbvs0IEWD-jV-
MnyBNNqR_gonUk/edit?usp=sharing
• For the second assessment period, one can choose to keep the grade of the
first assessment period, or to write a paper on the topic.
▪ Closed book exam (16 points , 80% of your final grade)

10
Course and Evaluation
▪ Presentation (4 points, 20% of your final grade)
• Evaluation:
➢ Cohesion and structure of the presentation
➢ Relevance of content and link with course
➢ Presentation and ability to answer questions
➢ Proper referencing
• Strict timing!
• Check the book to ensure you don’t take a topic that is covered later on. It
might be related to a future topic, but not exactly the same.
• When in doubt: mail us.

11
Resources
▪ Slides
▪ Book
• Data Science Ethics: Concepts, Techniques and Cautionary
Tales, Oxford University Press, 2022, 272 p.
• Around 35 Euros
• Available at Acco, bol.com, Amazon, Standaard Boekhandel
https://www.amazon.com/Data-Science-Ethics-Techniques-Cautionary/dp/0192847279/ref=sr_1_1

12
Data science ethics

0 1
1 0 0 1
0
0 0 1 1
1

13
Data science ethics

▪ Reduce risk ▪ Data leaks


▪ Reduce crime 0 1
▪ Filter bubble
▪ Increase profitability 1 0 0 1 ▪ Discrimination
0
0 0 1 1
▪ Improve medical diagnosis ▪ Digital pawns
1
▪ Increased “good” ▪ Increased “bad”

14
Data science ethics

About right and wrong


when doing data science
15
Data science ethics

About right and wrong


when doing data science
16
Data Science & Ethics
▪ Ethics: “moral principles that control or influence a person’s behavior”
▪ Moral: “connected with principles of right and wrong behavior”
▪ Law: what you can do
Ethics: what you should do

▪ Data Science Ethics: the domain of what is right and wrong when doing
data science
▪ Responsible AI: the development and application of AI that is aligned with
moral values in society

Definitions by Oxford Dictionary 17


Data Science & Ethics
▪ Ethics: “moral principles that control or influence a person’s behavior”
▪ Moral: “connected with principles of right and wrong behavior”
▪ Ethics Theories: Utilitariasm vs Deontological Ethics
• Utilitariasm:
➢ =consequentialism, what is produced in the consequence of the act
➢ Action is moral if the consequence is moral, means to an end
➢ Justifies immoral things
• Deontology:
➢ Not doing immoral actions

Definitions by Oxford Dictionary 18


Data Science & Ethics
▪ Aristole’s Nicomachean Ethic
• Moral behavior can be found at the mean between two
extremes: excess and deficiency.
• ‘Golden mean’ of data science ethics
➢ Deficiency: Not using any data at all
➢ Excess: Using all available data for any application, without any
concern for the ethical concerns

19
Data Science Ethics Equilibrium
▪ Data Science Ethics Equilibrium: A state of data science
practices determined by the ethical concerns and utility of
data science.

Eg churn prediction

Eg CV sorting

20
Trolley Problem
▪ Well-known thought experiment in ethics:
utilitarian vs deontological ethics

https://en.wikipedia.org/wiki/Trolley_problem 21
Trolley problem (The Good Place)
Trolley Problem
▪ Variants
• What if single person is your child or partner?
• What if you’re on a bridge, and can only stop the trolley by
pushing a man standing next to you off the bridge?

https://en.wikipedia.org/wiki/Trolley_problem 23
Ethics of self-driving cars

• What if the person is a criminal?


• What if the person is a pregnant woman?
• What if you would be the driver? 24
Ethics of self-driving cars
▪ MIT Moral Machine: Online experimental platform
• 40 million decisions in ten languages from millions of people
in 233 countries
• Global moral preferences

E. Awad, S. Dsouza, R. Kim, J. Schulz, J. Henrich, A. Shariff, J.-F. Bonnefon, I. Rahwan (2018). The Moral Machine experiment. Nature. 25
Ethics of self-driving cars
▪ MIT Moral Machine - Global moral preferences

▪ Include these morals?


• Regional differences: different moral code?
• Want this as a driver?

E. Awad, S. Dsouza, R. Kim, J. Schulz, J. Henrich, A. Shariff, J.-F. Bonnefon, I. Rahwan (2018). The Moral Machine experiment. Nature. 26
Data, Algorithms and Models
▪ Definitions
• Data: facts or information, especially when examined and used
to find out things or to make decisions
• Algorithm: ​a set of rules that must be followed when solving a
particular problem
• Prediction or AI Model: the decision-making formula, which
has been learnt from data by a prediction/AI algorithm

Definitions by Oxford Dictionary 27


Types of Data
▪ Personal data: ‘personal data’ means any information relating to an identified or
identifiable natural person (‘data subject’); an identifiable natural person is one who can
be identified, directly or indirectly, in particular by reference to an identifier such as a
name, an identification number, location data, an online identifier or to one or more
factors specific to the physical, physiological, genetic, mental, economic, cultural or social
identity of that natural person; (GDPR, Article 4)

▪ Behavioral data: data providing evidence of actions taken by persons, such as


location data, Facebook likes, online browsing data, payment data.

▪ Sensitive data: personal data revealing racial or ethnic origin, political opinions,
religious or philosophical beliefs, or trade union membership, and the processing of
genetic data, biometric data for the purpose of uniquely identifying a natural person, data
concerning health or data concerning a natural person's sex life or sexual orientation shall
be prohibited. (GDPR, Article 9)

28
Different Roles
Humans enter the data science process in different roles

▪ Data Subject: the person whose (personal) data is


being used. (Regulator as proxy.)

▪ Data Scientist: the person who is performing the data


science

▪ Manager: the person who manages and signs off on a


data science project

▪ Model Subject: the person on who the model is being


applied
29
Different Roles
Example of explaining prediction model

▪ Data Subject: how are you using my data in your


models?

▪ Data Scientist: where is the model making mistakes?

▪ Manager: how does the model generally work?

▪ Model Subject: why am I being denied credit?

30
FAT
▪ Fair: “Treating people equally without favouritism or
discrimination.”

▪ Transparant: “Easy to perceive or detect”

▪ Accountable: “Required or expected to justify actions or


decisions; responsible”

Definitions by Oxford Dictionary 31


FAT
▪ Fair: “Treating people equally without favouritism or
discrimination.”
1. Privacy: fair to the data subject’s privacy rights
➢ A state in which one is not observed or disturbed by other people.
➢ Human right
➢ No one shall be subjected to arbitrary interference with his privacy,
family, home or correspondence, nor to attacks upon his honour and
reputation.
- UN Universal Declaration of Human Rights of 1984
➢ Everyone has the right to respect for his private and family life, his
home and his correspondence.
- European Convention on Humans Rights of 1953
2. Discrimination: not discriminating against sensitive groups

32
FAT
▪ Fair: “Treating people equally without favouritism or
discrimination.”
2. Discrimination: not discriminating against sensitive groups
➢ Sensitive groups? Often race, gender, sexual preference.
➢ Cf. GDPR Sensitive data

33
FAT
▪ Transparant:”Easy to perceive or detect”
1. Process
➢ Depending on the role
➢ Crucial for Fairness and Accountability
➢ Does not imply revealing company secrets

2. Explainable AI
➢ Explain prediction (model)

34
FAT
▪ Accountable: “Required or expected to justify actions or
decisions; responsible”
• From theory to practice: Obligation to
1. implement appropriate and effective measures to ensure that
principles are complied with,
2. demonstrate compliance of the measures upon request, and
3. recognize potential negative consequences.

3
35
FAT Flow: a Data Science Ethics Framework
▪ Three dimensions
• Stage in the data science process
• Evaluation criterion
• Role of the human
FAT Flow: a Data Science Ethics Framework

privacy discrimination

fair: "Treating people equally without favouritism or


discrimination."
Oxford Dictionary
FAT Flow: a Data Science Ethics Framework

process explainable

transparent: "Easy to perceive or


detect."
Oxford Dictionary
FAT Flow: a Data Science Ethics Framework

measures compliance consequences

accountable: "Required or expected to justify actions or decisions;


responsible."
Oxford Dictionary
FAT Flow
▪ A Framework for Data Science Ethics
FAT Flow: Concepts and Techniques

41
FAT Flow: Cautionary Tales

42
Subjectivity of ethics
▪ Who decides what is ethical?
• Companies
• You

43
Subjectivity of ethics
▪ Application
• Fair to use gender and race data?
• Credit scoring vs medical diagnosis
▪ Time
• Women: allowed to vote in US in 1920, in Belgium in 1948, in
Moldova in 1978
• Black people: slavery, allowed to vote in the US in 1920
• Victims of our time:
➢ Those we consider not wrong to discriminate against, but have the
same rights as all humans: people of high age, low income, etc.
➢ Those we consider to have less rights than humans: animals, robots

▪ Location
• Respect for elder, disrespect for criminals, etc.
• Respect for right of individuals vs. state 44
Subjectivity of ethics

45
Discussion Case 1
▪ How important is it to be ethical in data science?
• Pro: absolutely, see “Why Care?”
• Con: no, importance is exaggerated.

• Spectrum, balance is to be discussed

46
Fair Data Gathering
▪ Concepts: Privacy, Sample Bias, Surveillance
▪ Techniques: Encryption, Hashing
▪ Cautionary Tales: Government backdoors

47
Transparant Data Gathering
▪ Concepts: Privacy
▪ Techniques: A/B Testing
▪ Cautionary Tales: OK Cupid

48
Discussion Case 2
▪ The SID-IN is a fair (expo) where universities and colleges provide
information on the programs for students and their parents. Since 2020
each participant gets a badge with QR code. Everyone at the fair then asks
to scan the badge, as to have information on who came by and send them
additional information later on. The data gathered is:
• Information available to person providing information at the fair:
name, preference for programs
• Information available to university: Name, school, email, address,
preference for programs

▪ Ethics:
• What are potential uses for data science, what additional information
would be useful (think out of the box and for big impact)?
• How could this be misused?
• Spectrum, balance is to be discussed

49
Fair Data Preperation
▪ Concepts: K-Anonymity, Proxies
▪ Techniques: Input Selection, Defining target variable
▪ Cautionary Tales: Netflix re-identificaiton

https://www.wired.com/2009/12/netflix-privacy-lawsuit/ 50
Transparant Data Preparation
▪ Concepts: Proxies
▪ Techniques: Input selection
▪ Cautionary Tales: Red lining

http://powerreporting.com/color/ 51
Fair Data Modeling
▪ Concepts: PPDM, biased models
▪ Techniques: Homomorphic encr., ZK Proofs, removing bias
▪ Cautionary Tales: Self-driving cars

E. Awad, S. Dsouza, R. Kim, J. Schulz, J. Henrich, A. Shariff, J.-F. Bonnefon, I. Rahwan (2018). The Moral Machine
experiment. Nature. 52
Transparant Data Modeling
▪ Concepts: Black box models
▪ Techniques: Global and instance based explanations
▪ Cautionary Tales: Credit scoring

53
Fair Model Evaluation
▪ Concepts: Privacy, discrimination
▪ Techniques: K-anonymity, detect bias
▪ Cautionary Tales: Predicting Recidivism

https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing 54
Transparant Model Evaluation
▪ Concepts: KPIs, Reporting, Explaining
▪ Techniques: Cherrypicking, Backtesting, Explaining models
▪ Cautionary Tales: Apple Card

https://towardsdatascience.com/is-the-medias-reluctance-to-admit-ai-
s-weaknesses-putting-us-at-risk-c355728e9028 55
Fair Model Deployment
▪ Concepts: Access to system
▪ Techniques: overruling
▪ Cautionary Tales: Target

https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html 56
Transparant Model Deployment
▪ Concepts: Unintended consequences, misleading
▪ Techniques: Deep Fake
▪ Cautionary Tales: Uber’s God View

57
https://www.forbes.com/sites/kashmirhill/2014/10/03/god-view-uber-allegedly-stalked-users-for-party-goers-viewing-pleasure/
Beyond Data Science Ethics
▪ Singularity
▪ Skynet
▪ Robot Rights and Duties

58
Ethical AI Frameworks
▪ IEEE Global Initiative on Ethics of Autonomous and
Intelligent Systems (2018)
1. Human Rights
5. Transparency
A/IS shall be created and operated to respect, The basis of a particular A/IS decision should always be
promote, and protect internationally discoverable.
recognized human rights. 6. Accountability
2. Well-being A/IS shall be created and operated to provide an
unambiguous rationale for all decisions made.
A/IS creators shall adopt increased human
7. Awareness of Misuse
well-being as a primary success criterion A/IS creators shall guard against all potential misuses
for development. and risks of A/IS in operation.
3. Data Agency 8. Competence
A/IS creators shall empower individuals with A/IS creators shall specify and operators shall adhere
to the knowledge and skill required for safe and
the ability to access and securely share their
effective operation.
data, to maintain people’s capacity to have
control over their identity.
4. Effectiveness
A/IS creators and operators shall provide
evidence of the effectiveness and fitness
for purpose of A/IS.
59
Ethical AI Frameworks
▪ Ethics guidelines for trustworthy AI (2019)
▪ “The aim of the Guidelines is to promote Trustworthy AI. Trustworthy AI has
three components, which should be met throughout the system's entire life
cycle:
• (1) it should be lawful, complying with all applicable laws and regulations
• (2) it should be ethical, ensuring adherence to ethical principles and values
and
• (3) it should be robust, both from a technical and social perspective since,
even with good intentions, AI systems can cause unintentional harm.

https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai 60
AI Act
▪ European regulation

61
Ethical AI Frameworks
▪ White House Executive Order on Maintaining American
Leadership in Artificial Intelligence, Feb. 2019
• Mentioning of privacy and civil liberties
• No mention of ethics, explainable or transparant
• https://www.whitehouse.gov/presidential-actions/executive-
order-maintaining-american-leadership-artificial-intelligence/
▪ ISO
• ISO/IEC AWI TR 24368
• Information technology — Artificial intelligence — Overview of
ethical and societal concerns
• Status: “Under development”
• https://www.iso.org/standard/78507.html

62
63
Hagendorff 2019 - The Ethics of AI Ethics - An Evaluation of Guidelines
Discussion Case 3
▪ Should Data Science Ethics be mandatory training for all
data science and business students?
▪ For who is it more important: business or data science
students?
▪ Should Data Science Ethics be regulated?

64
Presentation Ideas
▪ Review the recently proposed AI Act, and summarize the
main critiques and open issues
▪ Compare US, Chinese and EU view on AI Ethics

▪ Remember to register your week via the Google Doc


What have we learned?

66
Next week

67

You might also like