KEMBAR78
Py con2020 | PPTX
Understanding the importance of big data and big ideas
Understanding Datascience
behind the movie
MONEYBALL USING
PYTHON
Which Player to
Select
PROCESS
PROBLEM &
SOLUTION
01 02
TABLE OF CONTENTS
03
Billy Beane joins forces with
Peter Brand, a Yale graduate,
to challenge the old-school
selection methods and reinvent
his team using a sabermetric
model that has never been
used before.
About the Movie
—MONEYBALL
“IF WE WIN, ON OUR BUDGET, WITH THIS
TEAM..
WE’LL HAVE CHANGED THE GAME.”
UNDERSTANDING THE PROBLEM
Oakland Athletics
39,722,689 $
New York Yankees
114,457,768$
Oakland'A didnt have the
money to buy top players, so
they had to find another way to
be competitive. Neither they
had won for 86 years.
Bill and Paul then took an
analytical, statistical,
sabermetric approach to
assemble their team.
Wolf of Sabermetrics Street
SOLUTIONS
Interpreting the
data
Bill Beane's
Formula
Fetching Data
and Installing
all pre-requistie
Conclusion
Salaries and Win- Spoiler Alert
Importing all the
modules and
fetching data
01
Module
Data Downloaded from SeanLahman.com
Selecting the data we need
Data Explanation
Data Visulization
02
Setting the index
Importing the Salary
Grouping the Salary and Team
Set the Data-frame index using existing columns
This index will make our queries easier
Importing the Salary
Joining Columns of another DataFrame
Big Chunk of Code
Function of the millions is to
convert passed value into
millions
.xs function is used for
dataseries
I is the code of the team.
We are searching for two
team year .
Yankees and Oakland
We use scatter plot for the
given data.
Set the y and x axis.
Fontsize
ScreenSize
Color Scheme
Green for Oakland
Newyork-Blue
Obtaining the Plot
Using the data, we see Oakland should drastic improvement with salary remaining constant
and number of wins increasing. In the next slide, we will focus on the Billi Beane Secret
Formulae which made it possible.
Billi Beane
Formulae
03
Welcome to the cauldron
Most player in MLB focused on Batting Average as a statistics to improve their runs scored.
Billo Beane took a different approach, he focused on Base Percentage and Slugging
Percentage
On base percentage is a measure of how often a batter reaches base.
Formulae is OBP=(Hits+ Walks+ Hit By Pitch)/(At Bats+ Walks+ Hit by Pitch+ Sacrifice Flies)
Understanding Terminology (Optional)
A walk occurs a pitchers throws four pitches out of the strike zone, none of which are swung by the
hitters.
After refraining from swinging at four pitches out of the zone, a batter is awarded first base.
In baseball, hit by pitch (HBP) is an event in which a batter or his clothing or equipment is
struck directly by a pitch from the pitcher, the batter is called a hit batsman(HB)
A sacrifice fly occur when a batter hits a flyer balls out to the outfield or foul territory that
allows to score.
A player turns at batter called at bats.
A hit occurs when a batter strikes the baseball into fair territory and reaches base without
doing so via an error or fielder’s choice.
Statistics
Statsmodel is a
python module
that provides
classes and
functions for the
estimation of
manty different
statistical models.
Linear Regression also called Ordinary Least Square (OLS)
Regression.
It is of the format ols(y,x)
I
Conclusion
04
Conclusion
The first model has an Adjusted R-Squared of 0.918, with 95% confidence interval of BA
between -283 and 468..
The is counterintuitive, since we expect the BA value to be positive.
The second model has an Adjusted R-squared of 0.919, and the last model an Adjusted of 0.500.
Based on this analysis, we could confirm that the second model using OBP AND SLG is the best
model for predicting Run Scored
Results
Oakland A finished 1st in the
American League West and set
an AL record of 20 consecutive
wins.
They won the world series in
2004, after failing to do so for
86 years.
Main Motive of this Presentation
Challenging conventional wisdom as to what top talent looks like and where it comes from
• Specific industry, Company, or Competitor experience.
• Any Prior experience
• Specific University
• High G.P.A
• Certification
• M.B.A's
We need Assessments
The science is unbiased
The science is smarter than you, and your hiring manager
Fill in love with the technology, it will change your life!
You can find great talent hidden in horrible culture organization, many times who will be
more success than those coming from great cultures
Question and
Answer Time
CREDITS: This presentation template was created by Slidesgo,
including icons by Flaticon, and infographics & images by Freepik
THANKS
Do you have any questions?
Aniketsinha101@gmail.com
+91 8425 99 1426
www.cmmon.com
Please keep this slide for attribution
People who made this presentation possible
Billi BeaneAdil Moujbad
Introduction to Sabermetrics
and Implementation of
Moneyball by
For statistical analysis
(known as sabermetrics)
to baseball,
Paul DePodesta
Cold calculator, choosing
players based only on their
numbers
Michael Lewis
Moneyball Brad Pitt and Jonah Hills
Billi James
First Person to the coin the
terms Sabermetrics

Py con2020

  • 1.
    Understanding the importanceof big data and big ideas Understanding Datascience behind the movie MONEYBALL USING PYTHON
  • 2.
    Which Player to Select PROCESS PROBLEM& SOLUTION 01 02 TABLE OF CONTENTS 03
  • 3.
    Billy Beane joinsforces with Peter Brand, a Yale graduate, to challenge the old-school selection methods and reinvent his team using a sabermetric model that has never been used before. About the Movie
  • 4.
    —MONEYBALL “IF WE WIN,ON OUR BUDGET, WITH THIS TEAM.. WE’LL HAVE CHANGED THE GAME.”
  • 5.
    UNDERSTANDING THE PROBLEM OaklandAthletics 39,722,689 $ New York Yankees 114,457,768$
  • 6.
    Oakland'A didnt havethe money to buy top players, so they had to find another way to be competitive. Neither they had won for 86 years. Bill and Paul then took an analytical, statistical, sabermetric approach to assemble their team. Wolf of Sabermetrics Street
  • 7.
    SOLUTIONS Interpreting the data Bill Beane's Formula FetchingData and Installing all pre-requistie Conclusion
  • 8.
    Salaries and Win-Spoiler Alert
  • 9.
    Importing all the modulesand fetching data 01
  • 10.
    Module Data Downloaded fromSeanLahman.com Selecting the data we need
  • 11.
  • 12.
  • 13.
    Setting the index Importingthe Salary Grouping the Salary and Team Set the Data-frame index using existing columns This index will make our queries easier Importing the Salary Joining Columns of another DataFrame
  • 14.
    Big Chunk ofCode Function of the millions is to convert passed value into millions .xs function is used for dataseries I is the code of the team. We are searching for two team year . Yankees and Oakland We use scatter plot for the given data. Set the y and x axis. Fontsize ScreenSize Color Scheme Green for Oakland Newyork-Blue
  • 15.
    Obtaining the Plot Usingthe data, we see Oakland should drastic improvement with salary remaining constant and number of wins increasing. In the next slide, we will focus on the Billi Beane Secret Formulae which made it possible.
  • 16.
  • 17.
    Welcome to thecauldron Most player in MLB focused on Batting Average as a statistics to improve their runs scored. Billo Beane took a different approach, he focused on Base Percentage and Slugging Percentage On base percentage is a measure of how often a batter reaches base. Formulae is OBP=(Hits+ Walks+ Hit By Pitch)/(At Bats+ Walks+ Hit by Pitch+ Sacrifice Flies)
  • 18.
    Understanding Terminology (Optional) Awalk occurs a pitchers throws four pitches out of the strike zone, none of which are swung by the hitters. After refraining from swinging at four pitches out of the zone, a batter is awarded first base. In baseball, hit by pitch (HBP) is an event in which a batter or his clothing or equipment is struck directly by a pitch from the pitcher, the batter is called a hit batsman(HB) A sacrifice fly occur when a batter hits a flyer balls out to the outfield or foul territory that allows to score. A player turns at batter called at bats. A hit occurs when a batter strikes the baseball into fair territory and reaches base without doing so via an error or fielder’s choice.
  • 19.
    Statistics Statsmodel is a pythonmodule that provides classes and functions for the estimation of manty different statistical models. Linear Regression also called Ordinary Least Square (OLS) Regression. It is of the format ols(y,x) I
  • 20.
  • 21.
    Conclusion The first modelhas an Adjusted R-Squared of 0.918, with 95% confidence interval of BA between -283 and 468.. The is counterintuitive, since we expect the BA value to be positive. The second model has an Adjusted R-squared of 0.919, and the last model an Adjusted of 0.500. Based on this analysis, we could confirm that the second model using OBP AND SLG is the best model for predicting Run Scored
  • 22.
    Results Oakland A finished1st in the American League West and set an AL record of 20 consecutive wins. They won the world series in 2004, after failing to do so for 86 years.
  • 23.
    Main Motive ofthis Presentation Challenging conventional wisdom as to what top talent looks like and where it comes from • Specific industry, Company, or Competitor experience. • Any Prior experience • Specific University • High G.P.A • Certification • M.B.A's We need Assessments The science is unbiased The science is smarter than you, and your hiring manager Fill in love with the technology, it will change your life! You can find great talent hidden in horrible culture organization, many times who will be more success than those coming from great cultures
  • 24.
  • 25.
    CREDITS: This presentationtemplate was created by Slidesgo, including icons by Flaticon, and infographics & images by Freepik THANKS Do you have any questions? Aniketsinha101@gmail.com +91 8425 99 1426 www.cmmon.com Please keep this slide for attribution
  • 26.
    People who madethis presentation possible Billi BeaneAdil Moujbad Introduction to Sabermetrics and Implementation of Moneyball by For statistical analysis (known as sabermetrics) to baseball, Paul DePodesta Cold calculator, choosing players based only on their numbers Michael Lewis Moneyball Brad Pitt and Jonah Hills Billi James First Person to the coin the terms Sabermetrics