Python For DataAnalysis 3rd Wes Mckinney
download
https://ebookbell.com/product/python-for-data-analysis-3rd-wes-
mckinney-46540276
Explore and download more ebooks at ebookbell.com
2.
Here are somerecommended products that we believe you will be
interested in. You can click the link to download.
Python For Data Analysis Data Wrangling With Pandas Numpy And Ipython
2nd Edition Mckinney
https://ebookbell.com/product/python-for-data-analysis-data-wrangling-
with-pandas-numpy-and-ipython-2nd-edition-mckinney-22532878
Python For Data Analysis Wes Mckinney
https://ebookbell.com/product/python-for-data-analysis-wes-
mckinney-2612882
Python For Data Analysis The Ultimate And Definitive Manual To Learn
Data Science And Coding With Python Master The Basics Of Machine
Learning To Clean Code And Improve Artificial Intelligence Matt Algore
https://ebookbell.com/product/python-for-data-analysis-the-ultimate-
and-definitive-manual-to-learn-data-science-and-coding-with-python-
master-the-basics-of-machine-learning-to-clean-code-and-improve-
artificial-intelligence-matt-algore-29874340
Python For Data Analysis 3rd Edition Second Early Release 3rd Wes
Mckinney
https://ebookbell.com/product/python-for-data-analysis-3rd-edition-
second-early-release-3rd-wes-mckinney-36296812
3.
Python For DataAnalysis Unlocking Insights And Driving Innovation
With Powerful Data Techniques 2 In 1 Guide Brian Paul
https://ebookbell.com/product/python-for-data-analysis-unlocking-
insights-and-driving-innovation-with-powerful-data-
techniques-2-in-1-guide-brian-paul-55978516
Python For Data Analysis Wes Mckinney
https://ebookbell.com/product/python-for-data-analysis-wes-
mckinney-53639582
Python For Data Analysis Data Wrangling With Pandas Numpy And Ipython
2nd Edition Mckinney
https://ebookbell.com/product/python-for-data-analysis-data-wrangling-
with-pandas-numpy-and-ipython-2nd-edition-mckinney-22122784
Python For Data Analysis Wes Mckinney
https://ebookbell.com/product/python-for-data-analysis-wes-
mckinney-11939498
Python For Data Analysis 3rd Edition Wes Mckinney
https://ebookbell.com/product/python-for-data-analysis-3rd-edition-
wes-mckinney-232897832
DATA
“With this newedition,
Wes has updated his
book to ensure it remains
the go-to resource for
all things related to data
analysis with Python
and pandas. I cannot
recommend this book
highly enough.”
—Paul Barry
Lecturer and author of O’Reilly’s
Head First Python
Python for Data Analysis
9 781098 104030
5 6 9 9 9
US $69.99 CAN $87.99
ISBN: 978-1-098-10403-0
Twitter: @oreillymedia
linkedin.com/company/oreilly-media
youtube.com/oreillymedia
Get the definitive handbook for manipulating, processing,
cleaning, and crunching datasets in Python. Updated for
Python 3.10 and pandas 1.4, the third edition of this hands-
on guide is packed with practical case studies that show you
how to solve a broad set of data analysis problems effectively.
You’ll learn the latest versions of pandas, NumPy, and Jupyter
in the process.
Written by Wes McKinney, the creator of the Python pandas
project, this book is a practical, modern introduction to
data science tools in Python. It’s ideal for analysts new to
Python and for Python programmers new to data science
and scientific computing. Data files and related material are
available on GitHub.
• Use the Jupyter notebook and the IPython shell for
exploratory computing
• Learn basic and advanced features in NumPy
• Get started with data analysis tools in the pandas library
• Use flexible tools to load, clean, transform, merge, and
reshape data
• Create informative visualizations with matplotlib
• Apply the pandas groupBy facility to slice, dice, and
summarize datasets
• Analyze and manipulate regular and irregular time series
data
• Learn how to solve real-world data analysis problems with
thorough, detailed examples
Wes McKinney, cofounder and chief
technology officer of Voltron Data, is
an active member of the Python data
community and an advocate for Python
use in data analysis, finance, and
statistical computing applications. A
graduate of MIT, he’s also a member of
the project management committees
for the Apache Software Foundation’s
Apache Arrow and Apache Parquet
projects.
7.
Wes McKinney
Python forData Analysis
Data Wrangling with pandas,
NumPy, and Jupyter
THIRD EDITION
Boston Farnham Sebastopol Tokyo
Beijing Boston Farnham Sebastopol Tokyo
Beijing
Table of Contents
Preface.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1. Preliminaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 What Is This Book About? 1
What Kinds of Data? 1
1.2 Why Python for Data Analysis? 2
Python as Glue 3
Solving the “Two-Language” Problem 3
Why Not Python? 3
1.3 Essential Python Libraries 4
NumPy 4
pandas 5
matplotlib 6
IPython and Jupyter 6
SciPy 7
scikit-learn 8
statsmodels 8
Other Packages 9
1.4 Installation and Setup 9
Miniconda on Windows 9
GNU/Linux 10
Miniconda on macOS 11
Installing Necessary Packages 11
Integrated Development Environments and Text Editors 12
1.5 Community and Conferences 13
1.6 Navigating This Book 14
Code Examples 15
iii
10.
Data for Examples15
Import Conventions 16
2. Python Language Basics, IPython, and Jupyter Notebooks. . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 The Python Interpreter 18
2.2 IPython Basics 19
Running the IPython Shell 19
Running the Jupyter Notebook 20
Tab Completion 23
Introspection 25
2.3 Python Language Basics 26
Language Semantics 26
Scalar Types 34
Control Flow 42
2.4 Conclusion 45
3. Built-In Data Structures, Functions, and Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1 Data Structures and Sequences 47
Tuple 47
List 51
Dictionary 55
Set 59
Built-In Sequence Functions 62
List, Set, and Dictionary Comprehensions 63
3.2 Functions 65
Namespaces, Scope, and Local Functions 67
Returning Multiple Values 68
Functions Are Objects 69
Anonymous (Lambda) Functions 70
Generators 71
Errors and Exception Handling 74
3.3 Files and the Operating System 76
Bytes and Unicode with Files 80
3.4 Conclusion 82
4. NumPy Basics: Arrays and Vectorized Computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1 The NumPy ndarray: A Multidimensional Array Object 85
Creating ndarrays 86
Data Types for ndarrays 88
Arithmetic with NumPy Arrays 91
Basic Indexing and Slicing 92
iv | Table of Contents
11.
Boolean Indexing 97
FancyIndexing 100
Transposing Arrays and Swapping Axes 102
4.2 Pseudorandom Number Generation 103
4.3 Universal Functions: Fast Element-Wise Array Functions 105
4.4 Array-Oriented Programming with Arrays 108
Expressing Conditional Logic as Array Operations 110
Mathematical and Statistical Methods 111
Methods for Boolean Arrays 113
Sorting 114
Unique and Other Set Logic 115
4.5 File Input and Output with Arrays 116
4.6 Linear Algebra 116
4.7 Example: Random Walks 118
Simulating Many Random Walks at Once 120
4.8 Conclusion 121
5. Getting Started with pandas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.1 Introduction to pandas Data Structures 124
Series 124
DataFrame 129
Index Objects 136
5.2 Essential Functionality 138
Reindexing 138
Dropping Entries from an Axis 141
Indexing, Selection, and Filtering 142
Arithmetic and Data Alignment 152
Function Application and Mapping 158
Sorting and Ranking 160
Axis Indexes with Duplicate Labels 164
5.3 Summarizing and Computing Descriptive Statistics 165
Correlation and Covariance 168
Unique Values, Value Counts, and Membership 170
5.4 Conclusion 173
6. Data Loading, Storage, and File Formats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.1 Reading and Writing Data in Text Format 175
Reading Text Files in Pieces 182
Writing Data to Text Format 184
Working with Other Delimited Formats 185
JSON Data 187
Table of Contents | v
12.
XML and HTML:Web Scraping 189
6.2 Binary Data Formats 193
Reading Microsoft Excel Files 194
Using HDF5 Format 195
6.3 Interacting with Web APIs 197
6.4 Interacting with Databases 199
6.5 Conclusion 201
7. Data Cleaning and Preparation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
7.1 Handling Missing Data 203
Filtering Out Missing Data 205
Filling In Missing Data 207
7.2 Data Transformation 209
Removing Duplicates 209
Transforming Data Using a Function or Mapping 211
Replacing Values 212
Renaming Axis Indexes 214
Discretization and Binning 215
Detecting and Filtering Outliers 217
Permutation and Random Sampling 219
Computing Indicator/Dummy Variables 221
7.3 Extension Data Types 224
7.4 String Manipulation 227
Python Built-In String Object Methods 227
Regular Expressions 229
String Functions in pandas 232
7.5 Categorical Data 235
Background and Motivation 236
Categorical Extension Type in pandas 237
Computations with Categoricals 240
Categorical Methods 242
7.6 Conclusion 245
8. Data Wrangling: Join, Combine, and Reshape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
8.1 Hierarchical Indexing 247
Reordering and Sorting Levels 250
Summary Statistics by Level 251
Indexing with a DataFrame’s columns 252
8.2 Combining and Merging Datasets 253
Database-Style DataFrame Joins 254
Merging on Index 259
vi | Table of Contents
13.
Concatenating Along anAxis 263
Combining Data with Overlap 268
8.3 Reshaping and Pivoting 270
Reshaping with Hierarchical Indexing 270
Pivoting “Long” to “Wide” Format 273
Pivoting “Wide” to “Long” Format 277
8.4 Conclusion 279
9. Plotting and Visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
9.1 A Brief matplotlib API Primer 282
Figures and Subplots 283
Colors, Markers, and Line Styles 288
Ticks, Labels, and Legends 290
Annotations and Drawing on a Subplot 294
Saving Plots to File 296
matplotlib Configuration 297
9.2 Plotting with pandas and seaborn 298
Line Plots 298
Bar Plots 301
Histograms and Density Plots 309
Scatter or Point Plots 311
Facet Grids and Categorical Data 314
9.3 Other Python Visualization Tools 317
9.4 Conclusion 317
10. Data Aggregation and Group Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
10.1 How to Think About Group Operations 320
Iterating over Groups 324
Selecting a Column or Subset of Columns 326
Grouping with Dictionaries and Series 327
Grouping with Functions 328
Grouping by Index Levels 328
10.2 Data Aggregation 329
Column-Wise and Multiple Function Application 331
Returning Aggregated Data Without Row Indexes 335
10.3 Apply: General split-apply-combine 335
Suppressing the Group Keys 338
Quantile and Bucket Analysis 338
Example: Filling Missing Values with Group-Specific Values 340
Example: Random Sampling and Permutation 343
Example: Group Weighted Average and Correlation 344
Table of Contents | vii
14.
Example: Group-Wise LinearRegression 347
10.4 Group Transforms and “Unwrapped” GroupBys 347
10.5 Pivot Tables and Cross-Tabulation 351
Cross-Tabulations: Crosstab 354
10.6 Conclusion 355
11. Time Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
11.1 Date and Time Data Types and Tools 358
Converting Between String and Datetime 359
11.2 Time Series Basics 361
Indexing, Selection, Subsetting 363
Time Series with Duplicate Indices 365
11.3 Date Ranges, Frequencies, and Shifting 366
Generating Date Ranges 367
Frequencies and Date Offsets 370
Shifting (Leading and Lagging) Data 371
11.4 Time Zone Handling 374
Time Zone Localization and Conversion 375
Operations with Time Zone-Aware Timestamp Objects 377
Operations Between Different Time Zones 378
11.5 Periods and Period Arithmetic 379
Period Frequency Conversion 380
Quarterly Period Frequencies 382
Converting Timestamps to Periods (and Back) 384
Creating a PeriodIndex from Arrays 385
11.6 Resampling and Frequency Conversion 387
Downsampling 388
Upsampling and Interpolation 391
Resampling with Periods 392
Grouped Time Resampling 394
11.7 Moving Window Functions 396
Exponentially Weighted Functions 399
Binary Moving Window Functions 401
User-Defined Moving Window Functions 402
11.8 Conclusion 403
12. Introduction to Modeling Libraries in Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
12.1 Interfacing Between pandas and Model Code 405
12.2 Creating Model Descriptions with Patsy 408
Data Transformations in Patsy Formulas 410
Categorical Data and Patsy 412
viii | Table of Contents
15.
12.3 Introduction tostatsmodels 415
Estimating Linear Models 415
Estimating Time Series Processes 419
12.4 Introduction to scikit-learn 420
12.5 Conclusion 423
13. Data Analysis Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
13.1 Bitly Data from 1.USA.gov 425
Counting Time Zones in Pure Python 426
Counting Time Zones with pandas 428
13.2 MovieLens 1M Dataset 435
Measuring Rating Disagreement 439
13.3 US Baby Names 1880–2010 443
Analyzing Naming Trends 448
13.4 USDA Food Database 457
13.5 2012 Federal Election Commission Database 463
Donation Statistics by Occupation and Employer 466
Bucketing Donation Amounts 469
Donation Statistics by State 471
13.6 Conclusion 472
A. Advanced NumPy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
A.1 ndarray Object Internals 473
NumPy Data Type Hierarchy 474
A.2 Advanced Array Manipulation 476
Reshaping Arrays 476
C Versus FORTRAN Order 478
Concatenating and Splitting Arrays 479
Repeating Elements: tile and repeat 481
Fancy Indexing Equivalents: take and put 483
A.3 Broadcasting 484
Broadcasting over Other Axes 487
Setting Array Values by Broadcasting 489
A.4 Advanced ufunc Usage 490
ufunc Instance Methods 490
Writing New ufuncs in Python 493
A.5 Structured and Record Arrays 493
Nested Data Types and Multidimensional Fields 494
Why Use Structured Arrays? 495
A.6 More About Sorting 495
Indirect Sorts: argsort and lexsort 497
Table of Contents | ix
16.
Alternative Sort Algorithms498
Partially Sorting Arrays 499
numpy.searchsorted: Finding Elements in a Sorted Array 500
A.7 Writing Fast NumPy Functions with Numba 501
Creating Custom numpy.ufunc Objects with Numba 502
A.8 Advanced Array Input and Output 503
Memory-Mapped Files 503
HDF5 and Other Array Storage Options 504
A.9 Performance Tips 505
The Importance of Contiguous Memory 505
B. More on the IPython System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
B.1 Terminal Keyboard Shortcuts 509
B.2 About Magic Commands 510
The %run Command 512
Executing Code from the Clipboard 513
B.3 Using the Command History 514
Searching and Reusing the Command History 514
Input and Output Variables 515
B.4 Interacting with the Operating System 516
Shell Commands and Aliases 517
Directory Bookmark System 518
B.5 Software Development Tools 519
Interactive Debugger 519
Timing Code: %time and %timeit 523
Basic Profiling: %prun and %run -p 525
Profiling a Function Line by Line 527
B.6 Tips for Productive Code Development Using IPython 529
Reloading Module Dependencies 529
Code Design Tips 530
B.7 Advanced IPython Features 532
Profiles and Configuration 532
B.8 Conclusion 533
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
x | Table of Contents
17.
Preface
The first editionof this book was published in 2012, during a time when open source
data analysis libraries for Python, especially pandas, were very new and developing
rapidly. When the time came to write the second edition in 2016 and 2017, I needed
to update the book not only for Python 3.6 (the first edition used Python 2.7) but also
for the many changes in pandas that had occurred over the previous five years. Now
in 2022, there are fewer Python language changes (we are now at Python 3.10, with
3.11 coming out at the end of 2022), but pandas has continued to evolve.
In this third edition, my goal is to bring the content up to date with current versions
of Python, NumPy, pandas, and other projects, while also remaining relatively con‐
servative about discussing newer Python projects that have appeared in the last few
years. Since this book has become an important resource for many university courses
and working professionals, I will try to avoid topics that are at risk of falling out of
date within a year or two. That way paper copies won’t be too difficult to follow in
2023 or 2024 or beyond.
A new feature of the third edition is the open access online version hosted on my
website at https://wesmckinney.com/book, to serve as a resource and convenience for
owners of the print and digital editions. I intend to keep the content reasonably up to
date there, so if you own the paper book and run into something that doesn’t work
properly, you should check there for the latest content changes.
Conventions Used in This Book
The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
xi
18.
Constant width
Used forprogram listings, as well as within paragraphs to refer to program
elements such as variable or function names, databases, data types, environment
variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values deter‐
mined by context.
This element signifies a tip or suggestion.
This element signifies a general note.
This element indicates a warning or caution.
Using Code Examples
You can find data files and related material for each chapter in this book’s GitHub
repository at https://github.com/wesm/pydata-book, which is mirrored to Gitee (for
those who cannot access GitHub) at https://gitee.com/wesmckinn/pydata-book.
This book is here to help you get your job done. In general, if example code is
offered with this book, you may use it in your programs and documentation. You
do not need to contact us for permission unless you’re reproducing a significant
portion of the code. For example, writing a program that uses several chunks of code
from this book does not require permission. Selling or distributing examples from
O’Reilly books does require permission. Answering a question by citing this book
and quoting example code does not require permission. Incorporating a significant
amount of example code from this book into your product’s documentation does
require permission.
xii | Preface
19.
We appreciate, butdo not require, attribution. An attribution usually includes the
title, author, publisher, and ISBN. For example: “Python for Data Analysis by Wes
McKinney (O’Reilly). Copyright 2022 Wes McKinney, 978-1-098-10403-0.”
If you feel your use of code examples falls outside fair use or the permission given
above, feel free to contact us at permissions@oreilly.com.
O’Reilly Online Learning
For more than 40 years, O’Reilly Media has provided technol‐
ogy and business training, knowledge, and insight to help
companies succeed.
Our unique network of experts and innovators share their knowledge and expertise
through books, articles, and our online learning platform. O’Reilly’s online learning
platform gives you on-demand access to live training courses, in-depth learning
paths, interactive coding environments, and a vast collection of text and video from
O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional
information. You can access this page at https://oreil.ly/python-data-analysis-3e.
Email bookquestions@oreilly.com to comment or ask technical questions about this
book.
For news and information about our books and courses, visit http://oreilly.com.
Find us on LinkedIn: https://linkedin.com/company/oreilly-media.
Follow us on Twitter: http://twitter.com/oreillymedia.
Watch us on YouTube: http://youtube.com/oreillymedia.
Preface | xiii
20.
Acknowledgments
This work isthe product of many years of fruitful discussions and collaborations
with, and assistance from many people around the world. I’d like to thank a few of
them.
In Memoriam: John D. Hunter (1968–2012)
Our dear friend and colleague John D. Hunter passed away after a battle with colon
cancer on August 28, 2012. This was only a short time after I’d completed the final
manuscript for this book’s first edition.
John’s impact and legacy in the Python scientific and data communities would be
hard to overstate. In addition to developing matplotlib in the early 2000s (a time
when Python was not nearly so popular), he helped shape the culture of a critical
generation of open source developers who’ve become pillars of the Python ecosystem
that we now often take for granted.
I was lucky enough to connect with John early in my open source career in January
2010, just after releasing pandas 0.1. His inspiration and mentorship helped me push
forward, even in the darkest of times, with my vision for pandas and Python as a
first-class data analysis language.
John was very close with Fernando Pérez and Brian Granger, pioneers of IPython,
Jupyter, and many other initiatives in the Python community. We had hoped to work
on a book together, the four of us, but I ended up being the one with the most free
time. I am sure he would be proud of what we’ve accomplished, as individuals and as
a community, over the last nine years.
Acknowledgments for the Third Edition (2022)
It has more than a decade since I started writing the first edition of this book and
more than 15 years since I originally started my journey as a Python prorammer.
A lot has changed since then! Python has evolved from a relatively niche language
for data analysis to the most popular and most widely used language powering
the plurality (if not the majority!) of data science, machine learning, and artificial
intelligence work.
I have not been an active contributor to the pandas open source project since 2013,
but its worldwide developer community has continued to thrive, serving as a model
of community-centric open source software development. Many “next-generation”
Python projects that deal with tabular data are modeling their user interfaces directly
after pandas, so the project has proved to have an enduring influence on the future
trajectory of the Python data science ecosystem.
xiv | Preface
21.
I hope thatthis book continues to serve as a valuable resource for students and
individuals who want to learn about working with data in Python.
I’m especially thankful to O’Reilly for allowing me to publish an “open access” version
of this book on my website at https://wesmckinney.com/book, where I hope it will
reach even more people and help expand opportunity in the world of data analysis.
J.J. Allaire was a lifesaver in making this possible by helping me “port” the book from
Docbook XML to Quarto, a wonderful new scientific and technical publishing system
for print and web.
Special thanks to my technical reviewers Paul Barry, Jean-Christophe Leyder, Abdul‐
lah Karasan, and William Jamir, whose thorough feedback has greatly improved the
readability, clarity, and understandability of the content.
Acknowledgments for the Second Edition (2017)
It has been five years almost to the day since I completed the manuscript for
this book’s first edition in July 2012. A lot has changed. The Python community
has grown immensely, and the ecosystem of open source software around it has
flourished.
This new edition of the book would not exist if not for the tireless efforts of the
pandas core developers, who have grown the project and its user community into
one of the cornerstones of the Python data science ecosystem. These include, but are
not limited to, Tom Augspurger, Joris van den Bossche, Chris Bartak, Phillip Cloud,
gfyoung, Andy Hayden, Masaaki Horikoshi, Stephan Hoyer, Adam Klein, Wouter
Overmeire, Jeff Reback, Chang She, Skipper Seabold, Jeff Tratner, and y-p.
On the actual writing of this second edition, I would like to thank the O’Reilly staff
who helped me patiently with the writing process. This includes Marie Beaugureau,
Ben Lorica, and Colleen Toporek. I again had outstanding technical reviewers with
Tom Augspurger, Paul Barry, Hugh Brown, Jonathan Coe, and Andreas Müller con‐
tributing. Thank you.
This book’s first edition has been translated into many foreign languages, including
Chinese, French, German, Japanese, Korean, and Russian. Translating all this content
and making it available to a broader audience is a huge and often thankless effort.
Thank you for helping more people in the world learn how to program and use data
analysis tools.
I am also lucky to have had support for my continued open source development
efforts from Cloudera and Two Sigma Investments over the last few years. With open
source software projects more thinly resourced than ever relative to the size of user
bases, it is becoming increasingly important for businesses to provide support for
development of key open source projects. It’s the right thing to do.
Preface | xv
22.
Acknowledgments for theFirst Edition (2012)
It would have been difficult for me to write this book without the support of a large
number of people.
On the O’Reilly staff, I’m very grateful for my editors, Meghan Blanchette and Julie
Steele, who guided me through the process. Mike Loukides also worked with me in
the proposal stages and helped make the book a reality.
I received a wealth of technical review from a large cast of characters. In particu‐
lar, Martin Blais and Hugh Brown were incredibly helpful in improving the book’s
examples, clarity, and organization from cover to cover. James Long, Drew Conway,
Fernando Pérez, Brian Granger, Thomas Kluyver, Adam Klein, Josh Klein, Chang
She, and Stéfan van der Walt each reviewed one or more chapters, providing pointed
feedback from many different perspectives.
I got many great ideas for examples and datasets from friends and colleagues in the
data community, among them: Mike Dewar, Jeff Hammerbacher, James Johndrow,
Kristian Lum, Adam Klein, Hilary Mason, Chang She, and Ashley Williams.
I am of course indebted to the many leaders in the open source scientific Python
community who’ve built the foundation for my development work and gave encour‐
agement while I was writing this book: the IPython core team (Fernando Pérez,
Brian Granger, Min Ragan-Kelly, Thomas Kluyver, and others), John Hunter, Skipper
Seabold, Travis Oliphant, Peter Wang, Eric Jones, Robert Kern, Josef Perktold, Fran‐
cesc Alted, Chris Fonnesbeck, and too many others to mention. Several other people
provided a great deal of support, ideas, and encouragement along the way: Drew
Conway, Sean Taylor, Giuseppe Paleologo, Jared Lander, David Epstein, John Krowas,
Joshua Bloom, Den Pilsworth, John Myles-White, and many others I’ve forgotten.
I’d also like to thank a number of people from my formative years. First, my former
AQR colleagues who’ve cheered me on in my pandas work over the years: Alex Reyf‐
man, Michael Wong, Tim Sargen, Oktay Kurbanov, Matthew Tschantz, Roni Israelov,
Michael Katz, Ari Levine, Chris Uga, Prasad Ramanan, Ted Square, and Hoon Kim.
Lastly, my academic advisors Haynes Miller (MIT) and Mike West (Duke).
I received significant help from Phillip Cloud and Joris van den Bossche in 2014 to
update the book’s code examples and fix some other inaccuracies due to changes in
pandas.
On the personal side, Casey provided invaluable day-to-day support during the
writing process, tolerating my highs and lows as I hacked together the final draft on
top of an already overcommitted schedule. Lastly, my parents, Bill and Kim, taught
me to always follow my dreams and to never settle for less.
xvi | Preface
23.
CHAPTER 1
Preliminaries
1.1 WhatIs This Book About?
This book is concerned with the nuts and bolts of manipulating, processing, cleaning,
and crunching data in Python. My goal is to offer a guide to the parts of the Python
programming language and its data-oriented library ecosystem and tools that will
equip you to become an effective data analyst. While “data analysis” is in the title
of the book, the focus is specifically on Python programming, libraries, and tools as
opposed to data analysis methodology. This is the Python programming you need for
data analysis.
Sometime after I originally published this book in 2012, people started using the
term data science as an umbrella description for everything from simple descriptive
statistics to more advanced statistical analysis and machine learning. The Python
open source ecosystem for doing data analysis (or data science) has also expanded
significantly since then. There are now many other books which focus specifically on
these more advanced methodologies. My hope is that this book serves as adequate
preparation to enable you to move on to a more domain-specific resource.
Some might characterize much of the content of the book as “data
manipulation” as opposed to “data analysis.” We also use the terms
wrangling or munging to refer to data manipulation.
What Kinds of Data?
When I say “data,” what am I referring to exactly? The primary focus is on structured
data, a deliberately vague term that encompasses many different common forms of
data, such as:
1
24.
• Tabular orspreadsheet-like data in which each column may be a different type
•
(string, numeric, date, or otherwise). This includes most kinds of data commonly
stored in relational databases or tab- or comma-delimited text files.
• Multidimensional arrays (matrices).
•
• Multiple tables of data interrelated by key columns (what would be primary or
•
foreign keys for a SQL user).
• Evenly or unevenly spaced time series.
•
This is by no means a complete list. Even though it may not always be obvious, a
large percentage of datasets can be transformed into a structured form that is more
suitable for analysis and modeling. If not, it may be possible to extract features from
a dataset into a structured form. As an example, a collection of news articles could
be processed into a word frequency table, which could then be used to perform
sentiment analysis.
Most users of spreadsheet programs like Microsoft Excel, perhaps the most widely
used data analysis tool in the world, will not be strangers to these kinds of data.
1.2 Why Python for Data Analysis?
For many people, the Python programming language has strong appeal. Since its
first appearance in 1991, Python has become one of the most popular interpreted
programming languages, along with Perl, Ruby, and others. Python and Ruby have
become especially popular since 2005 or so for building websites using their numer‐
ous web frameworks, like Rails (Ruby) and Django (Python). Such languages are
often called scripting languages, as they can be used to quickly write small programs,
or scripts to automate other tasks. I don’t like the term “scripting languages,” as it
carries a connotation that they cannot be used for building serious software. Among
interpreted languages, for various historical and cultural reasons, Python has devel‐
oped a large and active scientific computing and data analysis community. In the last
20 years, Python has gone from a bleeding-edge or “at your own risk” scientific com‐
puting language to one of the most important languages for data science, machine
learning, and general software development in academia and industry.
For data analysis and interactive computing and data visualization, Python will inevi‐
tably draw comparisons with other open source and commercial programming lan‐
guages and tools in wide use, such as R, MATLAB, SAS, Stata, and others. In recent
years, Python’s improved open source libraries (such as pandas and scikit-learn) have
made it a popular choice for data analysis tasks. Combined with Python’s overall
strength for general-purpose software engineering, it is an excellent option as a
primary language for building data applications.
2 | Chapter 1: Preliminaries
25.
Python as Glue
Partof Python’s success in scientific computing is the ease of integrating C, C++,
and FORTRAN code. Most modern computing environments share a similar set of
legacy FORTRAN and C libraries for doing linear algebra, optimization, integration,
fast Fourier transforms, and other such algorithms. The same story has held true for
many companies and national labs that have used Python to glue together decades’
worth of legacy software.
Many programs consist of small portions of code where most of the time is spent,
with large amounts of “glue code” that doesn’t run often. In many cases, the execution
time of the glue code is insignificant; effort is most fruitfully invested in optimizing
the computational bottlenecks, sometimes by moving the code to a lower-level lan‐
guage like C.
Solving the “Two-Language” Problem
In many organizations, it is common to research, prototype, and test new ideas using
a more specialized computing language like SAS or R and then later port those
ideas to be part of a larger production system written in, say, Java, C#, or C++.
What people are increasingly finding is that Python is a suitable language not only
for doing research and prototyping but also for building the production systems.
Why maintain two development environments when one will suffice? I believe that
more and more companies will go down this path, as there are often significant
organizational benefits to having both researchers and software engineers using the
same set of programming tools.
Over the last decade some new approaches to solving the “two-language” problem
have appeared, such as the Julia programming language. Getting the most out of
Python in many cases will require programming in a low-level language like C or
C++ and creating Python bindings to that code. That said, “just-in-time” (JIT) com‐
piler technology provided by libraries like Numba have provided a way to achieve
excellent performance in many computational algorithms without having to leave the
Python programming environment.
Why Not Python?
While Python is an excellent environment for building many kinds of analytical
applications and general-purpose systems, there are a number of uses for which
Python may be less suitable.
As Python is an interpreted programming language, in general most Python code
will run substantially slower than code written in a compiled language like Java or
C++. As programmer time is often more valuable than CPU time, many are happy to
make this trade-off. However, in an application with very low latency or demanding
1.2 Why Python for Data Analysis? | 3
26.
resource utilization requirements(e.g., a high-frequency trading system), the time
spent programming in a lower-level (but also lower-productivity) language like C++
to achieve the maximum possible performance might be time well spent.
Python can be a challenging language for building highly concurrent, multithreaded
applications, particularly applications with many CPU-bound threads. The reason for
this is that it has what is known as the global interpreter lock (GIL), a mechanism that
prevents the interpreter from executing more than one Python instruction at a time.
The technical reasons for why the GIL exists are beyond the scope of this book. While
it is true that in many big data processing applications, a cluster of computers may be
required to process a dataset in a reasonable amount of time, there are still situations
where a single-process, multithreaded system is desirable.
This is not to say that Python cannot execute truly multithreaded, parallel code.
Python C extensions that use native multithreading (in C or C++) can run code in
parallel without being impacted by the GIL, as long as they do not need to regularly
interact with Python objects.
1.3 Essential Python Libraries
For those who are less familiar with the Python data ecosystem and the libraries used
throughout the book, I will give a brief overview of some of them.
NumPy
NumPy, short for Numerical Python, has long been a cornerstone of numerical
computing in Python. It provides the data structures, algorithms, and library glue
needed for most scientific applications involving numerical data in Python. NumPy
contains, among other things:
• A fast and efficient multidimensional array object ndarray
•
• Functions for performing element-wise computations with arrays or mathemati‐
•
cal operations between arrays
• Tools for reading and writing array-based datasets to disk
•
• Linear algebra operations, Fourier transform, and random number generation
•
• A mature C API to enable Python extensions and native C or C++ code to access
•
NumPy’s data structures and computational facilities
Beyond the fast array-processing capabilities that NumPy adds to Python, one of
its primary uses in data analysis is as a container for data to be passed between
algorithms and libraries. For numerical data, NumPy arrays are more efficient for
storing and manipulating data than the other built-in Python data structures. Also,
libraries written in a lower-level language, such as C or FORTRAN, can operate on
4 | Chapter 1: Preliminaries
27.
the data storedin a NumPy array without copying data into some other memory
representation. Thus, many numerical computing tools for Python either assume
NumPy arrays as a primary data structure or else target interoperability with NumPy.
pandas
pandas provides high-level data structures and functions designed to make working
with structured or tabular data intuitive and flexible. Since its emergence in 2010, it
has helped enable Python to be a powerful and productive data analysis environment.
The primary objects in pandas that will be used in this book are the DataFrame, a
tabular, column-oriented data structure with both row and column labels, and the
Series, a one-dimensional labeled array object.
pandas blends the array-computing ideas of NumPy with the kinds of data manipu‐
lation capabilities found in spreadsheets and relational databases (such as SQL). It
provides convenient indexing functionality to enable you to reshape, slice and dice,
perform aggregations, and select subsets of data. Since data manipulation, prepara‐
tion, and cleaning are such important skills in data analysis, pandas is one of the
primary focuses of this book.
As a bit of background, I started building pandas in early 2008 during my tenure at
AQR Capital Management, a quantitative investment management firm. At the time,
I had a distinct set of requirements that were not well addressed by any single tool at
my disposal:
• Data structures with labeled axes supporting automatic or explicit data align‐
•
ment—this prevents common errors resulting from misaligned data and working
with differently indexed data coming from different sources
• Integrated time series functionality
•
• The same data structures handle both time series data and non-time series data
•
• Arithmetic operations and reductions that preserve metadata
•
• Flexible handling of missing data
•
• Merge and other relational operations found in popular databases (SQL-based,
•
for example)
I wanted to be able to do all of these things in one place, preferably in a language
well suited to general-purpose software development. Python was a good candidate
language for this, but at that time an integrated set of data structures and tools
providing this functionality did not exist. As a result of having been built initially
to solve finance and business analytics problems, pandas features especially deep
time series functionality and tools well suited for working with time-indexed data
generated by business processes.
1.3 Essential Python Libraries | 5
28.
I spent alarge part of 2011 and 2012 expanding pandas’s capabilities with some of
my former AQR colleagues, Adam Klein and Chang She. In 2013, I stopped being
as involved in day-to-day project development, and pandas has since become a fully
community-owned and community-maintained project with well over two thousand
unique contributors around the world.
For users of the R language for statistical computing, the DataFrame name will be
familiar, as the object was named after the similar R data.frame object. Unlike
Python, data frames are built into the R programming language and its standard
library. As a result, many features found in pandas are typically either part of the R
core implementation or provided by add-on packages.
The pandas name itself is derived from panel data, an econometrics term for multidi‐
mensional structured datasets, and a play on the phrase Python data analysis.
matplotlib
matplotlib is the most popular Python library for producing plots and other two-
dimensional data visualizations. It was originally created by John D. Hunter and
is now maintained by a large team of developers. It is designed for creating plots
suitable for publication. While there are other visualization libraries available to
Python programmers, matplotlib is still widely used and integrates reasonably well
with the rest of the ecosystem. I think it is a safe choice as a default visualization tool.
IPython and Jupyter
The IPython project began in 2001 as Fernando Pérez’s side project to make a
better interactive Python interpreter. Over the subsequent 20 years it has become
one of the most important tools in the modern Python data stack. While it does
not provide any computational or data analytical tools by itself, IPython is designed
for both interactive computing and software development work. It encourages an
execute-explore workflow instead of the typical edit-compile-run workflow of many
other programming languages. It also provides integrated access to your operating
system’s shell and filesystem; this reduces the need to switch between a terminal
window and a Python session in many cases. Since much of data analysis coding
involves exploration, trial and error, and iteration, IPython can help you get the job
done faster.
In 2014, Fernando and the IPython team announced the Jupyter project, a broader
initiative to design language-agnostic interactive computing tools. The IPython web
notebook became the Jupyter notebook, with support now for over 40 programming
languages. The IPython system can now be used as a kernel (a programming language
mode) for using Python with Jupyter.
6 | Chapter 1: Preliminaries
29.
IPython itself hasbecome a component of the much broader Jupyter open source
project, which provides a productive environment for interactive and exploratory
computing. Its oldest and simplest “mode” is as an enhanced Python shell designed
to accelerate the writing, testing, and debugging of Python code. You can also use the
IPython system through the Jupyter notebook.
The Jupyter notebook system also allows you to author content in Markdown and
HTML, providing you a means to create rich documents with code and text.
I personally use IPython and Jupyter regularly in my Python work, whether running,
debugging, or testing code.
In the accompanying book materials on GitHub, you will find Jupyter notebooks
containing all the code examples from each chapter. If you cannot access GitHub
where you are, you can try the mirror on Gitee.
SciPy
SciPy is a collection of packages addressing a number of foundational problems in
scientific computing. Here are some of the tools it contains in its various modules:
scipy.integrate
Numerical integration routines and differential equation solvers
scipy.linalg
Linear algebra routines and matrix decompositions extending beyond those pro‐
vided in numpy.linalg
scipy.optimize
Function optimizers (minimizers) and root finding algorithms
scipy.signal
Signal processing tools
scipy.sparse
Sparse matrices and sparse linear system solvers
scipy.special
Wrapper around SPECFUN, a FORTRAN library implementing many common
mathematical functions, such as the gamma function
scipy.stats
Standard continuous and discrete probability distributions (density functions,
samplers, continuous distribution functions), various statistical tests, and more
descriptive statistics
1.3 Essential Python Libraries | 7
30.
Together, NumPy andSciPy form a reasonably complete and mature computational
foundation for many traditional scientific computing applications.
scikit-learn
Since the project’s inception in 2007, scikit-learn has become the premier general-
purpose machine learning toolkit for Python programmers. As of this writing, more
than two thousand different individuals have contributed code to the project. It
includes submodules for such models as:
• Classification: SVM, nearest neighbors, random forest, logistic regression, etc.
•
• Regression: Lasso, ridge regression, etc.
•
• Clustering: k-means, spectral clustering, etc.
•
• Dimensionality reduction: PCA, feature selection, matrix factorization, etc.
•
• Model selection: Grid search, cross-validation, metrics
•
• Preprocessing: Feature extraction, normalization
•
Along with pandas, statsmodels, and IPython, scikit-learn has been critical for ena‐
bling Python to be a productive data science programming language. While I won’t
be able to include a comprehensive guide to scikit-learn in this book, I will give a
brief introduction to some of its models and how to use them with the other tools
presented in the book.
statsmodels
statsmodels is a statistical analysis package that was seeded by work from Stanford
University statistics professor Jonathan Taylor, who implemented a number of regres‐
sion analysis models popular in the R programming language. Skipper Seabold and
Josef Perktold formally created the new statsmodels project in 2010 and since then
have grown the project to a critical mass of engaged users and contributors. Nathaniel
Smith developed the Patsy project, which provides a formula or model specification
framework for statsmodels inspired by R’s formula system.
Compared with scikit-learn, statsmodels contains algorithms for classical (primarily
frequentist) statistics and econometrics. This includes such submodules as:
• Regression models: linear regression, generalized linear models, robust linear
•
models, linear mixed effects models, etc.
• Analysis of variance (ANOVA)
•
• Time series analysis: AR, ARMA, ARIMA, VAR, and other models
•
• Nonparametric methods: Kernel density estimation, kernel regression
•
8 | Chapter 1: Preliminaries
31.
• Visualization ofstatistical model results
•
statsmodels is more focused on statistical inference, providing uncertainty estimates
and p-values for parameters. scikit-learn, by contrast, is more prediction focused.
As with scikit-learn, I will give a brief introduction to statsmodels and how to use it
with NumPy and pandas.
Other Packages
In 2022, there are many other Python libraries which might be discussed in a book
about data science. This includes some newer projects like TensorFlow or PyTorch,
which have become popular for machine learning or artificial intelligence work. Now
that there are other books out there that focus more specifically on those projects, I
would recommend using this book to build a foundation in general-purpose Python
data wrangling. Then, you should be well prepared to move on to a more advanced
resource that may assume a certain level of expertise.
1.4 Installation and Setup
Since everyone uses Python for different applications, there is no single solution for
setting up Python and obtaining the necessary add-on packages. Many readers will
not have a complete Python development environment suitable for following along
with this book, so here I will give detailed instructions to get set up on each operating
system. I will be using Miniconda, a minimal installation of the conda package
manager, along with conda-forge, a community-maintained software distribution
based on conda. This book uses Python 3.10 throughout, but if you’re reading in the
future, you are welcome to install a newer version of Python.
If for some reason these instructions become out-of-date by the time you are reading
this, you can check out my website for the book which I will endeavor to keep up to
date with the latest installation instructions.
Miniconda on Windows
To get started on Windows, download the Miniconda installer for the latest Python
version available (currently 3.9) from https://conda.io. I recommend following the
installation instructions for Windows available on the conda website, which may have
changed between the time this book was published and when you are reading this.
Most people will want the 64-bit version, but if that doesn’t run on your Windows
machine, you can install the 32-bit version instead.
When prompted whether to install for just yourself or for all users on your system,
choose the option that’s most appropriate for you. Installing just for yourself will be
sufficient to follow along with the book. It will also ask you whether you want to
1.4 Installation and Setup | 9
32.
add Miniconda tothe system PATH environment variable. If you select this (I usually
do), then this Miniconda installation may override other versions of Python you have
installed. If you do not, then you will need to use the Window Start menu shortcut
that’s installed to be able to use this Miniconda. This Start menu entry may be called
“Anaconda3 (64-bit).”
I’ll assume that you haven’t added Miniconda to your system PATH. To verify that
things are configured correctly, open the “Anaconda Prompt (Miniconda3)” entry
under “Anaconda3 (64-bit)” in the Start menu. Then try launching the Python inter‐
preter by typing python. You should see a message like this:
(base) C:UsersWes>python
Python 3.9 [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
To exit the Python shell, type exit() and press Enter.
GNU/Linux
Linux details will vary a bit depending on your Linux distribution type, but here I
give details for such distributions as Debian, Ubuntu, CentOS, and Fedora. Setup is
similar to macOS with the exception of how Miniconda is installed. Most readers will
want to download the default 64-bit installer file, which is for x86 architecture (but
it’s possible in the future more users will have aarch64-based Linux machines). The
installer is a shell script that must be executed in the terminal. You will then have
a file named something similar to Miniconda3-latest-Linux-x86_64.sh. To install it,
execute this script with bash:
$ bash Miniconda3-latest-Linux-x86_64.sh
Some Linux distributions have all the required Python packages
(although outdated versions, in some cases) in their package man‐
agers and can be installed using a tool like apt. The setup described
here uses Miniconda, as it’s both easily reproducible across distri‐
butions and simpler to upgrade packages to their latest versions.
You will have a choice of where to put the Miniconda files. I recommend installing
the files in the default location in your home directory; for example, /home/$USER/
miniconda (with your username, naturally).
The installer will ask if you wish to modify your shell scripts to automatically activate
Miniconda. I recommend doing this (select “yes”) as a matter of convenience.
After completing the installation, start a new terminal process and verify that you are
picking up the new Miniconda installation:
10 | Chapter 1: Preliminaries
33.
(base) $ python
Python3.9 | (main) [GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
To exit the Python shell, type exit() and press Enter or press Ctrl-D.
Miniconda on macOS
Download the macOS Miniconda installer, which should be named something
like Miniconda3-latest-MacOSX-arm64.sh for Apple Silicon-based macOS computers
released from 2020 onward, or Miniconda3-latest-MacOSX-x86_64.sh for Intel-based
Macs released before 2020. Open the Terminal application in macOS, and install by
executing the installer (most likely in your Downloads directory) with bash:
$ bash $HOME/Downloads/Miniconda3-latest-MacOSX-arm64.sh
When the installer runs, by default it automatically configures Miniconda in your
default shell environment in your default shell profile. This is probably located
at /Users/$USER/.zshrc. I recommend letting it do this; if you do not want to allow
the installer to modify your default shell environment, you will need to consult the
Miniconda documentation to be able to proceed.
To verify everything is working, try launching Python in the system shell (open the
Terminal application to get a command prompt):
$ python
Python 3.9 (main) [Clang 12.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
To exit the shell, press Ctrl-D or type exit() and press Enter.
Installing Necessary Packages
Now that we have set up Miniconda on your system, it’s time to install the main
packages we will be using in this book. The first step is to configure conda-forge as
your default package channel by running the following commands in a shell:
(base) $ conda config --add channels conda-forge
(base) $ conda config --set channel_priority strict
Now, we will create a new conda “environment” with the conda create command
using Python 3.10:
(base) $ conda create -y -n pydata-book python=3.10
After the installation completes, activate the environment with conda activate:
(base) $ conda activate pydata-book
(pydata-book) $
1.4 Installation and Setup | 11
34.
It is necessaryto use conda activate to activate your environment
each time you open a new terminal. You can see information about
the active conda environment at any time from the terminal by
running conda info.
Now, we will install the essential packages used throughout the book (along with their
dependencies) with conda install:
(pydata-book) $ conda install -y pandas jupyter matplotlib
We will be using some other packages, too, but these can be installed later once they
are needed. There are two ways to install packages: with conda install and with
pip install. conda install should always be preferred when using Miniconda, but
some packages are not available through conda, so if conda install $package_name
fails, try pip install $package_name.
If you want to install all of the packages used in the rest of the
book, you can do that now by running:
conda install lxml beautifulsoup4 html5lib openpyxl
requests sqlalchemy seaborn scipy statsmodels
patsy scikit-learn pyarrow pytables numba
On Windows, substitute a carat ^ for the line continuation used
on Linux and macOS.
You can update packages by using the conda update command:
conda update package_name
pip also supports upgrades using the --upgrade flag:
pip install --upgrade package_name
You will have several opportunities to try out these commands throughout the book.
While you can use both conda and pip to install packages, you
should avoid updating packages originally installed with conda
using pip (and vice versa), as doing so can lead to environment
problems. I recommend sticking to conda if you can and falling
back on pip only for packages that are unavailable with conda
install.
Integrated Development Environments and Text Editors
When asked about my standard development environment, I almost always say “IPy‐
thon plus a text editor.” I typically write a program and iteratively test and debug each
piece of it in IPython or Jupyter notebooks. It is also useful to be able to play around
12 | Chapter 1: Preliminaries
35.
with data interactivelyand visually verify that a particular set of data manipulations is
doing the right thing. Libraries like pandas and NumPy are designed to be productive
to use in the shell.
When building software, however, some users may prefer to use a more richly
featured integrated development environment (IDE) and rather than an editor like
Emacs or Vim which provide a more minimal environment out of the box. Here are
some that you can explore:
• PyDev (free), an IDE built on the Eclipse platform
•
• PyCharm from JetBrains (subscription-based for commercial users, free for open
•
source developers)
• Python Tools for Visual Studio (for Windows users)
•
• Spyder (free), an IDE currently shipped with Anaconda
•
• Komodo IDE (commercial)
•
Due to the popularity of Python, most text editors, like VS Code and Sublime Text 2,
have excellent Python support.
1.5 Community and Conferences
Outside of an internet search, the various scientific and data-related Python mailing
lists are generally helpful and responsive to questions. Some to take a look at include:
• pydata: A Google Group list for questions related to Python for data analysis and
•
pandas
• pystatsmodels: For statsmodels or pandas-related questions
•
• Mailing list for scikit-learn (scikit-learn@python.org) and machine learning in
•
Python, generally
• numpy-discussion: For NumPy-related questions
•
• scipy-user: For general SciPy or scientific Python questions
•
I deliberately did not post URLs for these in case they change. They can be easily
located via an internet search.
Each year many conferences are held all over the world for Python programmers.
If you would like to connect with other Python programmers who share your inter‐
ests, I encourage you to explore attending one, if possible. Many conferences have
financial support available for those who cannot afford admission or travel to the
conference. Here are some to consider:
1.5 Community and Conferences | 13
36.
• PyCon andEuroPython: The two main general Python conferences in North
•
America and Europe, respectively
• SciPy and EuroSciPy: Scientific-computing-oriented conferences in North Amer‐
•
ica and Europe, respectively
• PyData: A worldwide series of regional conferences targeted at data science and
•
data analysis use cases
• International and regional PyCon conferences (see https://pycon.org for a com‐
•
plete listing)
1.6 Navigating This Book
If you have never programmed in Python before, you will want to spend some time
in Chapters 2 and 3, where I have placed a condensed tutorial on Python language
features and the IPython shell and Jupyter notebooks. These things are prerequisite
knowledge for the remainder of the book. If you have Python experience already, you
may instead choose to skim or skip these chapters.
Next, I give a short introduction to the key features of NumPy, leaving more
advanced NumPy use for Appendix A. Then, I introduce pandas and devote the
rest of the book to data analysis topics applying pandas, NumPy, and matplotlib
(for visualization). I have structured the material in an incremental fashion, though
there is occasionally some minor crossover between chapters, with a few cases where
concepts are used that haven’t been introduced yet.
While readers may have many different end goals for their work, the tasks required
generally fall into a number of different broad groups:
Interacting with the outside world
Reading and writing with a variety of file formats and data stores
Preparation
Cleaning, munging, combining, normalizing, reshaping, slicing and dicing, and
transforming data for analysis
Transformation
Applying mathematical and statistical operations to groups of datasets to derive
new datasets (e.g., aggregating a large table by group variables)
Modeling and computation
Connecting your data to statistical models, machine learning algorithms, or other
computational tools
Presentation
Creating interactive or static graphical visualizations or textual summaries
14 | Chapter 1: Preliminaries
37.
Code Examples
Most ofthe code examples in the book are shown with input and output as it would
appear executed in the IPython shell or in Jupyter notebooks:
In [5]: CODE EXAMPLE
Out[5]: OUTPUT
When you see a code example like this, the intent is for you to type the example code
in the In block in your coding environment and execute it by pressing the Enter key
(or Shift-Enter in Jupyter). You should see output similar to what is shown in the Out
block.
I changed the default console output settings in NumPy and pandas to improve
readability and brevity throughout the book. For example, you may see more digits
of precision printed in numeric data. To exactly match the output shown in the book,
you can execute the following Python code before running the code examples:
import numpy as np
import pandas as pd
pd.options.display.max_columns = 20
pd.options.display.max_rows = 20
pd.options.display.max_colwidth = 80
np.set_printoptions(precision=4, suppress=True)
Data for Examples
Datasets for the examples in each chapter are hosted in a GitHub repository (or in a
mirror on Gitee if you cannot access GitHub). You can download this data either by
using the Git version control system on the command line or by downloading a zip
file of the repository from the website. If you run into problems, navigate to the book
website for up-to-date instructions about obtaining the book materials.
If you download a zip file containing the example datasets, you must then fully
extract the contents of the zip file to a directory and navigate to that directory from
the terminal before proceeding with running the book’s code examples:
$ pwd
/home/wesm/book-materials
$ ls
appa.ipynb ch05.ipynb ch09.ipynb ch13.ipynb README.md
ch02.ipynb ch06.ipynb ch10.ipynb COPYING requirements.txt
ch03.ipynb ch07.ipynb ch11.ipynb datasets
ch04.ipynb ch08.ipynb ch12.ipynb examples
1.6 Navigating This Book | 15
38.
I have madeevery effort to ensure that the GitHub repository contains everything
necessary to reproduce the examples, but I may have made some mistakes or omis‐
sions. If so, please send me an email: book@wesmckinney.com. The best way to report
errors in the book is on the errata page on the O’Reilly website.
Import Conventions
The Python community has adopted a number of naming conventions for commonly
used modules:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import statsmodels as sm
This means that when you see np.arange, this is a reference to the arange function
in NumPy. This is done because it’s considered bad practice in Python software
development to import everything (from numpy import *) from a large package like
NumPy.
16 | Chapter 1: Preliminaries
39.
CHAPTER 2
Python LanguageBasics, IPython,
and Jupyter Notebooks
When I wrote the first edition of this book in 2011 and 2012, there were fewer
resources available for learning about doing data analysis in Python. This was
partially a chicken-and-egg problem; many libraries that we now take for granted,
like pandas, scikit-learn, and statsmodels, were comparatively immature back then.
Now in 2022, there is now a growing literature on data science, data analysis, and
machine learning, supplementing the prior works on general-purpose scientific com‐
puting geared toward computational scientists, physicists, and professionals in other
research fields. There are also excellent books about learning the Python program‐
ming language itself and becoming an effective software engineer.
As this book is intended as an introductory text in working with data in Python, I
feel it is valuable to have a self-contained overview of some of the most important
features of Python’s built-in data structures and libraries from the perspective of data
manipulation. So, I will only present roughly enough information in this chapter and
Chapter 3 to enable you to follow along with the rest of the book.
Much of this book focuses on table-based analytics and data preparation tools for
working with datasets that are small enough to fit on your personal computer. To
use these tools you must sometimes do some wrangling to arrange messy data into
a more nicely tabular (or structured) form. Fortunately, Python is an ideal language
for doing this. The greater your facility with the Python language and its built-in data
types, the easier it will be for you to prepare new datasets for analysis.
Some of the tools in this book are best explored from a live IPython or Jupyter
session. Once you learn how to start up IPython and Jupyter, I recommend that you
follow along with the examples so you can experiment and try different things. As
17
40.
with any keyboard-drivenconsole-like environment, developing familiarity with the
common commands is also part of the learning curve.
There are introductory Python concepts that this chapter does not
cover, like classes and object-oriented programming, which you
may find useful in your foray into data analysis in Python.
To deepen your Python language knowledge, I recommend that
you supplement this chapter with the official Python tutorial and
potentially one of the many excellent books on general-purpose
Python programming. Some recommendations to get you started
include:
• Python Cookbook, Third Edition, by David Beazley and Brian
•
K. Jones (O’Reilly)
• Fluent Python by Luciano Ramalho (O’Reilly)
•
• Effective Python, Second Edition, by Brett Slatkin (Addison-
•
Wesley)
2.1 The Python Interpreter
Python is an interpreted language. The Python interpreter runs a program by execut‐
ing one statement at a time. The standard interactive Python interpreter can be
invoked on the command line with the python command:
$ python
Python 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:38:57)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 5
>>> print(a)
5
The >>> you see is the prompt after which you’ll type code expressions. To exit the
Python interpreter, you can either type exit() or press Ctrl-D (works on Linux and
macOS only).
Running Python programs is as simple as calling python with a .py file as its first
argument. Suppose we had created hello_world.py with these contents:
print("Hello world")
You can run it by executing the following command (the hello_world.py file must be
in your current working terminal directory):
$ python hello_world.py
Hello world
18 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
41.
While some Pythonprogrammers execute all of their Python code in this way,
those doing data analysis or scientific computing make use of IPython, an enhanced
Python interpreter, or Jupyter notebooks, web-based code notebooks originally cre‐
ated within the IPython project. I give an introduction to using IPython and Jupyter
in this chapter and have included a deeper look at IPython functionality in Appen‐
dix A. When you use the %run command, IPython executes the code in the specified
file in the same process, enabling you to explore the results interactively when it’s
done:
$ ipython
Python 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:38:57)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: %run hello_world.py
Hello world
In [2]:
The default IPython prompt adopts the numbered In [2]: style, compared with the
standard >>> prompt.
2.2 IPython Basics
In this section, I’ll get you up and running with the IPython shell and Jupyter
notebook, and introduce you to some of the essential concepts.
Running the IPython Shell
You can launch the IPython shell on the command line just like launching the regular
Python interpreter except with the ipython command:
$ ipython
Python 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:38:57)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: a = 5
In [2]: a
Out[2]: 5
You can execute arbitrary Python statements by typing them and pressing Return (or
Enter). When you type just a variable into IPython, it renders a string representation
of the object:
In [5]: import numpy as np
In [6]: data = [np.random.standard_normal() for i in range(7)]
2.2 IPython Basics | 19
42.
In [7]: data
Out[7]:
[-0.20470765948471295,
0.47894333805754824,
-0.5194387150567381,
-0.55573030434749,
1.9657805725027142,
1.3934058329729904,
0.09290787674371767]
Thefirst two lines are Python code statements; the second statement creates a vari‐
able named data that refers to a newly created Python dictionary. The last line prints
the value of data in the console.
Many kinds of Python objects are formatted to be more readable, or pretty-printed,
which is distinct from normal printing with print. If you printed the above data
variable in the standard Python interpreter, it would be much less readable:
>>> import numpy as np
>>> data = [np.random.standard_normal() for i in range(7)]
>>> print(data)
>>> data
[-0.5767699931966723, -0.1010317773535111, -1.7841005313329152,
-1.524392126408841, 0.22191374220117385, -1.9835710588082562,
-1.6081963964963528]
IPython also provides facilities to execute arbitrary blocks of code (via a somewhat
glorified copy-and-paste approach) and whole Python scripts. You can also use the
Jupyter notebook to work with larger blocks of code, as we will soon see.
Running the Jupyter Notebook
One of the major components of the Jupyter project is the notebook, a type of
interactive document for code, text (including Markdown), data visualizations, and
other output. The Jupyter notebook interacts with kernels, which are implementations
of the Jupyter interactive computing protocol specific to different programming
languages. The Python Jupyter kernel uses the IPython system for its underlying
behavior.
To start up Jupyter, run the command jupyter notebook in a terminal:
$ jupyter notebook
[I 15:20:52.739 NotebookApp] Serving notebooks from local directory:
/home/wesm/code/pydata-book
[I 15:20:52.739 NotebookApp] 0 active kernels
[I 15:20:52.739 NotebookApp] The Jupyter Notebook is running at:
http://localhost:8888/?token=0a77b52fefe52ab83e3c35dff8de121e4bb443a63f2d...
[I 15:20:52.740 NotebookApp] Use Control-C to stop this server and shut down
all kernels (twice to skip confirmation).
Created new window in existing browser session.
20 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
43.
To access thenotebook, open this file in a browser:
file:///home/wesm/.local/share/jupyter/runtime/nbserver-185259-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=0a77b52fefe52ab83e3c35dff8de121e4...
or http://127.0.0.1:8888/?token=0a77b52fefe52ab83e3c35dff8de121e4...
On many platforms, Jupyter will automatically open in your default web browser
(unless you start it with --no-browser). Otherwise, you can navigate to the HTTP
address printed when you started the notebook, here http://localhost:8888/?
token=0a77b52fefe52ab83e3c35dff8de121e4bb443a63f2d3055. See Figure 2-1 for
what this looks like in Google Chrome.
Many people use Jupyter as a local computing environment, but
it can also be deployed on servers and accessed remotely. I won’t
cover those details here, but I encourage you to explore this topic
on the internet if it’s relevant to your needs.
Figure 2-1. Jupyter notebook landing page
2.2 IPython Basics | 21
44.
To create anew notebook, click the New button and select the “Python 3” option.
You should see something like Figure 2-2. If this is your first time, try clicking on
the empty code “cell” and entering a line of Python code. Then press Shift-Enter to
execute it.
Figure 2-2. Jupyter new notebook view
When you save the notebook (see “Save and Checkpoint” under the notebook File
menu), it creates a file with the extension .ipynb. This is a self-contained file format
that contains all of the content (including any evaluated code output) currently in the
notebook. These can be loaded and edited by other Jupyter users.
To rename an open notebook, click on the notebook title at the top of the page and
type the new title, pressing Enter when you are finished.
To load an existing notebook, put the file in the same directory where you started the
notebook process (or in a subfolder within it), then click the name from the landing
page. You can try it out with the notebooks from my wesm/pydata-book repository on
GitHub. See Figure 2-3.
When you want to close a notebook, click the File menu and select “Close and Halt.”
If you simply close the browser tab, the Python process associated with the notebook
will keep running in the background.
While the Jupyter notebook may feel like a distinct experience from the IPython
shell, nearly all of the commands and tools in this chapter can be used in either
environment.
22 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
45.
Figure 2-3. Jupyterexample view for an existing notebook
Tab Completion
On the surface, the IPython shell looks like a cosmetically different version of the
standard terminal Python interpreter (invoked with python). One of the major
improvements over the standard Python shell is tab completion, found in many IDEs
or other interactive computing analysis environments. While entering expressions in
the shell, pressing the Tab key will search the namespace for any variables (objects,
functions, etc.) matching the characters you have typed so far and show the results in
a convenient drop-down menu:
In [1]: an_apple = 27
In [2]: an_example = 42
In [3]: an<Tab>
an_apple an_example any
In this example, note that IPython displayed both of the two variables I defined, as
well as the built-in function any. Also, you can also complete methods and attributes
on any object after typing a period:
2.2 IPython Basics | 23
46.
In [3]: b= [1, 2, 3]
In [4]: b.<Tab>
append() count() insert() reverse()
clear() extend() pop() sort()
copy() index() remove()
The same is true for modules:
In [1]: import datetime
In [2]: datetime.<Tab>
date MAXYEAR timedelta
datetime MINYEAR timezone
datetime_CAPI time tzinfo
Note that IPython by default hides methods and attributes starting
with underscores, such as magic methods and internal “private”
methods and attributes, in order to avoid cluttering the display
(and confusing novice users!). These, too, can be tab-completed,
but you must first type an underscore to see them. If you prefer
to always see such methods in tab completion, you can change this
setting in the IPython configuration. See the IPython documenta‐
tion to find out how to do this.
Tab completion works in many contexts outside of searching the interactive name‐
space and completing object or module attributes. When typing anything that looks
like a file path (even in a Python string), pressing the Tab key will complete anything
on your computer’s filesystem matching what you’ve typed.
Combined with the %run command (see “The %run Command” on page 512), this
functionality can save you many keystrokes.
Another area where tab completion saves time is in the completion of function
keyword arguments (including the = sign!). See Figure 2-4.
Figure 2-4. Autocomplete function keywords in a Jupyter notebook
We’ll have a closer look at functions in a little bit.
24 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
47.
Introspection
Using a questionmark (?) before or after a variable will display some general infor‐
mation about the object:
In [1]: b = [1, 2, 3]
In [2]: b?
Type: list
String form: [1, 2, 3]
Length: 3
Docstring:
Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list.
The argument must be an iterable if specified.
In [3]: print?
Docstring:
print(value, ..., sep=' ', end='n', file=sys.stdout, flush=False)
Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Type: builtin_function_or_method
This is referred to as object introspection. If the object is a function or instance
method, the docstring, if defined, will also be shown. Suppose we’d written the
following function (which you can reproduce in IPython or Jupyter):
def add_numbers(a, b):
"""
Add two numbers together
Returns
-------
the_sum : type of arguments
"""
return a + b
Then using ? shows us the docstring:
In [6]: add_numbers?
Signature: add_numbers(a, b)
Docstring:
Add two numbers together
Returns
-------
the_sum : type of arguments
2.2 IPython Basics | 25
48.
File: <ipython-input-9-6a548a216e27>
Type: function
?has a final usage, which is for searching the IPython namespace in a manner similar
to the standard Unix or Windows command line. A number of characters combined
with the wildcard (*) will show all names matching the wildcard expression. For
example, we could get a list of all functions in the top-level NumPy namespace
containing load:
In [9]: import numpy as np
In [10]: np.*load*?
np.__loader__
np.load
np.loads
np.loadtxt
2.3 Python Language Basics
In this section, I will give you an overview of essential Python programming concepts
and language mechanics. In the next chapter, I will go into more detail about Python
data structures, functions, and other built-in tools.
Language Semantics
The Python language design is distinguished by its emphasis on readability, simplic‐
ity, and explicitness. Some people go so far as to liken it to “executable pseudocode.”
Indentation, not braces
Python uses whitespace (tabs or spaces) to structure code instead of using braces as in
many other languages like R, C++, Java, and Perl. Consider a for loop from a sorting
algorithm:
for x in array:
if x < pivot:
less.append(x)
else:
greater.append(x)
A colon denotes the start of an indented code block after which all of the code must
be indented by the same amount until the end of the block.
Love it or hate it, significant whitespace is a fact of life for Python programmers.
While it may seem foreign at first, you will hopefully grow accustomed to it in time.
26 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
49.
I strongly recommendusing four spaces as your default indentation
and replacing tabs with four spaces. Many text editors have a
setting that will replace tab stops with spaces automatically (do
this!). IPython and Jupyter notebooks will automatically insert four
spaces on new lines following a colon and replace tabs by four
spaces.
As you can see by now, Python statements also do not need to be terminated by
semicolons. Semicolons can be used, however, to separate multiple statements on a
single line:
a = 5; b = 6; c = 7
Putting multiple statements on one line is generally discouraged in Python as it can
make code less readable.
Everything is an object
An important characteristic of the Python language is the consistency of its object
model. Every number, string, data structure, function, class, module, and so on exists
in the Python interpreter in its own “box,” which is referred to as a Python object.
Each object has an associated type (e.g., integer, string, or function) and internal data.
In practice this makes the language very flexible, as even functions can be treated like
any other object.
Comments
Any text preceded by the hash mark (pound sign) # is ignored by the Python
interpreter. This is often used to add comments to code. At times you may also want
to exclude certain blocks of code without deleting them. One solution is to comment
out the code:
results = []
for line in file_handle:
# keep the empty lines for now
# if len(line) == 0:
# continue
results.append(line.replace("foo", "bar"))
Comments can also occur after a line of executed code. While some programmers
prefer comments to be placed in the line preceding a particular line of code, this can
be useful at times:
print("Reached this line") # Simple status report
2.3 Python Language Basics | 27
50.
Function and objectmethod calls
You call functions using parentheses and passing zero or more arguments, optionally
assigning the returned value to a variable:
result = f(x, y, z)
g()
Almost every object in Python has attached functions, known as methods, that have
access to the object’s internal contents. You can call them using the following syntax:
obj.some_method(x, y, z)
Functions can take both positional and keyword arguments:
result = f(a, b, c, d=5, e="foo")
We will look at this in more detail later.
Variables and argument passing
When assigning a variable (or name) in Python, you are creating a reference to the
object shown on the righthand side of the equals sign. In practical terms, consider a
list of integers:
In [8]: a = [1, 2, 3]
Suppose we assign a to a new variable b:
In [9]: b = a
In [10]: b
Out[10]: [1, 2, 3]
In some languages, the assignment if b will cause the data [1, 2, 3] to be copied. In
Python, a and b actually now refer to the same object, the original list [1, 2, 3] (see
Figure 2-5 for a mock-up). You can prove this to yourself by appending an element to
a and then examining b:
In [11]: a.append(4)
In [12]: b
Out[12]: [1, 2, 3, 4]
Figure 2-5. Two references for the same object
Understanding the semantics of references in Python, and when, how, and why data
is copied, is especially critical when you are working with larger datasets in Python.
28 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
51.
Assignment is alsoreferred to as binding, as we are binding a
name to an object. Variable names that have been assigned may
occasionally be referred to as bound variables.
When you pass objects as arguments to a function, new local variables are created
referencing the original objects without any copying. If you bind a new object to a
variable inside a function, that will not overwrite a variable of the same name in the
“scope” outside of the function (the “parent scope”). It is therefore possible to alter the
internals of a mutable argument. Suppose we had the following function:
In [13]: def append_element(some_list, element):
....: some_list.append(element)
Then we have:
In [14]: data = [1, 2, 3]
In [15]: append_element(data, 4)
In [16]: data
Out[16]: [1, 2, 3, 4]
Dynamic references, strong types
Variables in Python have no inherent type associated with them; a variable can refer
to a different type of object simply by doing an assignment. There is no problem with
the following:
In [17]: a = 5
In [18]: type(a)
Out[18]: int
In [19]: a = "foo"
In [20]: type(a)
Out[20]: str
Variables are names for objects within a particular namespace; the type information is
stored in the object itself. Some observers might hastily conclude that Python is not a
“typed language.” This is not true; consider this example:
In [21]: "5" + 5
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-21-7fe5aa79f268> in <module>
----> 1 "5" + 5
TypeError: can only concatenate str (not "int") to str
2.3 Python Language Basics | 29
52.
In some languages,the string '5' might get implicitly converted (or cast) to an
integer, thus yielding 10. In other languages the integer 5 might be cast to a string,
yielding the concatenated string '55'. In Python, such implicit casts are not allowed.
In this regard we say that Python is a strongly typed language, which means that every
object has a specific type (or class), and implicit conversions will occur only in certain
permitted circumstances, such as:
In [22]: a = 4.5
In [23]: b = 2
# String formatting, to be visited later
In [24]: print(f"a is {type(a)}, b is {type(b)}")
a is <class 'float'>, b is <class 'int'>
In [25]: a / b
Out[25]: 2.25
Here, even though b is an integer, it is implicitly converted to a float for the division
operation.
Knowing the type of an object is important, and it’s useful to be able to write
functions that can handle many different kinds of input. You can check that an object
is an instance of a particular type using the isinstance function:
In [26]: a = 5
In [27]: isinstance(a, int)
Out[27]: True
isinstance can accept a tuple of types if you want to check that an object’s type is
among those present in the tuple:
In [28]: a = 5; b = 4.5
In [29]: isinstance(a, (int, float))
Out[29]: True
In [30]: isinstance(b, (int, float))
Out[30]: True
Attributes and methods
Objects in Python typically have both attributes (other Python objects stored
“inside” the object) and methods (functions associated with an object that can
have access to the object’s internal data). Both of them are accessed via the syntax
obj.attribute_name:
In [1]: a = "foo"
In [2]: a.<Press Tab>
30 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
53.
capitalize() index() isspace()removesuffix() startswith()
casefold() isprintable() istitle() replace() strip()
center() isalnum() isupper() rfind() swapcase()
count() isalpha() join() rindex() title()
encode() isascii() ljust() rjust() translate()
endswith() isdecimal() lower() rpartition()
expandtabs() isdigit() lstrip() rsplit()
find() isidentifier() maketrans() rstrip()
format() islower() partition() split()
format_map() isnumeric() removeprefix() splitlines()
Attributes and methods can also be accessed by name via the getattr function:
In [32]: getattr(a, "split")
Out[32]: <function str.split(sep=None, maxsplit=-1)>
While we will not extensively use the functions getattr and related functions
hasattr and setattr in this book, they can be used very effectively to write generic,
reusable code.
Duck typing
Often you may not care about the type of an object but rather only whether it has
certain methods or behavior. This is sometimes called duck typing, after the saying “If
it walks like a duck and quacks like a duck, then it’s a duck.” For example, you can
verify that an object is iterable if it implements the iterator protocol. For many objects,
this means it has an __iter__ “magic method,” though an alternative and better way
to check is to try using the iter function:
In [33]: def isiterable(obj):
....: try:
....: iter(obj)
....: return True
....: except TypeError: # not iterable
....: return False
This function would return True for strings as well as most Python collection types:
In [34]: isiterable("a string")
Out[34]: True
In [35]: isiterable([1, 2, 3])
Out[35]: True
In [36]: isiterable(5)
Out[36]: False
2.3 Python Language Basics | 31
54.
Imports
In Python, amodule is simply a file with the .py extension containing Python code.
Suppose we had the following module:
# some_module.py
PI = 3.14159
def f(x):
return x + 2
def g(a, b):
return a + b
If we wanted to access the variables and functions defined in some_module.py, from
another file in the same directory we could do:
import some_module
result = some_module.f(5)
pi = some_module.PI
Or alternately:
from some_module import g, PI
result = g(5, PI)
By using the as keyword, you can give imports different variable names:
import some_module as sm
from some_module import PI as pi, g as gf
r1 = sm.f(pi)
r2 = gf(6, pi)
Binary operators and comparisons
Most of the binary math operations and comparisons use familiar mathematical
syntax used in other programming languages:
In [37]: 5 - 7
Out[37]: -2
In [38]: 12 + 21.5
Out[38]: 33.5
In [39]: 5 <= 2
Out[39]: False
See Table 2-1 for all of the available binary operators.
32 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
It looked fora victim, or victims, for its fear. Once upon a time,
witches were burned to ease the terrors of ignorance, and plague-
spreaders were executed in times of pestilence to assure everybody
that now the plague would cease since somebody had been killed for
spreading it.
Organizations came into being with the official and impassioned
purpose of seeing that space research ceased immediately. Even
more violent organizations demanded the punishment of everybody
who had ever considered space travel a desirable thing. Congress
cut some hundreds of millions from a guided-missile-space-
exploration appropriation as a starter. A poor devil of a crackpot in
Santa Monica, California, revealed what he said was a spaceship he'd
built in his back yard to answer the signals from M-387. He intended
to charge a quarter admission to inspect it, using the money to
complete the drive apparatus. The thing was built of plywood and
could not conceivably lift off the ground, but a mob wrecked his
house, burned the puerile "spaceship" and would have lynched its
builder if they'd thought to look in a cellar vegetable closet. Other
crackpots who were more sensitive to public feelings announced the
picking up of messages addressed to the distant Something. The
messages, said this second class of crackpot, were reports from
spies who had been landed on Earth from flying saucers during the
past few decades. They did not explain how they were able to
translate them. A rush of flying-saucer sightings followed inevitably
—alleged to be landing-parties from M-387—and in Peoria, Illinois, a
picnicking party sighted an unidentified flying object shaped like a
soup spoon, the handle obviously being its tail. Experienced
newspapermen anticipated reports of the sighting of unidentified
flying objects shaped like knives and forks as soon as somebody
happened to think of it.
Sandy called a conference on the subject of security. She did not
look well, nowadays. She worried. Other people thought about the
messages from space, but Sandy had to think of something more
concrete. Six months earlier, the construction going on within a
plaster of Paris mould would have been laughed at, tolerantly, and
57.
some hopeful peoplemight have been respectful about it. But now it
was something utterly intolerable to public opinion. Newspapers
who'd lost circulation by talking sanely about space travel now got it
back by denouncing the people who'd answered the first broadcast.
And naturally, with the whole idea of outer space agitatedly
disapproved, everybody connected with it was suspected of
subversion.
"A reporter called up today," said Sandy. "He said he'd like to do a
feature story on Burke Development's new research triumph—the
new guided missile that flew thirty miles and froze everything
around where it landed. I said it fell out of an aeroplane and the last
completed project was for Interiors, Inc. Then he said that he'd been
talking to one of Mr. Holmes' men and the man said something
terrific was under way."
Burke looked uneasy. Holmes said uncomfortably, "There's no law
against what we're building, but somebody may introduce a bill in
Congress any day."
"That would be reasonable under other circumstances. There's a
time for things to be discovered. They shouldn't be accomplished too
soon. But the time for the ship out there is right now!" Burke said.
Pam raised her eyebrows. "Yes?"
"Those signals have to be checked up on," explained Burke. "It's
necessary now. But it could have been bad if our particular
enterprise had started, say, two years ago. Just think what would
have happened if atomic fission had been worked out in peacetime
ten years before World War Two! Scientific discoveries were
published then as a matter of course. Everybody'd have known how
to make atom bombs. Hitler would have had them, and so would
Mussolini. How many of us would be alive?"
Sandy interrupted, "The reporter wants to do a feature story on
what Burke Development is making. I said you were working on a
bomb shelter for quantity production. He asked if the rocket you
58.
shot off throughthe construction-shed wall was part of it. I said
there'd been no rocket fired. He didn't believe me."
"Who would?" asked Holmes.
"Hmmmmm," said Burke. "Tell him to come look at what we're
doing. The ship can pass for a bomb shelter. The wall-garden units
make sense. I'm going to dig a big hole in the morning to test the
drive-shaft in. It'll look like I intend to bury everything. A bomb
shelter should be buried."
"You mean you'll let him inside?" demanded Sandy.
"Sure!" said Burke. "All inventors are expected to be idiots. A lot of
them are. He'll think I'm making an impossibly expensive bomb
shelter, much too costly for a private family to buy. It will be typical
of the inventive mind as reporters think of it. Anyhow, everybody's
always willing to believe other people fools. That'll do the trick!"
Pam said blandly, "Sandy and I live in a boardinghouse, Joe. You
don't ask about such things, but an awfully nice man moved in a
couple of days ago—right after that shaft got away and went flying
thirty miles all by itself. The nice man has been trying to get
acquainted."
Holmes growled, and looked both startled and angry when he
realized it.
Pam added cheerfully, "Most evenings I've been busy, but I think I'll
let him take me to the movies. Just so I can make us all out to be
idiots," she added.
"I'll make the hole big enough to be convincing," said Burke. "Sandy,
you make inquiries for a rigger to lift and move the bomb shelter
into its hole when it's ready. If we seem about to bury it, nobody
should suspect us of ambitions they won't like."
"Why the hole, really?" asked Sandy.
"To put the shaft in," said Burke. "I've got to get it under control or
it won't be anything more than a bomb shelter."
59.
Keller, the instrumentman, had listened with cheerful interest and
without speaking a word. Now he made an indefinite noise and
looked inquiringly at Burke. Burke said, explanatorily, "The shaft
seems to be either on or off—either a magnet that doesn't quite
magnetize, or something that's hell on wheels. It flew thirty miles
without enough power supplied to it to make it quiver. That power
came from somewhere. I think there's a clue in the fact that it froze
everything around where it landed, in spite of traveling fast enough
to heat up from air-friction alone. I've got some ideas about it."
Keller nodded. Then he said urgently, "Broadcast?"
Burke frowned, and turned to Sandy. "That's part of the broadcast
from space that changes—is it still changing?"
"Still changing," said Sandy.
"I didn't think to ask you to keep a check on that. Thanks for
thinking of it, Sandy. Maybe someday I can make up to you for what
you've been going through."
"I doubt it very much," said Sandy grimly. "I'll call the reporter
back."
She waited for them to leave. When they'd gone, she moved
purposefully toward the telephone.
Pam said, "Did you hear that growl when I said I'd go to the movies
with somebody else? I'm having fun, Sandy!"
"I'm not," said Sandy.
"You're too efficient," the younger sister said candidly. "You're
indispensable. Burke couldn't begin to be able to put this thing
through without you. And that's the trouble. You should be
irresistible instead of essential."
"Not with Joe," said Sandy bitterly.
She picked up the telephone to call the newspaper. Pam looked very,
very reflective.
60.
There was alarge deep pit close by the plaster mould when the
reporter came next afternoon. A local rigger had come a little earlier
and was still there, estimating the cost for lifting up the contents of
the mould and lowering it precisely in place to be buried as a bomb
shelter under test should be. It was a fortunate coincidence,
because the reporter brought two other men who he said were
civilian defense officials. They had come to comment on the quality
of the bomb shelter under development. It was not too convincing a
statement.
When they left, Burke was not happy. They knew too much about
the materials and equipment he'd ordered. One man had let slip the
fact that he knew about the very expensive computer Burke had
bought. It could have no conceivable use in a bomb shelter. Both
men painstakingly left it to Burke to mention the thirty-mile flight of
a bronze object which arrived coated with frost of such utter frigidity
that it appeared to be liquid-air snow instead of water-ice. Burke did
not mention it. He was excessively uneasy when the reporter's car
took them away.
He went into the office. Pam was in the midst of a fit of the giggles.
"One of them," she explained, "is the nice man who moved into the
boardinghouse. He wants to take me to the movies. Did you notice
that they came when it ought to be my lunchtime? He asked when I
went to lunch ..."
Holmes came in. He scowled.
"One of my men says that one of those characters has been buying
him drinks and asking questions about what we're doing."
Burke scowled too.
"We can let your men go home in three days more."
"I'm going to start loading up," Holmes announced abruptly. "You
don't know how to stow stuff. You're not a yachtsman."
"I haven't got the shaft under control yet," said Burke.
61.
"You'll get it,"grunted Holmes.
He went out. Pam giggled again.
"He doesn't want me to go to the movies with the nice man from
Security," she told Burke. "But I think I'd better. I'll let him ply me
with popcorn and innocently let slip that Sandy and I know you've
been warned that bomb shelters won't find a mass market unless
they sell for less than the price of an extra bathroom. But if you
want to go broke we don't care."
"Give me three days more," said Burke harassedly.
"Well try," said Sandy suddenly. "Pam can fix up a double date with
one of her friend's friends and we'll both work on them."
Burke frowned absorbedly and went out. Sandy looked indignant. He
hadn't protested.
Burke got Holmes' four workmen out of the ship and had them help
him roll the bronze shaft to the pit and let it down onto a cradle of
timbers. Now if it moved it would have to penetrate solid earth.
The most trivial of computations showed that when the bronze shaft
had flown thirty miles, it hadn't done it on the energy of a condenser
shorted through its coils. The energy had come from somewhere
else. Burke had an idea where it was.
Presently he verified it. The cores and windings he'd adapted from a
transparent hand-weapon seen in an often-repeated dream—those
cores and windings did not make electromagnets. They made
something for which there was not yet a name. When current flows
through a standard electromagnet, the poles of its atoms are more
or less aligned. They tend to point in a single direction. But in this
arrangement of wires and iron no magnetism resulted, yet, the
random motion of the atoms in their framework of crystal structure
was coördinated. In any object above absolute zero all the atoms
and their constituent electrons and nuclei move constantly in all
directions. In such a core as Burke had formed and repeated along
the shaft's length, they all tried to move in one direction at the same
62.
time. Simultaneously, aterrific surge of current appeared in the coils.
A high-speed poleward velocity developed in all the substance of the
shaft. It was the heat-energy contained in the metal, all turned
instantly into kinetic energy. And when its heat-energy was
transformed to something else, the shaft got cold.
Once this fact was understood, control was easy. A single variable
inductance in series with the windings handled everything. In a
certain sense, the gadget was a magnet with negative—minus—self-
inductance. When a plus inductance in series made the self-
inductance zero, neither plus nor minus, the immensely powerful
device became docile. A small current produced a mild thrust,
affecting only part of the random heat-motion of atoms and
molecules. A stronger current produced a greater one. The
resemblance to an electromagnet remained. But the total inductance
must stay close to zero or utterly violent and explosive forward
thrust would develop, and it was calculable only in thousands of
gravities.
Burke had worked for three weeks to make the thing, but he
developed a control system for it in something under four hours.
That same night they got the bronze shaft into the ship. It fitted
perfectly into the place left for it. Burke knew now exactly what he
was doing. He set up his controls. He was able to produce so minute
a thrust that the lath-and-plaster mould merely creaked and swayed.
But he knew that he could make the whole mass surge unstoppably
from its place.
Holmes sent his workmen home. Sandy and Pam went to the movies
with two very nice men who pumped them deftly of all sorts of
erroneous information about Burke and Holmes and Keller and what
they were about. The nice men did not believe that information, but
they did believe that Sandy and Pam believed it. For themselves, the
combination of an object made by Burke which flew thirty miles plus
the presence of Holmes, who built plastic yachts, and the arrival of
Keller to adjust instruments of which they had a complete list—these
things could not be overlooked. But they did feel sorry for two nice
63.
and not over-brightgirls who might be involved in very serious
trouble.
Holmes and Burke installed directional controls, wiring, recording
instruments, etc. Stores and water and oxygen, for emergency use
only, went into the lath-and-plaster construction. Holmes took a
hammer and chisel and painstakingly cracked the mould so that the
top half could be lifted off, leaving the bottom half exposed to the
open air and sky.
Then the broadcast from space cut off. It had been coming
continuously for something like five weeks; one sharp, monotonous
note every two seconds, with a longer, fluting broadcast every
seventy-nine minutes. Now a third, new message began. It was yet
another grouping of the musical tones, with a much longer interval
of specific crackling sounds.
Keller had adjusted every instrument and zestfully retested them
over and over. Burke asked him to see if the third space message
compared in any way with the second. Keller put them through a
hook-up of instruments, beaming to himself, and the answer began
to appear.
Newspapers burst into new headlines. "Ultimatum from Space" they
thundered. "Threats from Alien Space Travelers." And as they
presented the situation it seemed believable that the third message
from the void was a threat.
The first had been a call, requiring an answer. When the answer
went out from Earth, a second message replaced the call. It
contained not only flute tones which might be considered to
represent words, but cracklings which might be the equivalent of
numbers. The continuous beepings between repetitions of the
second message were plainly a directional signal to be followed to
the message source.
In this context, the newspapers furiously asserted that the third
message was a threat. The first had been merely a summons, the
second had been a command to repair to the signaling entities, and
64.
the third wasa stern reiteration of the command, reinforced by
threats.
The human race does not take kindly to threats, especially when it
feels helpless. In the United States, there was such explosive
resentment as to require spread-eagle oratory by all public figures.
The President declared that every space missile in store had been
fitted with atomic-fusion warheads and that any alien spacecraft
which appeared in American skies would be shot down immediately.
Congress reported out of committee a bill for rocket weapons which
was stalled for six days because every senator and representative
wanted to make a speech in its favor. It was the largest
appropriation bill ever passed by Congress, which less than five
weeks before had cut two hundred millions out of a guided-missile-
space-exploration budget.
And in Europe there was frenzy.
For Burke and Holmes and Sandy and Pam and the smiling,
inarticulate Keller, the matter was deadly serious. Fury such as the
public felt constituted a witch-hunt in itself. Suspicious private
persons overwhelmed the FBI and the Space Agency with
information about characters they were sure were giving military
secrets to the space travelers on M-387. There were reports of aliens
skulking about American cities wearing luxuriant whiskers and dark
glasses to conceal their non-human features. Artists, hermits, and
mere amateur beard-growers found it wise to shave, and spirit
mediums, fortunetellers and, in the South, herb doctors reaped
harvests by the sale of ominous predictions and infallible advice on
how to escape annihilation from space.
And Burke Development, Inc., was building something that neither
Civilian Defense nor the FBI believed was a bomb shelter.
The three days Burke had needed passed. A fourth. He and Holmes
practically abandoned sleep to get everything finished inside the
plaster mould. Keller happily completed his graphs and took them to
Burke. They showed that the cracklings, which presumably meant
65.
numbers, had beenexpanded. What they said was now told on a
new scale. If the numbers had meant months or years, they now
meant days and hours. If they had meant millions of miles, they now
meant thousands or hundreds.
Burke was struggling with these implications when there was a
tapping at the air-lock, through which all entry and egress from the
ship took place. Holmes opened the inner door. Sandy and Pam
crawled through the lock which lay on its side instead of upright.
Sandy looked at Burke.
Pam said amiably, "We figured the job was about finished and we
wanted to see it. How do you fasten this door?"
Holmes showed her. The vessel that had been built inside the mould
did not seem as large as the outside structure promised. It looked
queer, too, because everything lay on its side. There were two
compartments with a ladder between, but the ladder lay on the floor.
The wall-gardens looked healthy under the fluorescent lamps which
kept the grass and vegetation flourishing. There were instrument
dials everywhere.
Sandy went to Burke's side.
"We're all but done," said Burke tiredly, "and Keller's just about
proved what the signals are."
"Can we go with you?" asked Sandy.
"Of course not," said Burke. "The first message was a distress call. It
had to be. Only in a distress call would somebody go into details so
any listener would know it was important. It called for help and said
who needed it, and why, and where."
Pam turned to Holmes. "Can that air-lock be opened from outside?"
It couldn't. Not when it was fastened, as now.
"Somebody answered that call from Earth," said Burke heavily, "and
the second message told more about what was wrong. The clickings,
we think, are numbers that told how long help could be waited for,
66.
or something onthat order. And then there was a beacon signal
meant to lead whoever was coming to help to that place."
Keller smiled pleasantly at Pam. He made an electrical connection
and zestfully checked the result.
"Now there's a third message," said Burke. "Time's running out for
whoever needs whatever help is called for. The clickings that seem
to be numbers have changed. The—what you might call the scale of
reportage—is new. They're telling us just how long they can wait or
just how bad their situation is. They're saying that time is running
out and they're saying, 'Hurry!'"
There was a thumping sound. Only Sandy and Pam looked
unsurprised. Burke stared.
Sandy said firmly, "That's the police, Joe. We've been going to the
movies with people who want to talk about you. Yesterday one of
them confided to us that you were dangerous, and since he told us
to get away from the office, we did. There might be shooting. He
tipped us a little while ago."
Burke swore. There were other thumpings. Louder ones. They were
on the air-lock door.
"If you try to put us out," said Sandy calmly, "you'll have to open
that door and they'll try to fight their way in—and then where'll you
be?"
Keller turned from the checking of the last instrument He looked at
the others with excited eyes. He waited.
"I don't know what they can arrest you for," said Sandy, "and maybe
they don't either, unless it's unauthorized artillery practice. But you
can't put us out! And you know darn well that unless you do
something they'll chop their way in!"
Burke said, "Dammit, they're not going to stop me from finding out if
this thing works!"
67.
He squirmed ina chair which had its base firmly fastened to a wall
and began to punch buttons.
"Hold fast!" he said angrily. "At least we'll see...."
There were loud snapping sounds. There were creakings. The room
stirred. It turned in a completely unbelievable fashion. Violent
crashings sounded outside. Abruptly, a small television screen before
Burke acquired an image. It was of the outside world reeling wildly.
Holmes seized a hand-hold and grabbed Pam. He kept her from
falling as a side wall became the floor, and what had been the floor
became a side wall, with the ceiling another. It seemed that all the
cosmos changed, though only walls and floors changed places.
Suddenly everything seemed normal but new. The surface underfoot
was covered with a rubber mat. The hydroponic wall-garden sections
were now vertical. Burke sat upright, and something over his head
rotated a half-turn and was still. But it became coated with frost.
More crashes. More small television screens acquired images. They
showed the office of Burke Development, Inc., against a tilted
landscape. The landscape leveled. Another showed the construction
shed. One showed cloud formations, very bright and distinct. And
two others showed a small, armed, formidable body of men
instinctively backing away from the outside television lens.
"So far," said Burke, "it works. Now—"
There was a sensation as of a rapidly rising elevator. Such a
sensation usually lasts for part of a second. This kept on. One of the
six television screens suddenly showed a view of Burke Development
from straight overhead. The buildings and men and the four-acre
enclosure dwindled rapidly. They were very tiny indeed and nearly all
of the town was in the camera's field of vision when a vague
whiteness, a cloud, moved in between.
"The devil!" said Burke. "Now they'll alert fighter planes and rocket
installations and decide that we're either traitors or aliens in disguise
and better be shot down. I think we simply have to go on!"
68.
Keller made gestures,his eyes bright. Burke looked worried.
"It shouldn't take more than ten minutes to get a Nike aloft and
after us. We must have been picked up by radar already.... We'll
head north. We have to, anyhow."
But he was wrong about the ten minutes. It was fifteen before a
rocket came into view, pouring out enormous masses of drive-
fumes. It flung itself toward the ship.
69.
Chapter 5
From asufficient height and a sufficient distance, the rocket's
repeated attacks must have appeared like the strikings and twistings
of a gigantic snake. It left behind it a writhing trail of fumes which
was convincingly serpentine. It climbed and struck, and climbed and
struck, like a monstrous python flinging itself furiously at some
invisible prey. Six, seven, eight times it plunged frenziedly at the
minute egg-shaped ship which scuttled for the heavens. Each time it
missed and writhed about to dart again.
Then its fuel gave out and for all intents and purposes it ceased to
exist. The thick, opaque trail it left behind began to dissipate. The
path of vapor scattered. It spread to rags and tatters of
unsubstantiality through which the rocket plummeted downward in
the long fall which is a spent rocket's ending.
Burke cautiously cut down the drive and awkwardly turned the ship
on its side, heading it toward the north. The state of things inside
the ship was one of intolerable tenseness.
"I'm a new driver," said Burke, "and that was a tough bit of driving
to do." He glanced at the exterior-pressure meter. "There's no air
outside to register. We must be fifty or sixty miles high and maybe
still rising. But we're not leaking air."
Actually the plastic ship was eighty miles up. The sunlit world
beneath it showed white patches of cloud in patterns a
meteorologist would have found interesting. Burke could see the
valley of the St. Lawrence River between the white areas. But the
Earth's surface was curiously foreshortened. What was beneath
seemed utterly flat, and at the edge of the world all appeared
distorted and unreal.
Holmes, still pale, asked, "How'd we get away from that rocket?"
70.
"We accelerated," saidBurke. "It was a defensive rocket. It was
designed to knock down jet bomb carriers or ballistic missiles which
travel at a constant speed. Target-seeking missiles can lock onto the
radar echo from a coasting ship, or one going at its highest speed
because their computers predict where their target, traveling at
constant speed, can be intercepted. We were never there. We were
accelerating. Missile-guidance systems can't measure acceleration
and allow for it. They shouldn't have to."
Four of the six television screens showed dark sky with twinkling
lights in it. On one there was the dim outline of the sun, reversed to
blackness because its light was too great to be registered in a
normal fashion. The other screen showed Earth.
There was a buzzing, and Keller looked at Burke.
"Rocket?" asked Burke. Keller shook his head. "Radar?" Keller
nodded.
"The DEW line, most likely," said Burke in a worried tone. "I don't
know whether they've got rockets that can reach us. But I know
fighter planes can't get this high. Maybe they can throw a spread of
air-to-air rockets, though.... I don't know their range."
Sandy said unsteadily, "They shouldn't do this to us! We're not
criminals! At least they should ask us who we are and what we're
doing!"
"They probably did," said Burke, "and we didn't answer. See if you
can pick up some voices, Keller."
Keller twirled dials and set indicators. Voices burst into speech.
"Reporting UFO sighted extreme altitude coördinates—First rocket
exhausted fuel in multiple attacks and fell, sir." Another voice, very
brisk, "Thirty-second squadron, scramble! Keep top altitude and get
under it. If it descends within range, blast it!" Another voice said
crisply, "Coördinates three-seven Jacob, one-nine Alfred...."
Keller turned the voices down to mutters because they were useless.
71.
Burke said, "Hell!We ought to land somewhere and check over the
ship. Keller, can you give me a microphone and a wavelength
somebody will be likely to pick up?"
Keller shrugged and picked up masses of wire. He began to work on
an as yet unfinished wiring job. Evidently, the ship was not near
enough to completion to be capable of a call to ground. It had taken
off with many things not finished. Burke, at the controls, found it
possible to think of a number of items that should have been
examined exhaustively before the ship left the mould in which it had
been made. He worried.
Pam said in a strange voice, "I thought I might rate as a heroine for
stowing away on this voyage, but I didn't think we'd have to dodge
rockets and fighter planes to get away!"
There was no comment.
"I'm a beginner at navigation," said Burke a little later, more worried
than before. "I know we have to go out over the north magnetic
pole, but how the hell do I find that?"
Keller beamed. He dropped his wiring job and went to the imposing
bank of electronic instruments. He set one, and then another, and
then a third. The action, of course, was similar to that of an airline
pilot when he tunes in broadcasting stations in different cities. From
each, a directional reading can be taken. Where the lines of direction
cross, there the transport plane must be. But Keller turned to
shortwave transmitters whose transmissions could be picked up in
space. Presently, eighty miles high, he wrote a latitude and longitude
neatly on a slip of paper, wrote "North magnetic pole 93°W, 71°N,
nearly," and after that a course.
"Hm," said Burke. "Thanks."
Then there was a relative silence inside the ship. Only a faint mutter
of voices came from assorted speakers that Keller had first turned on
and then turned down, and a small humming sound from a gyro.
When they listened, they could also hear a high sweet musical tone.
Burke shifted this control here, and that control there, and lifted his
72.
hands. The shipmoved on steadily. He checked this and that and the
other thing. He was pleased. But there were innumerable things to
be checked. Holmes went down the ladder to the other
compartment below. There were details to be looked into there, too.
One of the screens portrayed Earth from a height of seventy miles
instead of eighty, now. Others pictured the heavens, with very many
stars shining unwinkingly out of blackness. Keller got at his wires
again and resumed the work of installing a ship-to-ground
transmitter and its connection to an exterior-reflecting antenna.
Sandy watched Burke as he moved about, testing one thing after
another. From time to time he glanced at the screens which had to
serve in the place of windows. Once he went back to the control-
board and changed an adjustment.
"We dropped down ten miles," he explained to Sandy. "And I suspect
we're being trailed by jets down below."
Holmes meticulously inspected all storage places. He'd packed them
when the ship lay on her side.
Burke read an instrument and said with satisfaction, "We're running
on sunshine!"
He meant that in empty space certain aluminum plates on the
outside of the hull were picking up heat from the naked sun. The
use of the drive-shaft lowered its temperature. Metallic connection
with the outside plates conducted heat inward from those plates.
The drive-shaft was cold to the touch, but it could drop four hundred
degrees Fahrenheit before it ceased to operate as a drive. It was
gratifying that it had cooled so little up to this moment.
Later Keller tapped Burke on the shoulder and jerked his thumb
upward.
"We go up now?" asked Burke.
Keller nodded. Burke carefully swung the ship to aim vertically. The
views of solid Earth slid from previous screens to new ones. The
stars and the dark object which was the sun also moved across their
73.
screens to vanishand reappear on others. Then Burke touched the
drive-control. Once more they had the sensation of being in a rising
elevator. And at just that moment spots appeared on the barren, icy,
totally flattened terrain below.
They were rocket-trails from target-seeking missiles which had
reached the area of the north magnetic pole by herculean effort and
were aimed at the radar-detected little ship by the heavy planes that
carried them.
From the surface of the Earth, it would have seemed that monstrous
columns of foaming white appeared and rose with incredible
swiftness toward the heavens. They reached on, up and up and up,
seeming to draw closer together as they became smaller in the
distance, until all eight of them seemed to merge into a single point
of infinite whiteness in the sunshine above the world's blanket of air.
But nothing happened. Nothing. The ship did not accelerate as fast
as the rockets, but it had started first and it kept up longer. It went
scuttling away to emptiness and the bottoms of the towers of rocket-
smoke drifted away and away over the barren landscape all covered
with ice and snow.
When Earth looked like a huge round ball that did not even seem
very near, with a night side that was like a curious black chasm
among the stars, the atmosphere of tension inside the ship
diminished. Keller completed his wiring of a ship-to-ground
transmitter. He stood up, brushed off his hands and beamed.
The little ship continued on. Its temperature remained constant. The
air in it smelled of growing green stuff. It was moist. It was warm.
Keller turned a knob and a tiny, beeping noise could be heard. Dials
pointed, precisely.
"We couldn't go on our true course earlier," Burke told Sandy,
"because we had to get out beyond the Van Allen bands of cosmic
particles in orbit around the world. Pretty deadly stuff, that radiation!
In theory, though, all we have to do now is swing onto our proper
74.
course and followthose beepings home. We ought to be in harmless
emptiness here. Do you want to call Washington?"
She stared.
"We need help to navigate—or astrogate," said Burke. "Call them,
Sandy. I'll get on the wire when a general answers."
Sandy went jerkily to the transmitter just connected. She began to
speak steadily, "Calling Earth! Calling Earth! The spaceship you just
shot all those rockets at is calling! Calling Earth!"
It grew monotonous, but eventually a suspicious voice demanded
further identification.
It was a peculiar conversation. The five in the small spaceship were
considered traitors on Earth because they had exercised the
traditional right of American citizens to go about their own business
unhindered. It happened that their private purposes ran counter to
the emotional state of the public. Hence voices berated Sandy and
furiously demanded that the ship return immediately. Sandy insisted
on higher authority and presently an official voice identified itself as
general so-and-so and sternly commanded that the ship
acknowledge and obey orders to return to Earth. Burke took the
transmitter.
"My name's Burke," he said mildly. "If you can arrange some sort of
code, I'll tell you how to find the plans, and I'll give you the
instructions you'll need to build more ships like this. They can follow
us out. I think they should. I believe that this is more important than
anything else you can think of at the moment."
Silence. Then more sternness. But ultimately the official voice said,
"I'll get a code expert on this."
Burke handed the microphone to Sandy.
"Take over. We've got to arrange a cipher so nobody who listens in
can learn about official business. We may use a social security
number for a key, or the name of your maiden aunt's first
sweetheart, or something we know and Washington can find out but
75.
that nobody elsecan. Hm. Your last year's car-license number might
be a starter. They can seal up the records on that!"
Sandy took over the job. What was transmitted to Earth, of course,
could be picked up anywhere over an entire hemisphere. Somebody
would assuredly pass on what they overheard to, say, nations the
United States would rather have behind it than ahead of it in space-
travel equipment. Burke's suggestion of a cipher and instructions
changed his entire status with authority. They'd rather have had him
come back, but this was second best, and they took it.
From Burke's standpoint it was the only thing to do. He had no
official standing to lend weight to his claim that lunatic magnet-cores
with insanely complicated windings would amount to space-drive
units. If he returned, in the nature of things there would be a long
delay before mere facts could overcome theoreticians' convictions.
But now he was forty-five thousand miles out from Earth.
He had changed course to home on the beeping signals from M-387,
was accelerating at one full gravity and had been doing so for forty-
five minutes. And the small ship already had a velocity of twenty
miles per second and was still going up. All the rockets that men had
made, plus the Russian manned-probe drifting outward now, had
become as much outdated for space travel as flint arrowheads are
for war.
Burke returned to the microphone when Sandy left it to get a pencil
and paper.
"By the way," he said briskly. "We can keep on accelerating
indefinitely at one gravity. We've got radars. We got them from—"
He named the supplier. "Now we want advice on how fast we can
risk traveling before we'll be going too fast to dodge meteors or
whatnot that the radar may detect. Get that figured out for us, will
you?"
He gave back the instrument to Sandy and returned to his inspection
of every item of functioning equipment in the ship. He found one or
two trivial things to be bettered. The small craft went on in a
76.
singularly matter-of-fact fashion.If it had been a bomb shelter
buried in the pit beside the mould in which it was built, there would
have been very little difference in the feel of things. The constant
acceleration substituted perfectly for gravity. The six television
screens, to be sure, pictured incredible things outside, but television
screens often picture incredible things. The wall-gardens looked
green and flourishing. The pumps were noiseless. There were no
moving parts in the drive. The gyro held everything steady. There
was no vibration.
Nobody could remain upset in such an unexciting environment.
Presently Pam explored the living quarters below. Holmes took his
place in the control-chair, but found no need to touch anything.
Some time later Sandy reported, "Joe, they say we must be lying,
but if we can keep on accelerating, we'd better not hit over four
hundred miles a second. They say we can then swing end for end
and decelerate down to two hundred, and then swing once more
and build up to four again. But they insist that we ought to return to
Earth."
"They don't mention shooting rockets at us, do they?" asked Burke.
"I thought they wouldn't. Just say thanks and go on working out a
code."
Sandy set to work with pencil and paper. Federal agents would be
moving, now, to impound all official records that were in any way
connected with any of the five on the ship. The key to the code
would be contained in such records. It would be an agglomeration of
such items as Burke's grandmother's maiden name, Holmes' social-
security number, the name of a street Burke had lived on some years
before, the exact amount of his federal income taxes the previous
year, the title of a book third from the end on the second shelf of a
bookcase in Keller's apartment, and such unconsidered items as
most people can remember with a little effort, but which can only be
found out by people who know where to look. These people would
keep anybody else from looking in the same places. Such a code
would be clumsy to work with, but it would be unbreakable.
77.
It took hoursto establish it without the mention of a single word
included in the lengthy key. The ship reached four hundred miles a
second, turned about, and began to cut down its speed again.
Pam spoke from beside an electric stove, "Dinner's ready! Come and
get it!"
They dined; Sandy weary, Burke absorbed and inevitably worried,
Holmes placid and amiable, and Keller beaming and interested in all
that went on, which was practically nothing.
They did not see the stars direct, because television cameras were
preferable to portholes. Earth had become very small, and as it
swung ever more nearly into a direct line between the ship and the
sun, night filled more of its disk until only a hairline of sunshine
showed at one edge. The microwave receivers ceased to mutter. The
working astronomers on Earth who'd sent a message to M-387 were
suddenly relieved of their disgrace and set to work again to equip
the West Virginia radar telescope for continuous communication with
Burke's ship. Other technicians began to prepare multiple receptors
to pick up the ship's signals from hitherto unprecedented distances
for human two-way communication.
And on Earth an official statement went out from high authority. It
announced that a hurriedly completed American ship was on the way
to M-387 to investigate the signals from space. It announced that
measures long in preparation were now in use, and that an invincible
fleet of spacecraft would be completed in months, whereas they had
not been hoped for for another generation. An unexpected
breakthrough had made it possible to advance the science of space
travel by many decades, and a fleet to explore all the planets as well
as M-387 was already under construction. It was almost true that
they were. The blueprints of Burke's ship had been flown to
Washington from the plant, and an enormous number of replicas of
the egg-shaped vessel were ordered to be begun immediately, even
before the theory of the drive was understood.
78.
There was oneminor hitch. A legal-minded official protested that
Congressional appropriations had been for rocket-driven spaceships
only, and the money appropriated could not be used for other than
rockets. An executive order settled the matter. Then theorists began
to object to the principle of the drive. It contradicted well-
established scientific beliefs. It could not work.
It did, but there was violent opposition to the fact.
Publicly, of course, the shock of such an about-face by the national
government was extreme. But newspapers flashed new headlines.
"U.S. SHIP SPEEDING TO QUERY ALIENS!" Lesser heads announced,
"Critical Velocity Exceeded! Russian Probe Already Passed!" The last
was not quite true. The Russian manned probe had started out ten
days before. Burke hadn't overtaken it yet.
Broadcasters issued special bulletins, and two networks canceled top
evening programs to schedule interviews with prominent scientists
who'd had nothing whatever to do with what Burke had managed to
achieve.
In Europe, obviously, the political effect was stupendous. Russia was
reduced to impassioned claims that the ship had been built from
Russian plans, using Russian discoveries, which had been stolen by
imperialistic secret agents. And the heads of the Russian spy system
were disgraced for not having, in fact, stolen the plans and
discoveries from the Americans. All other operatives received threats
of what would happen to them if they didn't repair that omission.
These threats so scared half a dozen operatives that they defected
and told all they knew, thereby wrecking the Russian spy system for
the time being.
Essentially, however, the recovery of confidence in America was as
extravagant as the previous unhappy desire to hear no more about
space. Burke, Holmes, Keller, Sandy and Pam became national
heroes and heroines within eighteen hours after guided missiles had
failed to shoot them down. The only criticism came from a highly
conservative clergyman who hoped that other young girls would not
79.
imitate Sandy's andPam's disregard of convention and maintained
that a married woman should have gone along to chaperon them.
The atmosphere in the ship, however, was that of respectability
carried to the point where things were dull. The lower compartment
of the ship, being smaller, was inevitably appropriated by Sandy and
Pam. They retired when the ship was twenty hours out from Earth.
Each of them had prepared for stowing away by wearing extra
garments in layers.
"Funny," said Pam, yawning as they made ready to turn in, "I
thought it was going to be exciting. But it's just like a rather full day
at the office."
"Which," said Sandy, "I'm quite used to."
"I do think you ought to have barged in when they designed the
ship, Sandy. There's not one mirror in it!"
In the upper compartment Keller took his place in the control-chair
and took a trick of duty. It consisted solely of looking at the
instruments and listening to the beeping noises which came from
remoteness every two seconds, and the still completely cryptic
broadcasts which came every seventy-nine minutes. It wasn't
exciting. There was nothing to be excited about. But somebody had
to be on watch.
On the second day out, Washington was ready to use the new code.
The West Virginia radar bowl was powered to handle
communications again. Sandy painstakingly took down the gibberish
that came in and decoded it. From then on she worked at the coding
and transmission of messages and the reception and decoding of
others. Presently Pam relieved her at the job. Pam tended to be
bored because Holmes was as much absorbed in the business of
keeping anything from happening as was Burke.
The messages were almost entirely requests for, and answers to
requests for, details about the ship plans. The United States had not
yet completed a duplicate drive-shaft. Machinists labored to
reproduce the cores, which would then have to be wound in the
80.
complicated fashion theplans described. But it was an unhappy
experience for the scientific minds assigned to duplicate Burke's
ship. No woman ever followed a recipe without making some
change. Very few physicists can duplicate another's apparatus
without itching to change it. There were six copies of the drive under
construction at the same time, at the beginning. Four were made by
skeptics, who adhered to the original plans with strict accuracy. They
were sure they'd prove Burke wrong. Two were "improved" in the
making. The four, when finished, worked beautifully. The two
doctored versions did not. But still there was fretful discussion of the
theory of the drive. It seemed flatly to contradict Newton's law that
every action has a reaction of equal moment and opposite sign—a
law at least as firmly founded as the law of the conservation of
energy. But that had lately been revised into the law of the
conservation of energy and matter, which now was gospel. Burke's
theory required the Newtonian law to be restated to read "every
action of a given force has a reaction of the same force, of the same
moment," and so on. When the reaction of one force is converted
into another force, the results can be interesting. In fact, one can
have a space-drive. But there was bitter resistance to the idea. It
was demanded that Burke justify his views in a more reasonable way
than by mere demonstration that they worked.
After a time, Burke gave up trying to explain things. And when one
and then another duplicate drive worked, the argument ceased. But
eminent physicists still had a resentful feeling that Burke was
cheating on them somehow.
Then for days nothing happened. One of the three men in the ship
always stayed in the control-chair where he could check the ship's
course against the homing signals from the asteroid. He might have
to correct it by the fraction of a hair, or swing ship and put on more
drive if the radar should show celestial debris in the spaceship's
path. Every so many hours the ship had to be swung about so that
instead of accelerating she decelerated, or instead of decelerating
gained fresh speed. But that was all.
81.
On the fifthday there was the flash of a meteor on the radar. On the
seventh day an object which could have been the second or third
unmanned Russian probe showed briefly at the very edge of the
radar screens. In essence, however, the journey was pure tedium.
Burke wearied of making sure that his work was good, though he
congratulated himself that nothing did happen to break the
monotony. Holmes admitted that he was disappointed. He'd wanted
to make the journey because he'd sailed in everything but a
spaceship. But there was no fun in it. Keller alone seemed
comfortably absorbed. He prepared daily lists of instrument-readings
to be sent back to Earth. They would be of enormous importance to
science-minded people. They were not of interest to Sandy.
Even when she talked to Burke, it was necessarily impersonal. There
could be no privacy which was not ostentatious. The two girls used
the lower compartment, the three men the upper and larger one.
For Sandy to talk privately with Burke, she'd have had to go to the
small bottom section of the ship. Holmes and Pam faced the same
situation. It was uncomfortable. So they developed a perfectly
pleasant habit of talking exclusively of things everybody could talk
about. It did not bother Keller, who would hardly average a dozen
words in twenty-four hours, but Sandy muttered to herself when she
and Pam retired for what was a ship-night's rest.
When they went past the orbit of Mars, agitated instructions came
out from Earth. The asteroid belts began beyond Mars. Elaborate
directions came. The ship was tracked by radar telescopes all around
the world, direction-finding on its transmission. Croydon kept track.
American radar bowls picked up the ship's voice. South American
and Hawaiian and Japanese and Siberian radar telescopes
determined the ship's position every time a set of code symbols
reached Earth from the ship. Of course, there were also the
beepings and the seventy-nine-minute-spaced identical broadcasts
from farther out from the sun.
Somebody got a brilliant idea and authority to try it. An interview for
broadcast on Earth was sought with somebody on the ship. It was
82.
Welcome to ourwebsite – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com