KEMBAR78
Python For Data Analysis 3rd Wes Mckinney | PDF
Python For Data Analysis 3rd Wes Mckinney
download
https://ebookbell.com/product/python-for-data-analysis-3rd-wes-
mckinney-46540276
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Python For Data Analysis Data Wrangling With Pandas Numpy And Ipython
2nd Edition Mckinney
https://ebookbell.com/product/python-for-data-analysis-data-wrangling-
with-pandas-numpy-and-ipython-2nd-edition-mckinney-22532878
Python For Data Analysis Wes Mckinney
https://ebookbell.com/product/python-for-data-analysis-wes-
mckinney-2612882
Python For Data Analysis The Ultimate And Definitive Manual To Learn
Data Science And Coding With Python Master The Basics Of Machine
Learning To Clean Code And Improve Artificial Intelligence Matt Algore
https://ebookbell.com/product/python-for-data-analysis-the-ultimate-
and-definitive-manual-to-learn-data-science-and-coding-with-python-
master-the-basics-of-machine-learning-to-clean-code-and-improve-
artificial-intelligence-matt-algore-29874340
Python For Data Analysis 3rd Edition Second Early Release 3rd Wes
Mckinney
https://ebookbell.com/product/python-for-data-analysis-3rd-edition-
second-early-release-3rd-wes-mckinney-36296812
Python For Data Analysis Unlocking Insights And Driving Innovation
With Powerful Data Techniques 2 In 1 Guide Brian Paul
https://ebookbell.com/product/python-for-data-analysis-unlocking-
insights-and-driving-innovation-with-powerful-data-
techniques-2-in-1-guide-brian-paul-55978516
Python For Data Analysis Wes Mckinney
https://ebookbell.com/product/python-for-data-analysis-wes-
mckinney-53639582
Python For Data Analysis Data Wrangling With Pandas Numpy And Ipython
2nd Edition Mckinney
https://ebookbell.com/product/python-for-data-analysis-data-wrangling-
with-pandas-numpy-and-ipython-2nd-edition-mckinney-22122784
Python For Data Analysis Wes Mckinney
https://ebookbell.com/product/python-for-data-analysis-wes-
mckinney-11939498
Python For Data Analysis 3rd Edition Wes Mckinney
https://ebookbell.com/product/python-for-data-analysis-3rd-edition-
wes-mckinney-232897832
Python
for Data Analysis
Data Wrangling with pandas, NumPy & Jupyter
Wes McKinney
T
h
i
r
d
E
d
i
t
i
o
n
powered by
DATA
“With this new edition,
Wes has updated his
book to ensure it remains
the go-to resource for
all things related to data
analysis with Python
and pandas. I cannot
recommend this book
highly enough.”
—Paul Barry
Lecturer and author of O’Reilly’s
Head First Python
Python for Data Analysis
9 781098 104030
5 6 9 9 9
US $69.99 CAN $87.99
ISBN: 978-1-098-10403-0
Twitter: @oreillymedia
linkedin.com/company/oreilly-media
youtube.com/oreillymedia
Get the definitive handbook for manipulating, processing,
cleaning, and crunching datasets in Python. Updated for
Python 3.10 and pandas 1.4, the third edition of this hands-
on guide is packed with practical case studies that show you
how to solve a broad set of data analysis problems effectively.
You’ll learn the latest versions of pandas, NumPy, and Jupyter
in the process.
Written by Wes McKinney, the creator of the Python pandas
project, this book is a practical, modern introduction to
data science tools in Python. It’s ideal for analysts new to
Python and for Python programmers new to data science
and scientific computing. Data files and related material are
available on GitHub.
• Use the Jupyter notebook and the IPython shell for
exploratory computing
• Learn basic and advanced features in NumPy
• Get started with data analysis tools in the pandas library
• Use flexible tools to load, clean, transform, merge, and
reshape data
• Create informative visualizations with matplotlib
• Apply the pandas groupBy facility to slice, dice, and
summarize datasets
• Analyze and manipulate regular and irregular time series
data
• Learn how to solve real-world data analysis problems with
thorough, detailed examples
Wes McKinney, cofounder and chief
technology officer of Voltron Data, is
an active member of the Python data
community and an advocate for Python
use in data analysis, finance, and
statistical computing applications. A
graduate of MIT, he’s also a member of
the project management committees
for the Apache Software Foundation’s
Apache Arrow and Apache Parquet
projects.
Wes McKinney
Python for Data Analysis
Data Wrangling with pandas,
NumPy, and Jupyter
THIRD EDITION
Boston Farnham Sebastopol Tokyo
Beijing Boston Farnham Sebastopol Tokyo
Beijing
978-1-098-10403-0
[LSI]
Python for Data Analysis
by Wes McKinney
Copyright © 2022 Wesley McKinney. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional
sales department: 800-998-9938 or corporate@oreilly.com.
Acquisitions Editor: Jessica Haberman
Development Editor: Angela Rufino
Production Editor: Christopher Faucher
Copyeditor: Sonia Saruba
Proofreader: Piper Editorial Consulting, LLC
Indexer: Sue Klefstad
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Kate Dullea
October 2012: First Edition
October 2017: Second Edition
August 2022: Third Edition
Revision History for the Third Edition
2022-08-12: First Release
See https://www.oreilly.com/catalog/errata.csp?isbn=0636920519829 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Python for Data Analysis, the cover
image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the author disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use
of or reliance on this work. Use of the information and instructions contained in this work is at your
own risk. If any code samples or other technology this work contains or describes is subject to open
source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1. Preliminaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 What Is This Book About? 1
What Kinds of Data? 1
1.2 Why Python for Data Analysis? 2
Python as Glue 3
Solving the “Two-Language” Problem 3
Why Not Python? 3
1.3 Essential Python Libraries 4
NumPy 4
pandas 5
matplotlib 6
IPython and Jupyter 6
SciPy 7
scikit-learn 8
statsmodels 8
Other Packages 9
1.4 Installation and Setup 9
Miniconda on Windows 9
GNU/Linux 10
Miniconda on macOS 11
Installing Necessary Packages 11
Integrated Development Environments and Text Editors 12
1.5 Community and Conferences 13
1.6 Navigating This Book 14
Code Examples 15
iii
Data for Examples 15
Import Conventions 16
2. Python Language Basics, IPython, and Jupyter Notebooks. . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 The Python Interpreter 18
2.2 IPython Basics 19
Running the IPython Shell 19
Running the Jupyter Notebook 20
Tab Completion 23
Introspection 25
2.3 Python Language Basics 26
Language Semantics 26
Scalar Types 34
Control Flow 42
2.4 Conclusion 45
3. Built-In Data Structures, Functions, and Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1 Data Structures and Sequences 47
Tuple 47
List 51
Dictionary 55
Set 59
Built-In Sequence Functions 62
List, Set, and Dictionary Comprehensions 63
3.2 Functions 65
Namespaces, Scope, and Local Functions 67
Returning Multiple Values 68
Functions Are Objects 69
Anonymous (Lambda) Functions 70
Generators 71
Errors and Exception Handling 74
3.3 Files and the Operating System 76
Bytes and Unicode with Files 80
3.4 Conclusion 82
4. NumPy Basics: Arrays and Vectorized Computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1 The NumPy ndarray: A Multidimensional Array Object 85
Creating ndarrays 86
Data Types for ndarrays 88
Arithmetic with NumPy Arrays 91
Basic Indexing and Slicing 92
iv | Table of Contents
Boolean Indexing 97
Fancy Indexing 100
Transposing Arrays and Swapping Axes 102
4.2 Pseudorandom Number Generation 103
4.3 Universal Functions: Fast Element-Wise Array Functions 105
4.4 Array-Oriented Programming with Arrays 108
Expressing Conditional Logic as Array Operations 110
Mathematical and Statistical Methods 111
Methods for Boolean Arrays 113
Sorting 114
Unique and Other Set Logic 115
4.5 File Input and Output with Arrays 116
4.6 Linear Algebra 116
4.7 Example: Random Walks 118
Simulating Many Random Walks at Once 120
4.8 Conclusion 121
5. Getting Started with pandas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.1 Introduction to pandas Data Structures 124
Series 124
DataFrame 129
Index Objects 136
5.2 Essential Functionality 138
Reindexing 138
Dropping Entries from an Axis 141
Indexing, Selection, and Filtering 142
Arithmetic and Data Alignment 152
Function Application and Mapping 158
Sorting and Ranking 160
Axis Indexes with Duplicate Labels 164
5.3 Summarizing and Computing Descriptive Statistics 165
Correlation and Covariance 168
Unique Values, Value Counts, and Membership 170
5.4 Conclusion 173
6. Data Loading, Storage, and File Formats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.1 Reading and Writing Data in Text Format 175
Reading Text Files in Pieces 182
Writing Data to Text Format 184
Working with Other Delimited Formats 185
JSON Data 187
Table of Contents | v
XML and HTML: Web Scraping 189
6.2 Binary Data Formats 193
Reading Microsoft Excel Files 194
Using HDF5 Format 195
6.3 Interacting with Web APIs 197
6.4 Interacting with Databases 199
6.5 Conclusion 201
7. Data Cleaning and Preparation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
7.1 Handling Missing Data 203
Filtering Out Missing Data 205
Filling In Missing Data 207
7.2 Data Transformation 209
Removing Duplicates 209
Transforming Data Using a Function or Mapping 211
Replacing Values 212
Renaming Axis Indexes 214
Discretization and Binning 215
Detecting and Filtering Outliers 217
Permutation and Random Sampling 219
Computing Indicator/Dummy Variables 221
7.3 Extension Data Types 224
7.4 String Manipulation 227
Python Built-In String Object Methods 227
Regular Expressions 229
String Functions in pandas 232
7.5 Categorical Data 235
Background and Motivation 236
Categorical Extension Type in pandas 237
Computations with Categoricals 240
Categorical Methods 242
7.6 Conclusion 245
8. Data Wrangling: Join, Combine, and Reshape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
8.1 Hierarchical Indexing 247
Reordering and Sorting Levels 250
Summary Statistics by Level 251
Indexing with a DataFrame’s columns 252
8.2 Combining and Merging Datasets 253
Database-Style DataFrame Joins 254
Merging on Index 259
vi | Table of Contents
Concatenating Along an Axis 263
Combining Data with Overlap 268
8.3 Reshaping and Pivoting 270
Reshaping with Hierarchical Indexing 270
Pivoting “Long” to “Wide” Format 273
Pivoting “Wide” to “Long” Format 277
8.4 Conclusion 279
9. Plotting and Visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
9.1 A Brief matplotlib API Primer 282
Figures and Subplots 283
Colors, Markers, and Line Styles 288
Ticks, Labels, and Legends 290
Annotations and Drawing on a Subplot 294
Saving Plots to File 296
matplotlib Configuration 297
9.2 Plotting with pandas and seaborn 298
Line Plots 298
Bar Plots 301
Histograms and Density Plots 309
Scatter or Point Plots 311
Facet Grids and Categorical Data 314
9.3 Other Python Visualization Tools 317
9.4 Conclusion 317
10. Data Aggregation and Group Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
10.1 How to Think About Group Operations 320
Iterating over Groups 324
Selecting a Column or Subset of Columns 326
Grouping with Dictionaries and Series 327
Grouping with Functions 328
Grouping by Index Levels 328
10.2 Data Aggregation 329
Column-Wise and Multiple Function Application 331
Returning Aggregated Data Without Row Indexes 335
10.3 Apply: General split-apply-combine 335
Suppressing the Group Keys 338
Quantile and Bucket Analysis 338
Example: Filling Missing Values with Group-Specific Values 340
Example: Random Sampling and Permutation 343
Example: Group Weighted Average and Correlation 344
Table of Contents | vii
Example: Group-Wise Linear Regression 347
10.4 Group Transforms and “Unwrapped” GroupBys 347
10.5 Pivot Tables and Cross-Tabulation 351
Cross-Tabulations: Crosstab 354
10.6 Conclusion 355
11. Time Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
11.1 Date and Time Data Types and Tools 358
Converting Between String and Datetime 359
11.2 Time Series Basics 361
Indexing, Selection, Subsetting 363
Time Series with Duplicate Indices 365
11.3 Date Ranges, Frequencies, and Shifting 366
Generating Date Ranges 367
Frequencies and Date Offsets 370
Shifting (Leading and Lagging) Data 371
11.4 Time Zone Handling 374
Time Zone Localization and Conversion 375
Operations with Time Zone-Aware Timestamp Objects 377
Operations Between Different Time Zones 378
11.5 Periods and Period Arithmetic 379
Period Frequency Conversion 380
Quarterly Period Frequencies 382
Converting Timestamps to Periods (and Back) 384
Creating a PeriodIndex from Arrays 385
11.6 Resampling and Frequency Conversion 387
Downsampling 388
Upsampling and Interpolation 391
Resampling with Periods 392
Grouped Time Resampling 394
11.7 Moving Window Functions 396
Exponentially Weighted Functions 399
Binary Moving Window Functions 401
User-Defined Moving Window Functions 402
11.8 Conclusion 403
12. Introduction to Modeling Libraries in Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
12.1 Interfacing Between pandas and Model Code 405
12.2 Creating Model Descriptions with Patsy 408
Data Transformations in Patsy Formulas 410
Categorical Data and Patsy 412
viii | Table of Contents
12.3 Introduction to statsmodels 415
Estimating Linear Models 415
Estimating Time Series Processes 419
12.4 Introduction to scikit-learn 420
12.5 Conclusion 423
13. Data Analysis Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
13.1 Bitly Data from 1.USA.gov 425
Counting Time Zones in Pure Python 426
Counting Time Zones with pandas 428
13.2 MovieLens 1M Dataset 435
Measuring Rating Disagreement 439
13.3 US Baby Names 1880–2010 443
Analyzing Naming Trends 448
13.4 USDA Food Database 457
13.5 2012 Federal Election Commission Database 463
Donation Statistics by Occupation and Employer 466
Bucketing Donation Amounts 469
Donation Statistics by State 471
13.6 Conclusion 472
A. Advanced NumPy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
A.1 ndarray Object Internals 473
NumPy Data Type Hierarchy 474
A.2 Advanced Array Manipulation 476
Reshaping Arrays 476
C Versus FORTRAN Order 478
Concatenating and Splitting Arrays 479
Repeating Elements: tile and repeat 481
Fancy Indexing Equivalents: take and put 483
A.3 Broadcasting 484
Broadcasting over Other Axes 487
Setting Array Values by Broadcasting 489
A.4 Advanced ufunc Usage 490
ufunc Instance Methods 490
Writing New ufuncs in Python 493
A.5 Structured and Record Arrays 493
Nested Data Types and Multidimensional Fields 494
Why Use Structured Arrays? 495
A.6 More About Sorting 495
Indirect Sorts: argsort and lexsort 497
Table of Contents | ix
Alternative Sort Algorithms 498
Partially Sorting Arrays 499
numpy.searchsorted: Finding Elements in a Sorted Array 500
A.7 Writing Fast NumPy Functions with Numba 501
Creating Custom numpy.ufunc Objects with Numba 502
A.8 Advanced Array Input and Output 503
Memory-Mapped Files 503
HDF5 and Other Array Storage Options 504
A.9 Performance Tips 505
The Importance of Contiguous Memory 505
B. More on the IPython System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
B.1 Terminal Keyboard Shortcuts 509
B.2 About Magic Commands 510
The %run Command 512
Executing Code from the Clipboard 513
B.3 Using the Command History 514
Searching and Reusing the Command History 514
Input and Output Variables 515
B.4 Interacting with the Operating System 516
Shell Commands and Aliases 517
Directory Bookmark System 518
B.5 Software Development Tools 519
Interactive Debugger 519
Timing Code: %time and %timeit 523
Basic Profiling: %prun and %run -p 525
Profiling a Function Line by Line 527
B.6 Tips for Productive Code Development Using IPython 529
Reloading Module Dependencies 529
Code Design Tips 530
B.7 Advanced IPython Features 532
Profiles and Configuration 532
B.8 Conclusion 533
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
x | Table of Contents
Preface
The first edition of this book was published in 2012, during a time when open source
data analysis libraries for Python, especially pandas, were very new and developing
rapidly. When the time came to write the second edition in 2016 and 2017, I needed
to update the book not only for Python 3.6 (the first edition used Python 2.7) but also
for the many changes in pandas that had occurred over the previous five years. Now
in 2022, there are fewer Python language changes (we are now at Python 3.10, with
3.11 coming out at the end of 2022), but pandas has continued to evolve.
In this third edition, my goal is to bring the content up to date with current versions
of Python, NumPy, pandas, and other projects, while also remaining relatively con‐
servative about discussing newer Python projects that have appeared in the last few
years. Since this book has become an important resource for many university courses
and working professionals, I will try to avoid topics that are at risk of falling out of
date within a year or two. That way paper copies won’t be too difficult to follow in
2023 or 2024 or beyond.
A new feature of the third edition is the open access online version hosted on my
website at https://wesmckinney.com/book, to serve as a resource and convenience for
owners of the print and digital editions. I intend to keep the content reasonably up to
date there, so if you own the paper book and run into something that doesn’t work
properly, you should check there for the latest content changes.
Conventions Used in This Book
The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
xi
Constant width
Used for program listings, as well as within paragraphs to refer to program
elements such as variable or function names, databases, data types, environment
variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values deter‐
mined by context.
This element signifies a tip or suggestion.
This element signifies a general note.
This element indicates a warning or caution.
Using Code Examples
You can find data files and related material for each chapter in this book’s GitHub
repository at https://github.com/wesm/pydata-book, which is mirrored to Gitee (for
those who cannot access GitHub) at https://gitee.com/wesmckinn/pydata-book.
This book is here to help you get your job done. In general, if example code is
offered with this book, you may use it in your programs and documentation. You
do not need to contact us for permission unless you’re reproducing a significant
portion of the code. For example, writing a program that uses several chunks of code
from this book does not require permission. Selling or distributing examples from
O’Reilly books does require permission. Answering a question by citing this book
and quoting example code does not require permission. Incorporating a significant
amount of example code from this book into your product’s documentation does
require permission.
xii | Preface
We appreciate, but do not require, attribution. An attribution usually includes the
title, author, publisher, and ISBN. For example: “Python for Data Analysis by Wes
McKinney (O’Reilly). Copyright 2022 Wes McKinney, 978-1-098-10403-0.”
If you feel your use of code examples falls outside fair use or the permission given
above, feel free to contact us at permissions@oreilly.com.
O’Reilly Online Learning
For more than 40 years, O’Reilly Media has provided technol‐
ogy and business training, knowledge, and insight to help
companies succeed.
Our unique network of experts and innovators share their knowledge and expertise
through books, articles, and our online learning platform. O’Reilly’s online learning
platform gives you on-demand access to live training courses, in-depth learning
paths, interactive coding environments, and a vast collection of text and video from
O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional
information. You can access this page at https://oreil.ly/python-data-analysis-3e.
Email bookquestions@oreilly.com to comment or ask technical questions about this
book.
For news and information about our books and courses, visit http://oreilly.com.
Find us on LinkedIn: https://linkedin.com/company/oreilly-media.
Follow us on Twitter: http://twitter.com/oreillymedia.
Watch us on YouTube: http://youtube.com/oreillymedia.
Preface | xiii
Acknowledgments
This work is the product of many years of fruitful discussions and collaborations
with, and assistance from many people around the world. I’d like to thank a few of
them.
In Memoriam: John D. Hunter (1968–2012)
Our dear friend and colleague John D. Hunter passed away after a battle with colon
cancer on August 28, 2012. This was only a short time after I’d completed the final
manuscript for this book’s first edition.
John’s impact and legacy in the Python scientific and data communities would be
hard to overstate. In addition to developing matplotlib in the early 2000s (a time
when Python was not nearly so popular), he helped shape the culture of a critical
generation of open source developers who’ve become pillars of the Python ecosystem
that we now often take for granted.
I was lucky enough to connect with John early in my open source career in January
2010, just after releasing pandas 0.1. His inspiration and mentorship helped me push
forward, even in the darkest of times, with my vision for pandas and Python as a
first-class data analysis language.
John was very close with Fernando Pérez and Brian Granger, pioneers of IPython,
Jupyter, and many other initiatives in the Python community. We had hoped to work
on a book together, the four of us, but I ended up being the one with the most free
time. I am sure he would be proud of what we’ve accomplished, as individuals and as
a community, over the last nine years.
Acknowledgments for the Third Edition (2022)
It has more than a decade since I started writing the first edition of this book and
more than 15 years since I originally started my journey as a Python prorammer.
A lot has changed since then! Python has evolved from a relatively niche language
for data analysis to the most popular and most widely used language powering
the plurality (if not the majority!) of data science, machine learning, and artificial
intelligence work.
I have not been an active contributor to the pandas open source project since 2013,
but its worldwide developer community has continued to thrive, serving as a model
of community-centric open source software development. Many “next-generation”
Python projects that deal with tabular data are modeling their user interfaces directly
after pandas, so the project has proved to have an enduring influence on the future
trajectory of the Python data science ecosystem.
xiv | Preface
I hope that this book continues to serve as a valuable resource for students and
individuals who want to learn about working with data in Python.
I’m especially thankful to O’Reilly for allowing me to publish an “open access” version
of this book on my website at https://wesmckinney.com/book, where I hope it will
reach even more people and help expand opportunity in the world of data analysis.
J.J. Allaire was a lifesaver in making this possible by helping me “port” the book from
Docbook XML to Quarto, a wonderful new scientific and technical publishing system
for print and web.
Special thanks to my technical reviewers Paul Barry, Jean-Christophe Leyder, Abdul‐
lah Karasan, and William Jamir, whose thorough feedback has greatly improved the
readability, clarity, and understandability of the content.
Acknowledgments for the Second Edition (2017)
It has been five years almost to the day since I completed the manuscript for
this book’s first edition in July 2012. A lot has changed. The Python community
has grown immensely, and the ecosystem of open source software around it has
flourished.
This new edition of the book would not exist if not for the tireless efforts of the
pandas core developers, who have grown the project and its user community into
one of the cornerstones of the Python data science ecosystem. These include, but are
not limited to, Tom Augspurger, Joris van den Bossche, Chris Bartak, Phillip Cloud,
gfyoung, Andy Hayden, Masaaki Horikoshi, Stephan Hoyer, Adam Klein, Wouter
Overmeire, Jeff Reback, Chang She, Skipper Seabold, Jeff Tratner, and y-p.
On the actual writing of this second edition, I would like to thank the O’Reilly staff
who helped me patiently with the writing process. This includes Marie Beaugureau,
Ben Lorica, and Colleen Toporek. I again had outstanding technical reviewers with
Tom Augspurger, Paul Barry, Hugh Brown, Jonathan Coe, and Andreas Müller con‐
tributing. Thank you.
This book’s first edition has been translated into many foreign languages, including
Chinese, French, German, Japanese, Korean, and Russian. Translating all this content
and making it available to a broader audience is a huge and often thankless effort.
Thank you for helping more people in the world learn how to program and use data
analysis tools.
I am also lucky to have had support for my continued open source development
efforts from Cloudera and Two Sigma Investments over the last few years. With open
source software projects more thinly resourced than ever relative to the size of user
bases, it is becoming increasingly important for businesses to provide support for
development of key open source projects. It’s the right thing to do.
Preface | xv
Acknowledgments for the First Edition (2012)
It would have been difficult for me to write this book without the support of a large
number of people.
On the O’Reilly staff, I’m very grateful for my editors, Meghan Blanchette and Julie
Steele, who guided me through the process. Mike Loukides also worked with me in
the proposal stages and helped make the book a reality.
I received a wealth of technical review from a large cast of characters. In particu‐
lar, Martin Blais and Hugh Brown were incredibly helpful in improving the book’s
examples, clarity, and organization from cover to cover. James Long, Drew Conway,
Fernando Pérez, Brian Granger, Thomas Kluyver, Adam Klein, Josh Klein, Chang
She, and Stéfan van der Walt each reviewed one or more chapters, providing pointed
feedback from many different perspectives.
I got many great ideas for examples and datasets from friends and colleagues in the
data community, among them: Mike Dewar, Jeff Hammerbacher, James Johndrow,
Kristian Lum, Adam Klein, Hilary Mason, Chang She, and Ashley Williams.
I am of course indebted to the many leaders in the open source scientific Python
community who’ve built the foundation for my development work and gave encour‐
agement while I was writing this book: the IPython core team (Fernando Pérez,
Brian Granger, Min Ragan-Kelly, Thomas Kluyver, and others), John Hunter, Skipper
Seabold, Travis Oliphant, Peter Wang, Eric Jones, Robert Kern, Josef Perktold, Fran‐
cesc Alted, Chris Fonnesbeck, and too many others to mention. Several other people
provided a great deal of support, ideas, and encouragement along the way: Drew
Conway, Sean Taylor, Giuseppe Paleologo, Jared Lander, David Epstein, John Krowas,
Joshua Bloom, Den Pilsworth, John Myles-White, and many others I’ve forgotten.
I’d also like to thank a number of people from my formative years. First, my former
AQR colleagues who’ve cheered me on in my pandas work over the years: Alex Reyf‐
man, Michael Wong, Tim Sargen, Oktay Kurbanov, Matthew Tschantz, Roni Israelov,
Michael Katz, Ari Levine, Chris Uga, Prasad Ramanan, Ted Square, and Hoon Kim.
Lastly, my academic advisors Haynes Miller (MIT) and Mike West (Duke).
I received significant help from Phillip Cloud and Joris van den Bossche in 2014 to
update the book’s code examples and fix some other inaccuracies due to changes in
pandas.
On the personal side, Casey provided invaluable day-to-day support during the
writing process, tolerating my highs and lows as I hacked together the final draft on
top of an already overcommitted schedule. Lastly, my parents, Bill and Kim, taught
me to always follow my dreams and to never settle for less.
xvi | Preface
CHAPTER 1
Preliminaries
1.1 What Is This Book About?
This book is concerned with the nuts and bolts of manipulating, processing, cleaning,
and crunching data in Python. My goal is to offer a guide to the parts of the Python
programming language and its data-oriented library ecosystem and tools that will
equip you to become an effective data analyst. While “data analysis” is in the title
of the book, the focus is specifically on Python programming, libraries, and tools as
opposed to data analysis methodology. This is the Python programming you need for
data analysis.
Sometime after I originally published this book in 2012, people started using the
term data science as an umbrella description for everything from simple descriptive
statistics to more advanced statistical analysis and machine learning. The Python
open source ecosystem for doing data analysis (or data science) has also expanded
significantly since then. There are now many other books which focus specifically on
these more advanced methodologies. My hope is that this book serves as adequate
preparation to enable you to move on to a more domain-specific resource.
Some might characterize much of the content of the book as “data
manipulation” as opposed to “data analysis.” We also use the terms
wrangling or munging to refer to data manipulation.
What Kinds of Data?
When I say “data,” what am I referring to exactly? The primary focus is on structured
data, a deliberately vague term that encompasses many different common forms of
data, such as:
1
• Tabular or spreadsheet-like data in which each column may be a different type
•
(string, numeric, date, or otherwise). This includes most kinds of data commonly
stored in relational databases or tab- or comma-delimited text files.
• Multidimensional arrays (matrices).
•
• Multiple tables of data interrelated by key columns (what would be primary or
•
foreign keys for a SQL user).
• Evenly or unevenly spaced time series.
•
This is by no means a complete list. Even though it may not always be obvious, a
large percentage of datasets can be transformed into a structured form that is more
suitable for analysis and modeling. If not, it may be possible to extract features from
a dataset into a structured form. As an example, a collection of news articles could
be processed into a word frequency table, which could then be used to perform
sentiment analysis.
Most users of spreadsheet programs like Microsoft Excel, perhaps the most widely
used data analysis tool in the world, will not be strangers to these kinds of data.
1.2 Why Python for Data Analysis?
For many people, the Python programming language has strong appeal. Since its
first appearance in 1991, Python has become one of the most popular interpreted
programming languages, along with Perl, Ruby, and others. Python and Ruby have
become especially popular since 2005 or so for building websites using their numer‐
ous web frameworks, like Rails (Ruby) and Django (Python). Such languages are
often called scripting languages, as they can be used to quickly write small programs,
or scripts to automate other tasks. I don’t like the term “scripting languages,” as it
carries a connotation that they cannot be used for building serious software. Among
interpreted languages, for various historical and cultural reasons, Python has devel‐
oped a large and active scientific computing and data analysis community. In the last
20 years, Python has gone from a bleeding-edge or “at your own risk” scientific com‐
puting language to one of the most important languages for data science, machine
learning, and general software development in academia and industry.
For data analysis and interactive computing and data visualization, Python will inevi‐
tably draw comparisons with other open source and commercial programming lan‐
guages and tools in wide use, such as R, MATLAB, SAS, Stata, and others. In recent
years, Python’s improved open source libraries (such as pandas and scikit-learn) have
made it a popular choice for data analysis tasks. Combined with Python’s overall
strength for general-purpose software engineering, it is an excellent option as a
primary language for building data applications.
2 | Chapter 1: Preliminaries
Python as Glue
Part of Python’s success in scientific computing is the ease of integrating C, C++,
and FORTRAN code. Most modern computing environments share a similar set of
legacy FORTRAN and C libraries for doing linear algebra, optimization, integration,
fast Fourier transforms, and other such algorithms. The same story has held true for
many companies and national labs that have used Python to glue together decades’
worth of legacy software.
Many programs consist of small portions of code where most of the time is spent,
with large amounts of “glue code” that doesn’t run often. In many cases, the execution
time of the glue code is insignificant; effort is most fruitfully invested in optimizing
the computational bottlenecks, sometimes by moving the code to a lower-level lan‐
guage like C.
Solving the “Two-Language” Problem
In many organizations, it is common to research, prototype, and test new ideas using
a more specialized computing language like SAS or R and then later port those
ideas to be part of a larger production system written in, say, Java, C#, or C++.
What people are increasingly finding is that Python is a suitable language not only
for doing research and prototyping but also for building the production systems.
Why maintain two development environments when one will suffice? I believe that
more and more companies will go down this path, as there are often significant
organizational benefits to having both researchers and software engineers using the
same set of programming tools.
Over the last decade some new approaches to solving the “two-language” problem
have appeared, such as the Julia programming language. Getting the most out of
Python in many cases will require programming in a low-level language like C or
C++ and creating Python bindings to that code. That said, “just-in-time” (JIT) com‐
piler technology provided by libraries like Numba have provided a way to achieve
excellent performance in many computational algorithms without having to leave the
Python programming environment.
Why Not Python?
While Python is an excellent environment for building many kinds of analytical
applications and general-purpose systems, there are a number of uses for which
Python may be less suitable.
As Python is an interpreted programming language, in general most Python code
will run substantially slower than code written in a compiled language like Java or
C++. As programmer time is often more valuable than CPU time, many are happy to
make this trade-off. However, in an application with very low latency or demanding
1.2 Why Python for Data Analysis? | 3
resource utilization requirements (e.g., a high-frequency trading system), the time
spent programming in a lower-level (but also lower-productivity) language like C++
to achieve the maximum possible performance might be time well spent.
Python can be a challenging language for building highly concurrent, multithreaded
applications, particularly applications with many CPU-bound threads. The reason for
this is that it has what is known as the global interpreter lock (GIL), a mechanism that
prevents the interpreter from executing more than one Python instruction at a time.
The technical reasons for why the GIL exists are beyond the scope of this book. While
it is true that in many big data processing applications, a cluster of computers may be
required to process a dataset in a reasonable amount of time, there are still situations
where a single-process, multithreaded system is desirable.
This is not to say that Python cannot execute truly multithreaded, parallel code.
Python C extensions that use native multithreading (in C or C++) can run code in
parallel without being impacted by the GIL, as long as they do not need to regularly
interact with Python objects.
1.3 Essential Python Libraries
For those who are less familiar with the Python data ecosystem and the libraries used
throughout the book, I will give a brief overview of some of them.
NumPy
NumPy, short for Numerical Python, has long been a cornerstone of numerical
computing in Python. It provides the data structures, algorithms, and library glue
needed for most scientific applications involving numerical data in Python. NumPy
contains, among other things:
• A fast and efficient multidimensional array object ndarray
•
• Functions for performing element-wise computations with arrays or mathemati‐
•
cal operations between arrays
• Tools for reading and writing array-based datasets to disk
•
• Linear algebra operations, Fourier transform, and random number generation
•
• A mature C API to enable Python extensions and native C or C++ code to access
•
NumPy’s data structures and computational facilities
Beyond the fast array-processing capabilities that NumPy adds to Python, one of
its primary uses in data analysis is as a container for data to be passed between
algorithms and libraries. For numerical data, NumPy arrays are more efficient for
storing and manipulating data than the other built-in Python data structures. Also,
libraries written in a lower-level language, such as C or FORTRAN, can operate on
4 | Chapter 1: Preliminaries
the data stored in a NumPy array without copying data into some other memory
representation. Thus, many numerical computing tools for Python either assume
NumPy arrays as a primary data structure or else target interoperability with NumPy.
pandas
pandas provides high-level data structures and functions designed to make working
with structured or tabular data intuitive and flexible. Since its emergence in 2010, it
has helped enable Python to be a powerful and productive data analysis environment.
The primary objects in pandas that will be used in this book are the DataFrame, a
tabular, column-oriented data structure with both row and column labels, and the
Series, a one-dimensional labeled array object.
pandas blends the array-computing ideas of NumPy with the kinds of data manipu‐
lation capabilities found in spreadsheets and relational databases (such as SQL). It
provides convenient indexing functionality to enable you to reshape, slice and dice,
perform aggregations, and select subsets of data. Since data manipulation, prepara‐
tion, and cleaning are such important skills in data analysis, pandas is one of the
primary focuses of this book.
As a bit of background, I started building pandas in early 2008 during my tenure at
AQR Capital Management, a quantitative investment management firm. At the time,
I had a distinct set of requirements that were not well addressed by any single tool at
my disposal:
• Data structures with labeled axes supporting automatic or explicit data align‐
•
ment—this prevents common errors resulting from misaligned data and working
with differently indexed data coming from different sources
• Integrated time series functionality
•
• The same data structures handle both time series data and non-time series data
•
• Arithmetic operations and reductions that preserve metadata
•
• Flexible handling of missing data
•
• Merge and other relational operations found in popular databases (SQL-based,
•
for example)
I wanted to be able to do all of these things in one place, preferably in a language
well suited to general-purpose software development. Python was a good candidate
language for this, but at that time an integrated set of data structures and tools
providing this functionality did not exist. As a result of having been built initially
to solve finance and business analytics problems, pandas features especially deep
time series functionality and tools well suited for working with time-indexed data
generated by business processes.
1.3 Essential Python Libraries | 5
I spent a large part of 2011 and 2012 expanding pandas’s capabilities with some of
my former AQR colleagues, Adam Klein and Chang She. In 2013, I stopped being
as involved in day-to-day project development, and pandas has since become a fully
community-owned and community-maintained project with well over two thousand
unique contributors around the world.
For users of the R language for statistical computing, the DataFrame name will be
familiar, as the object was named after the similar R data.frame object. Unlike
Python, data frames are built into the R programming language and its standard
library. As a result, many features found in pandas are typically either part of the R
core implementation or provided by add-on packages.
The pandas name itself is derived from panel data, an econometrics term for multidi‐
mensional structured datasets, and a play on the phrase Python data analysis.
matplotlib
matplotlib is the most popular Python library for producing plots and other two-
dimensional data visualizations. It was originally created by John D. Hunter and
is now maintained by a large team of developers. It is designed for creating plots
suitable for publication. While there are other visualization libraries available to
Python programmers, matplotlib is still widely used and integrates reasonably well
with the rest of the ecosystem. I think it is a safe choice as a default visualization tool.
IPython and Jupyter
The IPython project began in 2001 as Fernando Pérez’s side project to make a
better interactive Python interpreter. Over the subsequent 20 years it has become
one of the most important tools in the modern Python data stack. While it does
not provide any computational or data analytical tools by itself, IPython is designed
for both interactive computing and software development work. It encourages an
execute-explore workflow instead of the typical edit-compile-run workflow of many
other programming languages. It also provides integrated access to your operating
system’s shell and filesystem; this reduces the need to switch between a terminal
window and a Python session in many cases. Since much of data analysis coding
involves exploration, trial and error, and iteration, IPython can help you get the job
done faster.
In 2014, Fernando and the IPython team announced the Jupyter project, a broader
initiative to design language-agnostic interactive computing tools. The IPython web
notebook became the Jupyter notebook, with support now for over 40 programming
languages. The IPython system can now be used as a kernel (a programming language
mode) for using Python with Jupyter.
6 | Chapter 1: Preliminaries
IPython itself has become a component of the much broader Jupyter open source
project, which provides a productive environment for interactive and exploratory
computing. Its oldest and simplest “mode” is as an enhanced Python shell designed
to accelerate the writing, testing, and debugging of Python code. You can also use the
IPython system through the Jupyter notebook.
The Jupyter notebook system also allows you to author content in Markdown and
HTML, providing you a means to create rich documents with code and text.
I personally use IPython and Jupyter regularly in my Python work, whether running,
debugging, or testing code.
In the accompanying book materials on GitHub, you will find Jupyter notebooks
containing all the code examples from each chapter. If you cannot access GitHub
where you are, you can try the mirror on Gitee.
SciPy
SciPy is a collection of packages addressing a number of foundational problems in
scientific computing. Here are some of the tools it contains in its various modules:
scipy.integrate
Numerical integration routines and differential equation solvers
scipy.linalg
Linear algebra routines and matrix decompositions extending beyond those pro‐
vided in numpy.linalg
scipy.optimize
Function optimizers (minimizers) and root finding algorithms
scipy.signal
Signal processing tools
scipy.sparse
Sparse matrices and sparse linear system solvers
scipy.special
Wrapper around SPECFUN, a FORTRAN library implementing many common
mathematical functions, such as the gamma function
scipy.stats
Standard continuous and discrete probability distributions (density functions,
samplers, continuous distribution functions), various statistical tests, and more
descriptive statistics
1.3 Essential Python Libraries | 7
Together, NumPy and SciPy form a reasonably complete and mature computational
foundation for many traditional scientific computing applications.
scikit-learn
Since the project’s inception in 2007, scikit-learn has become the premier general-
purpose machine learning toolkit for Python programmers. As of this writing, more
than two thousand different individuals have contributed code to the project. It
includes submodules for such models as:
• Classification: SVM, nearest neighbors, random forest, logistic regression, etc.
•
• Regression: Lasso, ridge regression, etc.
•
• Clustering: k-means, spectral clustering, etc.
•
• Dimensionality reduction: PCA, feature selection, matrix factorization, etc.
•
• Model selection: Grid search, cross-validation, metrics
•
• Preprocessing: Feature extraction, normalization
•
Along with pandas, statsmodels, and IPython, scikit-learn has been critical for ena‐
bling Python to be a productive data science programming language. While I won’t
be able to include a comprehensive guide to scikit-learn in this book, I will give a
brief introduction to some of its models and how to use them with the other tools
presented in the book.
statsmodels
statsmodels is a statistical analysis package that was seeded by work from Stanford
University statistics professor Jonathan Taylor, who implemented a number of regres‐
sion analysis models popular in the R programming language. Skipper Seabold and
Josef Perktold formally created the new statsmodels project in 2010 and since then
have grown the project to a critical mass of engaged users and contributors. Nathaniel
Smith developed the Patsy project, which provides a formula or model specification
framework for statsmodels inspired by R’s formula system.
Compared with scikit-learn, statsmodels contains algorithms for classical (primarily
frequentist) statistics and econometrics. This includes such submodules as:
• Regression models: linear regression, generalized linear models, robust linear
•
models, linear mixed effects models, etc.
• Analysis of variance (ANOVA)
•
• Time series analysis: AR, ARMA, ARIMA, VAR, and other models
•
• Nonparametric methods: Kernel density estimation, kernel regression
•
8 | Chapter 1: Preliminaries
• Visualization of statistical model results
•
statsmodels is more focused on statistical inference, providing uncertainty estimates
and p-values for parameters. scikit-learn, by contrast, is more prediction focused.
As with scikit-learn, I will give a brief introduction to statsmodels and how to use it
with NumPy and pandas.
Other Packages
In 2022, there are many other Python libraries which might be discussed in a book
about data science. This includes some newer projects like TensorFlow or PyTorch,
which have become popular for machine learning or artificial intelligence work. Now
that there are other books out there that focus more specifically on those projects, I
would recommend using this book to build a foundation in general-purpose Python
data wrangling. Then, you should be well prepared to move on to a more advanced
resource that may assume a certain level of expertise.
1.4 Installation and Setup
Since everyone uses Python for different applications, there is no single solution for
setting up Python and obtaining the necessary add-on packages. Many readers will
not have a complete Python development environment suitable for following along
with this book, so here I will give detailed instructions to get set up on each operating
system. I will be using Miniconda, a minimal installation of the conda package
manager, along with conda-forge, a community-maintained software distribution
based on conda. This book uses Python 3.10 throughout, but if you’re reading in the
future, you are welcome to install a newer version of Python.
If for some reason these instructions become out-of-date by the time you are reading
this, you can check out my website for the book which I will endeavor to keep up to
date with the latest installation instructions.
Miniconda on Windows
To get started on Windows, download the Miniconda installer for the latest Python
version available (currently 3.9) from https://conda.io. I recommend following the
installation instructions for Windows available on the conda website, which may have
changed between the time this book was published and when you are reading this.
Most people will want the 64-bit version, but if that doesn’t run on your Windows
machine, you can install the 32-bit version instead.
When prompted whether to install for just yourself or for all users on your system,
choose the option that’s most appropriate for you. Installing just for yourself will be
sufficient to follow along with the book. It will also ask you whether you want to
1.4 Installation and Setup | 9
add Miniconda to the system PATH environment variable. If you select this (I usually
do), then this Miniconda installation may override other versions of Python you have
installed. If you do not, then you will need to use the Window Start menu shortcut
that’s installed to be able to use this Miniconda. This Start menu entry may be called
“Anaconda3 (64-bit).”
I’ll assume that you haven’t added Miniconda to your system PATH. To verify that
things are configured correctly, open the “Anaconda Prompt (Miniconda3)” entry
under “Anaconda3 (64-bit)” in the Start menu. Then try launching the Python inter‐
preter by typing python. You should see a message like this:
(base) C:UsersWes>python
Python 3.9 [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
To exit the Python shell, type exit() and press Enter.
GNU/Linux
Linux details will vary a bit depending on your Linux distribution type, but here I
give details for such distributions as Debian, Ubuntu, CentOS, and Fedora. Setup is
similar to macOS with the exception of how Miniconda is installed. Most readers will
want to download the default 64-bit installer file, which is for x86 architecture (but
it’s possible in the future more users will have aarch64-based Linux machines). The
installer is a shell script that must be executed in the terminal. You will then have
a file named something similar to Miniconda3-latest-Linux-x86_64.sh. To install it,
execute this script with bash:
$ bash Miniconda3-latest-Linux-x86_64.sh
Some Linux distributions have all the required Python packages
(although outdated versions, in some cases) in their package man‐
agers and can be installed using a tool like apt. The setup described
here uses Miniconda, as it’s both easily reproducible across distri‐
butions and simpler to upgrade packages to their latest versions.
You will have a choice of where to put the Miniconda files. I recommend installing
the files in the default location in your home directory; for example, /home/$USER/
miniconda (with your username, naturally).
The installer will ask if you wish to modify your shell scripts to automatically activate
Miniconda. I recommend doing this (select “yes”) as a matter of convenience.
After completing the installation, start a new terminal process and verify that you are
picking up the new Miniconda installation:
10 | Chapter 1: Preliminaries
(base) $ python
Python 3.9 | (main) [GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
To exit the Python shell, type exit() and press Enter or press Ctrl-D.
Miniconda on macOS
Download the macOS Miniconda installer, which should be named something
like Miniconda3-latest-MacOSX-arm64.sh for Apple Silicon-based macOS computers
released from 2020 onward, or Miniconda3-latest-MacOSX-x86_64.sh for Intel-based
Macs released before 2020. Open the Terminal application in macOS, and install by
executing the installer (most likely in your Downloads directory) with bash:
$ bash $HOME/Downloads/Miniconda3-latest-MacOSX-arm64.sh
When the installer runs, by default it automatically configures Miniconda in your
default shell environment in your default shell profile. This is probably located
at /Users/$USER/.zshrc. I recommend letting it do this; if you do not want to allow
the installer to modify your default shell environment, you will need to consult the
Miniconda documentation to be able to proceed.
To verify everything is working, try launching Python in the system shell (open the
Terminal application to get a command prompt):
$ python
Python 3.9 (main) [Clang 12.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
To exit the shell, press Ctrl-D or type exit() and press Enter.
Installing Necessary Packages
Now that we have set up Miniconda on your system, it’s time to install the main
packages we will be using in this book. The first step is to configure conda-forge as
your default package channel by running the following commands in a shell:
(base) $ conda config --add channels conda-forge
(base) $ conda config --set channel_priority strict
Now, we will create a new conda “environment” with the conda create command
using Python 3.10:
(base) $ conda create -y -n pydata-book python=3.10
After the installation completes, activate the environment with conda activate:
(base) $ conda activate pydata-book
(pydata-book) $
1.4 Installation and Setup | 11
It is necessary to use conda activate to activate your environment
each time you open a new terminal. You can see information about
the active conda environment at any time from the terminal by
running conda info.
Now, we will install the essential packages used throughout the book (along with their
dependencies) with conda install:
(pydata-book) $ conda install -y pandas jupyter matplotlib
We will be using some other packages, too, but these can be installed later once they
are needed. There are two ways to install packages: with conda install and with
pip install. conda install should always be preferred when using Miniconda, but
some packages are not available through conda, so if conda install $package_name
fails, try pip install $package_name.
If you want to install all of the packages used in the rest of the
book, you can do that now by running:
conda install lxml beautifulsoup4 html5lib openpyxl 
requests sqlalchemy seaborn scipy statsmodels 
patsy scikit-learn pyarrow pytables numba
On Windows, substitute a carat ^ for the line continuation  used
on Linux and macOS.
You can update packages by using the conda update command:
conda update package_name
pip also supports upgrades using the --upgrade flag:
pip install --upgrade package_name
You will have several opportunities to try out these commands throughout the book.
While you can use both conda and pip to install packages, you
should avoid updating packages originally installed with conda
using pip (and vice versa), as doing so can lead to environment
problems. I recommend sticking to conda if you can and falling
back on pip only for packages that are unavailable with conda
install.
Integrated Development Environments and Text Editors
When asked about my standard development environment, I almost always say “IPy‐
thon plus a text editor.” I typically write a program and iteratively test and debug each
piece of it in IPython or Jupyter notebooks. It is also useful to be able to play around
12 | Chapter 1: Preliminaries
with data interactively and visually verify that a particular set of data manipulations is
doing the right thing. Libraries like pandas and NumPy are designed to be productive
to use in the shell.
When building software, however, some users may prefer to use a more richly
featured integrated development environment (IDE) and rather than an editor like
Emacs or Vim which provide a more minimal environment out of the box. Here are
some that you can explore:
• PyDev (free), an IDE built on the Eclipse platform
•
• PyCharm from JetBrains (subscription-based for commercial users, free for open
•
source developers)
• Python Tools for Visual Studio (for Windows users)
•
• Spyder (free), an IDE currently shipped with Anaconda
•
• Komodo IDE (commercial)
•
Due to the popularity of Python, most text editors, like VS Code and Sublime Text 2,
have excellent Python support.
1.5 Community and Conferences
Outside of an internet search, the various scientific and data-related Python mailing
lists are generally helpful and responsive to questions. Some to take a look at include:
• pydata: A Google Group list for questions related to Python for data analysis and
•
pandas
• pystatsmodels: For statsmodels or pandas-related questions
•
• Mailing list for scikit-learn (scikit-learn@python.org) and machine learning in
•
Python, generally
• numpy-discussion: For NumPy-related questions
•
• scipy-user: For general SciPy or scientific Python questions
•
I deliberately did not post URLs for these in case they change. They can be easily
located via an internet search.
Each year many conferences are held all over the world for Python programmers.
If you would like to connect with other Python programmers who share your inter‐
ests, I encourage you to explore attending one, if possible. Many conferences have
financial support available for those who cannot afford admission or travel to the
conference. Here are some to consider:
1.5 Community and Conferences | 13
• PyCon and EuroPython: The two main general Python conferences in North
•
America and Europe, respectively
• SciPy and EuroSciPy: Scientific-computing-oriented conferences in North Amer‐
•
ica and Europe, respectively
• PyData: A worldwide series of regional conferences targeted at data science and
•
data analysis use cases
• International and regional PyCon conferences (see https://pycon.org for a com‐
•
plete listing)
1.6 Navigating This Book
If you have never programmed in Python before, you will want to spend some time
in Chapters 2 and 3, where I have placed a condensed tutorial on Python language
features and the IPython shell and Jupyter notebooks. These things are prerequisite
knowledge for the remainder of the book. If you have Python experience already, you
may instead choose to skim or skip these chapters.
Next, I give a short introduction to the key features of NumPy, leaving more
advanced NumPy use for Appendix A. Then, I introduce pandas and devote the
rest of the book to data analysis topics applying pandas, NumPy, and matplotlib
(for visualization). I have structured the material in an incremental fashion, though
there is occasionally some minor crossover between chapters, with a few cases where
concepts are used that haven’t been introduced yet.
While readers may have many different end goals for their work, the tasks required
generally fall into a number of different broad groups:
Interacting with the outside world
Reading and writing with a variety of file formats and data stores
Preparation
Cleaning, munging, combining, normalizing, reshaping, slicing and dicing, and
transforming data for analysis
Transformation
Applying mathematical and statistical operations to groups of datasets to derive
new datasets (e.g., aggregating a large table by group variables)
Modeling and computation
Connecting your data to statistical models, machine learning algorithms, or other
computational tools
Presentation
Creating interactive or static graphical visualizations or textual summaries
14 | Chapter 1: Preliminaries
Code Examples
Most of the code examples in the book are shown with input and output as it would
appear executed in the IPython shell or in Jupyter notebooks:
In [5]: CODE EXAMPLE
Out[5]: OUTPUT
When you see a code example like this, the intent is for you to type the example code
in the In block in your coding environment and execute it by pressing the Enter key
(or Shift-Enter in Jupyter). You should see output similar to what is shown in the Out
block.
I changed the default console output settings in NumPy and pandas to improve
readability and brevity throughout the book. For example, you may see more digits
of precision printed in numeric data. To exactly match the output shown in the book,
you can execute the following Python code before running the code examples:
import numpy as np
import pandas as pd
pd.options.display.max_columns = 20
pd.options.display.max_rows = 20
pd.options.display.max_colwidth = 80
np.set_printoptions(precision=4, suppress=True)
Data for Examples
Datasets for the examples in each chapter are hosted in a GitHub repository (or in a
mirror on Gitee if you cannot access GitHub). You can download this data either by
using the Git version control system on the command line or by downloading a zip
file of the repository from the website. If you run into problems, navigate to the book
website for up-to-date instructions about obtaining the book materials.
If you download a zip file containing the example datasets, you must then fully
extract the contents of the zip file to a directory and navigate to that directory from
the terminal before proceeding with running the book’s code examples:
$ pwd
/home/wesm/book-materials
$ ls
appa.ipynb ch05.ipynb ch09.ipynb ch13.ipynb README.md
ch02.ipynb ch06.ipynb ch10.ipynb COPYING requirements.txt
ch03.ipynb ch07.ipynb ch11.ipynb datasets
ch04.ipynb ch08.ipynb ch12.ipynb examples
1.6 Navigating This Book | 15
I have made every effort to ensure that the GitHub repository contains everything
necessary to reproduce the examples, but I may have made some mistakes or omis‐
sions. If so, please send me an email: book@wesmckinney.com. The best way to report
errors in the book is on the errata page on the O’Reilly website.
Import Conventions
The Python community has adopted a number of naming conventions for commonly
used modules:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import statsmodels as sm
This means that when you see np.arange, this is a reference to the arange function
in NumPy. This is done because it’s considered bad practice in Python software
development to import everything (from numpy import *) from a large package like
NumPy.
16 | Chapter 1: Preliminaries
CHAPTER 2
Python Language Basics, IPython,
and Jupyter Notebooks
When I wrote the first edition of this book in 2011 and 2012, there were fewer
resources available for learning about doing data analysis in Python. This was
partially a chicken-and-egg problem; many libraries that we now take for granted,
like pandas, scikit-learn, and statsmodels, were comparatively immature back then.
Now in 2022, there is now a growing literature on data science, data analysis, and
machine learning, supplementing the prior works on general-purpose scientific com‐
puting geared toward computational scientists, physicists, and professionals in other
research fields. There are also excellent books about learning the Python program‐
ming language itself and becoming an effective software engineer.
As this book is intended as an introductory text in working with data in Python, I
feel it is valuable to have a self-contained overview of some of the most important
features of Python’s built-in data structures and libraries from the perspective of data
manipulation. So, I will only present roughly enough information in this chapter and
Chapter 3 to enable you to follow along with the rest of the book.
Much of this book focuses on table-based analytics and data preparation tools for
working with datasets that are small enough to fit on your personal computer. To
use these tools you must sometimes do some wrangling to arrange messy data into
a more nicely tabular (or structured) form. Fortunately, Python is an ideal language
for doing this. The greater your facility with the Python language and its built-in data
types, the easier it will be for you to prepare new datasets for analysis.
Some of the tools in this book are best explored from a live IPython or Jupyter
session. Once you learn how to start up IPython and Jupyter, I recommend that you
follow along with the examples so you can experiment and try different things. As
17
with any keyboard-driven console-like environment, developing familiarity with the
common commands is also part of the learning curve.
There are introductory Python concepts that this chapter does not
cover, like classes and object-oriented programming, which you
may find useful in your foray into data analysis in Python.
To deepen your Python language knowledge, I recommend that
you supplement this chapter with the official Python tutorial and
potentially one of the many excellent books on general-purpose
Python programming. Some recommendations to get you started
include:
• Python Cookbook, Third Edition, by David Beazley and Brian
•
K. Jones (O’Reilly)
• Fluent Python by Luciano Ramalho (O’Reilly)
•
• Effective Python, Second Edition, by Brett Slatkin (Addison-
•
Wesley)
2.1 The Python Interpreter
Python is an interpreted language. The Python interpreter runs a program by execut‐
ing one statement at a time. The standard interactive Python interpreter can be
invoked on the command line with the python command:
$ python
Python 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:38:57)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 5
>>> print(a)
5
The >>> you see is the prompt after which you’ll type code expressions. To exit the
Python interpreter, you can either type exit() or press Ctrl-D (works on Linux and
macOS only).
Running Python programs is as simple as calling python with a .py file as its first
argument. Suppose we had created hello_world.py with these contents:
print("Hello world")
You can run it by executing the following command (the hello_world.py file must be
in your current working terminal directory):
$ python hello_world.py
Hello world
18 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
While some Python programmers execute all of their Python code in this way,
those doing data analysis or scientific computing make use of IPython, an enhanced
Python interpreter, or Jupyter notebooks, web-based code notebooks originally cre‐
ated within the IPython project. I give an introduction to using IPython and Jupyter
in this chapter and have included a deeper look at IPython functionality in Appen‐
dix A. When you use the %run command, IPython executes the code in the specified
file in the same process, enabling you to explore the results interactively when it’s
done:
$ ipython
Python 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:38:57)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: %run hello_world.py
Hello world
In [2]:
The default IPython prompt adopts the numbered In [2]: style, compared with the
standard >>> prompt.
2.2 IPython Basics
In this section, I’ll get you up and running with the IPython shell and Jupyter
notebook, and introduce you to some of the essential concepts.
Running the IPython Shell
You can launch the IPython shell on the command line just like launching the regular
Python interpreter except with the ipython command:
$ ipython
Python 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:38:57)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: a = 5
In [2]: a
Out[2]: 5
You can execute arbitrary Python statements by typing them and pressing Return (or
Enter). When you type just a variable into IPython, it renders a string representation
of the object:
In [5]: import numpy as np
In [6]: data = [np.random.standard_normal() for i in range(7)]
2.2 IPython Basics | 19
In [7]: data
Out[7]:
[-0.20470765948471295,
0.47894333805754824,
-0.5194387150567381,
-0.55573030434749,
1.9657805725027142,
1.3934058329729904,
0.09290787674371767]
The first two lines are Python code statements; the second statement creates a vari‐
able named data that refers to a newly created Python dictionary. The last line prints
the value of data in the console.
Many kinds of Python objects are formatted to be more readable, or pretty-printed,
which is distinct from normal printing with print. If you printed the above data
variable in the standard Python interpreter, it would be much less readable:
>>> import numpy as np
>>> data = [np.random.standard_normal() for i in range(7)]
>>> print(data)
>>> data
[-0.5767699931966723, -0.1010317773535111, -1.7841005313329152,
-1.524392126408841, 0.22191374220117385, -1.9835710588082562,
-1.6081963964963528]
IPython also provides facilities to execute arbitrary blocks of code (via a somewhat
glorified copy-and-paste approach) and whole Python scripts. You can also use the
Jupyter notebook to work with larger blocks of code, as we will soon see.
Running the Jupyter Notebook
One of the major components of the Jupyter project is the notebook, a type of
interactive document for code, text (including Markdown), data visualizations, and
other output. The Jupyter notebook interacts with kernels, which are implementations
of the Jupyter interactive computing protocol specific to different programming
languages. The Python Jupyter kernel uses the IPython system for its underlying
behavior.
To start up Jupyter, run the command jupyter notebook in a terminal:
$ jupyter notebook
[I 15:20:52.739 NotebookApp] Serving notebooks from local directory:
/home/wesm/code/pydata-book
[I 15:20:52.739 NotebookApp] 0 active kernels
[I 15:20:52.739 NotebookApp] The Jupyter Notebook is running at:
http://localhost:8888/?token=0a77b52fefe52ab83e3c35dff8de121e4bb443a63f2d...
[I 15:20:52.740 NotebookApp] Use Control-C to stop this server and shut down
all kernels (twice to skip confirmation).
Created new window in existing browser session.
20 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
To access the notebook, open this file in a browser:
file:///home/wesm/.local/share/jupyter/runtime/nbserver-185259-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=0a77b52fefe52ab83e3c35dff8de121e4...
or http://127.0.0.1:8888/?token=0a77b52fefe52ab83e3c35dff8de121e4...
On many platforms, Jupyter will automatically open in your default web browser
(unless you start it with --no-browser). Otherwise, you can navigate to the HTTP
address printed when you started the notebook, here http://localhost:8888/?
token=0a77b52fefe52ab83e3c35dff8de121e4bb443a63f2d3055. See Figure 2-1 for
what this looks like in Google Chrome.
Many people use Jupyter as a local computing environment, but
it can also be deployed on servers and accessed remotely. I won’t
cover those details here, but I encourage you to explore this topic
on the internet if it’s relevant to your needs.
Figure 2-1. Jupyter notebook landing page
2.2 IPython Basics | 21
To create a new notebook, click the New button and select the “Python 3” option.
You should see something like Figure 2-2. If this is your first time, try clicking on
the empty code “cell” and entering a line of Python code. Then press Shift-Enter to
execute it.
Figure 2-2. Jupyter new notebook view
When you save the notebook (see “Save and Checkpoint” under the notebook File
menu), it creates a file with the extension .ipynb. This is a self-contained file format
that contains all of the content (including any evaluated code output) currently in the
notebook. These can be loaded and edited by other Jupyter users.
To rename an open notebook, click on the notebook title at the top of the page and
type the new title, pressing Enter when you are finished.
To load an existing notebook, put the file in the same directory where you started the
notebook process (or in a subfolder within it), then click the name from the landing
page. You can try it out with the notebooks from my wesm/pydata-book repository on
GitHub. See Figure 2-3.
When you want to close a notebook, click the File menu and select “Close and Halt.”
If you simply close the browser tab, the Python process associated with the notebook
will keep running in the background.
While the Jupyter notebook may feel like a distinct experience from the IPython
shell, nearly all of the commands and tools in this chapter can be used in either
environment.
22 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
Figure 2-3. Jupyter example view for an existing notebook
Tab Completion
On the surface, the IPython shell looks like a cosmetically different version of the
standard terminal Python interpreter (invoked with python). One of the major
improvements over the standard Python shell is tab completion, found in many IDEs
or other interactive computing analysis environments. While entering expressions in
the shell, pressing the Tab key will search the namespace for any variables (objects,
functions, etc.) matching the characters you have typed so far and show the results in
a convenient drop-down menu:
In [1]: an_apple = 27
In [2]: an_example = 42
In [3]: an<Tab>
an_apple an_example any
In this example, note that IPython displayed both of the two variables I defined, as
well as the built-in function any. Also, you can also complete methods and attributes
on any object after typing a period:
2.2 IPython Basics | 23
In [3]: b = [1, 2, 3]
In [4]: b.<Tab>
append() count() insert() reverse()
clear() extend() pop() sort()
copy() index() remove()
The same is true for modules:
In [1]: import datetime
In [2]: datetime.<Tab>
date MAXYEAR timedelta
datetime MINYEAR timezone
datetime_CAPI time tzinfo
Note that IPython by default hides methods and attributes starting
with underscores, such as magic methods and internal “private”
methods and attributes, in order to avoid cluttering the display
(and confusing novice users!). These, too, can be tab-completed,
but you must first type an underscore to see them. If you prefer
to always see such methods in tab completion, you can change this
setting in the IPython configuration. See the IPython documenta‐
tion to find out how to do this.
Tab completion works in many contexts outside of searching the interactive name‐
space and completing object or module attributes. When typing anything that looks
like a file path (even in a Python string), pressing the Tab key will complete anything
on your computer’s filesystem matching what you’ve typed.
Combined with the %run command (see “The %run Command” on page 512), this
functionality can save you many keystrokes.
Another area where tab completion saves time is in the completion of function
keyword arguments (including the = sign!). See Figure 2-4.
Figure 2-4. Autocomplete function keywords in a Jupyter notebook
We’ll have a closer look at functions in a little bit.
24 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
Introspection
Using a question mark (?) before or after a variable will display some general infor‐
mation about the object:
In [1]: b = [1, 2, 3]
In [2]: b?
Type: list
String form: [1, 2, 3]
Length: 3
Docstring:
Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list.
The argument must be an iterable if specified.
In [3]: print?
Docstring:
print(value, ..., sep=' ', end='n', file=sys.stdout, flush=False)
Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Type: builtin_function_or_method
This is referred to as object introspection. If the object is a function or instance
method, the docstring, if defined, will also be shown. Suppose we’d written the
following function (which you can reproduce in IPython or Jupyter):
def add_numbers(a, b):
"""
Add two numbers together
Returns
-------
the_sum : type of arguments
"""
return a + b
Then using ? shows us the docstring:
In [6]: add_numbers?
Signature: add_numbers(a, b)
Docstring:
Add two numbers together
Returns
-------
the_sum : type of arguments
2.2 IPython Basics | 25
File: <ipython-input-9-6a548a216e27>
Type: function
? has a final usage, which is for searching the IPython namespace in a manner similar
to the standard Unix or Windows command line. A number of characters combined
with the wildcard (*) will show all names matching the wildcard expression. For
example, we could get a list of all functions in the top-level NumPy namespace
containing load:
In [9]: import numpy as np
In [10]: np.*load*?
np.__loader__
np.load
np.loads
np.loadtxt
2.3 Python Language Basics
In this section, I will give you an overview of essential Python programming concepts
and language mechanics. In the next chapter, I will go into more detail about Python
data structures, functions, and other built-in tools.
Language Semantics
The Python language design is distinguished by its emphasis on readability, simplic‐
ity, and explicitness. Some people go so far as to liken it to “executable pseudocode.”
Indentation, not braces
Python uses whitespace (tabs or spaces) to structure code instead of using braces as in
many other languages like R, C++, Java, and Perl. Consider a for loop from a sorting
algorithm:
for x in array:
if x < pivot:
less.append(x)
else:
greater.append(x)
A colon denotes the start of an indented code block after which all of the code must
be indented by the same amount until the end of the block.
Love it or hate it, significant whitespace is a fact of life for Python programmers.
While it may seem foreign at first, you will hopefully grow accustomed to it in time.
26 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
I strongly recommend using four spaces as your default indentation
and replacing tabs with four spaces. Many text editors have a
setting that will replace tab stops with spaces automatically (do
this!). IPython and Jupyter notebooks will automatically insert four
spaces on new lines following a colon and replace tabs by four
spaces.
As you can see by now, Python statements also do not need to be terminated by
semicolons. Semicolons can be used, however, to separate multiple statements on a
single line:
a = 5; b = 6; c = 7
Putting multiple statements on one line is generally discouraged in Python as it can
make code less readable.
Everything is an object
An important characteristic of the Python language is the consistency of its object
model. Every number, string, data structure, function, class, module, and so on exists
in the Python interpreter in its own “box,” which is referred to as a Python object.
Each object has an associated type (e.g., integer, string, or function) and internal data.
In practice this makes the language very flexible, as even functions can be treated like
any other object.
Comments
Any text preceded by the hash mark (pound sign) # is ignored by the Python
interpreter. This is often used to add comments to code. At times you may also want
to exclude certain blocks of code without deleting them. One solution is to comment
out the code:
results = []
for line in file_handle:
# keep the empty lines for now
# if len(line) == 0:
# continue
results.append(line.replace("foo", "bar"))
Comments can also occur after a line of executed code. While some programmers
prefer comments to be placed in the line preceding a particular line of code, this can
be useful at times:
print("Reached this line") # Simple status report
2.3 Python Language Basics | 27
Function and object method calls
You call functions using parentheses and passing zero or more arguments, optionally
assigning the returned value to a variable:
result = f(x, y, z)
g()
Almost every object in Python has attached functions, known as methods, that have
access to the object’s internal contents. You can call them using the following syntax:
obj.some_method(x, y, z)
Functions can take both positional and keyword arguments:
result = f(a, b, c, d=5, e="foo")
We will look at this in more detail later.
Variables and argument passing
When assigning a variable (or name) in Python, you are creating a reference to the
object shown on the righthand side of the equals sign. In practical terms, consider a
list of integers:
In [8]: a = [1, 2, 3]
Suppose we assign a to a new variable b:
In [9]: b = a
In [10]: b
Out[10]: [1, 2, 3]
In some languages, the assignment if b will cause the data [1, 2, 3] to be copied. In
Python, a and b actually now refer to the same object, the original list [1, 2, 3] (see
Figure 2-5 for a mock-up). You can prove this to yourself by appending an element to
a and then examining b:
In [11]: a.append(4)
In [12]: b
Out[12]: [1, 2, 3, 4]
Figure 2-5. Two references for the same object
Understanding the semantics of references in Python, and when, how, and why data
is copied, is especially critical when you are working with larger datasets in Python.
28 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
Assignment is also referred to as binding, as we are binding a
name to an object. Variable names that have been assigned may
occasionally be referred to as bound variables.
When you pass objects as arguments to a function, new local variables are created
referencing the original objects without any copying. If you bind a new object to a
variable inside a function, that will not overwrite a variable of the same name in the
“scope” outside of the function (the “parent scope”). It is therefore possible to alter the
internals of a mutable argument. Suppose we had the following function:
In [13]: def append_element(some_list, element):
....: some_list.append(element)
Then we have:
In [14]: data = [1, 2, 3]
In [15]: append_element(data, 4)
In [16]: data
Out[16]: [1, 2, 3, 4]
Dynamic references, strong types
Variables in Python have no inherent type associated with them; a variable can refer
to a different type of object simply by doing an assignment. There is no problem with
the following:
In [17]: a = 5
In [18]: type(a)
Out[18]: int
In [19]: a = "foo"
In [20]: type(a)
Out[20]: str
Variables are names for objects within a particular namespace; the type information is
stored in the object itself. Some observers might hastily conclude that Python is not a
“typed language.” This is not true; consider this example:
In [21]: "5" + 5
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-21-7fe5aa79f268> in <module>
----> 1 "5" + 5
TypeError: can only concatenate str (not "int") to str
2.3 Python Language Basics | 29
In some languages, the string '5' might get implicitly converted (or cast) to an
integer, thus yielding 10. In other languages the integer 5 might be cast to a string,
yielding the concatenated string '55'. In Python, such implicit casts are not allowed.
In this regard we say that Python is a strongly typed language, which means that every
object has a specific type (or class), and implicit conversions will occur only in certain
permitted circumstances, such as:
In [22]: a = 4.5
In [23]: b = 2
# String formatting, to be visited later
In [24]: print(f"a is {type(a)}, b is {type(b)}")
a is <class 'float'>, b is <class 'int'>
In [25]: a / b
Out[25]: 2.25
Here, even though b is an integer, it is implicitly converted to a float for the division
operation.
Knowing the type of an object is important, and it’s useful to be able to write
functions that can handle many different kinds of input. You can check that an object
is an instance of a particular type using the isinstance function:
In [26]: a = 5
In [27]: isinstance(a, int)
Out[27]: True
isinstance can accept a tuple of types if you want to check that an object’s type is
among those present in the tuple:
In [28]: a = 5; b = 4.5
In [29]: isinstance(a, (int, float))
Out[29]: True
In [30]: isinstance(b, (int, float))
Out[30]: True
Attributes and methods
Objects in Python typically have both attributes (other Python objects stored
“inside” the object) and methods (functions associated with an object that can
have access to the object’s internal data). Both of them are accessed via the syntax
obj.attribute_name:
In [1]: a = "foo"
In [2]: a.<Press Tab>
30 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
capitalize() index() isspace() removesuffix() startswith()
casefold() isprintable() istitle() replace() strip()
center() isalnum() isupper() rfind() swapcase()
count() isalpha() join() rindex() title()
encode() isascii() ljust() rjust() translate()
endswith() isdecimal() lower() rpartition()
expandtabs() isdigit() lstrip() rsplit()
find() isidentifier() maketrans() rstrip()
format() islower() partition() split()
format_map() isnumeric() removeprefix() splitlines()
Attributes and methods can also be accessed by name via the getattr function:
In [32]: getattr(a, "split")
Out[32]: <function str.split(sep=None, maxsplit=-1)>
While we will not extensively use the functions getattr and related functions
hasattr and setattr in this book, they can be used very effectively to write generic,
reusable code.
Duck typing
Often you may not care about the type of an object but rather only whether it has
certain methods or behavior. This is sometimes called duck typing, after the saying “If
it walks like a duck and quacks like a duck, then it’s a duck.” For example, you can
verify that an object is iterable if it implements the iterator protocol. For many objects,
this means it has an __iter__ “magic method,” though an alternative and better way
to check is to try using the iter function:
In [33]: def isiterable(obj):
....: try:
....: iter(obj)
....: return True
....: except TypeError: # not iterable
....: return False
This function would return True for strings as well as most Python collection types:
In [34]: isiterable("a string")
Out[34]: True
In [35]: isiterable([1, 2, 3])
Out[35]: True
In [36]: isiterable(5)
Out[36]: False
2.3 Python Language Basics | 31
Imports
In Python, a module is simply a file with the .py extension containing Python code.
Suppose we had the following module:
# some_module.py
PI = 3.14159
def f(x):
return x + 2
def g(a, b):
return a + b
If we wanted to access the variables and functions defined in some_module.py, from
another file in the same directory we could do:
import some_module
result = some_module.f(5)
pi = some_module.PI
Or alternately:
from some_module import g, PI
result = g(5, PI)
By using the as keyword, you can give imports different variable names:
import some_module as sm
from some_module import PI as pi, g as gf
r1 = sm.f(pi)
r2 = gf(6, pi)
Binary operators and comparisons
Most of the binary math operations and comparisons use familiar mathematical
syntax used in other programming languages:
In [37]: 5 - 7
Out[37]: -2
In [38]: 12 + 21.5
Out[38]: 33.5
In [39]: 5 <= 2
Out[39]: False
See Table 2-1 for all of the available binary operators.
32 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
Random documents with unrelated
content Scribd suggests to you:
It looked for a victim, or victims, for its fear. Once upon a time,
witches were burned to ease the terrors of ignorance, and plague-
spreaders were executed in times of pestilence to assure everybody
that now the plague would cease since somebody had been killed for
spreading it.
Organizations came into being with the official and impassioned
purpose of seeing that space research ceased immediately. Even
more violent organizations demanded the punishment of everybody
who had ever considered space travel a desirable thing. Congress
cut some hundreds of millions from a guided-missile-space-
exploration appropriation as a starter. A poor devil of a crackpot in
Santa Monica, California, revealed what he said was a spaceship he'd
built in his back yard to answer the signals from M-387. He intended
to charge a quarter admission to inspect it, using the money to
complete the drive apparatus. The thing was built of plywood and
could not conceivably lift off the ground, but a mob wrecked his
house, burned the puerile "spaceship" and would have lynched its
builder if they'd thought to look in a cellar vegetable closet. Other
crackpots who were more sensitive to public feelings announced the
picking up of messages addressed to the distant Something. The
messages, said this second class of crackpot, were reports from
spies who had been landed on Earth from flying saucers during the
past few decades. They did not explain how they were able to
translate them. A rush of flying-saucer sightings followed inevitably
—alleged to be landing-parties from M-387—and in Peoria, Illinois, a
picnicking party sighted an unidentified flying object shaped like a
soup spoon, the handle obviously being its tail. Experienced
newspapermen anticipated reports of the sighting of unidentified
flying objects shaped like knives and forks as soon as somebody
happened to think of it.
Sandy called a conference on the subject of security. She did not
look well, nowadays. She worried. Other people thought about the
messages from space, but Sandy had to think of something more
concrete. Six months earlier, the construction going on within a
plaster of Paris mould would have been laughed at, tolerantly, and
some hopeful people might have been respectful about it. But now it
was something utterly intolerable to public opinion. Newspapers
who'd lost circulation by talking sanely about space travel now got it
back by denouncing the people who'd answered the first broadcast.
And naturally, with the whole idea of outer space agitatedly
disapproved, everybody connected with it was suspected of
subversion.
"A reporter called up today," said Sandy. "He said he'd like to do a
feature story on Burke Development's new research triumph—the
new guided missile that flew thirty miles and froze everything
around where it landed. I said it fell out of an aeroplane and the last
completed project was for Interiors, Inc. Then he said that he'd been
talking to one of Mr. Holmes' men and the man said something
terrific was under way."
Burke looked uneasy. Holmes said uncomfortably, "There's no law
against what we're building, but somebody may introduce a bill in
Congress any day."
"That would be reasonable under other circumstances. There's a
time for things to be discovered. They shouldn't be accomplished too
soon. But the time for the ship out there is right now!" Burke said.
Pam raised her eyebrows. "Yes?"
"Those signals have to be checked up on," explained Burke. "It's
necessary now. But it could have been bad if our particular
enterprise had started, say, two years ago. Just think what would
have happened if atomic fission had been worked out in peacetime
ten years before World War Two! Scientific discoveries were
published then as a matter of course. Everybody'd have known how
to make atom bombs. Hitler would have had them, and so would
Mussolini. How many of us would be alive?"
Sandy interrupted, "The reporter wants to do a feature story on
what Burke Development is making. I said you were working on a
bomb shelter for quantity production. He asked if the rocket you
shot off through the construction-shed wall was part of it. I said
there'd been no rocket fired. He didn't believe me."
"Who would?" asked Holmes.
"Hmmmmm," said Burke. "Tell him to come look at what we're
doing. The ship can pass for a bomb shelter. The wall-garden units
make sense. I'm going to dig a big hole in the morning to test the
drive-shaft in. It'll look like I intend to bury everything. A bomb
shelter should be buried."
"You mean you'll let him inside?" demanded Sandy.
"Sure!" said Burke. "All inventors are expected to be idiots. A lot of
them are. He'll think I'm making an impossibly expensive bomb
shelter, much too costly for a private family to buy. It will be typical
of the inventive mind as reporters think of it. Anyhow, everybody's
always willing to believe other people fools. That'll do the trick!"
Pam said blandly, "Sandy and I live in a boardinghouse, Joe. You
don't ask about such things, but an awfully nice man moved in a
couple of days ago—right after that shaft got away and went flying
thirty miles all by itself. The nice man has been trying to get
acquainted."
Holmes growled, and looked both startled and angry when he
realized it.
Pam added cheerfully, "Most evenings I've been busy, but I think I'll
let him take me to the movies. Just so I can make us all out to be
idiots," she added.
"I'll make the hole big enough to be convincing," said Burke. "Sandy,
you make inquiries for a rigger to lift and move the bomb shelter
into its hole when it's ready. If we seem about to bury it, nobody
should suspect us of ambitions they won't like."
"Why the hole, really?" asked Sandy.
"To put the shaft in," said Burke. "I've got to get it under control or
it won't be anything more than a bomb shelter."
Keller, the instrument man, had listened with cheerful interest and
without speaking a word. Now he made an indefinite noise and
looked inquiringly at Burke. Burke said, explanatorily, "The shaft
seems to be either on or off—either a magnet that doesn't quite
magnetize, or something that's hell on wheels. It flew thirty miles
without enough power supplied to it to make it quiver. That power
came from somewhere. I think there's a clue in the fact that it froze
everything around where it landed, in spite of traveling fast enough
to heat up from air-friction alone. I've got some ideas about it."
Keller nodded. Then he said urgently, "Broadcast?"
Burke frowned, and turned to Sandy. "That's part of the broadcast
from space that changes—is it still changing?"
"Still changing," said Sandy.
"I didn't think to ask you to keep a check on that. Thanks for
thinking of it, Sandy. Maybe someday I can make up to you for what
you've been going through."
"I doubt it very much," said Sandy grimly. "I'll call the reporter
back."
She waited for them to leave. When they'd gone, she moved
purposefully toward the telephone.
Pam said, "Did you hear that growl when I said I'd go to the movies
with somebody else? I'm having fun, Sandy!"
"I'm not," said Sandy.
"You're too efficient," the younger sister said candidly. "You're
indispensable. Burke couldn't begin to be able to put this thing
through without you. And that's the trouble. You should be
irresistible instead of essential."
"Not with Joe," said Sandy bitterly.
She picked up the telephone to call the newspaper. Pam looked very,
very reflective.
There was a large deep pit close by the plaster mould when the
reporter came next afternoon. A local rigger had come a little earlier
and was still there, estimating the cost for lifting up the contents of
the mould and lowering it precisely in place to be buried as a bomb
shelter under test should be. It was a fortunate coincidence,
because the reporter brought two other men who he said were
civilian defense officials. They had come to comment on the quality
of the bomb shelter under development. It was not too convincing a
statement.
When they left, Burke was not happy. They knew too much about
the materials and equipment he'd ordered. One man had let slip the
fact that he knew about the very expensive computer Burke had
bought. It could have no conceivable use in a bomb shelter. Both
men painstakingly left it to Burke to mention the thirty-mile flight of
a bronze object which arrived coated with frost of such utter frigidity
that it appeared to be liquid-air snow instead of water-ice. Burke did
not mention it. He was excessively uneasy when the reporter's car
took them away.
He went into the office. Pam was in the midst of a fit of the giggles.
"One of them," she explained, "is the nice man who moved into the
boardinghouse. He wants to take me to the movies. Did you notice
that they came when it ought to be my lunchtime? He asked when I
went to lunch ..."
Holmes came in. He scowled.
"One of my men says that one of those characters has been buying
him drinks and asking questions about what we're doing."
Burke scowled too.
"We can let your men go home in three days more."
"I'm going to start loading up," Holmes announced abruptly. "You
don't know how to stow stuff. You're not a yachtsman."
"I haven't got the shaft under control yet," said Burke.
"You'll get it," grunted Holmes.
He went out. Pam giggled again.
"He doesn't want me to go to the movies with the nice man from
Security," she told Burke. "But I think I'd better. I'll let him ply me
with popcorn and innocently let slip that Sandy and I know you've
been warned that bomb shelters won't find a mass market unless
they sell for less than the price of an extra bathroom. But if you
want to go broke we don't care."
"Give me three days more," said Burke harassedly.
"Well try," said Sandy suddenly. "Pam can fix up a double date with
one of her friend's friends and we'll both work on them."
Burke frowned absorbedly and went out. Sandy looked indignant. He
hadn't protested.
Burke got Holmes' four workmen out of the ship and had them help
him roll the bronze shaft to the pit and let it down onto a cradle of
timbers. Now if it moved it would have to penetrate solid earth.
The most trivial of computations showed that when the bronze shaft
had flown thirty miles, it hadn't done it on the energy of a condenser
shorted through its coils. The energy had come from somewhere
else. Burke had an idea where it was.
Presently he verified it. The cores and windings he'd adapted from a
transparent hand-weapon seen in an often-repeated dream—those
cores and windings did not make electromagnets. They made
something for which there was not yet a name. When current flows
through a standard electromagnet, the poles of its atoms are more
or less aligned. They tend to point in a single direction. But in this
arrangement of wires and iron no magnetism resulted, yet, the
random motion of the atoms in their framework of crystal structure
was coördinated. In any object above absolute zero all the atoms
and their constituent electrons and nuclei move constantly in all
directions. In such a core as Burke had formed and repeated along
the shaft's length, they all tried to move in one direction at the same
time. Simultaneously, a terrific surge of current appeared in the coils.
A high-speed poleward velocity developed in all the substance of the
shaft. It was the heat-energy contained in the metal, all turned
instantly into kinetic energy. And when its heat-energy was
transformed to something else, the shaft got cold.
Once this fact was understood, control was easy. A single variable
inductance in series with the windings handled everything. In a
certain sense, the gadget was a magnet with negative—minus—self-
inductance. When a plus inductance in series made the self-
inductance zero, neither plus nor minus, the immensely powerful
device became docile. A small current produced a mild thrust,
affecting only part of the random heat-motion of atoms and
molecules. A stronger current produced a greater one. The
resemblance to an electromagnet remained. But the total inductance
must stay close to zero or utterly violent and explosive forward
thrust would develop, and it was calculable only in thousands of
gravities.
Burke had worked for three weeks to make the thing, but he
developed a control system for it in something under four hours.
That same night they got the bronze shaft into the ship. It fitted
perfectly into the place left for it. Burke knew now exactly what he
was doing. He set up his controls. He was able to produce so minute
a thrust that the lath-and-plaster mould merely creaked and swayed.
But he knew that he could make the whole mass surge unstoppably
from its place.
Holmes sent his workmen home. Sandy and Pam went to the movies
with two very nice men who pumped them deftly of all sorts of
erroneous information about Burke and Holmes and Keller and what
they were about. The nice men did not believe that information, but
they did believe that Sandy and Pam believed it. For themselves, the
combination of an object made by Burke which flew thirty miles plus
the presence of Holmes, who built plastic yachts, and the arrival of
Keller to adjust instruments of which they had a complete list—these
things could not be overlooked. But they did feel sorry for two nice
and not over-bright girls who might be involved in very serious
trouble.
Holmes and Burke installed directional controls, wiring, recording
instruments, etc. Stores and water and oxygen, for emergency use
only, went into the lath-and-plaster construction. Holmes took a
hammer and chisel and painstakingly cracked the mould so that the
top half could be lifted off, leaving the bottom half exposed to the
open air and sky.
Then the broadcast from space cut off. It had been coming
continuously for something like five weeks; one sharp, monotonous
note every two seconds, with a longer, fluting broadcast every
seventy-nine minutes. Now a third, new message began. It was yet
another grouping of the musical tones, with a much longer interval
of specific crackling sounds.
Keller had adjusted every instrument and zestfully retested them
over and over. Burke asked him to see if the third space message
compared in any way with the second. Keller put them through a
hook-up of instruments, beaming to himself, and the answer began
to appear.
Newspapers burst into new headlines. "Ultimatum from Space" they
thundered. "Threats from Alien Space Travelers." And as they
presented the situation it seemed believable that the third message
from the void was a threat.
The first had been a call, requiring an answer. When the answer
went out from Earth, a second message replaced the call. It
contained not only flute tones which might be considered to
represent words, but cracklings which might be the equivalent of
numbers. The continuous beepings between repetitions of the
second message were plainly a directional signal to be followed to
the message source.
In this context, the newspapers furiously asserted that the third
message was a threat. The first had been merely a summons, the
second had been a command to repair to the signaling entities, and
the third was a stern reiteration of the command, reinforced by
threats.
The human race does not take kindly to threats, especially when it
feels helpless. In the United States, there was such explosive
resentment as to require spread-eagle oratory by all public figures.
The President declared that every space missile in store had been
fitted with atomic-fusion warheads and that any alien spacecraft
which appeared in American skies would be shot down immediately.
Congress reported out of committee a bill for rocket weapons which
was stalled for six days because every senator and representative
wanted to make a speech in its favor. It was the largest
appropriation bill ever passed by Congress, which less than five
weeks before had cut two hundred millions out of a guided-missile-
space-exploration budget.
And in Europe there was frenzy.
For Burke and Holmes and Sandy and Pam and the smiling,
inarticulate Keller, the matter was deadly serious. Fury such as the
public felt constituted a witch-hunt in itself. Suspicious private
persons overwhelmed the FBI and the Space Agency with
information about characters they were sure were giving military
secrets to the space travelers on M-387. There were reports of aliens
skulking about American cities wearing luxuriant whiskers and dark
glasses to conceal their non-human features. Artists, hermits, and
mere amateur beard-growers found it wise to shave, and spirit
mediums, fortunetellers and, in the South, herb doctors reaped
harvests by the sale of ominous predictions and infallible advice on
how to escape annihilation from space.
And Burke Development, Inc., was building something that neither
Civilian Defense nor the FBI believed was a bomb shelter.
The three days Burke had needed passed. A fourth. He and Holmes
practically abandoned sleep to get everything finished inside the
plaster mould. Keller happily completed his graphs and took them to
Burke. They showed that the cracklings, which presumably meant
numbers, had been expanded. What they said was now told on a
new scale. If the numbers had meant months or years, they now
meant days and hours. If they had meant millions of miles, they now
meant thousands or hundreds.
Burke was struggling with these implications when there was a
tapping at the air-lock, through which all entry and egress from the
ship took place. Holmes opened the inner door. Sandy and Pam
crawled through the lock which lay on its side instead of upright.
Sandy looked at Burke.
Pam said amiably, "We figured the job was about finished and we
wanted to see it. How do you fasten this door?"
Holmes showed her. The vessel that had been built inside the mould
did not seem as large as the outside structure promised. It looked
queer, too, because everything lay on its side. There were two
compartments with a ladder between, but the ladder lay on the floor.
The wall-gardens looked healthy under the fluorescent lamps which
kept the grass and vegetation flourishing. There were instrument
dials everywhere.
Sandy went to Burke's side.
"We're all but done," said Burke tiredly, "and Keller's just about
proved what the signals are."
"Can we go with you?" asked Sandy.
"Of course not," said Burke. "The first message was a distress call. It
had to be. Only in a distress call would somebody go into details so
any listener would know it was important. It called for help and said
who needed it, and why, and where."
Pam turned to Holmes. "Can that air-lock be opened from outside?"
It couldn't. Not when it was fastened, as now.
"Somebody answered that call from Earth," said Burke heavily, "and
the second message told more about what was wrong. The clickings,
we think, are numbers that told how long help could be waited for,
or something on that order. And then there was a beacon signal
meant to lead whoever was coming to help to that place."
Keller smiled pleasantly at Pam. He made an electrical connection
and zestfully checked the result.
"Now there's a third message," said Burke. "Time's running out for
whoever needs whatever help is called for. The clickings that seem
to be numbers have changed. The—what you might call the scale of
reportage—is new. They're telling us just how long they can wait or
just how bad their situation is. They're saying that time is running
out and they're saying, 'Hurry!'"
There was a thumping sound. Only Sandy and Pam looked
unsurprised. Burke stared.
Sandy said firmly, "That's the police, Joe. We've been going to the
movies with people who want to talk about you. Yesterday one of
them confided to us that you were dangerous, and since he told us
to get away from the office, we did. There might be shooting. He
tipped us a little while ago."
Burke swore. There were other thumpings. Louder ones. They were
on the air-lock door.
"If you try to put us out," said Sandy calmly, "you'll have to open
that door and they'll try to fight their way in—and then where'll you
be?"
Keller turned from the checking of the last instrument He looked at
the others with excited eyes. He waited.
"I don't know what they can arrest you for," said Sandy, "and maybe
they don't either, unless it's unauthorized artillery practice. But you
can't put us out! And you know darn well that unless you do
something they'll chop their way in!"
Burke said, "Dammit, they're not going to stop me from finding out if
this thing works!"
He squirmed in a chair which had its base firmly fastened to a wall
and began to punch buttons.
"Hold fast!" he said angrily. "At least we'll see...."
There were loud snapping sounds. There were creakings. The room
stirred. It turned in a completely unbelievable fashion. Violent
crashings sounded outside. Abruptly, a small television screen before
Burke acquired an image. It was of the outside world reeling wildly.
Holmes seized a hand-hold and grabbed Pam. He kept her from
falling as a side wall became the floor, and what had been the floor
became a side wall, with the ceiling another. It seemed that all the
cosmos changed, though only walls and floors changed places.
Suddenly everything seemed normal but new. The surface underfoot
was covered with a rubber mat. The hydroponic wall-garden sections
were now vertical. Burke sat upright, and something over his head
rotated a half-turn and was still. But it became coated with frost.
More crashes. More small television screens acquired images. They
showed the office of Burke Development, Inc., against a tilted
landscape. The landscape leveled. Another showed the construction
shed. One showed cloud formations, very bright and distinct. And
two others showed a small, armed, formidable body of men
instinctively backing away from the outside television lens.
"So far," said Burke, "it works. Now—"
There was a sensation as of a rapidly rising elevator. Such a
sensation usually lasts for part of a second. This kept on. One of the
six television screens suddenly showed a view of Burke Development
from straight overhead. The buildings and men and the four-acre
enclosure dwindled rapidly. They were very tiny indeed and nearly all
of the town was in the camera's field of vision when a vague
whiteness, a cloud, moved in between.
"The devil!" said Burke. "Now they'll alert fighter planes and rocket
installations and decide that we're either traitors or aliens in disguise
and better be shot down. I think we simply have to go on!"
Keller made gestures, his eyes bright. Burke looked worried.
"It shouldn't take more than ten minutes to get a Nike aloft and
after us. We must have been picked up by radar already.... We'll
head north. We have to, anyhow."
But he was wrong about the ten minutes. It was fifteen before a
rocket came into view, pouring out enormous masses of drive-
fumes. It flung itself toward the ship.
Chapter 5
From a sufficient height and a sufficient distance, the rocket's
repeated attacks must have appeared like the strikings and twistings
of a gigantic snake. It left behind it a writhing trail of fumes which
was convincingly serpentine. It climbed and struck, and climbed and
struck, like a monstrous python flinging itself furiously at some
invisible prey. Six, seven, eight times it plunged frenziedly at the
minute egg-shaped ship which scuttled for the heavens. Each time it
missed and writhed about to dart again.
Then its fuel gave out and for all intents and purposes it ceased to
exist. The thick, opaque trail it left behind began to dissipate. The
path of vapor scattered. It spread to rags and tatters of
unsubstantiality through which the rocket plummeted downward in
the long fall which is a spent rocket's ending.
Burke cautiously cut down the drive and awkwardly turned the ship
on its side, heading it toward the north. The state of things inside
the ship was one of intolerable tenseness.
"I'm a new driver," said Burke, "and that was a tough bit of driving
to do." He glanced at the exterior-pressure meter. "There's no air
outside to register. We must be fifty or sixty miles high and maybe
still rising. But we're not leaking air."
Actually the plastic ship was eighty miles up. The sunlit world
beneath it showed white patches of cloud in patterns a
meteorologist would have found interesting. Burke could see the
valley of the St. Lawrence River between the white areas. But the
Earth's surface was curiously foreshortened. What was beneath
seemed utterly flat, and at the edge of the world all appeared
distorted and unreal.
Holmes, still pale, asked, "How'd we get away from that rocket?"
"We accelerated," said Burke. "It was a defensive rocket. It was
designed to knock down jet bomb carriers or ballistic missiles which
travel at a constant speed. Target-seeking missiles can lock onto the
radar echo from a coasting ship, or one going at its highest speed
because their computers predict where their target, traveling at
constant speed, can be intercepted. We were never there. We were
accelerating. Missile-guidance systems can't measure acceleration
and allow for it. They shouldn't have to."
Four of the six television screens showed dark sky with twinkling
lights in it. On one there was the dim outline of the sun, reversed to
blackness because its light was too great to be registered in a
normal fashion. The other screen showed Earth.
There was a buzzing, and Keller looked at Burke.
"Rocket?" asked Burke. Keller shook his head. "Radar?" Keller
nodded.
"The DEW line, most likely," said Burke in a worried tone. "I don't
know whether they've got rockets that can reach us. But I know
fighter planes can't get this high. Maybe they can throw a spread of
air-to-air rockets, though.... I don't know their range."
Sandy said unsteadily, "They shouldn't do this to us! We're not
criminals! At least they should ask us who we are and what we're
doing!"
"They probably did," said Burke, "and we didn't answer. See if you
can pick up some voices, Keller."
Keller twirled dials and set indicators. Voices burst into speech.
"Reporting UFO sighted extreme altitude coördinates—First rocket
exhausted fuel in multiple attacks and fell, sir." Another voice, very
brisk, "Thirty-second squadron, scramble! Keep top altitude and get
under it. If it descends within range, blast it!" Another voice said
crisply, "Coördinates three-seven Jacob, one-nine Alfred...."
Keller turned the voices down to mutters because they were useless.
Burke said, "Hell! We ought to land somewhere and check over the
ship. Keller, can you give me a microphone and a wavelength
somebody will be likely to pick up?"
Keller shrugged and picked up masses of wire. He began to work on
an as yet unfinished wiring job. Evidently, the ship was not near
enough to completion to be capable of a call to ground. It had taken
off with many things not finished. Burke, at the controls, found it
possible to think of a number of items that should have been
examined exhaustively before the ship left the mould in which it had
been made. He worried.
Pam said in a strange voice, "I thought I might rate as a heroine for
stowing away on this voyage, but I didn't think we'd have to dodge
rockets and fighter planes to get away!"
There was no comment.
"I'm a beginner at navigation," said Burke a little later, more worried
than before. "I know we have to go out over the north magnetic
pole, but how the hell do I find that?"
Keller beamed. He dropped his wiring job and went to the imposing
bank of electronic instruments. He set one, and then another, and
then a third. The action, of course, was similar to that of an airline
pilot when he tunes in broadcasting stations in different cities. From
each, a directional reading can be taken. Where the lines of direction
cross, there the transport plane must be. But Keller turned to
shortwave transmitters whose transmissions could be picked up in
space. Presently, eighty miles high, he wrote a latitude and longitude
neatly on a slip of paper, wrote "North magnetic pole 93°W, 71°N,
nearly," and after that a course.
"Hm," said Burke. "Thanks."
Then there was a relative silence inside the ship. Only a faint mutter
of voices came from assorted speakers that Keller had first turned on
and then turned down, and a small humming sound from a gyro.
When they listened, they could also hear a high sweet musical tone.
Burke shifted this control here, and that control there, and lifted his
hands. The ship moved on steadily. He checked this and that and the
other thing. He was pleased. But there were innumerable things to
be checked. Holmes went down the ladder to the other
compartment below. There were details to be looked into there, too.
One of the screens portrayed Earth from a height of seventy miles
instead of eighty, now. Others pictured the heavens, with very many
stars shining unwinkingly out of blackness. Keller got at his wires
again and resumed the work of installing a ship-to-ground
transmitter and its connection to an exterior-reflecting antenna.
Sandy watched Burke as he moved about, testing one thing after
another. From time to time he glanced at the screens which had to
serve in the place of windows. Once he went back to the control-
board and changed an adjustment.
"We dropped down ten miles," he explained to Sandy. "And I suspect
we're being trailed by jets down below."
Holmes meticulously inspected all storage places. He'd packed them
when the ship lay on her side.
Burke read an instrument and said with satisfaction, "We're running
on sunshine!"
He meant that in empty space certain aluminum plates on the
outside of the hull were picking up heat from the naked sun. The
use of the drive-shaft lowered its temperature. Metallic connection
with the outside plates conducted heat inward from those plates.
The drive-shaft was cold to the touch, but it could drop four hundred
degrees Fahrenheit before it ceased to operate as a drive. It was
gratifying that it had cooled so little up to this moment.
Later Keller tapped Burke on the shoulder and jerked his thumb
upward.
"We go up now?" asked Burke.
Keller nodded. Burke carefully swung the ship to aim vertically. The
views of solid Earth slid from previous screens to new ones. The
stars and the dark object which was the sun also moved across their
screens to vanish and reappear on others. Then Burke touched the
drive-control. Once more they had the sensation of being in a rising
elevator. And at just that moment spots appeared on the barren, icy,
totally flattened terrain below.
They were rocket-trails from target-seeking missiles which had
reached the area of the north magnetic pole by herculean effort and
were aimed at the radar-detected little ship by the heavy planes that
carried them.
From the surface of the Earth, it would have seemed that monstrous
columns of foaming white appeared and rose with incredible
swiftness toward the heavens. They reached on, up and up and up,
seeming to draw closer together as they became smaller in the
distance, until all eight of them seemed to merge into a single point
of infinite whiteness in the sunshine above the world's blanket of air.
But nothing happened. Nothing. The ship did not accelerate as fast
as the rockets, but it had started first and it kept up longer. It went
scuttling away to emptiness and the bottoms of the towers of rocket-
smoke drifted away and away over the barren landscape all covered
with ice and snow.
When Earth looked like a huge round ball that did not even seem
very near, with a night side that was like a curious black chasm
among the stars, the atmosphere of tension inside the ship
diminished. Keller completed his wiring of a ship-to-ground
transmitter. He stood up, brushed off his hands and beamed.
The little ship continued on. Its temperature remained constant. The
air in it smelled of growing green stuff. It was moist. It was warm.
Keller turned a knob and a tiny, beeping noise could be heard. Dials
pointed, precisely.
"We couldn't go on our true course earlier," Burke told Sandy,
"because we had to get out beyond the Van Allen bands of cosmic
particles in orbit around the world. Pretty deadly stuff, that radiation!
In theory, though, all we have to do now is swing onto our proper
course and follow those beepings home. We ought to be in harmless
emptiness here. Do you want to call Washington?"
She stared.
"We need help to navigate—or astrogate," said Burke. "Call them,
Sandy. I'll get on the wire when a general answers."
Sandy went jerkily to the transmitter just connected. She began to
speak steadily, "Calling Earth! Calling Earth! The spaceship you just
shot all those rockets at is calling! Calling Earth!"
It grew monotonous, but eventually a suspicious voice demanded
further identification.
It was a peculiar conversation. The five in the small spaceship were
considered traitors on Earth because they had exercised the
traditional right of American citizens to go about their own business
unhindered. It happened that their private purposes ran counter to
the emotional state of the public. Hence voices berated Sandy and
furiously demanded that the ship return immediately. Sandy insisted
on higher authority and presently an official voice identified itself as
general so-and-so and sternly commanded that the ship
acknowledge and obey orders to return to Earth. Burke took the
transmitter.
"My name's Burke," he said mildly. "If you can arrange some sort of
code, I'll tell you how to find the plans, and I'll give you the
instructions you'll need to build more ships like this. They can follow
us out. I think they should. I believe that this is more important than
anything else you can think of at the moment."
Silence. Then more sternness. But ultimately the official voice said,
"I'll get a code expert on this."
Burke handed the microphone to Sandy.
"Take over. We've got to arrange a cipher so nobody who listens in
can learn about official business. We may use a social security
number for a key, or the name of your maiden aunt's first
sweetheart, or something we know and Washington can find out but
that nobody else can. Hm. Your last year's car-license number might
be a starter. They can seal up the records on that!"
Sandy took over the job. What was transmitted to Earth, of course,
could be picked up anywhere over an entire hemisphere. Somebody
would assuredly pass on what they overheard to, say, nations the
United States would rather have behind it than ahead of it in space-
travel equipment. Burke's suggestion of a cipher and instructions
changed his entire status with authority. They'd rather have had him
come back, but this was second best, and they took it.
From Burke's standpoint it was the only thing to do. He had no
official standing to lend weight to his claim that lunatic magnet-cores
with insanely complicated windings would amount to space-drive
units. If he returned, in the nature of things there would be a long
delay before mere facts could overcome theoreticians' convictions.
But now he was forty-five thousand miles out from Earth.
He had changed course to home on the beeping signals from M-387,
was accelerating at one full gravity and had been doing so for forty-
five minutes. And the small ship already had a velocity of twenty
miles per second and was still going up. All the rockets that men had
made, plus the Russian manned-probe drifting outward now, had
become as much outdated for space travel as flint arrowheads are
for war.
Burke returned to the microphone when Sandy left it to get a pencil
and paper.
"By the way," he said briskly. "We can keep on accelerating
indefinitely at one gravity. We've got radars. We got them from—"
He named the supplier. "Now we want advice on how fast we can
risk traveling before we'll be going too fast to dodge meteors or
whatnot that the radar may detect. Get that figured out for us, will
you?"
He gave back the instrument to Sandy and returned to his inspection
of every item of functioning equipment in the ship. He found one or
two trivial things to be bettered. The small craft went on in a
singularly matter-of-fact fashion. If it had been a bomb shelter
buried in the pit beside the mould in which it was built, there would
have been very little difference in the feel of things. The constant
acceleration substituted perfectly for gravity. The six television
screens, to be sure, pictured incredible things outside, but television
screens often picture incredible things. The wall-gardens looked
green and flourishing. The pumps were noiseless. There were no
moving parts in the drive. The gyro held everything steady. There
was no vibration.
Nobody could remain upset in such an unexciting environment.
Presently Pam explored the living quarters below. Holmes took his
place in the control-chair, but found no need to touch anything.
Some time later Sandy reported, "Joe, they say we must be lying,
but if we can keep on accelerating, we'd better not hit over four
hundred miles a second. They say we can then swing end for end
and decelerate down to two hundred, and then swing once more
and build up to four again. But they insist that we ought to return to
Earth."
"They don't mention shooting rockets at us, do they?" asked Burke.
"I thought they wouldn't. Just say thanks and go on working out a
code."
Sandy set to work with pencil and paper. Federal agents would be
moving, now, to impound all official records that were in any way
connected with any of the five on the ship. The key to the code
would be contained in such records. It would be an agglomeration of
such items as Burke's grandmother's maiden name, Holmes' social-
security number, the name of a street Burke had lived on some years
before, the exact amount of his federal income taxes the previous
year, the title of a book third from the end on the second shelf of a
bookcase in Keller's apartment, and such unconsidered items as
most people can remember with a little effort, but which can only be
found out by people who know where to look. These people would
keep anybody else from looking in the same places. Such a code
would be clumsy to work with, but it would be unbreakable.
It took hours to establish it without the mention of a single word
included in the lengthy key. The ship reached four hundred miles a
second, turned about, and began to cut down its speed again.
Pam spoke from beside an electric stove, "Dinner's ready! Come and
get it!"
They dined; Sandy weary, Burke absorbed and inevitably worried,
Holmes placid and amiable, and Keller beaming and interested in all
that went on, which was practically nothing.
They did not see the stars direct, because television cameras were
preferable to portholes. Earth had become very small, and as it
swung ever more nearly into a direct line between the ship and the
sun, night filled more of its disk until only a hairline of sunshine
showed at one edge. The microwave receivers ceased to mutter. The
working astronomers on Earth who'd sent a message to M-387 were
suddenly relieved of their disgrace and set to work again to equip
the West Virginia radar telescope for continuous communication with
Burke's ship. Other technicians began to prepare multiple receptors
to pick up the ship's signals from hitherto unprecedented distances
for human two-way communication.
And on Earth an official statement went out from high authority. It
announced that a hurriedly completed American ship was on the way
to M-387 to investigate the signals from space. It announced that
measures long in preparation were now in use, and that an invincible
fleet of spacecraft would be completed in months, whereas they had
not been hoped for for another generation. An unexpected
breakthrough had made it possible to advance the science of space
travel by many decades, and a fleet to explore all the planets as well
as M-387 was already under construction. It was almost true that
they were. The blueprints of Burke's ship had been flown to
Washington from the plant, and an enormous number of replicas of
the egg-shaped vessel were ordered to be begun immediately, even
before the theory of the drive was understood.
There was one minor hitch. A legal-minded official protested that
Congressional appropriations had been for rocket-driven spaceships
only, and the money appropriated could not be used for other than
rockets. An executive order settled the matter. Then theorists began
to object to the principle of the drive. It contradicted well-
established scientific beliefs. It could not work.
It did, but there was violent opposition to the fact.
Publicly, of course, the shock of such an about-face by the national
government was extreme. But newspapers flashed new headlines.
"U.S. SHIP SPEEDING TO QUERY ALIENS!" Lesser heads announced,
"Critical Velocity Exceeded! Russian Probe Already Passed!" The last
was not quite true. The Russian manned probe had started out ten
days before. Burke hadn't overtaken it yet.
Broadcasters issued special bulletins, and two networks canceled top
evening programs to schedule interviews with prominent scientists
who'd had nothing whatever to do with what Burke had managed to
achieve.
In Europe, obviously, the political effect was stupendous. Russia was
reduced to impassioned claims that the ship had been built from
Russian plans, using Russian discoveries, which had been stolen by
imperialistic secret agents. And the heads of the Russian spy system
were disgraced for not having, in fact, stolen the plans and
discoveries from the Americans. All other operatives received threats
of what would happen to them if they didn't repair that omission.
These threats so scared half a dozen operatives that they defected
and told all they knew, thereby wrecking the Russian spy system for
the time being.
Essentially, however, the recovery of confidence in America was as
extravagant as the previous unhappy desire to hear no more about
space. Burke, Holmes, Keller, Sandy and Pam became national
heroes and heroines within eighteen hours after guided missiles had
failed to shoot them down. The only criticism came from a highly
conservative clergyman who hoped that other young girls would not
imitate Sandy's and Pam's disregard of convention and maintained
that a married woman should have gone along to chaperon them.
The atmosphere in the ship, however, was that of respectability
carried to the point where things were dull. The lower compartment
of the ship, being smaller, was inevitably appropriated by Sandy and
Pam. They retired when the ship was twenty hours out from Earth.
Each of them had prepared for stowing away by wearing extra
garments in layers.
"Funny," said Pam, yawning as they made ready to turn in, "I
thought it was going to be exciting. But it's just like a rather full day
at the office."
"Which," said Sandy, "I'm quite used to."
"I do think you ought to have barged in when they designed the
ship, Sandy. There's not one mirror in it!"
In the upper compartment Keller took his place in the control-chair
and took a trick of duty. It consisted solely of looking at the
instruments and listening to the beeping noises which came from
remoteness every two seconds, and the still completely cryptic
broadcasts which came every seventy-nine minutes. It wasn't
exciting. There was nothing to be excited about. But somebody had
to be on watch.
On the second day out, Washington was ready to use the new code.
The West Virginia radar bowl was powered to handle
communications again. Sandy painstakingly took down the gibberish
that came in and decoded it. From then on she worked at the coding
and transmission of messages and the reception and decoding of
others. Presently Pam relieved her at the job. Pam tended to be
bored because Holmes was as much absorbed in the business of
keeping anything from happening as was Burke.
The messages were almost entirely requests for, and answers to
requests for, details about the ship plans. The United States had not
yet completed a duplicate drive-shaft. Machinists labored to
reproduce the cores, which would then have to be wound in the
complicated fashion the plans described. But it was an unhappy
experience for the scientific minds assigned to duplicate Burke's
ship. No woman ever followed a recipe without making some
change. Very few physicists can duplicate another's apparatus
without itching to change it. There were six copies of the drive under
construction at the same time, at the beginning. Four were made by
skeptics, who adhered to the original plans with strict accuracy. They
were sure they'd prove Burke wrong. Two were "improved" in the
making. The four, when finished, worked beautifully. The two
doctored versions did not. But still there was fretful discussion of the
theory of the drive. It seemed flatly to contradict Newton's law that
every action has a reaction of equal moment and opposite sign—a
law at least as firmly founded as the law of the conservation of
energy. But that had lately been revised into the law of the
conservation of energy and matter, which now was gospel. Burke's
theory required the Newtonian law to be restated to read "every
action of a given force has a reaction of the same force, of the same
moment," and so on. When the reaction of one force is converted
into another force, the results can be interesting. In fact, one can
have a space-drive. But there was bitter resistance to the idea. It
was demanded that Burke justify his views in a more reasonable way
than by mere demonstration that they worked.
After a time, Burke gave up trying to explain things. And when one
and then another duplicate drive worked, the argument ceased. But
eminent physicists still had a resentful feeling that Burke was
cheating on them somehow.
Then for days nothing happened. One of the three men in the ship
always stayed in the control-chair where he could check the ship's
course against the homing signals from the asteroid. He might have
to correct it by the fraction of a hair, or swing ship and put on more
drive if the radar should show celestial debris in the spaceship's
path. Every so many hours the ship had to be swung about so that
instead of accelerating she decelerated, or instead of decelerating
gained fresh speed. But that was all.
On the fifth day there was the flash of a meteor on the radar. On the
seventh day an object which could have been the second or third
unmanned Russian probe showed briefly at the very edge of the
radar screens. In essence, however, the journey was pure tedium.
Burke wearied of making sure that his work was good, though he
congratulated himself that nothing did happen to break the
monotony. Holmes admitted that he was disappointed. He'd wanted
to make the journey because he'd sailed in everything but a
spaceship. But there was no fun in it. Keller alone seemed
comfortably absorbed. He prepared daily lists of instrument-readings
to be sent back to Earth. They would be of enormous importance to
science-minded people. They were not of interest to Sandy.
Even when she talked to Burke, it was necessarily impersonal. There
could be no privacy which was not ostentatious. The two girls used
the lower compartment, the three men the upper and larger one.
For Sandy to talk privately with Burke, she'd have had to go to the
small bottom section of the ship. Holmes and Pam faced the same
situation. It was uncomfortable. So they developed a perfectly
pleasant habit of talking exclusively of things everybody could talk
about. It did not bother Keller, who would hardly average a dozen
words in twenty-four hours, but Sandy muttered to herself when she
and Pam retired for what was a ship-night's rest.
When they went past the orbit of Mars, agitated instructions came
out from Earth. The asteroid belts began beyond Mars. Elaborate
directions came. The ship was tracked by radar telescopes all around
the world, direction-finding on its transmission. Croydon kept track.
American radar bowls picked up the ship's voice. South American
and Hawaiian and Japanese and Siberian radar telescopes
determined the ship's position every time a set of code symbols
reached Earth from the ship. Of course, there were also the
beepings and the seventy-nine-minute-spaced identical broadcasts
from farther out from the sun.
Somebody got a brilliant idea and authority to try it. An interview for
broadcast on Earth was sought with somebody on the ship. It was
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

Python For Data Analysis 3rd Wes Mckinney

  • 1.
    Python For DataAnalysis 3rd Wes Mckinney download https://ebookbell.com/product/python-for-data-analysis-3rd-wes- mckinney-46540276 Explore and download more ebooks at ebookbell.com
  • 2.
    Here are somerecommended products that we believe you will be interested in. You can click the link to download. Python For Data Analysis Data Wrangling With Pandas Numpy And Ipython 2nd Edition Mckinney https://ebookbell.com/product/python-for-data-analysis-data-wrangling- with-pandas-numpy-and-ipython-2nd-edition-mckinney-22532878 Python For Data Analysis Wes Mckinney https://ebookbell.com/product/python-for-data-analysis-wes- mckinney-2612882 Python For Data Analysis The Ultimate And Definitive Manual To Learn Data Science And Coding With Python Master The Basics Of Machine Learning To Clean Code And Improve Artificial Intelligence Matt Algore https://ebookbell.com/product/python-for-data-analysis-the-ultimate- and-definitive-manual-to-learn-data-science-and-coding-with-python- master-the-basics-of-machine-learning-to-clean-code-and-improve- artificial-intelligence-matt-algore-29874340 Python For Data Analysis 3rd Edition Second Early Release 3rd Wes Mckinney https://ebookbell.com/product/python-for-data-analysis-3rd-edition- second-early-release-3rd-wes-mckinney-36296812
  • 3.
    Python For DataAnalysis Unlocking Insights And Driving Innovation With Powerful Data Techniques 2 In 1 Guide Brian Paul https://ebookbell.com/product/python-for-data-analysis-unlocking- insights-and-driving-innovation-with-powerful-data- techniques-2-in-1-guide-brian-paul-55978516 Python For Data Analysis Wes Mckinney https://ebookbell.com/product/python-for-data-analysis-wes- mckinney-53639582 Python For Data Analysis Data Wrangling With Pandas Numpy And Ipython 2nd Edition Mckinney https://ebookbell.com/product/python-for-data-analysis-data-wrangling- with-pandas-numpy-and-ipython-2nd-edition-mckinney-22122784 Python For Data Analysis Wes Mckinney https://ebookbell.com/product/python-for-data-analysis-wes- mckinney-11939498 Python For Data Analysis 3rd Edition Wes Mckinney https://ebookbell.com/product/python-for-data-analysis-3rd-edition- wes-mckinney-232897832
  • 5.
    Python for Data Analysis DataWrangling with pandas, NumPy & Jupyter Wes McKinney T h i r d E d i t i o n powered by
  • 6.
    DATA “With this newedition, Wes has updated his book to ensure it remains the go-to resource for all things related to data analysis with Python and pandas. I cannot recommend this book highly enough.” —Paul Barry Lecturer and author of O’Reilly’s Head First Python Python for Data Analysis 9 781098 104030 5 6 9 9 9 US $69.99 CAN $87.99 ISBN: 978-1-098-10403-0 Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia Get the definitive handbook for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.10 and pandas 1.4, the third edition of this hands- on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, and Jupyter in the process. Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub. • Use the Jupyter notebook and the IPython shell for exploratory computing • Learn basic and advanced features in NumPy • Get started with data analysis tools in the pandas library • Use flexible tools to load, clean, transform, merge, and reshape data • Create informative visualizations with matplotlib • Apply the pandas groupBy facility to slice, dice, and summarize datasets • Analyze and manipulate regular and irregular time series data • Learn how to solve real-world data analysis problems with thorough, detailed examples Wes McKinney, cofounder and chief technology officer of Voltron Data, is an active member of the Python data community and an advocate for Python use in data analysis, finance, and statistical computing applications. A graduate of MIT, he’s also a member of the project management committees for the Apache Software Foundation’s Apache Arrow and Apache Parquet projects.
  • 7.
    Wes McKinney Python forData Analysis Data Wrangling with pandas, NumPy, and Jupyter THIRD EDITION Boston Farnham Sebastopol Tokyo Beijing Boston Farnham Sebastopol Tokyo Beijing
  • 8.
    978-1-098-10403-0 [LSI] Python for DataAnalysis by Wes McKinney Copyright © 2022 Wesley McKinney. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Jessica Haberman Development Editor: Angela Rufino Production Editor: Christopher Faucher Copyeditor: Sonia Saruba Proofreader: Piper Editorial Consulting, LLC Indexer: Sue Klefstad Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea October 2012: First Edition October 2017: Second Edition August 2022: Third Edition Revision History for the Third Edition 2022-08-12: First Release See https://www.oreilly.com/catalog/errata.csp?isbn=0636920519829 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Python for Data Analysis, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
  • 9.
    Table of Contents Preface.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1. Preliminaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 What Is This Book About? 1 What Kinds of Data? 1 1.2 Why Python for Data Analysis? 2 Python as Glue 3 Solving the “Two-Language” Problem 3 Why Not Python? 3 1.3 Essential Python Libraries 4 NumPy 4 pandas 5 matplotlib 6 IPython and Jupyter 6 SciPy 7 scikit-learn 8 statsmodels 8 Other Packages 9 1.4 Installation and Setup 9 Miniconda on Windows 9 GNU/Linux 10 Miniconda on macOS 11 Installing Necessary Packages 11 Integrated Development Environments and Text Editors 12 1.5 Community and Conferences 13 1.6 Navigating This Book 14 Code Examples 15 iii
  • 10.
    Data for Examples15 Import Conventions 16 2. Python Language Basics, IPython, and Jupyter Notebooks. . . . . . . . . . . . . . . . . . . . . . . . 17 2.1 The Python Interpreter 18 2.2 IPython Basics 19 Running the IPython Shell 19 Running the Jupyter Notebook 20 Tab Completion 23 Introspection 25 2.3 Python Language Basics 26 Language Semantics 26 Scalar Types 34 Control Flow 42 2.4 Conclusion 45 3. Built-In Data Structures, Functions, and Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.1 Data Structures and Sequences 47 Tuple 47 List 51 Dictionary 55 Set 59 Built-In Sequence Functions 62 List, Set, and Dictionary Comprehensions 63 3.2 Functions 65 Namespaces, Scope, and Local Functions 67 Returning Multiple Values 68 Functions Are Objects 69 Anonymous (Lambda) Functions 70 Generators 71 Errors and Exception Handling 74 3.3 Files and the Operating System 76 Bytes and Unicode with Files 80 3.4 Conclusion 82 4. NumPy Basics: Arrays and Vectorized Computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.1 The NumPy ndarray: A Multidimensional Array Object 85 Creating ndarrays 86 Data Types for ndarrays 88 Arithmetic with NumPy Arrays 91 Basic Indexing and Slicing 92 iv | Table of Contents
  • 11.
    Boolean Indexing 97 FancyIndexing 100 Transposing Arrays and Swapping Axes 102 4.2 Pseudorandom Number Generation 103 4.3 Universal Functions: Fast Element-Wise Array Functions 105 4.4 Array-Oriented Programming with Arrays 108 Expressing Conditional Logic as Array Operations 110 Mathematical and Statistical Methods 111 Methods for Boolean Arrays 113 Sorting 114 Unique and Other Set Logic 115 4.5 File Input and Output with Arrays 116 4.6 Linear Algebra 116 4.7 Example: Random Walks 118 Simulating Many Random Walks at Once 120 4.8 Conclusion 121 5. Getting Started with pandas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.1 Introduction to pandas Data Structures 124 Series 124 DataFrame 129 Index Objects 136 5.2 Essential Functionality 138 Reindexing 138 Dropping Entries from an Axis 141 Indexing, Selection, and Filtering 142 Arithmetic and Data Alignment 152 Function Application and Mapping 158 Sorting and Ranking 160 Axis Indexes with Duplicate Labels 164 5.3 Summarizing and Computing Descriptive Statistics 165 Correlation and Covariance 168 Unique Values, Value Counts, and Membership 170 5.4 Conclusion 173 6. Data Loading, Storage, and File Formats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6.1 Reading and Writing Data in Text Format 175 Reading Text Files in Pieces 182 Writing Data to Text Format 184 Working with Other Delimited Formats 185 JSON Data 187 Table of Contents | v
  • 12.
    XML and HTML:Web Scraping 189 6.2 Binary Data Formats 193 Reading Microsoft Excel Files 194 Using HDF5 Format 195 6.3 Interacting with Web APIs 197 6.4 Interacting with Databases 199 6.5 Conclusion 201 7. Data Cleaning and Preparation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 7.1 Handling Missing Data 203 Filtering Out Missing Data 205 Filling In Missing Data 207 7.2 Data Transformation 209 Removing Duplicates 209 Transforming Data Using a Function or Mapping 211 Replacing Values 212 Renaming Axis Indexes 214 Discretization and Binning 215 Detecting and Filtering Outliers 217 Permutation and Random Sampling 219 Computing Indicator/Dummy Variables 221 7.3 Extension Data Types 224 7.4 String Manipulation 227 Python Built-In String Object Methods 227 Regular Expressions 229 String Functions in pandas 232 7.5 Categorical Data 235 Background and Motivation 236 Categorical Extension Type in pandas 237 Computations with Categoricals 240 Categorical Methods 242 7.6 Conclusion 245 8. Data Wrangling: Join, Combine, and Reshape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 8.1 Hierarchical Indexing 247 Reordering and Sorting Levels 250 Summary Statistics by Level 251 Indexing with a DataFrame’s columns 252 8.2 Combining and Merging Datasets 253 Database-Style DataFrame Joins 254 Merging on Index 259 vi | Table of Contents
  • 13.
    Concatenating Along anAxis 263 Combining Data with Overlap 268 8.3 Reshaping and Pivoting 270 Reshaping with Hierarchical Indexing 270 Pivoting “Long” to “Wide” Format 273 Pivoting “Wide” to “Long” Format 277 8.4 Conclusion 279 9. Plotting and Visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 9.1 A Brief matplotlib API Primer 282 Figures and Subplots 283 Colors, Markers, and Line Styles 288 Ticks, Labels, and Legends 290 Annotations and Drawing on a Subplot 294 Saving Plots to File 296 matplotlib Configuration 297 9.2 Plotting with pandas and seaborn 298 Line Plots 298 Bar Plots 301 Histograms and Density Plots 309 Scatter or Point Plots 311 Facet Grids and Categorical Data 314 9.3 Other Python Visualization Tools 317 9.4 Conclusion 317 10. Data Aggregation and Group Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 10.1 How to Think About Group Operations 320 Iterating over Groups 324 Selecting a Column or Subset of Columns 326 Grouping with Dictionaries and Series 327 Grouping with Functions 328 Grouping by Index Levels 328 10.2 Data Aggregation 329 Column-Wise and Multiple Function Application 331 Returning Aggregated Data Without Row Indexes 335 10.3 Apply: General split-apply-combine 335 Suppressing the Group Keys 338 Quantile and Bucket Analysis 338 Example: Filling Missing Values with Group-Specific Values 340 Example: Random Sampling and Permutation 343 Example: Group Weighted Average and Correlation 344 Table of Contents | vii
  • 14.
    Example: Group-Wise LinearRegression 347 10.4 Group Transforms and “Unwrapped” GroupBys 347 10.5 Pivot Tables and Cross-Tabulation 351 Cross-Tabulations: Crosstab 354 10.6 Conclusion 355 11. Time Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 11.1 Date and Time Data Types and Tools 358 Converting Between String and Datetime 359 11.2 Time Series Basics 361 Indexing, Selection, Subsetting 363 Time Series with Duplicate Indices 365 11.3 Date Ranges, Frequencies, and Shifting 366 Generating Date Ranges 367 Frequencies and Date Offsets 370 Shifting (Leading and Lagging) Data 371 11.4 Time Zone Handling 374 Time Zone Localization and Conversion 375 Operations with Time Zone-Aware Timestamp Objects 377 Operations Between Different Time Zones 378 11.5 Periods and Period Arithmetic 379 Period Frequency Conversion 380 Quarterly Period Frequencies 382 Converting Timestamps to Periods (and Back) 384 Creating a PeriodIndex from Arrays 385 11.6 Resampling and Frequency Conversion 387 Downsampling 388 Upsampling and Interpolation 391 Resampling with Periods 392 Grouped Time Resampling 394 11.7 Moving Window Functions 396 Exponentially Weighted Functions 399 Binary Moving Window Functions 401 User-Defined Moving Window Functions 402 11.8 Conclusion 403 12. Introduction to Modeling Libraries in Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 12.1 Interfacing Between pandas and Model Code 405 12.2 Creating Model Descriptions with Patsy 408 Data Transformations in Patsy Formulas 410 Categorical Data and Patsy 412 viii | Table of Contents
  • 15.
    12.3 Introduction tostatsmodels 415 Estimating Linear Models 415 Estimating Time Series Processes 419 12.4 Introduction to scikit-learn 420 12.5 Conclusion 423 13. Data Analysis Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 13.1 Bitly Data from 1.USA.gov 425 Counting Time Zones in Pure Python 426 Counting Time Zones with pandas 428 13.2 MovieLens 1M Dataset 435 Measuring Rating Disagreement 439 13.3 US Baby Names 1880–2010 443 Analyzing Naming Trends 448 13.4 USDA Food Database 457 13.5 2012 Federal Election Commission Database 463 Donation Statistics by Occupation and Employer 466 Bucketing Donation Amounts 469 Donation Statistics by State 471 13.6 Conclusion 472 A. Advanced NumPy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 A.1 ndarray Object Internals 473 NumPy Data Type Hierarchy 474 A.2 Advanced Array Manipulation 476 Reshaping Arrays 476 C Versus FORTRAN Order 478 Concatenating and Splitting Arrays 479 Repeating Elements: tile and repeat 481 Fancy Indexing Equivalents: take and put 483 A.3 Broadcasting 484 Broadcasting over Other Axes 487 Setting Array Values by Broadcasting 489 A.4 Advanced ufunc Usage 490 ufunc Instance Methods 490 Writing New ufuncs in Python 493 A.5 Structured and Record Arrays 493 Nested Data Types and Multidimensional Fields 494 Why Use Structured Arrays? 495 A.6 More About Sorting 495 Indirect Sorts: argsort and lexsort 497 Table of Contents | ix
  • 16.
    Alternative Sort Algorithms498 Partially Sorting Arrays 499 numpy.searchsorted: Finding Elements in a Sorted Array 500 A.7 Writing Fast NumPy Functions with Numba 501 Creating Custom numpy.ufunc Objects with Numba 502 A.8 Advanced Array Input and Output 503 Memory-Mapped Files 503 HDF5 and Other Array Storage Options 504 A.9 Performance Tips 505 The Importance of Contiguous Memory 505 B. More on the IPython System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 B.1 Terminal Keyboard Shortcuts 509 B.2 About Magic Commands 510 The %run Command 512 Executing Code from the Clipboard 513 B.3 Using the Command History 514 Searching and Reusing the Command History 514 Input and Output Variables 515 B.4 Interacting with the Operating System 516 Shell Commands and Aliases 517 Directory Bookmark System 518 B.5 Software Development Tools 519 Interactive Debugger 519 Timing Code: %time and %timeit 523 Basic Profiling: %prun and %run -p 525 Profiling a Function Line by Line 527 B.6 Tips for Productive Code Development Using IPython 529 Reloading Module Dependencies 529 Code Design Tips 530 B.7 Advanced IPython Features 532 Profiles and Configuration 532 B.8 Conclusion 533 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 x | Table of Contents
  • 17.
    Preface The first editionof this book was published in 2012, during a time when open source data analysis libraries for Python, especially pandas, were very new and developing rapidly. When the time came to write the second edition in 2016 and 2017, I needed to update the book not only for Python 3.6 (the first edition used Python 2.7) but also for the many changes in pandas that had occurred over the previous five years. Now in 2022, there are fewer Python language changes (we are now at Python 3.10, with 3.11 coming out at the end of 2022), but pandas has continued to evolve. In this third edition, my goal is to bring the content up to date with current versions of Python, NumPy, pandas, and other projects, while also remaining relatively con‐ servative about discussing newer Python projects that have appeared in the last few years. Since this book has become an important resource for many university courses and working professionals, I will try to avoid topics that are at risk of falling out of date within a year or two. That way paper copies won’t be too difficult to follow in 2023 or 2024 or beyond. A new feature of the third edition is the open access online version hosted on my website at https://wesmckinney.com/book, to serve as a resource and convenience for owners of the print and digital editions. I intend to keep the content reasonably up to date there, so if you own the paper book and run into something that doesn’t work properly, you should check there for the latest content changes. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. xi
  • 18.
    Constant width Used forprogram listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. Using Code Examples You can find data files and related material for each chapter in this book’s GitHub repository at https://github.com/wesm/pydata-book, which is mirrored to Gitee (for those who cannot access GitHub) at https://gitee.com/wesmckinn/pydata-book. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. xii | Preface
  • 19.
    We appreciate, butdo not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Python for Data Analysis by Wes McKinney (O’Reilly). Copyright 2022 Wes McKinney, 978-1-098-10403-0.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/python-data-analysis-3e. Email bookquestions@oreilly.com to comment or ask technical questions about this book. For news and information about our books and courses, visit http://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media. Follow us on Twitter: http://twitter.com/oreillymedia. Watch us on YouTube: http://youtube.com/oreillymedia. Preface | xiii
  • 20.
    Acknowledgments This work isthe product of many years of fruitful discussions and collaborations with, and assistance from many people around the world. I’d like to thank a few of them. In Memoriam: John D. Hunter (1968–2012) Our dear friend and colleague John D. Hunter passed away after a battle with colon cancer on August 28, 2012. This was only a short time after I’d completed the final manuscript for this book’s first edition. John’s impact and legacy in the Python scientific and data communities would be hard to overstate. In addition to developing matplotlib in the early 2000s (a time when Python was not nearly so popular), he helped shape the culture of a critical generation of open source developers who’ve become pillars of the Python ecosystem that we now often take for granted. I was lucky enough to connect with John early in my open source career in January 2010, just after releasing pandas 0.1. His inspiration and mentorship helped me push forward, even in the darkest of times, with my vision for pandas and Python as a first-class data analysis language. John was very close with Fernando Pérez and Brian Granger, pioneers of IPython, Jupyter, and many other initiatives in the Python community. We had hoped to work on a book together, the four of us, but I ended up being the one with the most free time. I am sure he would be proud of what we’ve accomplished, as individuals and as a community, over the last nine years. Acknowledgments for the Third Edition (2022) It has more than a decade since I started writing the first edition of this book and more than 15 years since I originally started my journey as a Python prorammer. A lot has changed since then! Python has evolved from a relatively niche language for data analysis to the most popular and most widely used language powering the plurality (if not the majority!) of data science, machine learning, and artificial intelligence work. I have not been an active contributor to the pandas open source project since 2013, but its worldwide developer community has continued to thrive, serving as a model of community-centric open source software development. Many “next-generation” Python projects that deal with tabular data are modeling their user interfaces directly after pandas, so the project has proved to have an enduring influence on the future trajectory of the Python data science ecosystem. xiv | Preface
  • 21.
    I hope thatthis book continues to serve as a valuable resource for students and individuals who want to learn about working with data in Python. I’m especially thankful to O’Reilly for allowing me to publish an “open access” version of this book on my website at https://wesmckinney.com/book, where I hope it will reach even more people and help expand opportunity in the world of data analysis. J.J. Allaire was a lifesaver in making this possible by helping me “port” the book from Docbook XML to Quarto, a wonderful new scientific and technical publishing system for print and web. Special thanks to my technical reviewers Paul Barry, Jean-Christophe Leyder, Abdul‐ lah Karasan, and William Jamir, whose thorough feedback has greatly improved the readability, clarity, and understandability of the content. Acknowledgments for the Second Edition (2017) It has been five years almost to the day since I completed the manuscript for this book’s first edition in July 2012. A lot has changed. The Python community has grown immensely, and the ecosystem of open source software around it has flourished. This new edition of the book would not exist if not for the tireless efforts of the pandas core developers, who have grown the project and its user community into one of the cornerstones of the Python data science ecosystem. These include, but are not limited to, Tom Augspurger, Joris van den Bossche, Chris Bartak, Phillip Cloud, gfyoung, Andy Hayden, Masaaki Horikoshi, Stephan Hoyer, Adam Klein, Wouter Overmeire, Jeff Reback, Chang She, Skipper Seabold, Jeff Tratner, and y-p. On the actual writing of this second edition, I would like to thank the O’Reilly staff who helped me patiently with the writing process. This includes Marie Beaugureau, Ben Lorica, and Colleen Toporek. I again had outstanding technical reviewers with Tom Augspurger, Paul Barry, Hugh Brown, Jonathan Coe, and Andreas Müller con‐ tributing. Thank you. This book’s first edition has been translated into many foreign languages, including Chinese, French, German, Japanese, Korean, and Russian. Translating all this content and making it available to a broader audience is a huge and often thankless effort. Thank you for helping more people in the world learn how to program and use data analysis tools. I am also lucky to have had support for my continued open source development efforts from Cloudera and Two Sigma Investments over the last few years. With open source software projects more thinly resourced than ever relative to the size of user bases, it is becoming increasingly important for businesses to provide support for development of key open source projects. It’s the right thing to do. Preface | xv
  • 22.
    Acknowledgments for theFirst Edition (2012) It would have been difficult for me to write this book without the support of a large number of people. On the O’Reilly staff, I’m very grateful for my editors, Meghan Blanchette and Julie Steele, who guided me through the process. Mike Loukides also worked with me in the proposal stages and helped make the book a reality. I received a wealth of technical review from a large cast of characters. In particu‐ lar, Martin Blais and Hugh Brown were incredibly helpful in improving the book’s examples, clarity, and organization from cover to cover. James Long, Drew Conway, Fernando Pérez, Brian Granger, Thomas Kluyver, Adam Klein, Josh Klein, Chang She, and Stéfan van der Walt each reviewed one or more chapters, providing pointed feedback from many different perspectives. I got many great ideas for examples and datasets from friends and colleagues in the data community, among them: Mike Dewar, Jeff Hammerbacher, James Johndrow, Kristian Lum, Adam Klein, Hilary Mason, Chang She, and Ashley Williams. I am of course indebted to the many leaders in the open source scientific Python community who’ve built the foundation for my development work and gave encour‐ agement while I was writing this book: the IPython core team (Fernando Pérez, Brian Granger, Min Ragan-Kelly, Thomas Kluyver, and others), John Hunter, Skipper Seabold, Travis Oliphant, Peter Wang, Eric Jones, Robert Kern, Josef Perktold, Fran‐ cesc Alted, Chris Fonnesbeck, and too many others to mention. Several other people provided a great deal of support, ideas, and encouragement along the way: Drew Conway, Sean Taylor, Giuseppe Paleologo, Jared Lander, David Epstein, John Krowas, Joshua Bloom, Den Pilsworth, John Myles-White, and many others I’ve forgotten. I’d also like to thank a number of people from my formative years. First, my former AQR colleagues who’ve cheered me on in my pandas work over the years: Alex Reyf‐ man, Michael Wong, Tim Sargen, Oktay Kurbanov, Matthew Tschantz, Roni Israelov, Michael Katz, Ari Levine, Chris Uga, Prasad Ramanan, Ted Square, and Hoon Kim. Lastly, my academic advisors Haynes Miller (MIT) and Mike West (Duke). I received significant help from Phillip Cloud and Joris van den Bossche in 2014 to update the book’s code examples and fix some other inaccuracies due to changes in pandas. On the personal side, Casey provided invaluable day-to-day support during the writing process, tolerating my highs and lows as I hacked together the final draft on top of an already overcommitted schedule. Lastly, my parents, Bill and Kim, taught me to always follow my dreams and to never settle for less. xvi | Preface
  • 23.
    CHAPTER 1 Preliminaries 1.1 WhatIs This Book About? This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. My goal is to offer a guide to the parts of the Python programming language and its data-oriented library ecosystem and tools that will equip you to become an effective data analyst. While “data analysis” is in the title of the book, the focus is specifically on Python programming, libraries, and tools as opposed to data analysis methodology. This is the Python programming you need for data analysis. Sometime after I originally published this book in 2012, people started using the term data science as an umbrella description for everything from simple descriptive statistics to more advanced statistical analysis and machine learning. The Python open source ecosystem for doing data analysis (or data science) has also expanded significantly since then. There are now many other books which focus specifically on these more advanced methodologies. My hope is that this book serves as adequate preparation to enable you to move on to a more domain-specific resource. Some might characterize much of the content of the book as “data manipulation” as opposed to “data analysis.” We also use the terms wrangling or munging to refer to data manipulation. What Kinds of Data? When I say “data,” what am I referring to exactly? The primary focus is on structured data, a deliberately vague term that encompasses many different common forms of data, such as: 1
  • 24.
    • Tabular orspreadsheet-like data in which each column may be a different type • (string, numeric, date, or otherwise). This includes most kinds of data commonly stored in relational databases or tab- or comma-delimited text files. • Multidimensional arrays (matrices). • • Multiple tables of data interrelated by key columns (what would be primary or • foreign keys for a SQL user). • Evenly or unevenly spaced time series. • This is by no means a complete list. Even though it may not always be obvious, a large percentage of datasets can be transformed into a structured form that is more suitable for analysis and modeling. If not, it may be possible to extract features from a dataset into a structured form. As an example, a collection of news articles could be processed into a word frequency table, which could then be used to perform sentiment analysis. Most users of spreadsheet programs like Microsoft Excel, perhaps the most widely used data analysis tool in the world, will not be strangers to these kinds of data. 1.2 Why Python for Data Analysis? For many people, the Python programming language has strong appeal. Since its first appearance in 1991, Python has become one of the most popular interpreted programming languages, along with Perl, Ruby, and others. Python and Ruby have become especially popular since 2005 or so for building websites using their numer‐ ous web frameworks, like Rails (Ruby) and Django (Python). Such languages are often called scripting languages, as they can be used to quickly write small programs, or scripts to automate other tasks. I don’t like the term “scripting languages,” as it carries a connotation that they cannot be used for building serious software. Among interpreted languages, for various historical and cultural reasons, Python has devel‐ oped a large and active scientific computing and data analysis community. In the last 20 years, Python has gone from a bleeding-edge or “at your own risk” scientific com‐ puting language to one of the most important languages for data science, machine learning, and general software development in academia and industry. For data analysis and interactive computing and data visualization, Python will inevi‐ tably draw comparisons with other open source and commercial programming lan‐ guages and tools in wide use, such as R, MATLAB, SAS, Stata, and others. In recent years, Python’s improved open source libraries (such as pandas and scikit-learn) have made it a popular choice for data analysis tasks. Combined with Python’s overall strength for general-purpose software engineering, it is an excellent option as a primary language for building data applications. 2 | Chapter 1: Preliminaries
  • 25.
    Python as Glue Partof Python’s success in scientific computing is the ease of integrating C, C++, and FORTRAN code. Most modern computing environments share a similar set of legacy FORTRAN and C libraries for doing linear algebra, optimization, integration, fast Fourier transforms, and other such algorithms. The same story has held true for many companies and national labs that have used Python to glue together decades’ worth of legacy software. Many programs consist of small portions of code where most of the time is spent, with large amounts of “glue code” that doesn’t run often. In many cases, the execution time of the glue code is insignificant; effort is most fruitfully invested in optimizing the computational bottlenecks, sometimes by moving the code to a lower-level lan‐ guage like C. Solving the “Two-Language” Problem In many organizations, it is common to research, prototype, and test new ideas using a more specialized computing language like SAS or R and then later port those ideas to be part of a larger production system written in, say, Java, C#, or C++. What people are increasingly finding is that Python is a suitable language not only for doing research and prototyping but also for building the production systems. Why maintain two development environments when one will suffice? I believe that more and more companies will go down this path, as there are often significant organizational benefits to having both researchers and software engineers using the same set of programming tools. Over the last decade some new approaches to solving the “two-language” problem have appeared, such as the Julia programming language. Getting the most out of Python in many cases will require programming in a low-level language like C or C++ and creating Python bindings to that code. That said, “just-in-time” (JIT) com‐ piler technology provided by libraries like Numba have provided a way to achieve excellent performance in many computational algorithms without having to leave the Python programming environment. Why Not Python? While Python is an excellent environment for building many kinds of analytical applications and general-purpose systems, there are a number of uses for which Python may be less suitable. As Python is an interpreted programming language, in general most Python code will run substantially slower than code written in a compiled language like Java or C++. As programmer time is often more valuable than CPU time, many are happy to make this trade-off. However, in an application with very low latency or demanding 1.2 Why Python for Data Analysis? | 3
  • 26.
    resource utilization requirements(e.g., a high-frequency trading system), the time spent programming in a lower-level (but also lower-productivity) language like C++ to achieve the maximum possible performance might be time well spent. Python can be a challenging language for building highly concurrent, multithreaded applications, particularly applications with many CPU-bound threads. The reason for this is that it has what is known as the global interpreter lock (GIL), a mechanism that prevents the interpreter from executing more than one Python instruction at a time. The technical reasons for why the GIL exists are beyond the scope of this book. While it is true that in many big data processing applications, a cluster of computers may be required to process a dataset in a reasonable amount of time, there are still situations where a single-process, multithreaded system is desirable. This is not to say that Python cannot execute truly multithreaded, parallel code. Python C extensions that use native multithreading (in C or C++) can run code in parallel without being impacted by the GIL, as long as they do not need to regularly interact with Python objects. 1.3 Essential Python Libraries For those who are less familiar with the Python data ecosystem and the libraries used throughout the book, I will give a brief overview of some of them. NumPy NumPy, short for Numerical Python, has long been a cornerstone of numerical computing in Python. It provides the data structures, algorithms, and library glue needed for most scientific applications involving numerical data in Python. NumPy contains, among other things: • A fast and efficient multidimensional array object ndarray • • Functions for performing element-wise computations with arrays or mathemati‐ • cal operations between arrays • Tools for reading and writing array-based datasets to disk • • Linear algebra operations, Fourier transform, and random number generation • • A mature C API to enable Python extensions and native C or C++ code to access • NumPy’s data structures and computational facilities Beyond the fast array-processing capabilities that NumPy adds to Python, one of its primary uses in data analysis is as a container for data to be passed between algorithms and libraries. For numerical data, NumPy arrays are more efficient for storing and manipulating data than the other built-in Python data structures. Also, libraries written in a lower-level language, such as C or FORTRAN, can operate on 4 | Chapter 1: Preliminaries
  • 27.
    the data storedin a NumPy array without copying data into some other memory representation. Thus, many numerical computing tools for Python either assume NumPy arrays as a primary data structure or else target interoperability with NumPy. pandas pandas provides high-level data structures and functions designed to make working with structured or tabular data intuitive and flexible. Since its emergence in 2010, it has helped enable Python to be a powerful and productive data analysis environment. The primary objects in pandas that will be used in this book are the DataFrame, a tabular, column-oriented data structure with both row and column labels, and the Series, a one-dimensional labeled array object. pandas blends the array-computing ideas of NumPy with the kinds of data manipu‐ lation capabilities found in spreadsheets and relational databases (such as SQL). It provides convenient indexing functionality to enable you to reshape, slice and dice, perform aggregations, and select subsets of data. Since data manipulation, prepara‐ tion, and cleaning are such important skills in data analysis, pandas is one of the primary focuses of this book. As a bit of background, I started building pandas in early 2008 during my tenure at AQR Capital Management, a quantitative investment management firm. At the time, I had a distinct set of requirements that were not well addressed by any single tool at my disposal: • Data structures with labeled axes supporting automatic or explicit data align‐ • ment—this prevents common errors resulting from misaligned data and working with differently indexed data coming from different sources • Integrated time series functionality • • The same data structures handle both time series data and non-time series data • • Arithmetic operations and reductions that preserve metadata • • Flexible handling of missing data • • Merge and other relational operations found in popular databases (SQL-based, • for example) I wanted to be able to do all of these things in one place, preferably in a language well suited to general-purpose software development. Python was a good candidate language for this, but at that time an integrated set of data structures and tools providing this functionality did not exist. As a result of having been built initially to solve finance and business analytics problems, pandas features especially deep time series functionality and tools well suited for working with time-indexed data generated by business processes. 1.3 Essential Python Libraries | 5
  • 28.
    I spent alarge part of 2011 and 2012 expanding pandas’s capabilities with some of my former AQR colleagues, Adam Klein and Chang She. In 2013, I stopped being as involved in day-to-day project development, and pandas has since become a fully community-owned and community-maintained project with well over two thousand unique contributors around the world. For users of the R language for statistical computing, the DataFrame name will be familiar, as the object was named after the similar R data.frame object. Unlike Python, data frames are built into the R programming language and its standard library. As a result, many features found in pandas are typically either part of the R core implementation or provided by add-on packages. The pandas name itself is derived from panel data, an econometrics term for multidi‐ mensional structured datasets, and a play on the phrase Python data analysis. matplotlib matplotlib is the most popular Python library for producing plots and other two- dimensional data visualizations. It was originally created by John D. Hunter and is now maintained by a large team of developers. It is designed for creating plots suitable for publication. While there are other visualization libraries available to Python programmers, matplotlib is still widely used and integrates reasonably well with the rest of the ecosystem. I think it is a safe choice as a default visualization tool. IPython and Jupyter The IPython project began in 2001 as Fernando Pérez’s side project to make a better interactive Python interpreter. Over the subsequent 20 years it has become one of the most important tools in the modern Python data stack. While it does not provide any computational or data analytical tools by itself, IPython is designed for both interactive computing and software development work. It encourages an execute-explore workflow instead of the typical edit-compile-run workflow of many other programming languages. It also provides integrated access to your operating system’s shell and filesystem; this reduces the need to switch between a terminal window and a Python session in many cases. Since much of data analysis coding involves exploration, trial and error, and iteration, IPython can help you get the job done faster. In 2014, Fernando and the IPython team announced the Jupyter project, a broader initiative to design language-agnostic interactive computing tools. The IPython web notebook became the Jupyter notebook, with support now for over 40 programming languages. The IPython system can now be used as a kernel (a programming language mode) for using Python with Jupyter. 6 | Chapter 1: Preliminaries
  • 29.
    IPython itself hasbecome a component of the much broader Jupyter open source project, which provides a productive environment for interactive and exploratory computing. Its oldest and simplest “mode” is as an enhanced Python shell designed to accelerate the writing, testing, and debugging of Python code. You can also use the IPython system through the Jupyter notebook. The Jupyter notebook system also allows you to author content in Markdown and HTML, providing you a means to create rich documents with code and text. I personally use IPython and Jupyter regularly in my Python work, whether running, debugging, or testing code. In the accompanying book materials on GitHub, you will find Jupyter notebooks containing all the code examples from each chapter. If you cannot access GitHub where you are, you can try the mirror on Gitee. SciPy SciPy is a collection of packages addressing a number of foundational problems in scientific computing. Here are some of the tools it contains in its various modules: scipy.integrate Numerical integration routines and differential equation solvers scipy.linalg Linear algebra routines and matrix decompositions extending beyond those pro‐ vided in numpy.linalg scipy.optimize Function optimizers (minimizers) and root finding algorithms scipy.signal Signal processing tools scipy.sparse Sparse matrices and sparse linear system solvers scipy.special Wrapper around SPECFUN, a FORTRAN library implementing many common mathematical functions, such as the gamma function scipy.stats Standard continuous and discrete probability distributions (density functions, samplers, continuous distribution functions), various statistical tests, and more descriptive statistics 1.3 Essential Python Libraries | 7
  • 30.
    Together, NumPy andSciPy form a reasonably complete and mature computational foundation for many traditional scientific computing applications. scikit-learn Since the project’s inception in 2007, scikit-learn has become the premier general- purpose machine learning toolkit for Python programmers. As of this writing, more than two thousand different individuals have contributed code to the project. It includes submodules for such models as: • Classification: SVM, nearest neighbors, random forest, logistic regression, etc. • • Regression: Lasso, ridge regression, etc. • • Clustering: k-means, spectral clustering, etc. • • Dimensionality reduction: PCA, feature selection, matrix factorization, etc. • • Model selection: Grid search, cross-validation, metrics • • Preprocessing: Feature extraction, normalization • Along with pandas, statsmodels, and IPython, scikit-learn has been critical for ena‐ bling Python to be a productive data science programming language. While I won’t be able to include a comprehensive guide to scikit-learn in this book, I will give a brief introduction to some of its models and how to use them with the other tools presented in the book. statsmodels statsmodels is a statistical analysis package that was seeded by work from Stanford University statistics professor Jonathan Taylor, who implemented a number of regres‐ sion analysis models popular in the R programming language. Skipper Seabold and Josef Perktold formally created the new statsmodels project in 2010 and since then have grown the project to a critical mass of engaged users and contributors. Nathaniel Smith developed the Patsy project, which provides a formula or model specification framework for statsmodels inspired by R’s formula system. Compared with scikit-learn, statsmodels contains algorithms for classical (primarily frequentist) statistics and econometrics. This includes such submodules as: • Regression models: linear regression, generalized linear models, robust linear • models, linear mixed effects models, etc. • Analysis of variance (ANOVA) • • Time series analysis: AR, ARMA, ARIMA, VAR, and other models • • Nonparametric methods: Kernel density estimation, kernel regression • 8 | Chapter 1: Preliminaries
  • 31.
    • Visualization ofstatistical model results • statsmodels is more focused on statistical inference, providing uncertainty estimates and p-values for parameters. scikit-learn, by contrast, is more prediction focused. As with scikit-learn, I will give a brief introduction to statsmodels and how to use it with NumPy and pandas. Other Packages In 2022, there are many other Python libraries which might be discussed in a book about data science. This includes some newer projects like TensorFlow or PyTorch, which have become popular for machine learning or artificial intelligence work. Now that there are other books out there that focus more specifically on those projects, I would recommend using this book to build a foundation in general-purpose Python data wrangling. Then, you should be well prepared to move on to a more advanced resource that may assume a certain level of expertise. 1.4 Installation and Setup Since everyone uses Python for different applications, there is no single solution for setting up Python and obtaining the necessary add-on packages. Many readers will not have a complete Python development environment suitable for following along with this book, so here I will give detailed instructions to get set up on each operating system. I will be using Miniconda, a minimal installation of the conda package manager, along with conda-forge, a community-maintained software distribution based on conda. This book uses Python 3.10 throughout, but if you’re reading in the future, you are welcome to install a newer version of Python. If for some reason these instructions become out-of-date by the time you are reading this, you can check out my website for the book which I will endeavor to keep up to date with the latest installation instructions. Miniconda on Windows To get started on Windows, download the Miniconda installer for the latest Python version available (currently 3.9) from https://conda.io. I recommend following the installation instructions for Windows available on the conda website, which may have changed between the time this book was published and when you are reading this. Most people will want the 64-bit version, but if that doesn’t run on your Windows machine, you can install the 32-bit version instead. When prompted whether to install for just yourself or for all users on your system, choose the option that’s most appropriate for you. Installing just for yourself will be sufficient to follow along with the book. It will also ask you whether you want to 1.4 Installation and Setup | 9
  • 32.
    add Miniconda tothe system PATH environment variable. If you select this (I usually do), then this Miniconda installation may override other versions of Python you have installed. If you do not, then you will need to use the Window Start menu shortcut that’s installed to be able to use this Miniconda. This Start menu entry may be called “Anaconda3 (64-bit).” I’ll assume that you haven’t added Miniconda to your system PATH. To verify that things are configured correctly, open the “Anaconda Prompt (Miniconda3)” entry under “Anaconda3 (64-bit)” in the Start menu. Then try launching the Python inter‐ preter by typing python. You should see a message like this: (base) C:UsersWes>python Python 3.9 [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32 Type "help", "copyright", "credits" or "license" for more information. >>> To exit the Python shell, type exit() and press Enter. GNU/Linux Linux details will vary a bit depending on your Linux distribution type, but here I give details for such distributions as Debian, Ubuntu, CentOS, and Fedora. Setup is similar to macOS with the exception of how Miniconda is installed. Most readers will want to download the default 64-bit installer file, which is for x86 architecture (but it’s possible in the future more users will have aarch64-based Linux machines). The installer is a shell script that must be executed in the terminal. You will then have a file named something similar to Miniconda3-latest-Linux-x86_64.sh. To install it, execute this script with bash: $ bash Miniconda3-latest-Linux-x86_64.sh Some Linux distributions have all the required Python packages (although outdated versions, in some cases) in their package man‐ agers and can be installed using a tool like apt. The setup described here uses Miniconda, as it’s both easily reproducible across distri‐ butions and simpler to upgrade packages to their latest versions. You will have a choice of where to put the Miniconda files. I recommend installing the files in the default location in your home directory; for example, /home/$USER/ miniconda (with your username, naturally). The installer will ask if you wish to modify your shell scripts to automatically activate Miniconda. I recommend doing this (select “yes”) as a matter of convenience. After completing the installation, start a new terminal process and verify that you are picking up the new Miniconda installation: 10 | Chapter 1: Preliminaries
  • 33.
    (base) $ python Python3.9 | (main) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> To exit the Python shell, type exit() and press Enter or press Ctrl-D. Miniconda on macOS Download the macOS Miniconda installer, which should be named something like Miniconda3-latest-MacOSX-arm64.sh for Apple Silicon-based macOS computers released from 2020 onward, or Miniconda3-latest-MacOSX-x86_64.sh for Intel-based Macs released before 2020. Open the Terminal application in macOS, and install by executing the installer (most likely in your Downloads directory) with bash: $ bash $HOME/Downloads/Miniconda3-latest-MacOSX-arm64.sh When the installer runs, by default it automatically configures Miniconda in your default shell environment in your default shell profile. This is probably located at /Users/$USER/.zshrc. I recommend letting it do this; if you do not want to allow the installer to modify your default shell environment, you will need to consult the Miniconda documentation to be able to proceed. To verify everything is working, try launching Python in the system shell (open the Terminal application to get a command prompt): $ python Python 3.9 (main) [Clang 12.0.1 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> To exit the shell, press Ctrl-D or type exit() and press Enter. Installing Necessary Packages Now that we have set up Miniconda on your system, it’s time to install the main packages we will be using in this book. The first step is to configure conda-forge as your default package channel by running the following commands in a shell: (base) $ conda config --add channels conda-forge (base) $ conda config --set channel_priority strict Now, we will create a new conda “environment” with the conda create command using Python 3.10: (base) $ conda create -y -n pydata-book python=3.10 After the installation completes, activate the environment with conda activate: (base) $ conda activate pydata-book (pydata-book) $ 1.4 Installation and Setup | 11
  • 34.
    It is necessaryto use conda activate to activate your environment each time you open a new terminal. You can see information about the active conda environment at any time from the terminal by running conda info. Now, we will install the essential packages used throughout the book (along with their dependencies) with conda install: (pydata-book) $ conda install -y pandas jupyter matplotlib We will be using some other packages, too, but these can be installed later once they are needed. There are two ways to install packages: with conda install and with pip install. conda install should always be preferred when using Miniconda, but some packages are not available through conda, so if conda install $package_name fails, try pip install $package_name. If you want to install all of the packages used in the rest of the book, you can do that now by running: conda install lxml beautifulsoup4 html5lib openpyxl requests sqlalchemy seaborn scipy statsmodels patsy scikit-learn pyarrow pytables numba On Windows, substitute a carat ^ for the line continuation used on Linux and macOS. You can update packages by using the conda update command: conda update package_name pip also supports upgrades using the --upgrade flag: pip install --upgrade package_name You will have several opportunities to try out these commands throughout the book. While you can use both conda and pip to install packages, you should avoid updating packages originally installed with conda using pip (and vice versa), as doing so can lead to environment problems. I recommend sticking to conda if you can and falling back on pip only for packages that are unavailable with conda install. Integrated Development Environments and Text Editors When asked about my standard development environment, I almost always say “IPy‐ thon plus a text editor.” I typically write a program and iteratively test and debug each piece of it in IPython or Jupyter notebooks. It is also useful to be able to play around 12 | Chapter 1: Preliminaries
  • 35.
    with data interactivelyand visually verify that a particular set of data manipulations is doing the right thing. Libraries like pandas and NumPy are designed to be productive to use in the shell. When building software, however, some users may prefer to use a more richly featured integrated development environment (IDE) and rather than an editor like Emacs or Vim which provide a more minimal environment out of the box. Here are some that you can explore: • PyDev (free), an IDE built on the Eclipse platform • • PyCharm from JetBrains (subscription-based for commercial users, free for open • source developers) • Python Tools for Visual Studio (for Windows users) • • Spyder (free), an IDE currently shipped with Anaconda • • Komodo IDE (commercial) • Due to the popularity of Python, most text editors, like VS Code and Sublime Text 2, have excellent Python support. 1.5 Community and Conferences Outside of an internet search, the various scientific and data-related Python mailing lists are generally helpful and responsive to questions. Some to take a look at include: • pydata: A Google Group list for questions related to Python for data analysis and • pandas • pystatsmodels: For statsmodels or pandas-related questions • • Mailing list for scikit-learn (scikit-learn@python.org) and machine learning in • Python, generally • numpy-discussion: For NumPy-related questions • • scipy-user: For general SciPy or scientific Python questions • I deliberately did not post URLs for these in case they change. They can be easily located via an internet search. Each year many conferences are held all over the world for Python programmers. If you would like to connect with other Python programmers who share your inter‐ ests, I encourage you to explore attending one, if possible. Many conferences have financial support available for those who cannot afford admission or travel to the conference. Here are some to consider: 1.5 Community and Conferences | 13
  • 36.
    • PyCon andEuroPython: The two main general Python conferences in North • America and Europe, respectively • SciPy and EuroSciPy: Scientific-computing-oriented conferences in North Amer‐ • ica and Europe, respectively • PyData: A worldwide series of regional conferences targeted at data science and • data analysis use cases • International and regional PyCon conferences (see https://pycon.org for a com‐ • plete listing) 1.6 Navigating This Book If you have never programmed in Python before, you will want to spend some time in Chapters 2 and 3, where I have placed a condensed tutorial on Python language features and the IPython shell and Jupyter notebooks. These things are prerequisite knowledge for the remainder of the book. If you have Python experience already, you may instead choose to skim or skip these chapters. Next, I give a short introduction to the key features of NumPy, leaving more advanced NumPy use for Appendix A. Then, I introduce pandas and devote the rest of the book to data analysis topics applying pandas, NumPy, and matplotlib (for visualization). I have structured the material in an incremental fashion, though there is occasionally some minor crossover between chapters, with a few cases where concepts are used that haven’t been introduced yet. While readers may have many different end goals for their work, the tasks required generally fall into a number of different broad groups: Interacting with the outside world Reading and writing with a variety of file formats and data stores Preparation Cleaning, munging, combining, normalizing, reshaping, slicing and dicing, and transforming data for analysis Transformation Applying mathematical and statistical operations to groups of datasets to derive new datasets (e.g., aggregating a large table by group variables) Modeling and computation Connecting your data to statistical models, machine learning algorithms, or other computational tools Presentation Creating interactive or static graphical visualizations or textual summaries 14 | Chapter 1: Preliminaries
  • 37.
    Code Examples Most ofthe code examples in the book are shown with input and output as it would appear executed in the IPython shell or in Jupyter notebooks: In [5]: CODE EXAMPLE Out[5]: OUTPUT When you see a code example like this, the intent is for you to type the example code in the In block in your coding environment and execute it by pressing the Enter key (or Shift-Enter in Jupyter). You should see output similar to what is shown in the Out block. I changed the default console output settings in NumPy and pandas to improve readability and brevity throughout the book. For example, you may see more digits of precision printed in numeric data. To exactly match the output shown in the book, you can execute the following Python code before running the code examples: import numpy as np import pandas as pd pd.options.display.max_columns = 20 pd.options.display.max_rows = 20 pd.options.display.max_colwidth = 80 np.set_printoptions(precision=4, suppress=True) Data for Examples Datasets for the examples in each chapter are hosted in a GitHub repository (or in a mirror on Gitee if you cannot access GitHub). You can download this data either by using the Git version control system on the command line or by downloading a zip file of the repository from the website. If you run into problems, navigate to the book website for up-to-date instructions about obtaining the book materials. If you download a zip file containing the example datasets, you must then fully extract the contents of the zip file to a directory and navigate to that directory from the terminal before proceeding with running the book’s code examples: $ pwd /home/wesm/book-materials $ ls appa.ipynb ch05.ipynb ch09.ipynb ch13.ipynb README.md ch02.ipynb ch06.ipynb ch10.ipynb COPYING requirements.txt ch03.ipynb ch07.ipynb ch11.ipynb datasets ch04.ipynb ch08.ipynb ch12.ipynb examples 1.6 Navigating This Book | 15
  • 38.
    I have madeevery effort to ensure that the GitHub repository contains everything necessary to reproduce the examples, but I may have made some mistakes or omis‐ sions. If so, please send me an email: book@wesmckinney.com. The best way to report errors in the book is on the errata page on the O’Reilly website. Import Conventions The Python community has adopted a number of naming conventions for commonly used modules: import numpy as np import matplotlib.pyplot as plt import pandas as pd import seaborn as sns import statsmodels as sm This means that when you see np.arange, this is a reference to the arange function in NumPy. This is done because it’s considered bad practice in Python software development to import everything (from numpy import *) from a large package like NumPy. 16 | Chapter 1: Preliminaries
  • 39.
    CHAPTER 2 Python LanguageBasics, IPython, and Jupyter Notebooks When I wrote the first edition of this book in 2011 and 2012, there were fewer resources available for learning about doing data analysis in Python. This was partially a chicken-and-egg problem; many libraries that we now take for granted, like pandas, scikit-learn, and statsmodels, were comparatively immature back then. Now in 2022, there is now a growing literature on data science, data analysis, and machine learning, supplementing the prior works on general-purpose scientific com‐ puting geared toward computational scientists, physicists, and professionals in other research fields. There are also excellent books about learning the Python program‐ ming language itself and becoming an effective software engineer. As this book is intended as an introductory text in working with data in Python, I feel it is valuable to have a self-contained overview of some of the most important features of Python’s built-in data structures and libraries from the perspective of data manipulation. So, I will only present roughly enough information in this chapter and Chapter 3 to enable you to follow along with the rest of the book. Much of this book focuses on table-based analytics and data preparation tools for working with datasets that are small enough to fit on your personal computer. To use these tools you must sometimes do some wrangling to arrange messy data into a more nicely tabular (or structured) form. Fortunately, Python is an ideal language for doing this. The greater your facility with the Python language and its built-in data types, the easier it will be for you to prepare new datasets for analysis. Some of the tools in this book are best explored from a live IPython or Jupyter session. Once you learn how to start up IPython and Jupyter, I recommend that you follow along with the examples so you can experiment and try different things. As 17
  • 40.
    with any keyboard-drivenconsole-like environment, developing familiarity with the common commands is also part of the learning curve. There are introductory Python concepts that this chapter does not cover, like classes and object-oriented programming, which you may find useful in your foray into data analysis in Python. To deepen your Python language knowledge, I recommend that you supplement this chapter with the official Python tutorial and potentially one of the many excellent books on general-purpose Python programming. Some recommendations to get you started include: • Python Cookbook, Third Edition, by David Beazley and Brian • K. Jones (O’Reilly) • Fluent Python by Luciano Ramalho (O’Reilly) • • Effective Python, Second Edition, by Brett Slatkin (Addison- • Wesley) 2.1 The Python Interpreter Python is an interpreted language. The Python interpreter runs a program by execut‐ ing one statement at a time. The standard interactive Python interpreter can be invoked on the command line with the python command: $ python Python 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:38:57) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> a = 5 >>> print(a) 5 The >>> you see is the prompt after which you’ll type code expressions. To exit the Python interpreter, you can either type exit() or press Ctrl-D (works on Linux and macOS only). Running Python programs is as simple as calling python with a .py file as its first argument. Suppose we had created hello_world.py with these contents: print("Hello world") You can run it by executing the following command (the hello_world.py file must be in your current working terminal directory): $ python hello_world.py Hello world 18 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
  • 41.
    While some Pythonprogrammers execute all of their Python code in this way, those doing data analysis or scientific computing make use of IPython, an enhanced Python interpreter, or Jupyter notebooks, web-based code notebooks originally cre‐ ated within the IPython project. I give an introduction to using IPython and Jupyter in this chapter and have included a deeper look at IPython functionality in Appen‐ dix A. When you use the %run command, IPython executes the code in the specified file in the same process, enabling you to explore the results interactively when it’s done: $ ipython Python 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:38:57) Type 'copyright', 'credits' or 'license' for more information IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help. In [1]: %run hello_world.py Hello world In [2]: The default IPython prompt adopts the numbered In [2]: style, compared with the standard >>> prompt. 2.2 IPython Basics In this section, I’ll get you up and running with the IPython shell and Jupyter notebook, and introduce you to some of the essential concepts. Running the IPython Shell You can launch the IPython shell on the command line just like launching the regular Python interpreter except with the ipython command: $ ipython Python 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:38:57) Type 'copyright', 'credits' or 'license' for more information IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help. In [1]: a = 5 In [2]: a Out[2]: 5 You can execute arbitrary Python statements by typing them and pressing Return (or Enter). When you type just a variable into IPython, it renders a string representation of the object: In [5]: import numpy as np In [6]: data = [np.random.standard_normal() for i in range(7)] 2.2 IPython Basics | 19
  • 42.
    In [7]: data Out[7]: [-0.20470765948471295, 0.47894333805754824, -0.5194387150567381, -0.55573030434749, 1.9657805725027142, 1.3934058329729904, 0.09290787674371767] Thefirst two lines are Python code statements; the second statement creates a vari‐ able named data that refers to a newly created Python dictionary. The last line prints the value of data in the console. Many kinds of Python objects are formatted to be more readable, or pretty-printed, which is distinct from normal printing with print. If you printed the above data variable in the standard Python interpreter, it would be much less readable: >>> import numpy as np >>> data = [np.random.standard_normal() for i in range(7)] >>> print(data) >>> data [-0.5767699931966723, -0.1010317773535111, -1.7841005313329152, -1.524392126408841, 0.22191374220117385, -1.9835710588082562, -1.6081963964963528] IPython also provides facilities to execute arbitrary blocks of code (via a somewhat glorified copy-and-paste approach) and whole Python scripts. You can also use the Jupyter notebook to work with larger blocks of code, as we will soon see. Running the Jupyter Notebook One of the major components of the Jupyter project is the notebook, a type of interactive document for code, text (including Markdown), data visualizations, and other output. The Jupyter notebook interacts with kernels, which are implementations of the Jupyter interactive computing protocol specific to different programming languages. The Python Jupyter kernel uses the IPython system for its underlying behavior. To start up Jupyter, run the command jupyter notebook in a terminal: $ jupyter notebook [I 15:20:52.739 NotebookApp] Serving notebooks from local directory: /home/wesm/code/pydata-book [I 15:20:52.739 NotebookApp] 0 active kernels [I 15:20:52.739 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/?token=0a77b52fefe52ab83e3c35dff8de121e4bb443a63f2d... [I 15:20:52.740 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). Created new window in existing browser session. 20 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
  • 43.
    To access thenotebook, open this file in a browser: file:///home/wesm/.local/share/jupyter/runtime/nbserver-185259-open.html Or copy and paste one of these URLs: http://localhost:8888/?token=0a77b52fefe52ab83e3c35dff8de121e4... or http://127.0.0.1:8888/?token=0a77b52fefe52ab83e3c35dff8de121e4... On many platforms, Jupyter will automatically open in your default web browser (unless you start it with --no-browser). Otherwise, you can navigate to the HTTP address printed when you started the notebook, here http://localhost:8888/? token=0a77b52fefe52ab83e3c35dff8de121e4bb443a63f2d3055. See Figure 2-1 for what this looks like in Google Chrome. Many people use Jupyter as a local computing environment, but it can also be deployed on servers and accessed remotely. I won’t cover those details here, but I encourage you to explore this topic on the internet if it’s relevant to your needs. Figure 2-1. Jupyter notebook landing page 2.2 IPython Basics | 21
  • 44.
    To create anew notebook, click the New button and select the “Python 3” option. You should see something like Figure 2-2. If this is your first time, try clicking on the empty code “cell” and entering a line of Python code. Then press Shift-Enter to execute it. Figure 2-2. Jupyter new notebook view When you save the notebook (see “Save and Checkpoint” under the notebook File menu), it creates a file with the extension .ipynb. This is a self-contained file format that contains all of the content (including any evaluated code output) currently in the notebook. These can be loaded and edited by other Jupyter users. To rename an open notebook, click on the notebook title at the top of the page and type the new title, pressing Enter when you are finished. To load an existing notebook, put the file in the same directory where you started the notebook process (or in a subfolder within it), then click the name from the landing page. You can try it out with the notebooks from my wesm/pydata-book repository on GitHub. See Figure 2-3. When you want to close a notebook, click the File menu and select “Close and Halt.” If you simply close the browser tab, the Python process associated with the notebook will keep running in the background. While the Jupyter notebook may feel like a distinct experience from the IPython shell, nearly all of the commands and tools in this chapter can be used in either environment. 22 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
  • 45.
    Figure 2-3. Jupyterexample view for an existing notebook Tab Completion On the surface, the IPython shell looks like a cosmetically different version of the standard terminal Python interpreter (invoked with python). One of the major improvements over the standard Python shell is tab completion, found in many IDEs or other interactive computing analysis environments. While entering expressions in the shell, pressing the Tab key will search the namespace for any variables (objects, functions, etc.) matching the characters you have typed so far and show the results in a convenient drop-down menu: In [1]: an_apple = 27 In [2]: an_example = 42 In [3]: an<Tab> an_apple an_example any In this example, note that IPython displayed both of the two variables I defined, as well as the built-in function any. Also, you can also complete methods and attributes on any object after typing a period: 2.2 IPython Basics | 23
  • 46.
    In [3]: b= [1, 2, 3] In [4]: b.<Tab> append() count() insert() reverse() clear() extend() pop() sort() copy() index() remove() The same is true for modules: In [1]: import datetime In [2]: datetime.<Tab> date MAXYEAR timedelta datetime MINYEAR timezone datetime_CAPI time tzinfo Note that IPython by default hides methods and attributes starting with underscores, such as magic methods and internal “private” methods and attributes, in order to avoid cluttering the display (and confusing novice users!). These, too, can be tab-completed, but you must first type an underscore to see them. If you prefer to always see such methods in tab completion, you can change this setting in the IPython configuration. See the IPython documenta‐ tion to find out how to do this. Tab completion works in many contexts outside of searching the interactive name‐ space and completing object or module attributes. When typing anything that looks like a file path (even in a Python string), pressing the Tab key will complete anything on your computer’s filesystem matching what you’ve typed. Combined with the %run command (see “The %run Command” on page 512), this functionality can save you many keystrokes. Another area where tab completion saves time is in the completion of function keyword arguments (including the = sign!). See Figure 2-4. Figure 2-4. Autocomplete function keywords in a Jupyter notebook We’ll have a closer look at functions in a little bit. 24 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
  • 47.
    Introspection Using a questionmark (?) before or after a variable will display some general infor‐ mation about the object: In [1]: b = [1, 2, 3] In [2]: b? Type: list String form: [1, 2, 3] Length: 3 Docstring: Built-in mutable sequence. If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified. In [3]: print? Docstring: print(value, ..., sep=' ', end='n', file=sys.stdout, flush=False) Prints the values to a stream, or to sys.stdout by default. Optional keyword arguments: file: a file-like object (stream); defaults to the current sys.stdout. sep: string inserted between values, default a space. end: string appended after the last value, default a newline. flush: whether to forcibly flush the stream. Type: builtin_function_or_method This is referred to as object introspection. If the object is a function or instance method, the docstring, if defined, will also be shown. Suppose we’d written the following function (which you can reproduce in IPython or Jupyter): def add_numbers(a, b): """ Add two numbers together Returns ------- the_sum : type of arguments """ return a + b Then using ? shows us the docstring: In [6]: add_numbers? Signature: add_numbers(a, b) Docstring: Add two numbers together Returns ------- the_sum : type of arguments 2.2 IPython Basics | 25
  • 48.
    File: <ipython-input-9-6a548a216e27> Type: function ?has a final usage, which is for searching the IPython namespace in a manner similar to the standard Unix or Windows command line. A number of characters combined with the wildcard (*) will show all names matching the wildcard expression. For example, we could get a list of all functions in the top-level NumPy namespace containing load: In [9]: import numpy as np In [10]: np.*load*? np.__loader__ np.load np.loads np.loadtxt 2.3 Python Language Basics In this section, I will give you an overview of essential Python programming concepts and language mechanics. In the next chapter, I will go into more detail about Python data structures, functions, and other built-in tools. Language Semantics The Python language design is distinguished by its emphasis on readability, simplic‐ ity, and explicitness. Some people go so far as to liken it to “executable pseudocode.” Indentation, not braces Python uses whitespace (tabs or spaces) to structure code instead of using braces as in many other languages like R, C++, Java, and Perl. Consider a for loop from a sorting algorithm: for x in array: if x < pivot: less.append(x) else: greater.append(x) A colon denotes the start of an indented code block after which all of the code must be indented by the same amount until the end of the block. Love it or hate it, significant whitespace is a fact of life for Python programmers. While it may seem foreign at first, you will hopefully grow accustomed to it in time. 26 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
  • 49.
    I strongly recommendusing four spaces as your default indentation and replacing tabs with four spaces. Many text editors have a setting that will replace tab stops with spaces automatically (do this!). IPython and Jupyter notebooks will automatically insert four spaces on new lines following a colon and replace tabs by four spaces. As you can see by now, Python statements also do not need to be terminated by semicolons. Semicolons can be used, however, to separate multiple statements on a single line: a = 5; b = 6; c = 7 Putting multiple statements on one line is generally discouraged in Python as it can make code less readable. Everything is an object An important characteristic of the Python language is the consistency of its object model. Every number, string, data structure, function, class, module, and so on exists in the Python interpreter in its own “box,” which is referred to as a Python object. Each object has an associated type (e.g., integer, string, or function) and internal data. In practice this makes the language very flexible, as even functions can be treated like any other object. Comments Any text preceded by the hash mark (pound sign) # is ignored by the Python interpreter. This is often used to add comments to code. At times you may also want to exclude certain blocks of code without deleting them. One solution is to comment out the code: results = [] for line in file_handle: # keep the empty lines for now # if len(line) == 0: # continue results.append(line.replace("foo", "bar")) Comments can also occur after a line of executed code. While some programmers prefer comments to be placed in the line preceding a particular line of code, this can be useful at times: print("Reached this line") # Simple status report 2.3 Python Language Basics | 27
  • 50.
    Function and objectmethod calls You call functions using parentheses and passing zero or more arguments, optionally assigning the returned value to a variable: result = f(x, y, z) g() Almost every object in Python has attached functions, known as methods, that have access to the object’s internal contents. You can call them using the following syntax: obj.some_method(x, y, z) Functions can take both positional and keyword arguments: result = f(a, b, c, d=5, e="foo") We will look at this in more detail later. Variables and argument passing When assigning a variable (or name) in Python, you are creating a reference to the object shown on the righthand side of the equals sign. In practical terms, consider a list of integers: In [8]: a = [1, 2, 3] Suppose we assign a to a new variable b: In [9]: b = a In [10]: b Out[10]: [1, 2, 3] In some languages, the assignment if b will cause the data [1, 2, 3] to be copied. In Python, a and b actually now refer to the same object, the original list [1, 2, 3] (see Figure 2-5 for a mock-up). You can prove this to yourself by appending an element to a and then examining b: In [11]: a.append(4) In [12]: b Out[12]: [1, 2, 3, 4] Figure 2-5. Two references for the same object Understanding the semantics of references in Python, and when, how, and why data is copied, is especially critical when you are working with larger datasets in Python. 28 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
  • 51.
    Assignment is alsoreferred to as binding, as we are binding a name to an object. Variable names that have been assigned may occasionally be referred to as bound variables. When you pass objects as arguments to a function, new local variables are created referencing the original objects without any copying. If you bind a new object to a variable inside a function, that will not overwrite a variable of the same name in the “scope” outside of the function (the “parent scope”). It is therefore possible to alter the internals of a mutable argument. Suppose we had the following function: In [13]: def append_element(some_list, element): ....: some_list.append(element) Then we have: In [14]: data = [1, 2, 3] In [15]: append_element(data, 4) In [16]: data Out[16]: [1, 2, 3, 4] Dynamic references, strong types Variables in Python have no inherent type associated with them; a variable can refer to a different type of object simply by doing an assignment. There is no problem with the following: In [17]: a = 5 In [18]: type(a) Out[18]: int In [19]: a = "foo" In [20]: type(a) Out[20]: str Variables are names for objects within a particular namespace; the type information is stored in the object itself. Some observers might hastily conclude that Python is not a “typed language.” This is not true; consider this example: In [21]: "5" + 5 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-21-7fe5aa79f268> in <module> ----> 1 "5" + 5 TypeError: can only concatenate str (not "int") to str 2.3 Python Language Basics | 29
  • 52.
    In some languages,the string '5' might get implicitly converted (or cast) to an integer, thus yielding 10. In other languages the integer 5 might be cast to a string, yielding the concatenated string '55'. In Python, such implicit casts are not allowed. In this regard we say that Python is a strongly typed language, which means that every object has a specific type (or class), and implicit conversions will occur only in certain permitted circumstances, such as: In [22]: a = 4.5 In [23]: b = 2 # String formatting, to be visited later In [24]: print(f"a is {type(a)}, b is {type(b)}") a is <class 'float'>, b is <class 'int'> In [25]: a / b Out[25]: 2.25 Here, even though b is an integer, it is implicitly converted to a float for the division operation. Knowing the type of an object is important, and it’s useful to be able to write functions that can handle many different kinds of input. You can check that an object is an instance of a particular type using the isinstance function: In [26]: a = 5 In [27]: isinstance(a, int) Out[27]: True isinstance can accept a tuple of types if you want to check that an object’s type is among those present in the tuple: In [28]: a = 5; b = 4.5 In [29]: isinstance(a, (int, float)) Out[29]: True In [30]: isinstance(b, (int, float)) Out[30]: True Attributes and methods Objects in Python typically have both attributes (other Python objects stored “inside” the object) and methods (functions associated with an object that can have access to the object’s internal data). Both of them are accessed via the syntax obj.attribute_name: In [1]: a = "foo" In [2]: a.<Press Tab> 30 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
  • 53.
    capitalize() index() isspace()removesuffix() startswith() casefold() isprintable() istitle() replace() strip() center() isalnum() isupper() rfind() swapcase() count() isalpha() join() rindex() title() encode() isascii() ljust() rjust() translate() endswith() isdecimal() lower() rpartition() expandtabs() isdigit() lstrip() rsplit() find() isidentifier() maketrans() rstrip() format() islower() partition() split() format_map() isnumeric() removeprefix() splitlines() Attributes and methods can also be accessed by name via the getattr function: In [32]: getattr(a, "split") Out[32]: <function str.split(sep=None, maxsplit=-1)> While we will not extensively use the functions getattr and related functions hasattr and setattr in this book, they can be used very effectively to write generic, reusable code. Duck typing Often you may not care about the type of an object but rather only whether it has certain methods or behavior. This is sometimes called duck typing, after the saying “If it walks like a duck and quacks like a duck, then it’s a duck.” For example, you can verify that an object is iterable if it implements the iterator protocol. For many objects, this means it has an __iter__ “magic method,” though an alternative and better way to check is to try using the iter function: In [33]: def isiterable(obj): ....: try: ....: iter(obj) ....: return True ....: except TypeError: # not iterable ....: return False This function would return True for strings as well as most Python collection types: In [34]: isiterable("a string") Out[34]: True In [35]: isiterable([1, 2, 3]) Out[35]: True In [36]: isiterable(5) Out[36]: False 2.3 Python Language Basics | 31
  • 54.
    Imports In Python, amodule is simply a file with the .py extension containing Python code. Suppose we had the following module: # some_module.py PI = 3.14159 def f(x): return x + 2 def g(a, b): return a + b If we wanted to access the variables and functions defined in some_module.py, from another file in the same directory we could do: import some_module result = some_module.f(5) pi = some_module.PI Or alternately: from some_module import g, PI result = g(5, PI) By using the as keyword, you can give imports different variable names: import some_module as sm from some_module import PI as pi, g as gf r1 = sm.f(pi) r2 = gf(6, pi) Binary operators and comparisons Most of the binary math operations and comparisons use familiar mathematical syntax used in other programming languages: In [37]: 5 - 7 Out[37]: -2 In [38]: 12 + 21.5 Out[38]: 33.5 In [39]: 5 <= 2 Out[39]: False See Table 2-1 for all of the available binary operators. 32 | Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks
  • 55.
    Random documents withunrelated content Scribd suggests to you:
  • 56.
    It looked fora victim, or victims, for its fear. Once upon a time, witches were burned to ease the terrors of ignorance, and plague- spreaders were executed in times of pestilence to assure everybody that now the plague would cease since somebody had been killed for spreading it. Organizations came into being with the official and impassioned purpose of seeing that space research ceased immediately. Even more violent organizations demanded the punishment of everybody who had ever considered space travel a desirable thing. Congress cut some hundreds of millions from a guided-missile-space- exploration appropriation as a starter. A poor devil of a crackpot in Santa Monica, California, revealed what he said was a spaceship he'd built in his back yard to answer the signals from M-387. He intended to charge a quarter admission to inspect it, using the money to complete the drive apparatus. The thing was built of plywood and could not conceivably lift off the ground, but a mob wrecked his house, burned the puerile "spaceship" and would have lynched its builder if they'd thought to look in a cellar vegetable closet. Other crackpots who were more sensitive to public feelings announced the picking up of messages addressed to the distant Something. The messages, said this second class of crackpot, were reports from spies who had been landed on Earth from flying saucers during the past few decades. They did not explain how they were able to translate them. A rush of flying-saucer sightings followed inevitably —alleged to be landing-parties from M-387—and in Peoria, Illinois, a picnicking party sighted an unidentified flying object shaped like a soup spoon, the handle obviously being its tail. Experienced newspapermen anticipated reports of the sighting of unidentified flying objects shaped like knives and forks as soon as somebody happened to think of it. Sandy called a conference on the subject of security. She did not look well, nowadays. She worried. Other people thought about the messages from space, but Sandy had to think of something more concrete. Six months earlier, the construction going on within a plaster of Paris mould would have been laughed at, tolerantly, and
  • 57.
    some hopeful peoplemight have been respectful about it. But now it was something utterly intolerable to public opinion. Newspapers who'd lost circulation by talking sanely about space travel now got it back by denouncing the people who'd answered the first broadcast. And naturally, with the whole idea of outer space agitatedly disapproved, everybody connected with it was suspected of subversion. "A reporter called up today," said Sandy. "He said he'd like to do a feature story on Burke Development's new research triumph—the new guided missile that flew thirty miles and froze everything around where it landed. I said it fell out of an aeroplane and the last completed project was for Interiors, Inc. Then he said that he'd been talking to one of Mr. Holmes' men and the man said something terrific was under way." Burke looked uneasy. Holmes said uncomfortably, "There's no law against what we're building, but somebody may introduce a bill in Congress any day." "That would be reasonable under other circumstances. There's a time for things to be discovered. They shouldn't be accomplished too soon. But the time for the ship out there is right now!" Burke said. Pam raised her eyebrows. "Yes?" "Those signals have to be checked up on," explained Burke. "It's necessary now. But it could have been bad if our particular enterprise had started, say, two years ago. Just think what would have happened if atomic fission had been worked out in peacetime ten years before World War Two! Scientific discoveries were published then as a matter of course. Everybody'd have known how to make atom bombs. Hitler would have had them, and so would Mussolini. How many of us would be alive?" Sandy interrupted, "The reporter wants to do a feature story on what Burke Development is making. I said you were working on a bomb shelter for quantity production. He asked if the rocket you
  • 58.
    shot off throughthe construction-shed wall was part of it. I said there'd been no rocket fired. He didn't believe me." "Who would?" asked Holmes. "Hmmmmm," said Burke. "Tell him to come look at what we're doing. The ship can pass for a bomb shelter. The wall-garden units make sense. I'm going to dig a big hole in the morning to test the drive-shaft in. It'll look like I intend to bury everything. A bomb shelter should be buried." "You mean you'll let him inside?" demanded Sandy. "Sure!" said Burke. "All inventors are expected to be idiots. A lot of them are. He'll think I'm making an impossibly expensive bomb shelter, much too costly for a private family to buy. It will be typical of the inventive mind as reporters think of it. Anyhow, everybody's always willing to believe other people fools. That'll do the trick!" Pam said blandly, "Sandy and I live in a boardinghouse, Joe. You don't ask about such things, but an awfully nice man moved in a couple of days ago—right after that shaft got away and went flying thirty miles all by itself. The nice man has been trying to get acquainted." Holmes growled, and looked both startled and angry when he realized it. Pam added cheerfully, "Most evenings I've been busy, but I think I'll let him take me to the movies. Just so I can make us all out to be idiots," she added. "I'll make the hole big enough to be convincing," said Burke. "Sandy, you make inquiries for a rigger to lift and move the bomb shelter into its hole when it's ready. If we seem about to bury it, nobody should suspect us of ambitions they won't like." "Why the hole, really?" asked Sandy. "To put the shaft in," said Burke. "I've got to get it under control or it won't be anything more than a bomb shelter."
  • 59.
    Keller, the instrumentman, had listened with cheerful interest and without speaking a word. Now he made an indefinite noise and looked inquiringly at Burke. Burke said, explanatorily, "The shaft seems to be either on or off—either a magnet that doesn't quite magnetize, or something that's hell on wheels. It flew thirty miles without enough power supplied to it to make it quiver. That power came from somewhere. I think there's a clue in the fact that it froze everything around where it landed, in spite of traveling fast enough to heat up from air-friction alone. I've got some ideas about it." Keller nodded. Then he said urgently, "Broadcast?" Burke frowned, and turned to Sandy. "That's part of the broadcast from space that changes—is it still changing?" "Still changing," said Sandy. "I didn't think to ask you to keep a check on that. Thanks for thinking of it, Sandy. Maybe someday I can make up to you for what you've been going through." "I doubt it very much," said Sandy grimly. "I'll call the reporter back." She waited for them to leave. When they'd gone, she moved purposefully toward the telephone. Pam said, "Did you hear that growl when I said I'd go to the movies with somebody else? I'm having fun, Sandy!" "I'm not," said Sandy. "You're too efficient," the younger sister said candidly. "You're indispensable. Burke couldn't begin to be able to put this thing through without you. And that's the trouble. You should be irresistible instead of essential." "Not with Joe," said Sandy bitterly. She picked up the telephone to call the newspaper. Pam looked very, very reflective.
  • 60.
    There was alarge deep pit close by the plaster mould when the reporter came next afternoon. A local rigger had come a little earlier and was still there, estimating the cost for lifting up the contents of the mould and lowering it precisely in place to be buried as a bomb shelter under test should be. It was a fortunate coincidence, because the reporter brought two other men who he said were civilian defense officials. They had come to comment on the quality of the bomb shelter under development. It was not too convincing a statement. When they left, Burke was not happy. They knew too much about the materials and equipment he'd ordered. One man had let slip the fact that he knew about the very expensive computer Burke had bought. It could have no conceivable use in a bomb shelter. Both men painstakingly left it to Burke to mention the thirty-mile flight of a bronze object which arrived coated with frost of such utter frigidity that it appeared to be liquid-air snow instead of water-ice. Burke did not mention it. He was excessively uneasy when the reporter's car took them away. He went into the office. Pam was in the midst of a fit of the giggles. "One of them," she explained, "is the nice man who moved into the boardinghouse. He wants to take me to the movies. Did you notice that they came when it ought to be my lunchtime? He asked when I went to lunch ..." Holmes came in. He scowled. "One of my men says that one of those characters has been buying him drinks and asking questions about what we're doing." Burke scowled too. "We can let your men go home in three days more." "I'm going to start loading up," Holmes announced abruptly. "You don't know how to stow stuff. You're not a yachtsman." "I haven't got the shaft under control yet," said Burke.
  • 61.
    "You'll get it,"grunted Holmes. He went out. Pam giggled again. "He doesn't want me to go to the movies with the nice man from Security," she told Burke. "But I think I'd better. I'll let him ply me with popcorn and innocently let slip that Sandy and I know you've been warned that bomb shelters won't find a mass market unless they sell for less than the price of an extra bathroom. But if you want to go broke we don't care." "Give me three days more," said Burke harassedly. "Well try," said Sandy suddenly. "Pam can fix up a double date with one of her friend's friends and we'll both work on them." Burke frowned absorbedly and went out. Sandy looked indignant. He hadn't protested. Burke got Holmes' four workmen out of the ship and had them help him roll the bronze shaft to the pit and let it down onto a cradle of timbers. Now if it moved it would have to penetrate solid earth. The most trivial of computations showed that when the bronze shaft had flown thirty miles, it hadn't done it on the energy of a condenser shorted through its coils. The energy had come from somewhere else. Burke had an idea where it was. Presently he verified it. The cores and windings he'd adapted from a transparent hand-weapon seen in an often-repeated dream—those cores and windings did not make electromagnets. They made something for which there was not yet a name. When current flows through a standard electromagnet, the poles of its atoms are more or less aligned. They tend to point in a single direction. But in this arrangement of wires and iron no magnetism resulted, yet, the random motion of the atoms in their framework of crystal structure was coördinated. In any object above absolute zero all the atoms and their constituent electrons and nuclei move constantly in all directions. In such a core as Burke had formed and repeated along the shaft's length, they all tried to move in one direction at the same
  • 62.
    time. Simultaneously, aterrific surge of current appeared in the coils. A high-speed poleward velocity developed in all the substance of the shaft. It was the heat-energy contained in the metal, all turned instantly into kinetic energy. And when its heat-energy was transformed to something else, the shaft got cold. Once this fact was understood, control was easy. A single variable inductance in series with the windings handled everything. In a certain sense, the gadget was a magnet with negative—minus—self- inductance. When a plus inductance in series made the self- inductance zero, neither plus nor minus, the immensely powerful device became docile. A small current produced a mild thrust, affecting only part of the random heat-motion of atoms and molecules. A stronger current produced a greater one. The resemblance to an electromagnet remained. But the total inductance must stay close to zero or utterly violent and explosive forward thrust would develop, and it was calculable only in thousands of gravities. Burke had worked for three weeks to make the thing, but he developed a control system for it in something under four hours. That same night they got the bronze shaft into the ship. It fitted perfectly into the place left for it. Burke knew now exactly what he was doing. He set up his controls. He was able to produce so minute a thrust that the lath-and-plaster mould merely creaked and swayed. But he knew that he could make the whole mass surge unstoppably from its place. Holmes sent his workmen home. Sandy and Pam went to the movies with two very nice men who pumped them deftly of all sorts of erroneous information about Burke and Holmes and Keller and what they were about. The nice men did not believe that information, but they did believe that Sandy and Pam believed it. For themselves, the combination of an object made by Burke which flew thirty miles plus the presence of Holmes, who built plastic yachts, and the arrival of Keller to adjust instruments of which they had a complete list—these things could not be overlooked. But they did feel sorry for two nice
  • 63.
    and not over-brightgirls who might be involved in very serious trouble. Holmes and Burke installed directional controls, wiring, recording instruments, etc. Stores and water and oxygen, for emergency use only, went into the lath-and-plaster construction. Holmes took a hammer and chisel and painstakingly cracked the mould so that the top half could be lifted off, leaving the bottom half exposed to the open air and sky. Then the broadcast from space cut off. It had been coming continuously for something like five weeks; one sharp, monotonous note every two seconds, with a longer, fluting broadcast every seventy-nine minutes. Now a third, new message began. It was yet another grouping of the musical tones, with a much longer interval of specific crackling sounds. Keller had adjusted every instrument and zestfully retested them over and over. Burke asked him to see if the third space message compared in any way with the second. Keller put them through a hook-up of instruments, beaming to himself, and the answer began to appear. Newspapers burst into new headlines. "Ultimatum from Space" they thundered. "Threats from Alien Space Travelers." And as they presented the situation it seemed believable that the third message from the void was a threat. The first had been a call, requiring an answer. When the answer went out from Earth, a second message replaced the call. It contained not only flute tones which might be considered to represent words, but cracklings which might be the equivalent of numbers. The continuous beepings between repetitions of the second message were plainly a directional signal to be followed to the message source. In this context, the newspapers furiously asserted that the third message was a threat. The first had been merely a summons, the second had been a command to repair to the signaling entities, and
  • 64.
    the third wasa stern reiteration of the command, reinforced by threats. The human race does not take kindly to threats, especially when it feels helpless. In the United States, there was such explosive resentment as to require spread-eagle oratory by all public figures. The President declared that every space missile in store had been fitted with atomic-fusion warheads and that any alien spacecraft which appeared in American skies would be shot down immediately. Congress reported out of committee a bill for rocket weapons which was stalled for six days because every senator and representative wanted to make a speech in its favor. It was the largest appropriation bill ever passed by Congress, which less than five weeks before had cut two hundred millions out of a guided-missile- space-exploration budget. And in Europe there was frenzy. For Burke and Holmes and Sandy and Pam and the smiling, inarticulate Keller, the matter was deadly serious. Fury such as the public felt constituted a witch-hunt in itself. Suspicious private persons overwhelmed the FBI and the Space Agency with information about characters they were sure were giving military secrets to the space travelers on M-387. There were reports of aliens skulking about American cities wearing luxuriant whiskers and dark glasses to conceal their non-human features. Artists, hermits, and mere amateur beard-growers found it wise to shave, and spirit mediums, fortunetellers and, in the South, herb doctors reaped harvests by the sale of ominous predictions and infallible advice on how to escape annihilation from space. And Burke Development, Inc., was building something that neither Civilian Defense nor the FBI believed was a bomb shelter. The three days Burke had needed passed. A fourth. He and Holmes practically abandoned sleep to get everything finished inside the plaster mould. Keller happily completed his graphs and took them to Burke. They showed that the cracklings, which presumably meant
  • 65.
    numbers, had beenexpanded. What they said was now told on a new scale. If the numbers had meant months or years, they now meant days and hours. If they had meant millions of miles, they now meant thousands or hundreds. Burke was struggling with these implications when there was a tapping at the air-lock, through which all entry and egress from the ship took place. Holmes opened the inner door. Sandy and Pam crawled through the lock which lay on its side instead of upright. Sandy looked at Burke. Pam said amiably, "We figured the job was about finished and we wanted to see it. How do you fasten this door?" Holmes showed her. The vessel that had been built inside the mould did not seem as large as the outside structure promised. It looked queer, too, because everything lay on its side. There were two compartments with a ladder between, but the ladder lay on the floor. The wall-gardens looked healthy under the fluorescent lamps which kept the grass and vegetation flourishing. There were instrument dials everywhere. Sandy went to Burke's side. "We're all but done," said Burke tiredly, "and Keller's just about proved what the signals are." "Can we go with you?" asked Sandy. "Of course not," said Burke. "The first message was a distress call. It had to be. Only in a distress call would somebody go into details so any listener would know it was important. It called for help and said who needed it, and why, and where." Pam turned to Holmes. "Can that air-lock be opened from outside?" It couldn't. Not when it was fastened, as now. "Somebody answered that call from Earth," said Burke heavily, "and the second message told more about what was wrong. The clickings, we think, are numbers that told how long help could be waited for,
  • 66.
    or something onthat order. And then there was a beacon signal meant to lead whoever was coming to help to that place." Keller smiled pleasantly at Pam. He made an electrical connection and zestfully checked the result. "Now there's a third message," said Burke. "Time's running out for whoever needs whatever help is called for. The clickings that seem to be numbers have changed. The—what you might call the scale of reportage—is new. They're telling us just how long they can wait or just how bad their situation is. They're saying that time is running out and they're saying, 'Hurry!'" There was a thumping sound. Only Sandy and Pam looked unsurprised. Burke stared. Sandy said firmly, "That's the police, Joe. We've been going to the movies with people who want to talk about you. Yesterday one of them confided to us that you were dangerous, and since he told us to get away from the office, we did. There might be shooting. He tipped us a little while ago." Burke swore. There were other thumpings. Louder ones. They were on the air-lock door. "If you try to put us out," said Sandy calmly, "you'll have to open that door and they'll try to fight their way in—and then where'll you be?" Keller turned from the checking of the last instrument He looked at the others with excited eyes. He waited. "I don't know what they can arrest you for," said Sandy, "and maybe they don't either, unless it's unauthorized artillery practice. But you can't put us out! And you know darn well that unless you do something they'll chop their way in!" Burke said, "Dammit, they're not going to stop me from finding out if this thing works!"
  • 67.
    He squirmed ina chair which had its base firmly fastened to a wall and began to punch buttons. "Hold fast!" he said angrily. "At least we'll see...." There were loud snapping sounds. There were creakings. The room stirred. It turned in a completely unbelievable fashion. Violent crashings sounded outside. Abruptly, a small television screen before Burke acquired an image. It was of the outside world reeling wildly. Holmes seized a hand-hold and grabbed Pam. He kept her from falling as a side wall became the floor, and what had been the floor became a side wall, with the ceiling another. It seemed that all the cosmos changed, though only walls and floors changed places. Suddenly everything seemed normal but new. The surface underfoot was covered with a rubber mat. The hydroponic wall-garden sections were now vertical. Burke sat upright, and something over his head rotated a half-turn and was still. But it became coated with frost. More crashes. More small television screens acquired images. They showed the office of Burke Development, Inc., against a tilted landscape. The landscape leveled. Another showed the construction shed. One showed cloud formations, very bright and distinct. And two others showed a small, armed, formidable body of men instinctively backing away from the outside television lens. "So far," said Burke, "it works. Now—" There was a sensation as of a rapidly rising elevator. Such a sensation usually lasts for part of a second. This kept on. One of the six television screens suddenly showed a view of Burke Development from straight overhead. The buildings and men and the four-acre enclosure dwindled rapidly. They were very tiny indeed and nearly all of the town was in the camera's field of vision when a vague whiteness, a cloud, moved in between. "The devil!" said Burke. "Now they'll alert fighter planes and rocket installations and decide that we're either traitors or aliens in disguise and better be shot down. I think we simply have to go on!"
  • 68.
    Keller made gestures,his eyes bright. Burke looked worried. "It shouldn't take more than ten minutes to get a Nike aloft and after us. We must have been picked up by radar already.... We'll head north. We have to, anyhow." But he was wrong about the ten minutes. It was fifteen before a rocket came into view, pouring out enormous masses of drive- fumes. It flung itself toward the ship.
  • 69.
    Chapter 5 From asufficient height and a sufficient distance, the rocket's repeated attacks must have appeared like the strikings and twistings of a gigantic snake. It left behind it a writhing trail of fumes which was convincingly serpentine. It climbed and struck, and climbed and struck, like a monstrous python flinging itself furiously at some invisible prey. Six, seven, eight times it plunged frenziedly at the minute egg-shaped ship which scuttled for the heavens. Each time it missed and writhed about to dart again. Then its fuel gave out and for all intents and purposes it ceased to exist. The thick, opaque trail it left behind began to dissipate. The path of vapor scattered. It spread to rags and tatters of unsubstantiality through which the rocket plummeted downward in the long fall which is a spent rocket's ending. Burke cautiously cut down the drive and awkwardly turned the ship on its side, heading it toward the north. The state of things inside the ship was one of intolerable tenseness. "I'm a new driver," said Burke, "and that was a tough bit of driving to do." He glanced at the exterior-pressure meter. "There's no air outside to register. We must be fifty or sixty miles high and maybe still rising. But we're not leaking air." Actually the plastic ship was eighty miles up. The sunlit world beneath it showed white patches of cloud in patterns a meteorologist would have found interesting. Burke could see the valley of the St. Lawrence River between the white areas. But the Earth's surface was curiously foreshortened. What was beneath seemed utterly flat, and at the edge of the world all appeared distorted and unreal. Holmes, still pale, asked, "How'd we get away from that rocket?"
  • 70.
    "We accelerated," saidBurke. "It was a defensive rocket. It was designed to knock down jet bomb carriers or ballistic missiles which travel at a constant speed. Target-seeking missiles can lock onto the radar echo from a coasting ship, or one going at its highest speed because their computers predict where their target, traveling at constant speed, can be intercepted. We were never there. We were accelerating. Missile-guidance systems can't measure acceleration and allow for it. They shouldn't have to." Four of the six television screens showed dark sky with twinkling lights in it. On one there was the dim outline of the sun, reversed to blackness because its light was too great to be registered in a normal fashion. The other screen showed Earth. There was a buzzing, and Keller looked at Burke. "Rocket?" asked Burke. Keller shook his head. "Radar?" Keller nodded. "The DEW line, most likely," said Burke in a worried tone. "I don't know whether they've got rockets that can reach us. But I know fighter planes can't get this high. Maybe they can throw a spread of air-to-air rockets, though.... I don't know their range." Sandy said unsteadily, "They shouldn't do this to us! We're not criminals! At least they should ask us who we are and what we're doing!" "They probably did," said Burke, "and we didn't answer. See if you can pick up some voices, Keller." Keller twirled dials and set indicators. Voices burst into speech. "Reporting UFO sighted extreme altitude coördinates—First rocket exhausted fuel in multiple attacks and fell, sir." Another voice, very brisk, "Thirty-second squadron, scramble! Keep top altitude and get under it. If it descends within range, blast it!" Another voice said crisply, "Coördinates three-seven Jacob, one-nine Alfred...." Keller turned the voices down to mutters because they were useless.
  • 71.
    Burke said, "Hell!We ought to land somewhere and check over the ship. Keller, can you give me a microphone and a wavelength somebody will be likely to pick up?" Keller shrugged and picked up masses of wire. He began to work on an as yet unfinished wiring job. Evidently, the ship was not near enough to completion to be capable of a call to ground. It had taken off with many things not finished. Burke, at the controls, found it possible to think of a number of items that should have been examined exhaustively before the ship left the mould in which it had been made. He worried. Pam said in a strange voice, "I thought I might rate as a heroine for stowing away on this voyage, but I didn't think we'd have to dodge rockets and fighter planes to get away!" There was no comment. "I'm a beginner at navigation," said Burke a little later, more worried than before. "I know we have to go out over the north magnetic pole, but how the hell do I find that?" Keller beamed. He dropped his wiring job and went to the imposing bank of electronic instruments. He set one, and then another, and then a third. The action, of course, was similar to that of an airline pilot when he tunes in broadcasting stations in different cities. From each, a directional reading can be taken. Where the lines of direction cross, there the transport plane must be. But Keller turned to shortwave transmitters whose transmissions could be picked up in space. Presently, eighty miles high, he wrote a latitude and longitude neatly on a slip of paper, wrote "North magnetic pole 93°W, 71°N, nearly," and after that a course. "Hm," said Burke. "Thanks." Then there was a relative silence inside the ship. Only a faint mutter of voices came from assorted speakers that Keller had first turned on and then turned down, and a small humming sound from a gyro. When they listened, they could also hear a high sweet musical tone. Burke shifted this control here, and that control there, and lifted his
  • 72.
    hands. The shipmoved on steadily. He checked this and that and the other thing. He was pleased. But there were innumerable things to be checked. Holmes went down the ladder to the other compartment below. There were details to be looked into there, too. One of the screens portrayed Earth from a height of seventy miles instead of eighty, now. Others pictured the heavens, with very many stars shining unwinkingly out of blackness. Keller got at his wires again and resumed the work of installing a ship-to-ground transmitter and its connection to an exterior-reflecting antenna. Sandy watched Burke as he moved about, testing one thing after another. From time to time he glanced at the screens which had to serve in the place of windows. Once he went back to the control- board and changed an adjustment. "We dropped down ten miles," he explained to Sandy. "And I suspect we're being trailed by jets down below." Holmes meticulously inspected all storage places. He'd packed them when the ship lay on her side. Burke read an instrument and said with satisfaction, "We're running on sunshine!" He meant that in empty space certain aluminum plates on the outside of the hull were picking up heat from the naked sun. The use of the drive-shaft lowered its temperature. Metallic connection with the outside plates conducted heat inward from those plates. The drive-shaft was cold to the touch, but it could drop four hundred degrees Fahrenheit before it ceased to operate as a drive. It was gratifying that it had cooled so little up to this moment. Later Keller tapped Burke on the shoulder and jerked his thumb upward. "We go up now?" asked Burke. Keller nodded. Burke carefully swung the ship to aim vertically. The views of solid Earth slid from previous screens to new ones. The stars and the dark object which was the sun also moved across their
  • 73.
    screens to vanishand reappear on others. Then Burke touched the drive-control. Once more they had the sensation of being in a rising elevator. And at just that moment spots appeared on the barren, icy, totally flattened terrain below. They were rocket-trails from target-seeking missiles which had reached the area of the north magnetic pole by herculean effort and were aimed at the radar-detected little ship by the heavy planes that carried them. From the surface of the Earth, it would have seemed that monstrous columns of foaming white appeared and rose with incredible swiftness toward the heavens. They reached on, up and up and up, seeming to draw closer together as they became smaller in the distance, until all eight of them seemed to merge into a single point of infinite whiteness in the sunshine above the world's blanket of air. But nothing happened. Nothing. The ship did not accelerate as fast as the rockets, but it had started first and it kept up longer. It went scuttling away to emptiness and the bottoms of the towers of rocket- smoke drifted away and away over the barren landscape all covered with ice and snow. When Earth looked like a huge round ball that did not even seem very near, with a night side that was like a curious black chasm among the stars, the atmosphere of tension inside the ship diminished. Keller completed his wiring of a ship-to-ground transmitter. He stood up, brushed off his hands and beamed. The little ship continued on. Its temperature remained constant. The air in it smelled of growing green stuff. It was moist. It was warm. Keller turned a knob and a tiny, beeping noise could be heard. Dials pointed, precisely. "We couldn't go on our true course earlier," Burke told Sandy, "because we had to get out beyond the Van Allen bands of cosmic particles in orbit around the world. Pretty deadly stuff, that radiation! In theory, though, all we have to do now is swing onto our proper
  • 74.
    course and followthose beepings home. We ought to be in harmless emptiness here. Do you want to call Washington?" She stared. "We need help to navigate—or astrogate," said Burke. "Call them, Sandy. I'll get on the wire when a general answers." Sandy went jerkily to the transmitter just connected. She began to speak steadily, "Calling Earth! Calling Earth! The spaceship you just shot all those rockets at is calling! Calling Earth!" It grew monotonous, but eventually a suspicious voice demanded further identification. It was a peculiar conversation. The five in the small spaceship were considered traitors on Earth because they had exercised the traditional right of American citizens to go about their own business unhindered. It happened that their private purposes ran counter to the emotional state of the public. Hence voices berated Sandy and furiously demanded that the ship return immediately. Sandy insisted on higher authority and presently an official voice identified itself as general so-and-so and sternly commanded that the ship acknowledge and obey orders to return to Earth. Burke took the transmitter. "My name's Burke," he said mildly. "If you can arrange some sort of code, I'll tell you how to find the plans, and I'll give you the instructions you'll need to build more ships like this. They can follow us out. I think they should. I believe that this is more important than anything else you can think of at the moment." Silence. Then more sternness. But ultimately the official voice said, "I'll get a code expert on this." Burke handed the microphone to Sandy. "Take over. We've got to arrange a cipher so nobody who listens in can learn about official business. We may use a social security number for a key, or the name of your maiden aunt's first sweetheart, or something we know and Washington can find out but
  • 75.
    that nobody elsecan. Hm. Your last year's car-license number might be a starter. They can seal up the records on that!" Sandy took over the job. What was transmitted to Earth, of course, could be picked up anywhere over an entire hemisphere. Somebody would assuredly pass on what they overheard to, say, nations the United States would rather have behind it than ahead of it in space- travel equipment. Burke's suggestion of a cipher and instructions changed his entire status with authority. They'd rather have had him come back, but this was second best, and they took it. From Burke's standpoint it was the only thing to do. He had no official standing to lend weight to his claim that lunatic magnet-cores with insanely complicated windings would amount to space-drive units. If he returned, in the nature of things there would be a long delay before mere facts could overcome theoreticians' convictions. But now he was forty-five thousand miles out from Earth. He had changed course to home on the beeping signals from M-387, was accelerating at one full gravity and had been doing so for forty- five minutes. And the small ship already had a velocity of twenty miles per second and was still going up. All the rockets that men had made, plus the Russian manned-probe drifting outward now, had become as much outdated for space travel as flint arrowheads are for war. Burke returned to the microphone when Sandy left it to get a pencil and paper. "By the way," he said briskly. "We can keep on accelerating indefinitely at one gravity. We've got radars. We got them from—" He named the supplier. "Now we want advice on how fast we can risk traveling before we'll be going too fast to dodge meteors or whatnot that the radar may detect. Get that figured out for us, will you?" He gave back the instrument to Sandy and returned to his inspection of every item of functioning equipment in the ship. He found one or two trivial things to be bettered. The small craft went on in a
  • 76.
    singularly matter-of-fact fashion.If it had been a bomb shelter buried in the pit beside the mould in which it was built, there would have been very little difference in the feel of things. The constant acceleration substituted perfectly for gravity. The six television screens, to be sure, pictured incredible things outside, but television screens often picture incredible things. The wall-gardens looked green and flourishing. The pumps were noiseless. There were no moving parts in the drive. The gyro held everything steady. There was no vibration. Nobody could remain upset in such an unexciting environment. Presently Pam explored the living quarters below. Holmes took his place in the control-chair, but found no need to touch anything. Some time later Sandy reported, "Joe, they say we must be lying, but if we can keep on accelerating, we'd better not hit over four hundred miles a second. They say we can then swing end for end and decelerate down to two hundred, and then swing once more and build up to four again. But they insist that we ought to return to Earth." "They don't mention shooting rockets at us, do they?" asked Burke. "I thought they wouldn't. Just say thanks and go on working out a code." Sandy set to work with pencil and paper. Federal agents would be moving, now, to impound all official records that were in any way connected with any of the five on the ship. The key to the code would be contained in such records. It would be an agglomeration of such items as Burke's grandmother's maiden name, Holmes' social- security number, the name of a street Burke had lived on some years before, the exact amount of his federal income taxes the previous year, the title of a book third from the end on the second shelf of a bookcase in Keller's apartment, and such unconsidered items as most people can remember with a little effort, but which can only be found out by people who know where to look. These people would keep anybody else from looking in the same places. Such a code would be clumsy to work with, but it would be unbreakable.
  • 77.
    It took hoursto establish it without the mention of a single word included in the lengthy key. The ship reached four hundred miles a second, turned about, and began to cut down its speed again. Pam spoke from beside an electric stove, "Dinner's ready! Come and get it!" They dined; Sandy weary, Burke absorbed and inevitably worried, Holmes placid and amiable, and Keller beaming and interested in all that went on, which was practically nothing. They did not see the stars direct, because television cameras were preferable to portholes. Earth had become very small, and as it swung ever more nearly into a direct line between the ship and the sun, night filled more of its disk until only a hairline of sunshine showed at one edge. The microwave receivers ceased to mutter. The working astronomers on Earth who'd sent a message to M-387 were suddenly relieved of their disgrace and set to work again to equip the West Virginia radar telescope for continuous communication with Burke's ship. Other technicians began to prepare multiple receptors to pick up the ship's signals from hitherto unprecedented distances for human two-way communication. And on Earth an official statement went out from high authority. It announced that a hurriedly completed American ship was on the way to M-387 to investigate the signals from space. It announced that measures long in preparation were now in use, and that an invincible fleet of spacecraft would be completed in months, whereas they had not been hoped for for another generation. An unexpected breakthrough had made it possible to advance the science of space travel by many decades, and a fleet to explore all the planets as well as M-387 was already under construction. It was almost true that they were. The blueprints of Burke's ship had been flown to Washington from the plant, and an enormous number of replicas of the egg-shaped vessel were ordered to be begun immediately, even before the theory of the drive was understood.
  • 78.
    There was oneminor hitch. A legal-minded official protested that Congressional appropriations had been for rocket-driven spaceships only, and the money appropriated could not be used for other than rockets. An executive order settled the matter. Then theorists began to object to the principle of the drive. It contradicted well- established scientific beliefs. It could not work. It did, but there was violent opposition to the fact. Publicly, of course, the shock of such an about-face by the national government was extreme. But newspapers flashed new headlines. "U.S. SHIP SPEEDING TO QUERY ALIENS!" Lesser heads announced, "Critical Velocity Exceeded! Russian Probe Already Passed!" The last was not quite true. The Russian manned probe had started out ten days before. Burke hadn't overtaken it yet. Broadcasters issued special bulletins, and two networks canceled top evening programs to schedule interviews with prominent scientists who'd had nothing whatever to do with what Burke had managed to achieve. In Europe, obviously, the political effect was stupendous. Russia was reduced to impassioned claims that the ship had been built from Russian plans, using Russian discoveries, which had been stolen by imperialistic secret agents. And the heads of the Russian spy system were disgraced for not having, in fact, stolen the plans and discoveries from the Americans. All other operatives received threats of what would happen to them if they didn't repair that omission. These threats so scared half a dozen operatives that they defected and told all they knew, thereby wrecking the Russian spy system for the time being. Essentially, however, the recovery of confidence in America was as extravagant as the previous unhappy desire to hear no more about space. Burke, Holmes, Keller, Sandy and Pam became national heroes and heroines within eighteen hours after guided missiles had failed to shoot them down. The only criticism came from a highly conservative clergyman who hoped that other young girls would not
  • 79.
    imitate Sandy's andPam's disregard of convention and maintained that a married woman should have gone along to chaperon them. The atmosphere in the ship, however, was that of respectability carried to the point where things were dull. The lower compartment of the ship, being smaller, was inevitably appropriated by Sandy and Pam. They retired when the ship was twenty hours out from Earth. Each of them had prepared for stowing away by wearing extra garments in layers. "Funny," said Pam, yawning as they made ready to turn in, "I thought it was going to be exciting. But it's just like a rather full day at the office." "Which," said Sandy, "I'm quite used to." "I do think you ought to have barged in when they designed the ship, Sandy. There's not one mirror in it!" In the upper compartment Keller took his place in the control-chair and took a trick of duty. It consisted solely of looking at the instruments and listening to the beeping noises which came from remoteness every two seconds, and the still completely cryptic broadcasts which came every seventy-nine minutes. It wasn't exciting. There was nothing to be excited about. But somebody had to be on watch. On the second day out, Washington was ready to use the new code. The West Virginia radar bowl was powered to handle communications again. Sandy painstakingly took down the gibberish that came in and decoded it. From then on she worked at the coding and transmission of messages and the reception and decoding of others. Presently Pam relieved her at the job. Pam tended to be bored because Holmes was as much absorbed in the business of keeping anything from happening as was Burke. The messages were almost entirely requests for, and answers to requests for, details about the ship plans. The United States had not yet completed a duplicate drive-shaft. Machinists labored to reproduce the cores, which would then have to be wound in the
  • 80.
    complicated fashion theplans described. But it was an unhappy experience for the scientific minds assigned to duplicate Burke's ship. No woman ever followed a recipe without making some change. Very few physicists can duplicate another's apparatus without itching to change it. There were six copies of the drive under construction at the same time, at the beginning. Four were made by skeptics, who adhered to the original plans with strict accuracy. They were sure they'd prove Burke wrong. Two were "improved" in the making. The four, when finished, worked beautifully. The two doctored versions did not. But still there was fretful discussion of the theory of the drive. It seemed flatly to contradict Newton's law that every action has a reaction of equal moment and opposite sign—a law at least as firmly founded as the law of the conservation of energy. But that had lately been revised into the law of the conservation of energy and matter, which now was gospel. Burke's theory required the Newtonian law to be restated to read "every action of a given force has a reaction of the same force, of the same moment," and so on. When the reaction of one force is converted into another force, the results can be interesting. In fact, one can have a space-drive. But there was bitter resistance to the idea. It was demanded that Burke justify his views in a more reasonable way than by mere demonstration that they worked. After a time, Burke gave up trying to explain things. And when one and then another duplicate drive worked, the argument ceased. But eminent physicists still had a resentful feeling that Burke was cheating on them somehow. Then for days nothing happened. One of the three men in the ship always stayed in the control-chair where he could check the ship's course against the homing signals from the asteroid. He might have to correct it by the fraction of a hair, or swing ship and put on more drive if the radar should show celestial debris in the spaceship's path. Every so many hours the ship had to be swung about so that instead of accelerating she decelerated, or instead of decelerating gained fresh speed. But that was all.
  • 81.
    On the fifthday there was the flash of a meteor on the radar. On the seventh day an object which could have been the second or third unmanned Russian probe showed briefly at the very edge of the radar screens. In essence, however, the journey was pure tedium. Burke wearied of making sure that his work was good, though he congratulated himself that nothing did happen to break the monotony. Holmes admitted that he was disappointed. He'd wanted to make the journey because he'd sailed in everything but a spaceship. But there was no fun in it. Keller alone seemed comfortably absorbed. He prepared daily lists of instrument-readings to be sent back to Earth. They would be of enormous importance to science-minded people. They were not of interest to Sandy. Even when she talked to Burke, it was necessarily impersonal. There could be no privacy which was not ostentatious. The two girls used the lower compartment, the three men the upper and larger one. For Sandy to talk privately with Burke, she'd have had to go to the small bottom section of the ship. Holmes and Pam faced the same situation. It was uncomfortable. So they developed a perfectly pleasant habit of talking exclusively of things everybody could talk about. It did not bother Keller, who would hardly average a dozen words in twenty-four hours, but Sandy muttered to herself when she and Pam retired for what was a ship-night's rest. When they went past the orbit of Mars, agitated instructions came out from Earth. The asteroid belts began beyond Mars. Elaborate directions came. The ship was tracked by radar telescopes all around the world, direction-finding on its transmission. Croydon kept track. American radar bowls picked up the ship's voice. South American and Hawaiian and Japanese and Siberian radar telescopes determined the ship's position every time a set of code symbols reached Earth from the ship. Of course, there were also the beepings and the seventy-nine-minute-spaced identical broadcasts from farther out from the sun. Somebody got a brilliant idea and authority to try it. An interview for broadcast on Earth was sought with somebody on the ship. It was
  • 82.
    Welcome to ourwebsite – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com