0% found this document useful (0 votes)

8 views112 pages

Programming Languages For LHC

The document discusses the relationship between programming languages and particle physics, emphasizing the abstraction layers in both fields. It highlights how programming languages serve as tools for human interpretation of computations, with varying degrees of abstraction. The talk also addresses the evolution of programming languages, their impact on software engineering, and the importance of implementation over language choice for performance.

Uploaded by

dailypriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views112 pages

Programming Languages For LHC

Uploaded by

dailypriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 112

Programming languages and particle physics

Jim Pivarski

Princeton University – IRIS-HEP

May 8, 2019

1 / 57
2 / 57
But that just ended a few minutes ago.

(This talk is not a summary of the workshop;

come to tomorrow’s LPC Physics Forum at 1:30pm.)

3 / 57
But that just ended a few minutes ago.

(This talk is not a summary of the workshop;

come to tomorrow’s LPC Physics Forum at 1:30pm.)

Instead, let’s take a step back. . .

3 / 57
4 / 57
Because, you know, it’s different water.

5 / 57
So why do we say it’s the same river?

5 / 57
Why do we say it’s the same river?

The river is an abstraction.

We associate an enormous number of microscopic states (“molecules
here, molecules there”) with a single macroscopic state (“the river”).

6 / 57
Why do we say it’s the same river?

The river is an abstraction.

We associate an enormous number of microscopic states (“molecules
here, molecules there”) with a single macroscopic state (“the river”).

It’s an abstraction like thermodynamics;

it can be exact with the right definitions.

6 / 57
Most of computer science is about abstracting details, too.
double bessel_j0(double x) {
double out;
if (fabs(x) < 8.0) {
double y = x*x;
double ans1 = 57568490574.0 + y*(-13362590354.0 + y*(651619640.7
+ y*(-11214424.18 + y*(77392.33017 + y*(-184.9052456)))));
double ans2 = 57568490411.0 + y*(1029532985.0 + y*(9494680.718
+ y*(59272.64853 + y*(267.8532712 + y*1.0))));
out = ans1 / ans2;
}
else {
double z = 8.0 / fabs(x);
double y = z*z;
double xx = fabs(x) - 0.785398164;
double ans1 = 1.0 + y*(-0.1098628627e-2 + y*(0.2734510407e-4
+ y*(-0.2073370639e-5 + y*0.2093887211e-6)));
double ans2 = -0.1562499995e-1 + y*(0.1430488765e-3
+ y*(-0.6911147651e-5 + y*(0.7621095161e-6
- y*0.934935152e-7)));
out = sqrt(0.636619772/fabs(x))*(cos(xx)*ans1 - z*sin(xx)*ans2);
}
return out;
} 7 / 57
Most of computer science is about abstracting details, too.
double bessel_j0(double x) {
double out;
← one value goes in
if (fabs(x) < 8.0) {
double y = x*x;
double ans1 = 57568490574.0 + y*(-13362590354.0 + y*(651619640.7
+ y*(-11214424.18 + y*(77392.33017 + y*(-184.9052456)))));
double ans2 = 57568490411.0 + y*(1029532985.0 + y*(9494680.718
+ y*(59272.64853 + y*(267.8532712 + y*1.0))));
out = ans1 / ans2;
}
else {
double z = 8.0 / fabs(x);
double y = z*z;
double xx = fabs(x) - 0.785398164;
double ans1 = 1.0 + y*(-0.1098628627e-2 + y*(0.2734510407e-4
+ y*(-0.2073370639e-5 + y*0.2093887211e-6)));
double ans2 = -0.1562499995e-1 + y*(0.1430488765e-3
+ y*(-0.6911147651e-5 + y*(0.7621095161e-6
- y*0.934935152e-7)));
out = sqrt(0.636619772/fabs(x))*(cos(xx)*ans1 - z*sin(xx)*ans2);
}
return out;
} 7 / 57
Most of computer science is about abstracting details, too.
double bessel_j0(double x) {
double out;
← one value goes in
if (fabs(x) < 8.0) {
double y = x*x;
double ans1 = 57568490574.0 + y*(-13362590354.0 + y*(651619640.7
+ y*(-11214424.18 + y*(77392.33017 + y*(-184.9052456)))));
double ans2 = 57568490411.0 + y*(1029532985.0 + y*(9494680.718
+ y*(59272.64853 + y*(267.8532712 + y*1.0))));
out = ans1 / ans2;
}
else {
double z = 8.0 / fabs(x);
double y = z*z;
double xx = fabs(x) - 0.785398164;
double ans1 = 1.0 + y*(-0.1098628627e-2 + y*(0.2734510407e-4
+ y*(-0.2073370639e-5 + y*0.2093887211e-6)));
double ans2 = -0.1562499995e-1 + y*(0.1430488765e-3
+ y*(-0.6911147651e-5 + y*(0.7621095161e-6
- y*0.934935152e-7)));
out = sqrt(0.636619772/fabs(x))*(cos(xx)*ans1 - z*sin(xx)*ans2);
}

}
return out; ← one value comes out
7 / 57
The abstraction is cumulative:
Every function/class/module has an
interior and an interface—minimizing

#external parameters
#internal parameters
reduces the mental burden on
programmers and users.

8 / 57
Science has layers of abstraction

These are approximate, taking advantage of a separation of scales.

9 / 57
(cartoon diagram, not to scale)

computer
#external parameters
programming

abstraction in science
(atom → proton → quark)

machine
learning thermodynamics

#internal parameters
10 / 57
Software interfaces can be exact, despite radical internal differences.
I Super Mario Bros. entirely rewritten in Javascript by Josh Goldberg.
I Shares none of the original code, but behaves identically.

11 / 57
Software interfaces can be exact, despite radical internal differences.
I Super Mario Bros. entirely rewritten in Javascript by Josh Goldberg.
I Shares none of the original code, but behaves identically.

Is it the same program?

11 / 57
As a young programmer, I wasn’t satisfied
with high-level languages because I wanted
to get down to the “real” computer.

12 / 57
As a young programmer, I wasn’t satisfied
with high-level languages because I wanted
to get down to the “real” computer.

Which meant Pascal.

Pascal was “real,” and BASIC was not.

12 / 57
As a young programmer, I wasn’t satisfied
with high-level languages because I wanted
to get down to the “real” computer.

Which meant Pascal.

Pascal was “real,” and BASIC was not.

But ultimately, not even assembly code is

real in the sense that I’m meaning here.
12 / 57
The objectively real part of a computer is a set
of physical states.

13 / 57
The objectively real part of a computer is a set
of physical states that we interpret as computations.

13 / 57
Programming languages are how we describe our interpretations.

XIX + IV = XXIII

19 + 4 = 23

14 / 57
Programming languages are how we describe our interpretations.

XIX + IV = XXIII

19 + 4 = 23

(And some languages are better at it than others.) 14 / 57

Programming languages differ in their degree of abstraction,
but all programming languages are for humans, not computers.

15 / 57
Programming languages differ in their degree of abstraction,
but all programming languages are for humans, not computers.

Each one re-expresses the programmer’s intent in terms of another:

CMSSW configuration implemented in Python runtime
Python runtime implemented in C source code
C source code compiled into machine instructions
machine instructions built into logic gates
logic gates interpreted as computation.

15 / 57
Programming languages differ in their degree of abstraction,
but all programming languages are for humans, not computers.

Each one re-expresses the programmer’s intent in terms of another:

Only the last level actually pushes the abacus beads.

15 / 57
Originally, programming languages didn’t push the abacus beads.

Ada of Lovelace’s algorithm for computing

Bernoulli numbers was written for a
computer that never ended up being
invented, but it was a program.

16 / 57
Originally, programming languages didn’t push the abacus beads.

Ada of Lovelace’s algorithm for computing

Bernoulli numbers was written for a
computer that never ended up being
invented, but it was a program.

John McCarthy, creator of Lisp: “This EVAL was written and published in the paper and
Steve Russel said, ‘Look, why don’t I program this EVAL?’ and I said to him, ‘Ho, ho,
you’re confusing theory with practice—this EVAL is intended for reading, not for
computing!’ But he went ahead and did it.”

16 / 57
Originally, programming languages didn’t push the abacus beads.

Ada of Lovelace’s algorithm for computing

Bernoulli numbers was written for a
computer that never ended up being
invented, but it was a program.

APL (ancestor of MATLAB, R, and Numpy) was also a notation for describing programs
years before it was executable. The book was named A Programming Language.

16 / 57
Programmers had to manually translate
these notations into instruction codes.

That’s why it was called “coding.”

17 / 57
Programmers had to manually translate
these notations into instruction codes.

That’s why it was called “coding.”

Von Neumann called assembly language “a waste of a valuable

scientific computing instrument—using it for clerical work!”

17 / 57
The Software Crisis

Now that our programming languages do push abacus beads, software engineering
has become an odd discipline: saying something is the same as making it.

18 / 57
The Software Crisis

Now that our programming languages do push abacus beads, software engineering
has become an odd discipline: saying something is the same as making it.

And yet, we still get it wrong.

18 / 57
We favor high-level languages because they have fewer concepts,
hopefully just the ones that are essential for a problem.

19 / 57
We favor high-level languages because they have fewer concepts,
hopefully just the ones that are essential for a problem.

But what about speed? Don’t we choose languages for speed?

19 / 57
We favor high-level languages because they have fewer concepts,
hopefully just the ones that are essential for a problem.

But what about speed? Don’t we choose languages for speed?

“There’s no such thing as a ‘fast’ or ‘slow’ language.”

— so sayeth the StackOverflow

19 / 57
Except Python. Python is slow, right?

https://benchmarksgame-team.pages.debian.net/benchmarksgame
20 / 57
But it really isn’t the language; it’s the implementation.

import numpy

def run(height, width, maxiterations=20):

y, x = numpy.ogrid[-1:0:height*1j, -1.5:0:width*1j]
c = x + y*1j
fractal = numpy.full(c.shape, maxiterations,
dtype=numpy.int32)
for h in range(height):
for w in range(width): # for each pixel (h, w)...
z = c[h, w]
for i in range(maxiterations): # iterate at most 20 times
z = z**2 + c[h, w] # applying z → z2 + c
if abs(z) > 2: # if it diverges (|z| > 2)
fractal[h, w] = i # color the plane with the iteration number
break # we're done, no need to keep iterating

return fractal

21 / 57
But it really isn’t the language; it’s the implementation.

import numpy, numba

@numba.jit
def run(height, width, maxiterations=20):
y, x = numpy.ogrid[-1:0:height*1j, -1.5:0:width*1j]
c = x + y*1j
fractal = numpy.full(c.shape, maxiterations,
dtype=numpy.int32)
for h in range(height):
for w in range(width): # for each pixel (h, w)...
z = c[h, w]
for i in range(maxiterations): # iterate at most 20 times
z = z**2 + c[h, w] # applying z → z2 + c
if abs(z) > 2: # if it diverges (|z| > 2)
fractal[h, w] = i # color the plane with the iteration number
break # we're done, no need to keep iterating

return fractal

Now 50× faster, about equal to C code (-O3).

21 / 57
Here’s the catch

The @numba.jit decorator translates a subset of Python bytecode to machine

instructions. You only get a speedup for statically typable, numeric code.

22 / 57
Here’s the catch

The @numba.jit decorator translates a subset of Python bytecode to machine

instructions. You only get a speedup for statically typable, numeric code.

Same language (subset), completely different implementation.

22 / 57
Here’s the catch

The @numba.jit decorator translates a subset of Python bytecode to machine

instructions. You only get a speedup for statically typable, numeric code.

Same language (subset), completely different implementation.

Pure Python is slower than Numba or C because it has more hurdles in the way:
dynamic typing, pointer-chasing, garbage collection, hashtables, string equality. . .
22 / 57
Greg Owen’s talk on Spark 2.0

23 / 57
Greg Owen’s talk on Spark 2.0

23 / 57
So although it’s the implementation, not the language, that’s slow,

that implementation can be hampered by the flexibility

that the language promises.

24 / 57
25 / 57
Domain-specific languages:
specialized languages for narrowly defined problems.

I Main purpose: reduces complexity, the mental clutter that

obscures general-purpose languages.

I Secondary purpose: limited flexibility allows for streamlined

implementations.

26 / 57
Domain-specific languages that you’re probably already using

Any guesses?

27 / 57
Domain-specific languages that you’re probably already using

Regular expressions

28 / 57
Domain-specific languages that you’re probably already using
100

TTree::Draw (TTreeFormula) 80

ttree->Draw("lep1_p4.X() + lep1_p4.Y()"); 40

0
-400 -300 -200 -100 0 100 200 300
lep1_p4.X() + lep1_p4.Y()

29 / 57
Domain-specific languages that you’re probably already using
100

TTree::Draw (TTreeFormula) 80

ttree->Draw("lep1_p4.X() + lep1_p4.Y()"); 40

0
Looping and reducing constructs: -400 -300 -200 -100 0 100 200
lep1_p4.X() + lep1_p4.Y()
300

for (int i0; i0 < 3; i0++) {

for (int j2; j2 < 5; j2++) {
for (int j3; j3 < 2; j3++) {
"fMatrix[][fResults[][]]" −→ int i1 = fResults[j2][j3];
use the value of fMatrix[i0][i1]
}
}

Length$(·) Sum$(·) Min$(·) Max$(·) MinIf$(·,·) MaxIf$(·,·) Alt$(·,·)

29 / 57
Domain-specific languages that you’re probably already using

Makefiles

30 / 57
Domain-specific languages that you’re probably already using

Format strings

printf/scanf: distinct syntax from C/C++, must be quoted

printf("Error 0x%04x: %s", id, errors[id]);

I/O streams: defined within C/C++ via operator overloading

std::cout << "Error 0x" << std::hex << std::setfill('0')
<< std::setw(4) << id << ": " << errors[id] << std::endl;

31 / 57
Domain-specific languages that you’re probably already using

Format strings

printf/scanf: distinct syntax from C/C++, must be quoted

printf("Error 0x%04x: %s", id, errors[id]);

I/O streams: defined within C/C++ via operator overloading

std::cout << "Error 0x" << std::hex << std::setfill('0')
<< std::setw(4) << id << ": " << errors[id] << std::endl;

printf/scanf is “external” and I/O streams is “internal” (embedded)

31 / 57
External versus internal (embedded) domain-specific languages
External: SQL has a distinct syntax from Python; must be quoted in PySpark.
import pyspark
pyspark.sql("""
SELECT CONCAT(first, " ", last) AS fullname, AVG(age)
FROM my_table WHERE age BETWEEN 18 AND 24
GROUP BY fullname
""")

Internal (embedded): SparkSQL is an equivalent language, defined within Python.

import pyspark.sql.functions as F
df = pyspark.read.load("my_table")
(df.withColumn("fullname",
F.concat(F.col("first"), F.lit(" "), F.col("last")))
.select("fullname", "age")
.where(df.age.between(18, 24))
.groupBy("fullname")
.agg(F.mean("age")))
32 / 57
Objection: a collection of libraries and operator overloads isn’t a language!

33 / 57
Objection: a collection of libraries and operator overloads isn’t a language!

My answer: programming languages are human modes of expression,

implemented using other programming languages, all the way down.
What matters is whether it’s a coherent set of concepts, not whether
it was implemented by a parser.

33 / 57
Objection: a collection of libraries and operator overloads isn’t a language!

My answer: programming languages are human modes of expression,

implemented using other programming languages, all the way down.
What matters is whether it’s a coherent set of concepts, not whether
it was implemented by a parser.

(One might as well argue about the distinction between languages and dialects.)

33 / 57
Perhaps the most widespread domain-specific language in data analysis:

SQL

34 / 57
Perhaps the most widespread domain-specific language in data analysis:

SQL
But we rarely use it in particle physics. Why?

34 / 57
Structure of a collider physics query: C++

“Momentum of the track with |η| < 2.4 that has the most hits.”
Track *best = NULL;

for (int i = 0; i < tracks.size(); i++) {

if (fabs(tracks[i]->eta) < 2.4)
if (best == NULL ||
tracks[i]->hits.size() > best->hits.size())
best = tracks[i];
}

if (best != NULL)
return best->pt;
else
return 0.0;

35 / 57
Structure of a collider physics query: SQL

“Momentum of the track with |η| < 2.4 that has the most hits.”
WITH hit_stats AS (
SELECT hit.track_id, COUNT(*) AS hit_count FROM hit
GROUP BY hit.track_id),
track_sorted AS (
SELECT track.*,
ROW_NUMBER() OVER (
PARTITION BY track.event_id
ORDER BY hit_stats.hit_count DESC)
track_ordinal FROM track INNER JOIN hit_stats
ON hit_stats.track_id = track.id
WHERE ABS(track.eta) < 2.4)
SELECT * FROM event INNER JOIN track_sorted
ON track_sorted.event_id = event.id
WHERE
track_sorted.track_ordinal = 1
35 / 57
The problem is that collisions produce a variable number of particles per event:
the tables are “jagged.”

36 / 57
The problem is that collisions produce a variable number of particles per event:
the tables are “jagged.”

This can be described using SQL’s relational concepts:

I separate tables for events and particles
I linked by a common “event number” index.
But each type of particle has to be a separate table and each operation has to be
INNER JOINed to maintain events as objects.

36 / 57
The problem is that collisions produce a variable number of particles per event:
the tables are “jagged.”

This can be described using SQL’s relational concepts:

SQL makes particle physics problems harder, not easier, which defeats the point.

36 / 57
It seems like there’s an opportunity here

Would a domain specific language for particle physics

I make analysis code easier to read?
I make mistakes more evident?
I make it easier to synchronize analyses from different groups/experiments?
I make it easier to preserve them in executable/recastable form?
I highlight physics concepts, like control regions, systematic variations, event
weights, combinatorics with symmetries?
I hide irrelevant concepts like managing files, memory, load balancing, and
other performance tweaks?

37 / 57
It seems like there’s an opportunity here

Would a domain specific language for particle physics

That was the subject of the Analysis Description Language Workshop.

37 / 57
In fact, about that SQL. . .

38 / 57
In fact, about that SQL. . .

38 / 57
Why hasn’t this been done before?

(Why hasn’t it succeeded before?)

39 / 57
I think the answer is cultural, so I’ll take a historical perspective. . .

40 / 57
I think the answer is cultural, so I’ll take a historical perspective. . .

Starting in 1880.

40 / 57
The U.S. Census’s problem
The U.S. does a census every 10 years. The 1880 census took 8 years to process.
−→ Big data problem!

41 / 57
The U.S. Census’s problem
The U.S. does a census every 10 years. The 1880 census took 8 years to process.
−→ Big data problem!
Held a competition for a new method; winner was 10× faster than the rest:

41 / 57
Census records on punch cards, which filtered electrical contacts

42 / 57
Wired to a machine that opens a door for each matching pattern

43 / 57
It was an SQL machine: 3 basic clauses of most SQL queries
SELECT: pre-programmed WHERE: pins pass through
(wired up) counters punch card and template

GROUP BY: door opens

to the appropriate bin
for aggregation

SELECT name WHERE literate GROUP BY marital_status

44 / 57
Herman Hollerith (inventor) incorporated the Tabulating Machine Company, which
after a series of mergers became International Business Machines (IBM) in 1924.

45 / 57
Herman Hollerith (inventor) incorporated the Tabulating Machine Company, which
after a series of mergers became International Business Machines (IBM) in 1924.

The computation represented by this machine is not universal (Turing complete),

but has many applications.

45 / 57
Herman Hollerith (inventor) incorporated the Tabulating Machine Company, which
after a series of mergers became International Business Machines (IBM) in 1924.

The computation represented by this machine is not universal (Turing complete),

but has many applications.

Most recently as “map-reduce.”

45 / 57
Google’s problem
In the early 2000’s, Google was struggling to keep up with the growing web
(index 5 months out of date, routine hardware failures, scale sensitive to bit flips).

46 / 57
Google’s problem
In the early 2000’s, Google was struggling to keep up with the growing web
(index 5 months out of date, routine hardware failures, scale sensitive to bit flips).

At that time, each programmer had to divide tasks, distribute, and combine
results manually and account for failures manually.

46 / 57
Google’s problem
In the early 2000’s, Google was struggling to keep up with the growing web
(index 5 months out of date, routine hardware failures, scale sensitive to bit flips).

At that time, each programmer had to divide tasks, distribute, and combine
results manually and account for failures manually.

2003: MapReduce created to abstract task management from analysis logic.

46 / 57
Google’s problem
In the early 2000’s, Google was struggling to keep up with the growing web
(index 5 months out of date, routine hardware failures, scale sensitive to bit flips).

At that time, each programmer had to divide tasks, distribute, and combine
results manually and account for failures manually.

2003: MapReduce created to abstract task management from analysis logic.

MapReduce is distributed SELECT-WHERE

| {z } - GROUP
| {z BY}.
“map” “reduce”

46 / 57
Google’s problem
In the early 2000’s, Google was struggling to keep up with the growing web
(index 5 months out of date, routine hardware failures, scale sensitive to bit flips).

At that time, each programmer had to divide tasks, distribute, and combine
results manually and account for failures manually.

2003: MapReduce created to abstract task management from analysis logic.

MapReduce is distributed SELECT-WHERE

| {z } - GROUP
| {z BY}.
“map” “reduce”

2004: published as a paper by Jeffrey Dean and Sanjay Ghemawat.

2006: reimplemented as open-source software: Apache Hadoop.
46 / 57
Problems like “index all webpages” plug into this framework.
SELECT-WHERE: filter and transform GROUP BY: collect and transform all
each input to a hkey, valuei pair. values with a given key.
def map(webpage): def reduce(word, webpages):
for word in webpage.split(): index[word] = set()
if not stopword(word): for webpage in webpages:
yield (word, webpage) index[word].add(webpage)

47 / 57
That’s how statisticians encountered computing.

Physics encountered computing differently.

48 / 57
Physicists got into computers when they became general-purpose

1944: John Mauchly (physicist) and J. Presper Eckert

(electrical engineer) designed ENIAC to replace mechanical
computers for ballistics.

49 / 57
Physicists got into computers when they became general-purpose

1944: John Mauchly (physicist) and J. Presper Eckert

(electrical engineer) designed ENIAC to replace mechanical
computers for ballistics.
ENIAC was one of the first computers driven by machine
code instructions, stored as a program in memory.

49 / 57
Physicists got into computers when they became general-purpose

1944: John Mauchly (physicist) and J. Presper Eckert

(electrical engineer) designed ENIAC to replace mechanical
computers for ballistics.
ENIAC was one of the first computers driven by machine
code instructions, stored as a program in memory.

1945: John von Neumann learned of their work and

suggested using it for nuclear simulations (H-bomb).

49 / 57
Physicists got into computers when they became general-purpose

1944: John Mauchly (physicist) and J. Presper Eckert

(electrical engineer) designed ENIAC to replace mechanical
computers for ballistics.
ENIAC was one of the first computers driven by machine
code instructions, stored as a program in memory.

1945: John von Neumann learned of their work and

suggested using it for nuclear simulations (H-bomb).
His internal memo describing ENIAC’s stored programs
was leaked; now known as “Von Neumann architecture.”

49 / 57
Physicists got into computers when they became general-purpose

1944: John Mauchly (physicist) and J. Presper Eckert

(electrical engineer) designed ENIAC to replace mechanical
computers for ballistics.
ENIAC was one of the first computers driven by machine
code instructions, stored as a program in memory.

1945: John von Neumann learned of their work and

suggested using it for nuclear simulations (H-bomb).
His internal memo describing ENIAC’s stored programs
was leaked; now known as “Von Neumann architecture.”
Los Alamos group led by Nicholas Metropolis, developed
Monte Carlo techniques for physics problems.
49 / 57
The actual programming was performed by these six women

Kathleen Frances Bilas Betty Jean Ruth Elizabeth Marlyn

McNulty Jennings Lichterman Snyder Wescoff

50 / 57
Eckert-Mauchly Computer Corporation → Remington Rand

Mauchly and Eckert “went into industry” selling computers;

the first one (UNIVAC) to the U.S. Census.

51 / 57
Eckert-Mauchly Computer Corporation → Remington Rand

Mauchly and Eckert “went into industry” selling computers;

the first one (UNIVAC) to the U.S. Census.

1950: Short Code, the first executable high-level language:

a transliterated interpreter of mathematical formulas.
math: X3 = ( X1 + Y1 ) / X1 * Y1
code: X3 03 09 X1 07 Y1 02 04 X1 Y1
50× slower than machine code because it was interpreted.

51 / 57
Eckert-Mauchly Computer Corporation → Remington Rand

Mauchly and Eckert “went into industry” selling computers;

the first one (UNIVAC) to the U.S. Census.

1950: Short Code, the first executable high-level language:

a transliterated interpreter of mathematical formulas.
math: X3 = ( X1 + Y1 ) / X1 * Y1
code: X3 03 09 X1 07 Y1 02 04 X1 Y1
50× slower than machine code because it was interpreted.

1952–1959: At Remington Rand, Grace Hopper developed a

series of compiled languages, ultimately COBOL.

51 / 57
Eckert-Mauchly Computer Corporation → Remington Rand

Mauchly and Eckert “went into industry” selling computers;

the first one (UNIVAC) to the U.S. Census.

1950: Short Code, the first executable high-level language:

a transliterated interpreter of mathematical formulas.
math: X3 = ( X1 + Y1 ) / X1 * Y1
code: X3 03 09 X1 07 Y1 02 04 X1 Y1
50× slower than machine code because it was interpreted.

1952–1959: At Remington Rand, Grace Hopper developed a

series of compiled languages, ultimately COBOL.

Meanwhile, IBM developed FORTRAN: 1954–1957.

51 / 57
Physicists drove programming language development in the 1940’s and 1950’s but
stuck with FORTRAN until the 21st century.

52 / 57
Physicists drove programming language development in the 1940’s and 1950’s but
stuck with FORTRAN until the 21st century.

In fact, FORTRAN (pre-Fortran 90) wasn’t even a good fit to data analysis problems.
It didn’t handle jagged data well, much like SQL.

52 / 57
Physicists drove programming language development in the 1940’s and 1950’s but
stuck with FORTRAN until the 21st century.

In fact, FORTRAN (pre-Fortran 90) wasn’t even a good fit to data analysis problems.
It didn’t handle jagged data well, much like SQL.

This gap was filled with a library: ZEBRA provided a graph of structures and dynamic
memory management, even though these were features of Ada, C, Pascal, and PL/I.

52 / 57
Physicists drove programming language development in the 1940’s and 1950’s but
stuck with FORTRAN until the 21st century.

In fact, FORTRAN (pre-Fortran 90) wasn’t even a good fit to data analysis problems.
It didn’t handle jagged data well, much like SQL.

This gap was filled with a library: ZEBRA provided a graph of structures and dynamic
memory management, even though these were features of Ada, C, Pascal, and PL/I.

“The whole concept of ZEBRA is a manifestation of one of FORTRAN 77’s needs.”

— Bebo White in 1989

52 / 57
Ironically, a very similar talk was given almost 20 years ago today

53 / 57
54 / 57
Are we halfway through the second major language shift?

Languages of non-fork repos for GitHub

| users who also{zfork cms-sw/cmssw}
| {z }
their own work physicists, specifically CMS

800 C/C++
Python
700 Jupyter Notebook
TeX
600 Java
500 R
VHDL
400 FORTRAN
Julia
300 Go
200
100
0
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
year

55 / 57
Are we halfway through the second major language shift?

Languages of non-fork repos for GitHub

| users who also{zfork cms-sw/cmssw}
| {z }
their own work physicists, specifically CMS

800 C/C++
Python
700 Jupyter Notebook
TeX
600 Java
500 R
VHDL
400 FORTRAN
Julia
300 Go
200
100
0
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
year

The shift from Fortran to C++ was a decision made by collaboration leaders.

55 / 57
Are we halfway through the second major language shift?

Languages of non-fork repos for GitHub

| users who also{zfork cms-sw/cmssw}
| {z }
their own work physicists, specifically CMS

800 C/C++
Python
700 Jupyter Notebook
TeX
600 Java
500 R
VHDL
400 FORTRAN
Julia
300 Go
200
100
0
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
year

The shift from Fortran to C++ was a decision made by collaboration leaders.
What we see here are individuals choosing a language for their own work.
55 / 57
56 / 57
Informal summary of the workshop at tomorrow’s
LPC Physics Forum at 1:30pm.

57 / 57

Big Ideas 4up
No ratings yet
Big Ideas 4up
9 pages
Discovering Computer Science - Interdisciplinary Problems, Principles, and Python Programming
No ratings yet
Discovering Computer Science - Interdisciplinary Problems, Principles, and Python Programming
543 pages
Module 0 - Prerequisite - Theory and Practical Perspective of QC
No ratings yet
Module 0 - Prerequisite - Theory and Practical Perspective of QC
95 pages
CS101 Study Guide
No ratings yet
CS101 Study Guide
49 pages
Python in Large-Scale Linear Algebra
No ratings yet
Python in Large-Scale Linear Algebra
11 pages
PLAI Version 3.2.0 Printing
No ratings yet
PLAI Version 3.2.0 Printing
229 pages
Discovering Computer Science Interdisciplinary Problems Principles and Python Programming 2nd Edition Jessen Havill PDF Download
No ratings yet
Discovering Computer Science Interdisciplinary Problems Principles and Python Programming 2nd Edition Jessen Havill PDF Download
134 pages
Computer Science Principles With Java
No ratings yet
Computer Science Principles With Java
261 pages
Lecture - 1 3abcdefghijklmnopqrstuvwxyzabc
No ratings yet
Lecture - 1 3abcdefghijklmnopqrstuvwxyzabc
95 pages
Week0Lecture1 159290
No ratings yet
Week0Lecture1 159290
24 pages
Computer Science Education: Where Are The Software Engineers of Tomorrow?
No ratings yet
Computer Science Education: Where Are The Software Engineers of Tomorrow?
3 pages
Computational Thinking Essentials
No ratings yet
Computational Thinking Essentials
9 pages
CS450 Classnotes (PDFDrive)
No ratings yet
CS450 Classnotes (PDFDrive)
28 pages
Computer Science Principles With Python
No ratings yet
Computer Science Principles With Python
263 pages
Limits of Computation From A Programming Perspective 1st Edition Bernhard Reus (Auth.) Instant Download
No ratings yet
Limits of Computation From A Programming Perspective 1st Edition Bernhard Reus (Auth.) Instant Download
89 pages
Computer Programming CH 01
No ratings yet
Computer Programming CH 01
46 pages
AP Computer Science Principles Sample Exam Question
No ratings yet
AP Computer Science Principles Sample Exam Question
19 pages
Week0Lecture 82823
No ratings yet
Week0Lecture 82823
23 pages
Algorithms & Programming Essentials
No ratings yet
Algorithms & Programming Essentials
72 pages
CS61C: Intro to C, Pointers & Arrays
No ratings yet
CS61C: Intro to C, Pointers & Arrays
19 pages
Session
No ratings yet
Session
51 pages
Programming Course for Educators
No ratings yet
Programming Course for Educators
21 pages
Introduction To Computer Simulation of Physical Systems (Lecture 1)
No ratings yet
Introduction To Computer Simulation of Physical Systems (Lecture 1)
63 pages
Intro to Programming Concepts
No ratings yet
Intro to Programming Concepts
17 pages
W1-Introduction 03042025
No ratings yet
W1-Introduction 03042025
59 pages
Chap 0
No ratings yet
Chap 0
30 pages
03 Notes On C Harvey
No ratings yet
03 Notes On C Harvey
19 pages
Data Structure in Javascript
No ratings yet
Data Structure in Javascript
138 pages
Lecture 0 - CS50x
No ratings yet
Lecture 0 - CS50x
13 pages
GGG
No ratings yet
GGG
21 pages
Programming Domains Course Summary
No ratings yet
Programming Domains Course Summary
12 pages
550+ Computer Science and Programming
No ratings yet
550+ Computer Science and Programming
13 pages
1.1 Lecture Notes
No ratings yet
1.1 Lecture Notes
13 pages
Full Unit 1 Notes
No ratings yet
Full Unit 1 Notes
37 pages
Cs50x 2025 Course
No ratings yet
Cs50x 2025 Course
49 pages
Notes
No ratings yet
Notes
73 pages
Dowek Principles of Programming Languages c2009
100% (2)
Dowek Principles of Programming Languages c2009
171 pages
Ece001 PDF
No ratings yet
Ece001 PDF
106 pages
Lecture 1
No ratings yet
Lecture 1
43 pages
Report On Coding Testing SWEFinal 11
No ratings yet
Report On Coding Testing SWEFinal 11
34 pages
Scrib 4
No ratings yet
Scrib 4
19 pages
UNCG CSC100 F17 LECT4 ComputingAndProgramming
No ratings yet
UNCG CSC100 F17 LECT4 ComputingAndProgramming
15 pages
Basic Programming Laboratory
No ratings yet
Basic Programming Laboratory
19 pages
How To Design Programs An Introduction To Programming and Computing Matthias Felleisen Available Full Chapters
100% (1)
How To Design Programs An Introduction To Programming and Computing Matthias Felleisen Available Full Chapters
138 pages
Fcs Notes
No ratings yet
Fcs Notes
167 pages
Discovering Computer Science Interdisciplinary Problems Principles and Python Programming 1st Edition Jessen Havill Updated 2025
100% (1)
Discovering Computer Science Interdisciplinary Problems Principles and Python Programming 1st Edition Jessen Havill Updated 2025
76 pages
CS121 Lecture Note
No ratings yet
CS121 Lecture Note
55 pages
Research Paper in Practical Research 1
No ratings yet
Research Paper in Practical Research 1
28 pages
Programming Environment
No ratings yet
Programming Environment
11 pages
Ee2021e L1
No ratings yet
Ee2021e L1
27 pages
Overview1 PDF
No ratings yet
Overview1 PDF
42 pages
L1 Introduction To Programming Languages
No ratings yet
L1 Introduction To Programming Languages
38 pages
YosefRoba ID 1556 15 Assignment 1
No ratings yet
YosefRoba ID 1556 15 Assignment 1
19 pages
Computer Program - Wikipedia
No ratings yet
Computer Program - Wikipedia
58 pages
ELE 5223 Tutorials
No ratings yet
ELE 5223 Tutorials
10 pages
NCISM Rasa Shastra Evam Bhaishajya Kalpana Syllabus
No ratings yet
NCISM Rasa Shastra Evam Bhaishajya Kalpana Syllabus
17 pages
DLL Science WK1 Q4 2024
No ratings yet
DLL Science WK1 Q4 2024
5 pages
Reteach 1
No ratings yet
Reteach 1
2 pages
Sea of Souls Volume VI (Extra Content)
No ratings yet
Sea of Souls Volume VI (Extra Content)
27 pages
Grade 3 Science and English Test
No ratings yet
Grade 3 Science and English Test
16 pages
Homemade Fernet Recipe Guide
0% (1)
Homemade Fernet Recipe Guide
2 pages
Disassembly and Assembly: Automatic Transmission
No ratings yet
Disassembly and Assembly: Automatic Transmission
1 page
Underwater Data Logger
No ratings yet
Underwater Data Logger
11 pages
Compendium Part 21 1
No ratings yet
Compendium Part 21 1
7 pages
South South Recipes
100% (1)
South South Recipes
36 pages
Wave Field Synthesis Explained
No ratings yet
Wave Field Synthesis Explained
26 pages
ImagePROGRAF TM Series Brochure 200
No ratings yet
ImagePROGRAF TM Series Brochure 200
4 pages
Case Study-3rd test-TQM-2018-19
100% (1)
Case Study-3rd test-TQM-2018-19
5 pages
Rubric Malikhaing Pagkukwento - 2015 PDF
No ratings yet
Rubric Malikhaing Pagkukwento - 2015 PDF
4 pages
Benazepril
No ratings yet
Benazepril
2 pages
HV Ref Manual PDF
No ratings yet
HV Ref Manual PDF
184 pages
Study in Austria-General Information For Pakistani Students
No ratings yet
Study in Austria-General Information For Pakistani Students
15 pages
Coax Data Sheet - Coaxial Valve: Type MK 15 FK 15
No ratings yet
Coax Data Sheet - Coaxial Valve: Type MK 15 FK 15
2 pages
Gom 6 - 10
No ratings yet
Gom 6 - 10
560 pages
2021.08.29 News Chapter 5 Chain
No ratings yet
2021.08.29 News Chapter 5 Chain
53 pages
Cipms: Computerized Integrated Plant Management System
No ratings yet
Cipms: Computerized Integrated Plant Management System
19 pages
Gothic Architecture Presentation
No ratings yet
Gothic Architecture Presentation
81 pages
Power Meter Training Plan
No ratings yet
Power Meter Training Plan
5 pages
Establishing A TSFP
100% (1)
Establishing A TSFP
2 pages
Afternoon OR Nurse Position Application
No ratings yet
Afternoon OR Nurse Position Application
2 pages
Easy Tense Chart
No ratings yet
Easy Tense Chart
3 pages
Automatic Door Opening Using Arduino HRSC04 Ultras
No ratings yet
Automatic Door Opening Using Arduino HRSC04 Ultras
8 pages
Class 12 Economics: Macroeconomics Quiz
No ratings yet
Class 12 Economics: Macroeconomics Quiz
6 pages
D&D Spell Incantations Guide
No ratings yet
D&D Spell Incantations Guide
25 pages
Laptop Guide PDF
No ratings yet
Laptop Guide PDF
19 pages