KEMBAR78
What Academia Can Learn from Open Source | PDF
What Academia Can Learn 
from Open Source 
! 
Arfon Smith 
arfon@github.com 
@arfon 
Creative Commons Attribution 3.0 Unported License 
"
!
What is a GitHub?
A story from my life 
(10 years ago)
Astronomer
tl;dr - technical, but brimming 
with inefficiencies
http://www.flickr.com/photos/blachswan
http://www.flickr.com/photos/esoastronomy/
httttp://www.flflickr.com/photos/jeasmoaiesgtirlobneortmy/
http://amandabauer.blogspot.com/
Diffraction grating 
Telescope 
Detector
> cat bad_pix_mask.txt 
130 130 1 2048 
189 189 258 258 
480 562 378 378 
493 521 390 397 
851 851 247 274 
319 319 304 580 
493 511 610 636 
188 188 228 228
Wasteful
Wasteful 
2 days work
Wasteful 
2 days work 
3 observing runs/week
Wasteful 
2 days work 
3 observing runs/week 
52 weeks in year
Wasteful 
2 days work 
3 observing runs/week 
52 weeks in year 
15 year detector lifetime
Wasteful 
2 days work 
3 observing runs/week 
52 weeks in year 
15 year detector lifetime 
2*3*52*15 = 4680 days (13 years)
Wasteful… but the norm 
2 days work 
3 observing runs/week 
52 weeks in year 
15 year detector lifetime 
2*3*52*15 = 4680 days (13 years)
A second story from my life 
(2 months ago)
Software composed of many 
components
Your software is the thing 
that is different
Open Source: Ubiquitous 
culture of reuse
Why isn’t academia like this?
http://dx.doi.org/ 
10.1051/0004-6361
Careers are based on 
paper counts
Careers are based on 
paper citations
Three major problems
1. ’Novel’ results preferred
2. Reduced collaboration
3. The format sucks
Explain what you did
So that others can repeat
Everybody learns
It’s the way that we explain 
that matters most
State of the art technology
State of the art technology… 
for the late 17th century* 
* Michael Nielsen
Data, methods, prose
http://www.nature.com/news/2011/111005/full/478026a.html
BIG SCIENCE
Numbers, data Science! 
Complex stuff
Reproducibility 
Data intensive
Verification may take years 
(if at all)
What do open source 
collaborations do well?
Open Source vs 
Open Collaborations 
Open source collaborations
Open Source: the right to 
modify, not the right to 
contribute. 
Open source collaborations
Open Collaborations: a highly 
collaborative development 
process and are receptive to 
contributions Open source of collaborations 
code, 
documentation, discussion, etc 
from anyone who shows 
competent interest.
Open Collaborations: a highly 
collaborative THIS 
development 
process and are receptive to 
contributions Open source of collaborations 
code, 
documentation, discussion, etc 
from anyone who shows 
competent interest.
Ubiquitous culture of reuse
Expose their collaborative 
process
How do 4000 people 
work together?
The pull request
Code first, permission later 
discuss improve
Every time this happens the 
community learns
Merged pull requests
“open source is… 
reproducible by necessity” 
Fernando Perez 
http://blog.fperez.org/2013/11/an-ambitious-experiment-in-data-science.html
Better at collaborating 
because they have to be
Open = Public? 
(doesn’t have to mean this)
‘Open Source’ way of 
working
Open (within your team, 
department or institution)
Electronic & Available
Asynchronous, exposed process
Lock-free
Low friction collaboration
Academia can learn from 
open source
Academia must learn from 
open source
What’s happening in academia 
today?
Collaboration around code
Collaborative authoring
Collaborative teaching
Where might more significant 
change happen?
Where do communities form?
Around a shared challenge?
Around shared data?
10n ? 
Level 1 (continual) 
Level 2 (periodic)
Informatics and Statistics 
Active Galactic Nuclei Solar System 
Dark Energy (DESC) 
Stars, Milky Way 
Strong lensing 
Transients/variable stars 
Galaxies 
Large-scale structure 
Supernovae Weak lensing
Software composed of many 
components
Your software should be the 
thing that is different
science too! 
Your software should be the 
thing that is different
Scientific data is becoming 
more open
http://www.nature.com/news/2011/111005/full/478026a.html
How do we make this 
behaviour the norm?
Credit
“Academic environments of 
today do not reward tool 
builders” 
Ed Lazowska, OSTP event 
http://lazowska.cs.washington.edu/MS/MS.OSTP.pdf
“publishing a paper about 
code is basically just 
advertising” 
David Donoho 
http://www.stanford.edu/~vcs/Video.html
How to derive meaningful 
metrics from open 
contributions?
Trust
Discoverability
Barriers are cultural, not 
technical
Why should we care?
Because we paid for it?
Because open=good?
Because care about the 
creation of knowledge?
Open source has solved much 
of what academia needs
Our challenge is to adapt and 
evolve the academy in this 
new collaborative age
Thanks 
arfon@github.com 
@arfon 
"

What Academia Can Learn from Open Source