Enhance Your Productivity and Software Quality with Techniques from Silicon Valley
Benjamin S. Skrainka University College London Institute for Fiscal Studies
b.skrainka@ucl.ac.uk
February 23, 2011
The Big Picture
Whether you like it or not you are a software engineer:
Much wisdom we can learn from Silicon Valley Much technology we can exploit About increasing your productivity About reproducible results (scientic method, getting sued)
much of the cost of software is maintenance!
Good Code
Good code is:
Easy to maintain Easy to extend Easy to understand ... even after a six month break! Straight-forward and direct ... no side-eects or surprises! Reads like English (or some other human language)
Some Questions
Before writing a line of code, ask yourself:
What will this code be used for? How often will it be used? How might it evolve? How can I isolate myself from possible changes, such as using a dierent solver? What part of this code is generic and what part problem-specic? i.e,
What can I reuse? What should I abstract into a library?
Roadmap
Tactical Programming Designing Better Software Debugging and Optimization Software Development Tools
Goals of Tactical Programming
Tactics are about structuring your code so that:
Easier to read Easier to detect bugs Easier to understand Easier to extend i.e., to minimize the costs of working with your code
increased productivity for free!!!
Use A Coding Convention
A good coding convention makes your code read like a good story and makes your intent clear:
Naming of functions, variables, and lenames Grouping and layout of code such as braces Modication history Comments Respect the local coding convention when working on code
Choose a convention and stick to it!
Structure Your Code
Group logical chunks of code together:
Separate larger blocks with comments
Create horizontal lines of -, =, etc. to indicate higher-level groupings Just like books are organized into chapters, sections, subsections, etc. Use vertical space (blank lines) to set o lower-level chunks of code Put space around operators =, +, -, *, / and inside of {}, (), and [] Choose a sensible indentation scheme, such as two spaces Beware of tabs ...
Use white space:
Anything longer than 1-2 screen-fulls of code should be a separate function
Choose Good Names
Choose names which describe the role of a function or variable:
Separate multiple words with CamelCase or _ Function names should start or end with a verb: CalcMarketShares() Encode type information into variable names: oat, int, matrix, vector, etc. One variable denition per line + a comment Start indexes with ix: ixStart, ixStop One p for each level of pointer indirection
Bad Names: p, x, y, n, i, j, k, l, jfunc1 Good Names: dwPriceFood, dwExcessDemand, dwIncome, nGoods, vProb, IntegrateMarketShares(), IsValid(), ix, jx, kx, pHHData
Braces
There are two main styles for braces: 1TBS/K+R/etc. if( IsBadState() ) { fixProblem() ; } Allman/GNU/etc. if( IsBadState() ) { fixProblem() ; }
Write Comments
Comments are important:
History of changes Why you did something, not what you did Explain anything tricky you wont remember why you did something next month... Use comments and white space to convey logical structure of code on small, medium, and large scales Start any le with a short one line comment explaining purpose of module Document function interfaces and any quirks
One Place Only
Strive to minimize duplication:
Are you writing code with cut and paste? abstract it into a function ... Use constants whenever possible:
Dene all numbers and constants in only one place Dene indexes (with good names) for dierent columns or rows in a matrix Make arguments const when only used for input No hard-coded numbers!!! macros templates
Automate what you can:
When you have to make changes, it is easier if you only have to modify it in one place!
Order of Operations
Dont abuse order of operations:
Only use order of operations for +, -, /, * For everything else, use parentheses! Avoid clever tricks and side-eects
MATLAB Tricks
Here are a couple tricks to improve your MATLAB code:
Use cells by commenting the start of a section with %%:
Group a logically-related block of code Rerun the cell with CTRL + RETURN
Handle errors with keyboard Store column indexes in a structure: Index.Price, Index.Income, ... Wrap related variables into a structure: ChoiceData.X = mCovariates ; ChoiceData.Y = vChoices ; ChoiceData.nObs = length( vChoices ) ;
How to Design Software
Much of good software design is based on:
Planning ahead for maintenance (one of the biggest costs of most projects) and future extensions Writing testable code Choosing good abstractions Designing good interfaces
What to Worry About
Questions to ponder:
Where will my code run? What technologies does it depend on? How is it likely to change? How will it be used? How often will it be used?
Write a design document!!! You dont have time not to plan...
Trade-os
You need to evaluate many trade-os:
Speed vs. robustness Speed vs. memory usage Speed vs. maintainability (e.g. fast code may require unreadable optimizations) Development time vs. code quality (performance, maintainability, reusability) Quality vs. frequency of use
Interfaces
An interface is a contract:
Clear and easy to remember Use the same interface for similar objects/operations Promotes loose coupling and reuse Minimizes maintenance headaches by isolating implementation from interface Publish the interface in a header le:
Separate from the implementation le Protect with include guards if using C preprocessor May need second header le for private information
Only a few arguments put any more in a struct
Practice Information Hiding
Hiding information and implementation make your code more robust:
Put only the minimum amount of information in the public name space Make everything else private or static Prevents unintentional access Now changing implementation details wont break other code Encapsulate state information in a struct, not a global if possible Avoid global variables!!! They often lead to race conditions...
Reusable Code
Write reusable code:
Collect general tools and components into a common library Reuse for faster development of other projects Decrease bugs through use of production code
Corollary: reuse (high quality) existing software libraries and components dont reinvent the wheel
Defensive Programming I
Write code to facilitate debugging:
Modularize functionality E.g., access shared resources or special facilities only through one library: splineLib, splineCreate, splineEval, splineDelete, ... If a bug occurs then it is:
1. In the library 2. Use of the library
Defensive Programming II
Isolate your code from things which might change:
Third party software: MPI, solvers, libraries Platform-specic technologies: OS-specic APIs Buggy code by co-workers (software condom)
I.e., write a thin layer between your code and volatile resources
Test Driven Development
TDD uses unit tests and a tight write-test-debug cycle to catch bugs early:
Unit tests are short pieces of code which exercise all (or the key) paths through a function
The sooner you nd a bug, the cheaper/easier it is to x Immediately program to an interface to verify design decisions Catch bugs caused by other changes to system
Many popular unit test frame works are available: junit, cunit, boost::test, etc. Interpreted languages provide a similar productivity boost by letting you test code interactively as you develop it. TDD is a philosophy for software development Refactor code which is unwieldy
Debugging
Unfortunately, you will make mistakes:
Learn to use the debugger Dont sprinkle your code with printf, WRITE, etc.:
Obscures code readability I/O slows code considerably Message logging to les Print messages to screen in debug version only
Add diagnostic logging to large applications
Debugging
Use the C : to facilitate debugging (even in FORTRAN): #ifdef USE_DIAG #define DIAG_PRINT #else #define DIAG_PRINT #endif PRINT *, !
Must use correct compiler ags: -fpp -allow no_fppcomments
Optimization
Your intuition about what needs optimization is often wrong:
First, get your code to work correctly Then optimize:
Measure code with a proler Optimize what needs optimizing
MATLAB has a built-in optimizer For gcc, use gperf
Vectorization
Write loops which support vectorization (unrolling):
Use:
Straight-line code Vector (array) data only Local variables Assignment statements only Pre-dened (constant) exit condition Function calls Non-mathematical operations (which are dicult to vectorize) Mixing vectorizable types Memory access patterns which prevent vectorization i.e. where one statement access future and/or previous array elements
Avoid:
Make
Make manages building software:
Checks dependencies Builds only what is necessary Allows abstraction of build process:
Tools Options Platform specic details
Promotes portability
Editor and OS
Invest in your tools:
Choose your editor with more care than you would your spouse because you will spend more time with your editor, even after the spouse is gone. Harry J. Paarsch
Learn to use a good programming editor: Vi, Emacs, jEdit, Notepad++, Eclipse, etc. Will increase your productivity
Same applies to your OS get some Unix in your life! etags, cscope, ctree, etc. make it easy to explore code Eclipse, MS Visual Studio have powerful tools as well
Version Control
Version Control is a safety net for programmers:
Manages every version of your code Supports distributed software development Supports multiple developers Keeps everything synchronized Automatically merges dierent changes to the same code Common examples: SVN, hg, git, ClearCase, Perforce, ... Much better than DropBox...
Unix and Windows clients are available
Create a Repository
The rst step is to create a repository to store all versions of your A source code (C, FORTRAN, MATLAB, R, LTEX, etc.):
Should be accessible from all computers which will access your code
A machine you can access via SSH which is running SVN A commercial repository hosting service (sometimes free) such as www.ProjectLocker.com or github.com
Example (WARNING: always use fsfs): ssh joe@svn.econ.somewhere.edu mkdir SVN mkdir SVN/ThesisCode svnadmin create /home/joe/SVN/ThesisCode --fs-type fsfs ls -F SVN/ThesisCode README.txt conf/ dav/ db/ format hooks/ locks/
Getting Started
SVN provides two commands:
svnadmin: to create and administer repositories svn: to perform version control operations (checkout, commit, di, etc.) Execute a command with help, e.g.: svn --help svn commit --help
There are several ways to get help:
Use the man command: man svn
Google Red Bean SVN book for details
Can congure in ~/.subversion/config
Import Your Code
If you have existing code, you need to import it:
svn import -m "Descriptive Message About Your Work" \ /local/path/to/ThesisCode \ svn+ssh://joe@svn.econ.somewhere.edu/home/joe/SVN/Thesis
Your code is now under version control Run this on the machine which hosts your code
Checkout Your Code
To work on your code on a computer, you must rst check it out:
cd mkdir sbox cd sbox svn checkout \ svn+ssh://joe@svn.econ.somewhere.edu/home/joe/SVN/Thesis Note: svn+ssh is just one example of the type of URLs supported by SVN to refer to a location.
Get to Work
In the course of your work you will use the following commands:
svn commit svn update svn add [ le | directory ] svn mkdir dir svn rm [-fr] FileOrDir svn diff -r PREV BasicDriver.c svn log BasicDriver.c
More advanced operations include branches and tags....