Reproducible Open Notebook Science
This past weekend I gave an invited talk at the Harley Wood Winter School in Collaroy, New South Wales, AUS. It was an excellent conference at a beautiful location, and definitely a treat to be asked to speak about scientific computing, the future of reproducible open science.
Here’s the abstract from my talk:
Full-stack science workflow += the IPython notebook
My talk will range from setting up a bashrc, to how you spend your time on a day to day basis, to the ultimate goal of clearly communicating reproducible scientific results. I’ll have many examples of common pitfalls to avoid and a few tactics that can get you series of small wins in the battle that we call research. Finally, I will demonstrate the IPython notebook which I think will become a game changer for sharing reproducible science. I will make my slides and my code publically available after the talk.
I was a bit nervous because the talk was going to include me doing interactive coding and demonstrations — which is always dangerous — but I think that it ended up going rather smoothly. As promised, I am making my slides and random examples available. It also gave me the opportunity to talk about where I hope to see science heading into the future. Reproducible code, shared and open data and notebooks. The ability to reproduce the exact plots in a paper is now easily upon us, and we should strive to have this be the standard going forward.
It was also my last talk that I will give as an astrophysicist because I’m starting the Insight Data Science Fellowship program in September! I’m very excited about that, and I also wanted to share with the audience about so-called ‘Plan B’ careers trajectories out of academia.
First my (interactive) slides
My first set of slides — I tried to export into a reveal.js slideshow, but I failed, so it’s one long (downloadable) IPython notebook.
Second set of slides which includes the Bayesian Blocks example from the AstroML: Machine Learning and Data Mining for Astronomy library.
Example of the future of science
A possible flow of events
- Read an interesting arxiv paper
- discussion with advisor
- download the IPython notebook code (from github) + data; play with it.
- Play with it in your own IPython notebook.
- Paste your final notebook (text file) into a github gist
- Paste link into the IPython notebook viewer (remember to remove the .git at the end of the URL).
- Now you have an nbviewer URL.
You can now email that link to anyone in the world who has a browser. No python, no IPython, nothing needs to be installed. The barrier to sharing the analysis here is about as close to zero as we can get.
Workflow
I repeatedly tried to make the case to think about your workflow — the more often you do an action, the more you should think about optimizing it.
- A couple of useful .bashrc commands to make life easier: bashrc
- This includes the save function which allows you to simply
cd example
and return to the saved directory (stored for future use as well). - Sublime Text — A text editor worth getting to know (Available OS X/Linux/Windows)
- How to get LaTeX installed — excellent blog post.
- And this blog post as well.
- Finally, but most importantly, this series of screencasts of how to effectively use Sublime Text. Worth watching all the way through once, using Sublime Text for ~ 1 month, then rewatching.
- Divvy — keyboard shortcut call up a window and resize to custom sizes (Available for OS X/Windows).
Python
Currently recommenidng getting python
, IPython, IPython notebook through the Anaconda installation method.
A list of a few python tutorials:
IPython notebook links
- Notebook Viewer
- The IPython notebook is moving to the Project Jupyter in the near future. Don’t worry this is the same old IPython notebook thing, but it’s rebranding because it now supports R, Julia and other languages.
Bash
Besides the few lines I recommend adding to your .bashrc
above, these are a couple of handy snippets that can be used in bash:
# iterate over numbers from 1 to 12
for index in {1..12}
do
echo example.$index.name
done
# iterate over all command line arguments
# Save this in a file called demo.bash then run the command
# bash demo.bash hi this works
# to see what happens.
for name in $@
do
echo $name
done
Recommended actions
- Get a github account (it’s free)
- Do the excellent short git tutorial to start to get a feel for git.
- Github Gists is where you are going to copy and paste the text from your IPython notebook that you’d like to share. Remember gists are updatable, and the nbviewer link will show the most recent version (after a few minute delay).
- Get a stackoverflow account (it’s free).
- Learn about Data Drive Documents
- Created by: Mike Bostock (with tons of amazing examples).
- My example: My Life Trajectory (so far) based off of this example
Let me know if you saw the talk and what you thought of it! Or if I forgot to put a link to something that I mentioned.
Comments on this entry are closed.
Hi Jonathan,
Just wanted to thank you again for talking at the HWWS. I found the talk very useful (all the extra bash tricks will come in handy!) and I circulated your website with the talk outline within the Sydney Institute for Astronomy (SIfA). Good luck in Silicon Valley!
Thanks Joe, I'm glad you found it useful! I had a great time and it was nice to have the opportunity to organize my thoughts (somewhat) on a few of these topics.
Just wanted to thank you again for talking at the HWWS. I found the talk very useful (all the extra bash tricks will come in handy!) and I circulated your website with the talk outline within the Sydney Institute for Astronomy (SIfA). Good luck in Silicon Valley!