≡ Menu

I am now an O’Reilly Author!

My O’Reilly screencast course is now available: Jupyter Notebook for Data Science Teams! I’ve been working on this project for many months, and I’m so happy that it has finally completed.

Jupyter notebook data science usage tips

You might have seen me give an interactive talk on some tips and tricks in how to use the Jupyter notebook (either at OSCON 2015, at UC Berkeley’s Master of Information and Data Science program, Insight Data Science or elsewhere). These talks focused on the exploratory data science and general Jupyter notebook usage patterns. This screencasting course has some of how to use the Jupyter notebook efficiently and tips for using a bunch of useful extensions. However, there’s also a lot about using a Jupyter notebook with other data scientists collaboratively/version control. There’s also some focus on sharing notebooks w/ project managers or others who might not have python installed.

Please let me know if you end up getting Jupyter Notebook for Data Science Teams or recommending it to your companies for internal training! I’d love to hear what you think of it!


2015 Year in Review

Another post to reflect on all that’s happened in the past year! I’m beginning writing this post while on a flight to Maui for my honeymoon with my lovely wife sleeping next to me. What a difference a year makes!

Some of my major life events:

Books read that I’d recommend — with Amazon affiliate links


  • The Martian by Andy Weir
    • A well-paced and excellent story + lots of science. The audiobook edition is great.
  • The City & The City by China Mieville
    • Recommended to me by Eric White. A very different but imaginative detective story.


  • Diplomacy by Henry Kissenger
    • A serious read. Such an interesting take on the history of diplomacy in European history right up through the early 90s. The twists and turns of Kissenger’s read on Americans from before WWI through today is very thought provoking.
  • Islam and the Future of Tolerance by Sam Harris and Majid Nawaz
    • An important discussion that needs to be had more frequently and more publicly.
  • Superforecasting by Philip Tetlock and Dan Gardner
    • Fascinating read on how everyday people can become much better at predicting future events (think questions like: will Assad give up power by April 3, 2016). I wish it was a bit more how-to in the end.
  • The $12 Million Stuffed Shark by Don Thompson
    • A good read with the next book about the high-end art world. It’s almost too incredible to believe.
  • Seven Days in the Art World Kindle Edition by Sarah Thornton
    • Same as above. If you’re interested in how the art world works, in broad strokes, read these two books together.

Places that I visited

  • Dallas, TX, USA — (family/friends)
  • Mountain View, CA, USA — where I live and work
  • Portland, OR, USA — first consulting trip
  • Amsterdam, NLD — (week in Amsterdam to watch Julija get her PhD cum laude and meet her family!)
  • Austin, TX, USA — SciPy 2015 conference (poster)
  • Portland, OR, USA — OSCON 2015 conference (talk)
  • San Francisco, CA, USA — Got married!
  • Yosemite, CA, USA — Upper Yosemite falls is quite the hike.
  • Maui, HI, USA & Kauai, HI, USA — Honeymoon!

Side projects

Looking forward to what the next year will have in store! The initial plans call for my first trip to Lithuania!


I just wanted to post the guts of a script that Colin Higgins (fellow Data Scientist at SVDS) wrote.

# step1
wget https://repo.continuum.io/miniconda/Miniconda-latest-MacOSX-x86_64.sh

# step2
chmod +x Miniconda-latest-MacOSX-x86_64.sh

# step3 -- have to type spacebar and "yes"

# step4
source ~/.bashrc

# step5
conda update conda -y

# step6
conda create -y -n anaconda_r -c r r-irkernel r-recommended r-essentials anaconda

Now, switch into the anaconda_r environment (which will prepend your PATH in that one terminal ONLY) with:

source activate anaconda_r

and install extra packages like so:

conda install -c r rpy2 -y

This made it so that both the R kernel, python kernel, and the rpy2 package were all working in the same environment (my previous blog post was a temporary stop-gap that couldn’t get there).

{ 1 comment }

Here is an excellent talk by Michael Manapat at the PyData Seattle 2015 conference. I wish that this style of talk — of really digging deep with specific examples — becomes more common!

Michael Manapat: Counterfactual evaluation of machine learning models

The slides can be found here, and the paper that it’s partially based on is here.


Jupyter Notebook Best Practices for Data Science

I gave a talk on Friday (July 24) at the 2015 OSCON in Portland, OR. My topic was on the IPython (Jupyter) Notebook for Data Science, and it highlighted a number of challenges that come from needing to organize a data science workflow — especially in the context of working on a team of data scientists.

The video of my talk (not available just yet) is below:

I had a great time and I hope people find it useful. The github repository for my talk.


2014 in Review

Berlin Thanks to everyone who helped make this past year great – I’ve been incredibly fortunate to have people who have helped support me in all of my adventures. Below are a few highlights from 2014!

Some of my major life events:

  • Finished my 3 year postdoc under Michael Murphy at Swinburne University of Technology.
  • Published a paper with Michael that was the culmination of years of work. The pdf is here if curious.
  • Moved from Australia to the San Francisco Bay Area.
  • Brought Julija home to meet the parents over Thanksgiving.
  • Completed the Insight Data Science program.
  • Started at SVDS as a Data Scientist!

Places that I visited (and spent at least two nights this year)

  • Dallas, TX, USA (family/friends)
  • Washington, D.C., USA (AAS)
  • Phoenix, AZ, USA (visit Stephanie/Kelsey)
  • San Diego, CA, USA (talk at UCSD)
  • Melbourne, AUS (postdoc life, Marc visited!)
  • Hobart, AUS (Dave!)
  • Paris, FRA (week in Paris)
  • Amsterdam, NLD (week in Amsterdam –Julija!)
  • Zurich, CHE (Kern!)
  • Glasgow, GBR (IMAX Glasgow)
  • Cambridge, GBR (talk at Cambridge)
  • Berlin, DEU (photos)
  • Potsdam, DEU (visit at Potsdam University)
  • Sydney, AUS (Harley Wood Winter School)
  • Palo Alto, CA, USA (Insight Data Science)
  • Dallas, TX, USA (family/friends)
  • Mountain View, CA, USA (started work at SVDS)
  • Dallas, TX, USA (family/friends)

Fun final list:

Seasons experienced this year (in order)

  • Winter
  • Summer
  • Autumn
  • Spring
  • Summer
  • Winter
  • Summer
  • Autumn
  • Winter

One chapter closes; a new chapter opens

As of this week, I am officially no longer an astrophysicist. I start my next career as a data scientist in about a month. It’s been a fantastic experience for me both personally and professionally. Michael Murphy was an incredibly patient and encouraging boss from whom I learned more than I hoped.

Coming into this job I hoped to learn a ton, and I have, but many opportunities and experiences were completely unexpected. From observing at observatories like ESO’s VLT in Chile and Keck in Hawaii, to being in an IMAX film. Finally, the amazing amount of travel that my position granted me was life altering.

This is not my big farewell post, as I still have a week and a bit in Australia, but I had trouble sleeping last night so I decided to make a fun D3 map of all of the flights that I’ve taken during my time as a postdoc — starting with the flights from DFW (Dallas-Forth Worth) to LAX (Los Angeles) to MEL (Melbourne) in August 2011!

The code that I used to make this is available at this link.

Ok future, let’s see where we go from here.


2014 Harley Wood Winter School Invited Talk

Reproducible Open Notebook Science

This past weekend I gave an invited talk at the Harley Wood Winter School in Collaroy, New South Wales, AUS. It was an excellent conference at a beautiful location, and definitely a treat to be asked to speak about scientific computing, the future of reproducible open science.

Here’s the abstract from my talk:

Full-stack science workflow += the IPython notebook

My talk will range from setting up a bashrc, to how you spend your time on a day to day basis, to the ultimate goal of clearly communicating reproducible scientific results. I’ll have many examples of common pitfalls to avoid and a few tactics that can get you series of small wins in the battle that we call research. Finally, I will demonstrate the IPython notebook which I think will become a game changer for sharing reproducible science. I will make my slides and my code publically available after the talk.

I was a bit nervous because the talk was going to include me doing interactive coding and demonstrations — which is always dangerous — but I think that it ended up going rather smoothly. As promised, I am making my slides and random examples available. It also gave me the opportunity to talk about where I hope to see science heading into the future. Reproducible code, shared and open data and notebooks. The ability to reproduce the exact plots in a paper is now easily upon us, and we should strive to have this be the standard going forward.

It was also my last talk that I will give as an astrophysicist because I’m starting the Insight Data Science Fellowship program in September! I’m very excited about that, and I also wanted to share with the audience about so-called ‘Plan B’ careers trajectories out of academia.

First my (interactive) slides

My first set of slides — I tried to export into a reveal.js slideshow, but I failed, so it’s one long (downloadable) IPython notebook.

Second set of slides which includes the Bayesian Blocks example from the AstroML: Machine Learning and Data Mining for Astronomy library.

Example of the future of science

A possible flow of events

You can now email that link to anyone in the world who has a browser. No python, no IPython, nothing needs to be installed. The barrier to sharing the analysis here is about as close to zero as we can get.


I repeatedly tried to make the case to think about your workflow — the more often you do an action, the more you should think about optimizing it.

  • A couple of useful .bashrc commands to make life easier: bashrc
  • This includes the save function which allows you to simply cd example and return to the saved directory (stored for future use as well).
  • Sublime Text — A text editor worth getting to know (Available OS X/Linux/Windows)
    • How to get LaTeX installed — excellent blog post.
    • And this blog post as well.
    • Finally, but most importantly, this series of screencasts of how to effectively use Sublime Text. Worth watching all the way through once, using Sublime Text for ~ 1 month, then rewatching.
  • Divvy — keyboard shortcut call up a window and resize to custom sizes (Available for OS X/Windows).


Currently recommenidng getting python, IPython, IPython notebook through the Anaconda installation method.

A list of a few python tutorials:

IPython notebook links

  • Notebook Viewer
  • The IPython notebook is moving to the Project Jupyter in the near future. Don’t worry this is the same old IPython notebook thing, but it’s rebranding because it now supports R, Julia and other languages.


Besides the few lines I recommend adding to your .bashrc above, these are a couple of handy snippets that can be used in bash:

# iterate over numbers from 1 to 12
for index in {1..12}
    echo example.$index.name

# iterate over all command line arguments
# Save this in a file called demo.bash then run the command
# bash demo.bash hi this works
# to see what happens.
for name in $@
    echo $name

Let me know if you saw the talk and what you thought of it! Or if I forgot to put a link to something that I mentioned.


ipython notebook tips and tricks for science research

David Lagattuta and I gave a seminar at CAS about using python, ipython and the ipython notebook. At the end of it we made 3 of our ipython notebooks available to all.

These notebooks have embedded in them examples of images, text, LaTeX, and even embedded YouTube screencasts that explain aspects of the notebook within the notebook itself.

  1. ipython notebook: Future of Sciencehttp://nbviewer.ipython.org/5742826
  2. Clean Code; Clear Codehttp://nbviewer.ipython.org/5742829
  3. The Future of Science — EXTRAShttp://nbviewer.ipython.org/5742830

Here’s a screencast that we embedded which shows some neat ipython notebook features:

As always, I’m looking to improve my coding (and presentation) skills, so constructive criticism is requested.



Hidden Universe IMAX 3D

I’ve been involved (http://hiddenuniversemovie.com/the-film/the-astronomers/) with the production of a 3D IMAX film called Hidden Universe.

The locations it’ll be showing (so far) are here: http://hiddenuniversemovie.com/theatre-locations/

Some photos that I took of the filming in Chile (back in November 2012).

2012-11 IMAX


Enhanced by Zemanta