Here is an excellent talk by Michael Manapat at the PyData Seattle 2015 conference. I wish that this style of talk — of really digging deep with specific examples — becomes more common!

Michael Manapat: Counterfactual evaluation of machine learning models

The slides can be found here, and the paper that it’s partially based on is here.


Jupyter Notebook Best Practices for Data Science

I gave a talk on Friday (July 24) at the 2015 OSCON in Portland, OR. My topic was on the IPython (Jupyter) Notebook for Data Science, and it highlighted a number of challenges that come from needing to organize a data science workflow — especially in the context of working on a team of data scientists.

The video of my talk (not available just yet) is below:

I had a great time and I hope people find it useful. The github repository for my talk.


2014 in Review

Berlin Thanks to everyone who helped make this past year great – I’ve been incredibly fortunate to have people who have helped support me in all of my adventures. Below are a few highlights from 2014!

Some of my major life events:

  • Finished my 3 year postdoc under Michael Murphy at Swinburne University of Technology.
  • Published a paper with Michael that was the culmination of years of work. The pdf is here if curious.
  • Moved from Australia to the San Francisco Bay Area.
  • Brought Julija home to meet the parents over Thanksgiving.
  • Completed the Insight Data Science program.
  • Started at SVDS as a Data Scientist!

Places that I visited (and spent at least two nights this year)

  • Dallas, TX, USA (family/friends)
  • Washington, D.C., USA (AAS)
  • Phoenix, AZ, USA (visit Stephanie/Kelsey)
  • San Diego, CA, USA (talk at UCSD)
  • Melbourne, AUS (postdoc life, Marc visited!)
  • Hobart, AUS (Dave!)
  • Paris, FRA (week in Paris)
  • Amsterdam, NLD (week in Amsterdam –Julija!)
  • Zurich, CHE (Kern!)
  • Glasgow, GBR (IMAX Glasgow)
  • Cambridge, GBR (talk at Cambridge)
  • Berlin, DEU (photos)
  • Potsdam, DEU (visit at Potsdam University)
  • Sydney, AUS (Harley Wood Winter School)
  • Palo Alto, CA, USA (Insight Data Science)
  • Dallas, TX, USA (family/friends)
  • Mountain View, CA, USA (started work at SVDS)
  • Dallas, TX, USA (family/friends)

Fun final list:

Seasons experienced this year (in order)

  • Winter
  • Summer
  • Autumn
  • Spring
  • Summer
  • Winter
  • Summer
  • Autumn
  • Winter


One chapter closes; a new chapter opens

As of this week, I am officially no longer an astrophysicist. I start my next career as a data scientist in about a month. It’s been a fantastic experience for me both personally and professionally. Michael Murphy was an incredibly patient and encouraging boss from whom I learned more than I hoped.

Coming into this job I hoped to learn a ton, and I have, but many opportunities and experiences were completely unexpected. From observing at observatories like ESO’s VLT in Chile and Keck in Hawaii, to being in an IMAX film. Finally, the amazing amount of travel that my position granted me was life altering.

This is not my big farewell post, as I still have a week and a bit in Australia, but I had trouble sleeping last night so I decided to make a fun D3 map of all of the flights that I’ve taken during my time as a postdoc — starting with the flights from DFW (Dallas-Forth Worth) to LAX (Los Angeles) to MEL (Melbourne) in August 2011!

The code that I used to make this is available at this link.

Ok future, let’s see where we go from here.


2014 Harley Wood Winter School Invited Talk

Reproducible Open Notebook Science

This past weekend I gave an invited talk at the Harley Wood Winter School in Collaroy, New South Wales, AUS. It was an excellent conference at a beautiful location, and definitely a treat to be asked to speak about scientific computing, the future of reproducible open science.

Here’s the abstract from my talk:

Full-stack science workflow += the IPython notebook

My talk will range from setting up a bashrc, to how you spend your time on a day to day basis, to the ultimate goal of clearly communicating reproducible scientific results. I’ll have many examples of common pitfalls to avoid and a few tactics that can get you series of small wins in the battle that we call research. Finally, I will demonstrate the IPython notebook which I think will become a game changer for sharing reproducible science. I will make my slides and my code publically available after the talk.

I was a bit nervous because the talk was going to include me doing interactive coding and demonstrations — which is always dangerous — but I think that it ended up going rather smoothly. As promised, I am making my slides and random examples available. It also gave me the opportunity to talk about where I hope to see science heading into the future. Reproducible code, shared and open data and notebooks. The ability to reproduce the exact plots in a paper is now easily upon us, and we should strive to have this be the standard going forward.

It was also my last talk that I will give as an astrophysicist because I’m starting the Insight Data Science Fellowship program in September! I’m very excited about that, and I also wanted to share with the audience about so-called ‘Plan B’ careers trajectories out of academia.

First my (interactive) slides

My first set of slides — I tried to export into a reveal.js slideshow, but I failed, so it’s one long (downloadable) IPython notebook.

Second set of slides which includes the Bayesian Blocks example from the AstroML: Machine Learning and Data Mining for Astronomy library.

Example of the future of science

A possible flow of events

You can now email that link to anyone in the world who has a browser. No python, no IPython, nothing needs to be installed. The barrier to sharing the analysis here is about as close to zero as we can get.


I repeatedly tried to make the case to think about your workflow — the more often you do an action, the more you should think about optimizing it.

  • A couple of useful .bashrc commands to make life easier: bashrc
  • This includes the save function which allows you to simply cd example and return to the saved directory (stored for future use as well).
  • Sublime Text — A text editor worth getting to know (Available OS X/Linux/Windows)
    • How to get LaTeX installed — excellent blog post.
    • And this blog post as well.
    • Finally, but most importantly, this series of screencasts of how to effectively use Sublime Text. Worth watching all the way through once, using Sublime Text for ~ 1 month, then rewatching.
  • Divvy — keyboard shortcut call up a window and resize to custom sizes (Available for OS X/Windows).


Currently recommenidng getting python, IPython, IPython notebook through the Anaconda installation method.

A list of a few python tutorials:

IPython notebook links

  • Notebook Viewer
  • The IPython notebook is moving to the Project Jupyter in the near future. Don’t worry this is the same old IPython notebook thing, but it’s rebranding because it now supports R, Julia and other languages.


Besides the few lines I recommend adding to your .bashrc above, these are a couple of handy snippets that can be used in bash:

# iterate over numbers from 1 to 12
for index in {1..12}
    echo example.$

# iterate over all command line arguments
# Save this in a file called demo.bash then run the command
# bash demo.bash hi this works
# to see what happens.
for name in $@
    echo $name

Let me know if you saw the talk and what you thought of it! Or if I forgot to put a link to something that I mentioned.


ipython notebook tips and tricks for science research

David Lagattuta and I gave a seminar at CAS about using python, ipython and the ipython notebook. At the end of it we made 3 of our ipython notebooks available to all.

These notebooks have embedded in them examples of images, text, LaTeX, and even embedded YouTube screencasts that explain aspects of the notebook within the notebook itself.

  1. ipython notebook: Future of Science
  2. Clean Code; Clear Code
  3. The Future of Science — EXTRAS

Here’s a screencast that we embedded which shows some neat ipython notebook features:

As always, I’m looking to improve my coding (and presentation) skills, so constructive criticism is requested.



Hidden Universe IMAX 3D

I’ve been involved ( with the production of a 3D IMAX film called Hidden Universe.

The locations it’ll be showing (so far) are here:

Some photos that I took of the filming in Chile (back in November 2012).

2012-11 IMAX


Enhanced by Zemanta


Eating differently

I’ve been inspired to write up a bit about what I have been eating. The long and the short of it is that I’m basically following the “Paleo”/”Primal” diet (since July 2012), for health reasons (as in my goal was not for weight loss).

It breaks down into two lists: the “allowed/encouraged” and “not allowed”. Here’s how I play it:


  • fish
  • meat
  • fowl
  • eggs
  • bacon
  • avocados
  • greens (love me some spinach!)
  • fruits
  • vegetables (bell peppers, zucchini, onions, etc.)
  • nuts (almonds, walnuts, pine nuts)
  • oil (olive oil, avocado oils, coconut oil)
  • vinegars (balsamic, etc.)
  • tea


  • no grains (flour, wheat, cereals, etc.)
  • no potatoes
  • no legumes (peanuts, soy)
  • no beans
  • no corn
  • no refined sugar
  • semi-restricted dairy (butter and cheese are fine for me)

A small rant about the name. Paleo is short for paleolithic, as in the time period. It is also referred to as “caveman” diet and probably a few other names. I dislike this name because I find it misleading and unhelpful. The reason I do or do not eat the above things is that I’ve found it to be healthy to eat (or avoid) the above lists of food. That’s it. Not for any other reason that’s justified because “we evolved to eat this way” or other such claims. I do suspect that there’s something to the idea that grains (not just gluten) can cause reactions in some people like an inflammation response.

I don’t want to go into the details, but I just want to say that I’ve found that following these rules has been beneficial to me. As a scientist, I insist that my anecdote isn’t conclusive evidence, but it might lead to some interesting future experiments.

Also note that I decided to follow this diet after watching the following TED talk and reading a number of resources (which didn’t all agree). I decided that at the worst the diet was healthy (and likely healthier than I was eating at the time). At best, it could improve my health.

Free resources


TEDxIowaCity Dr. Terry Wahls

Your mileage may vary/buyer beware/consult doctor before changing stuff.

Enhanced by Zemanta


Data Reduction of VLT-UVES

The end goal will be to run UVES_headsort to correctly setup the reduction scripts to reduce science exposures. Let’s look at reducing the VLT-UVES science exposures for a single object: vesta.

Start with a bunch of raw UVES data files in a folder (let’s call it raw) that are named things like:

In this mess of files, there exist all manner of calibration and science exposures for the spectrograph in different settings. It’s a mess, so the first order of business is to organize this into something that can be used. It isn’t pretty, but here’s how I do this first step:

# I'm going to make this into a proper script one day, but here it is as it currently stands
# Replace OBJECTNAME with the object name in the exposures
ls /raw/UVES.2010*.fits > raw.list.all.1
dfits `cat raw.list.all.1` | fitsort DPR.TYPE DPR.CATG EXPTIME OBJECT INS.MODE DPR.TECH | \
grep -v "SLIT" | grep -v "CDALIGN" | grep -v "SimCal" | grep -v "DARK" | grep -v "DFLAT" | grep -v "TEST" | grep -v "FIBRE" | \
awk 'NR > 1 {if ($3!="SCIENCE" || ($3=="SCIENCE" && $4>0.1)) print $0}' | \
awk '{if ($4>=300.0 || $3!="SCIENCE" || ($4<300.0 && $3=="SCIENCE" && ($5!="OBJECTNAME")) || (($3=="CALIB" || $3=="TEST") && ($6!="ECHELLE,ABSORPTION-CELL" || $7!="ECHELLE,ABSORPTION-CELL"))) print $0}' | \
awk '{print $1}' > raw.list.1
grep -f raw.list.1 -v raw.list.all.1 > raw.list.exclude.1

What should happen at this point is a file is made: raw.list.1 which has all of the science and their associated files that need to be reduced. An important point here — you have to use the full path names.

(I’ve never used the script at this stage, so you can try it (let me know how it goes), or just use the following command).

$ UVES_headsort raw.list.1 -info -list -c 45 45 -d

If the object name is correctly filled out in the science exposures (a genuine “if”), you’ll have a file named vesta.list which looks like:

$ head vesta.list 

I’ve written a really simple python script that helps set up the reduction scripts — it’s not well-commented but it’s also not too complicated.

# 2011-11-16
import sys
import os
import subprocess
import argparse
parser = argparse.ArgumentParser(description="filename")
parser.add_argument('infile', action='store', type=str, help='grabs commandline filename')
args = parser.parse_args()
stem = args.infile
if args.infile.split('.')[-1] == 'list':
  stem = args.infile.split('.')[0]
basecommand = 'UVES_headsort ' +  stem + '.list -info ' + stem + '.info -c '
hours = 0
warning = 10
maxhours = 200
while warning > 5 and hours < maxhours:
  hours += 1
  linein = basecommand + str(hours) + ' ' + str(hours) + ' -d'
  proc = subprocess.Popen(shlex.split(linein), stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
  warning = len(proc.communicate()[0].split('\n'))
  print "Warnings: ", warning, "Hours: ", hours
if hours > maxhours:
print "Correct hours: ", hours
print "Final command: ", basecommand + str(hours) + ' ' + str(hours)
commandfile = open(stem + '.command','w')
print "Making command file."
print >>commandfile, basecommand + str(hours) + ' ' + str(hours)

The basic usage:

$ vesta.list

It will run UVES_headsort in debug mode with an increasing range of hours around the exposures until it includes the files it needs — it will crap out if you don’t have the right (or enough) calibration files. When it runs properly, this script at the end produces two files: and vesta.command. Run

$ source vesta.command

and you should have a directory vesta/ which, if you cd into it should reduction scripts named things like:

$ cd vesta; ls reduce*

If you’re feeling lucky, you can run:

$ source reduce_master.cpl

and it will go through all of the reduction scripts one at a time and do all of the magic.


There’s still plenty of time to be a genius!

Stern-Gerlach Experiment de

Image via Wikipedia

There was an interesting article that showed up in many places that had some neat information in it. It also didn’t really explore one of the things that struck me as interesting.

The article(s) is MSN: The stroke of genius strikes later in life today. It talks about the idea that scientists do their best work before the age of 30.

It turns out that it definitely USED to be generally true; but it has increasingly not been the case. It turns out that for physics at least, 48 is the average age at which their most genius work is done.


One thing that struck me was this sentence:
“In fact, in 1923, the proportion of physicists who did their breakthrough work by age 30 peaked at 31 percent. Those who did their best work by age 40 peaked in 1934 at 78 percent.”

Let’s keep those numbers in mind for a minute. Consider for a second that it was a certain generation of physicists who bring about the Quantum Mechanics revolution, and not a particular age. Let’s consider the generation born in 1893-4.

If this generation was in the proper place due to an accident of birth, they might have the lion’s share of substantial breakthroughs over a twenty year span or so. This group of physicists would have turned 30 in 1923-24. This same group of physicists would have turned 40 in 1933-34.

So, it’s the same generation in both cases, the only thing that differed was when their “best work” was considered to have occurred. In fact, since the percentages peaked in both decades, it appears that that generation is the one that dominated physics for twenty years.

Enhanced by Zemanta

{ 1 comment }