One chapter closes; a new chapter opens

by Jonathan Whitmore on 2014-08-01

As of this week, I am officially no longer an astrophysicist. I start my next career as a data scientist in about a month. It’s been a fantastic experience for me both personally and professionally. Michael Murphy was an incredibly patient and encouraging boss from whom I learned more than I hoped.

Coming into this job I hoped to learn a ton, and I have, but many opportunities and experiences were completely unexpected. From observing at observatories like ESO’s VLT in Chile and Keck in Hawaii, to being in an IMAX film. Finally, the amazing amount of travel that my position granted me was life altering.

This is not my big farewell post, as I still have a week and a bit in Australia, but I had trouble sleeping last night so I decided to make a fun D3 map of all of the flights that I’ve taken during my time as a postdoc — starting with the flights from DFW (Dallas-Forth Worth) to LAX (Los Angeles) to MEL (Melbourne) in August 2011!

The code that I used to make this is available at this link.

Ok future, let’s see where we go from here.

{ 0 comments }

2014 Harley Wood Winter School Invited Talk

by Jonathan Whitmore on 2014-07-26

Reproducible Open Notebook Science

This past weekend I gave an invited talk at the Harley Wood Winter School in Collaroy, New South Wales, AUS. It was an excellent conference at a beautiful location, and definitely a treat to be asked to speak about scientific computing, the future of reproducible open science.

Here’s the abstract from my talk:

Full-stack science workflow += the IPython notebook

My talk will range from setting up a bashrc, to how you spend your time on a day to day basis, to the ultimate goal of clearly communicating reproducible scientific results. I’ll have many examples of common pitfalls to avoid and a few tactics that can get you series of small wins in the battle that we call research. Finally, I will demonstrate the IPython notebook which I think will become a game changer for sharing reproducible science. I will make my slides and my code publically available after the talk.

I was a bit nervous because the talk was going to include me doing interactive coding and demonstrations — which is always dangerous — but I think that it ended up going rather smoothly. As promised, I am making my slides and random examples available. It also gave me the opportunity to talk about where I hope to see science heading into the future. Reproducible code, shared and open data and notebooks. The ability to reproduce the exact plots in a paper is now easily upon us, and we should strive to have this be the standard going forward.

It was also my last talk that I will give as an astrophysicist because I’m starting the Insight Data Science Fellowship program in September! I’m very excited about that, and I also wanted to share with the audience about so-called ‘Plan B’ careers trajectories out of academia.

First my (interactive) slides

My first set of slides — I tried to export into a reveal.js slideshow, but I failed, so it’s one long (downloadable) IPython notebook.

Second set of slides which includes the Bayesian Blocks example from the AstroML: Machine Learning and Data Mining for Astronomy library.

Example of the future of science

A possible flow of events

Read an interesting arxiv paper
discussion with advisor
download the IPython notebook code (from github) + data; play with it.
Play with it in your own IPython notebook.
Paste your final notebook (text file) into a github gist
Paste link into the IPython notebook viewer (remember to remove the .git at the end of the URL).
Now you have an nbviewer URL.

You can now email that link to anyone in the world who has a browser. No python, no IPython, nothing needs to be installed. The barrier to sharing the analysis here is about as close to zero as we can get.

Workflow

I repeatedly tried to make the case to think about your workflow — the more often you do an action, the more you should think about optimizing it.

A couple of useful .bashrc commands to make life easier: bashrc
This includes the save function which allows you to simply cd example and return to the saved directory (stored for future use as well).
Sublime Text — A text editor worth getting to know (Available OS X/Linux/Windows)
- How to get LaTeX installed — excellent blog post.
- And this blog post as well.
- Finally, but most importantly, this series of screencasts of how to effectively use Sublime Text. Worth watching all the way through once, using Sublime Text for ~ 1 month, then rewatching.
Divvy — keyboard shortcut call up a window and resize to custom sizes (Available for OS X/Windows).

Python

Currently recommenidng getting python, IPython, IPython notebook through the Anaconda installation method.

A list of a few python tutorials:

IPython notebook links

Notebook Viewer
The IPython notebook is moving to the Project Jupyter in the near future. Don’t worry this is the same old IPython notebook thing, but it’s rebranding because it now supports R, Julia and other languages.

Bash

Besides the few lines I recommend adding to your .bashrc above, these are a couple of handy snippets that can be used in bash:

# iterate over numbers from 1 to 12
for index in {1..12}
do
    echo example.$index.name
done


# iterate over all command line arguments
# Save this in a file called demo.bash then run the command
# bash demo.bash hi this works
# to see what happens.
for name in $@
do
    echo $name
done

Recommended actions

Get a github account (it’s free)
Do the excellent short git tutorial to start to get a feel for git.
Github Gists is where you are going to copy and paste the text from your IPython notebook that you’d like to share. Remember gists are updatable, and the nbviewer link will show the most recent version (after a few minute delay).
Get a stackoverflow account (it’s free).
Learn about Data Drive Documents
Created by: Mike Bostock (with tons of amazing examples).
My example: My Life Trajectory (so far) based off of this example

Let me know if you saw the talk and what you thought of it! Or if I forgot to put a link to something that I mentioned.

{ 3 comments }

ipython notebook tips and tricks for science research

by Jonathan Whitmore on 2013-10-08

David Lagattuta and I gave a seminar at CAS about using python, ipython and the ipython notebook. At the end of it we made 3 of our ipython notebooks available to all.

These notebooks have embedded in them examples of images, text, LaTeX, and even embedded YouTube screencasts that explain aspects of the notebook within the notebook itself.

ipython notebook: Future of Science: http://nbviewer.ipython.org/5742826
Clean Code; Clear Code: http://nbviewer.ipython.org/5742829
The Future of Science — EXTRAS: http://nbviewer.ipython.org/5742830

Here’s a screencast that we embedded which shows some neat ipython notebook features:

As always, I’m looking to improve my coding (and presentation) skills, so constructive criticism is requested.

Thanks!

{ 0 comments }

Hidden Universe IMAX 3D

by Jonathan Whitmore on 2013-06-16

I’ve been involved (http://hiddenuniversemovie.com/the-film/the-astronomers/) with the production of a 3D IMAX film called Hidden Universe.

The locations it’ll be showing (so far) are here: http://hiddenuniversemovie.com/theatre-locations/

Some photos that I took of the filming in Chile (back in November 2012).

2012-11 IMAX

{ 5 comments }

Eating differently

by Jonathan Whitmore on 2012-12-12

I’ve been inspired to write up a bit about what I have been eating. The long and the short of it is that I’m basically following the “Paleo”/”Primal” diet (since July 2012), for health reasons (as in my goal was not for weight loss).

It breaks down into two lists: the “allowed/encouraged” and “not allowed”. Here’s how I play it:

ALLOWED

fish
meat
fowl
eggs
bacon
avocados
greens (love me some spinach!)
fruits
vegetables (bell peppers, zucchini, onions, etc.)
nuts (almonds, walnuts, pine nuts)
oil (olive oil, avocado oils, coconut oil)
vinegars (balsamic, etc.)
tea

NOT ALLOWED

no grains (flour, wheat, cereals, etc.)
no potatoes
no legumes (peanuts, soy)
no beans
no corn
no refined sugar
semi-restricted dairy (butter and cheese are fine for me)

A small rant about the name. Paleo is short for paleolithic, as in the time period. It is also referred to as “caveman” diet and probably a few other names. I dislike this name because I find it misleading and unhelpful. The reason I do or do not eat the above things is that I’ve found it to be healthy to eat (or avoid) the above lists of food. That’s it. Not for any other reason that’s justified because “we evolved to eat this way” or other such claims. I do suspect that there’s something to the idea that grains (not just gluten) can cause reactions in some people like an inflammation response.

I don’t want to go into the details, but I just want to say that I’ve found that following these rules has been beneficial to me. As a scientist, I insist that my anecdote isn’t conclusive evidence, but it might lead to some interesting future experiments.

Also note that I decided to follow this diet after watching the following TED talk and reading a number of resources (which didn’t all agree). I decided that at the worst the diet was healthy (and likely healthier than I was eating at the time). At best, it could improve my health.

Free resources

Books

TEDxIowaCity Dr. Terry Wahls

Your mileage may vary/buyer beware/consult doctor before changing stuff.

{ 2 comments }

Data Reduction of VLT-UVES

by Jonathan Whitmore on 2012-03-30

The end goal will be to run UVES_headsort to correctly setup the reduction scripts to reduce science exposures. Let’s look at reducing the VLT-UVES science exposures for a single object: vesta.

Start with a bunch of raw UVES data files in a folder (let’s call it raw) that are named things like:
UVES.2011-11-01T23:57:47.608.fits.

In this mess of files, there exist all manner of calibration and science exposures for the spectrograph in different settings. It’s a mess, so the first order of business is to organize this into something that can be used. It isn’t pretty, but here’s how I do this first step:

# I'm going to make this into a proper script one day, but here it is as it currently stands
# Replace OBJECTNAME with the object name in the exposures
ls /raw/UVES.2010*.fits > raw.list.all.1
dfits `cat raw.list.all.1` | fitsort DPR.TYPE DPR.CATG EXPTIME OBJECT INS.MODE DPR.TECH | \
grep -v "SLIT" | grep -v "CDALIGN" | grep -v "SimCal" | grep -v "DARK" | grep -v "DFLAT" | grep -v "TEST" | grep -v "FIBRE" | \
awk 'NR > 1 {if ($3!="SCIENCE" || ($3=="SCIENCE" && $4>0.1)) print $0}' | \
awk '{if ($4>=300.0 || $3!="SCIENCE" || ($4<300.0 && $3=="SCIENCE" && ($5!="OBJECTNAME")) || (($3=="CALIB" || $3=="TEST") && ($6!="ECHELLE,ABSORPTION-CELL" || $7!="ECHELLE,ABSORPTION-CELL"))) print $0}' | \
awk '{print $1}' > raw.list.1
grep -f raw.list.1 -v raw.list.all.1 > raw.list.exclude.1

What should happen at this point is a file is made: raw.list.1 which has all of the science and their associated files that need to be reduced. An important point here — you have to use the full path names.

(I’ve never used the helpreduce.py script at this stage, so you can try it (let me know how it goes), or just use the following command).

$ UVES_headsort raw.list.1 -info raw.info -list -c 45 45 -d

If the object name is correctly filled out in the science exposures (a genuine “if”), you’ll have a file named vesta.list which looks like:

$ head vesta.list 
/raw/UVES.2011-11-01T23:57:47.608.fits
/raw/UVES.2011-11-02T00:01:56.739.fits
/raw/UVES.2011-11-02T11:13:03.809.fits
/raw/UVES.2011-11-02T11:11:10.079.fits
/raw/UVES.2011-11-02T11:14:42.258.fits

I’ve written a really simple python script that helps set up the reduction scripts — it’s not well-commented but it’s also not too complicated.

#!/bin/python
# 2011-11-16 helpreduce.py
 
import sys
import os
import subprocess
import argparse
 
parser = argparse.ArgumentParser(description="filename")
parser.add_argument('infile', action='store', type=str, help='grabs commandline filename')
 
args = parser.parse_args()
stem = args.infile
 
if args.infile.split('.')[-1] == 'list':
  stem = args.infile.split('.')[0]
 
basecommand = 'UVES_headsort ' +  stem + '.list -info ' + stem + '.info -c '
 
hours = 0
warning = 10
maxhours = 200
while warning > 5 and hours < maxhours:
  hours += 1
  linein = basecommand + str(hours) + ' ' + str(hours) + ' -d'
  proc = subprocess.Popen(shlex.split(linein), stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
  warning = len(proc.communicate()[0].split('\n'))
  print "Warnings: ", warning, "Hours: ", hours
 
if hours > maxhours:
  sys.exit()
 
print "Correct hours: ", hours
print "Final command: ", basecommand + str(hours) + ' ' + str(hours)
commandfile = open(stem + '.command','w')
print "Making command file."
print >>commandfile, basecommand + str(hours) + ' ' + str(hours)
commandfile.close()

The basic usage:

$ helpreduce.py vesta.list

It will run UVES_headsort in debug mode with an increasing range of hours around the exposures until it includes the files it needs — it will crap out if you don’t have the right (or enough) calibration files. When it runs properly, this script at the end produces two files: vesta.info and vesta.command. Run

$ source vesta.command

and you should have a directory vesta/ which, if you cd into it should reduction scripts named things like:

$ cd vesta; ls reduce*
 
reduce_564_03.cpl
reduce_564_03.sof
reduce_master.cpl

If you’re feeling lucky, you can run:

$ source reduce_master.cpl

and it will go through all of the reduction scripts one at a time and do all of the magic.

{ 0 comments }

There’s still plenty of time to be a genius!

by Jonathan Whitmore on 2011-11-11

Image via Wikipedia

There was an interesting article that showed up in many places that had some neat information in it. It also didn’t really explore one of the things that struck me as interesting.

The article(s) is MSN: The stroke of genius strikes later in life today. It talks about the idea that scientists do their best work before the age of 30.

It turns out that it definitely USED to be generally true; but it has increasingly not been the case. It turns out that for physics at least, 48 is the average age at which their most genius work is done.

Cool.

One thing that struck me was this sentence:
“In fact, in 1923, the proportion of physicists who did their breakthrough work by age 30 peaked at 31 percent. Those who did their best work by age 40 peaked in 1934 at 78 percent.”

Let’s keep those numbers in mind for a minute. Consider for a second that it was a certain generation of physicists who bring about the Quantum Mechanics revolution, and not a particular age. Let’s consider the generation born in 1893-4.

If this generation was in the proper place due to an accident of birth, they might have the lion’s share of substantial breakthroughs over a twenty year span or so. This group of physicists would have turned 30 in 1923-24. This same group of physicists would have turned 40 in 1933-34.

So, it’s the same generation in both cases, the only thing that differed was when their “best work” was considered to have occurred. In fact, since the percentages peaked in both decades, it appears that that generation is the one that dominated physics for twenty years.

{ 1 comment }

Suggestions for Moving to Melbourne

by Jonathan Whitmore on 2011-09-27

Image via Wikipedia

Now that I’ve been here just over a month, I decided to write down a few of the suggestions that I had about relocating to Melbourne while it is still relatively fresh in my head.

Finances

Step 0. Take out Aussie $$ via ATMs from your US $$ accounts.

Step 1. Get an Aussie bank account. It’s dead simple, just do it. Walk into a bank and do it.

Step 2. Get a mobile (cell) phone. For some reason, it seemed to me that everyone in Australia will conduct interactions via phone calls and not via email almost on pain of death. Perhaps it was just the specific things that I was trying to complete, but it seemed that no one wanted to email. This lead to a number of phone-tag back-and-forths, including one situation where a real estate agent and I called and missed each other 7 times over a span of about a week.

Housing

If you are looking to rent: I recommend going with www.domain.com.au and www.realestate.com.au websites. The process for finding an apartment is roughly as follows. Before you are allowed to apply for an apartment you have to physically inspect the apartment. The real estate agent posts a time on one of the websites I listed above that everyone interested in an apartment needs to arrive. They will advertise a place and say: on 17/09 12:30-1PM (remember they do the date day/month/year) — you need to show up at 12:30, maybe even earlier. They don’t usually stay the whole time. Also, the walkthroughs were fairly quick in my experience. When you show up, they will write your name and mobile phone number.

If you are looking for house shares try: melbourne.gumtree.com.au and swinburne.studystays.com.au.

Also, the addresses are listed like this: “343/57-59 Main Street” means: Apartment Number 343, Street Address 57 through 59. Don’t ask me, it just means you can show up at 57, or 59 and you’re at the same place either way.

The rent is listed as $/week — but you normally pay per month. (And it can be 4 or 4 1/3 times the weekly amount listed, so clarify w/ renter).

Applying

I applied to a number of places. Here is what I did and some suggestions.

If I noticed that a certain real estate company is representing a number of apartments that I am interested in looking at, I filled out their application with information that would be the same each application: the references, my name, my contact info, etc. I then made a “master copy” that I made photocopies. This allowed me to quickly fill out their application for several places with the same real estate agency.

Rent application credentials.

1 page cv/resume; if you have a phd make sure you write: Dr. Jonathan Whitmore and in general during this application process, put Dr. wherever you can (or PhD student if that’s what you are, don’t lie). You are trying to be as impressive as can be.
Clear photocopies of as many forms of ID that you can find: passports, drivers licenses, staff ID cards, etc.
Include your letter of offer for employment
Include work visa (mine was the email printout from the Aussie government saying the terms).

Make a number of copies of your credentials so that you can easily fill out the application and attach your credential packet. I recommend filling out the applications for each of the apartments that you are going to look at before you go. This allows you to hand the entire filled out application packet if you are interested applying for the place.

Some final thoughts: weekends are busy and usually have many people looking at once. You have a better shot with during the week ones (less people seeing/applying means your odds are slightly better). Be friendly with the real estate agents. Finally, try to email/call and schedule a private showing during the week — these worked out the best for me.

Good luck!

{ 1 comment }

Left the US for Oz

by Jonathan Whitmore on 2011-08-06

Image by WilLiao via Flickr

Do you ever get the feeling that your presence holds everything together, and that if you leave it’ll all collapse? This feeling almost never reflects reality, and if you get it frequently and think it’s true, it’s probably often sign of a narcissistic personality disorder.

That being said, I left the US on Monday, and in the subsequent 5 days the entire country seems to have done nothing but get itself into trouble. I’m not saying that if I stayed in the US that the debt limit crisis would have played out differently, or that the stock markets wouldn’t have fallen almost 5% in a day, or that the S&P wouldn’t have downgraded the US government credit rating for the first time in history… but the timing is interesting, isn’t it?

I’m currently in Hawthorn, which is a suburb very close to Melbourne, Australia, to begin my postdoc position at Swinburne University of Technology. Leaving the 100+ degree weather of Texas for the mid 50’s and rainy (today at least) Melbourne winter has been quite a jolt, but I think I’m really going to like living here.

I’ll provide more updates in the near future as I find housing and get settled!

{ 1 comment }

The Future of Education

by Jonathan Whitmore on 2011-03-11

I think that the future of education is an amazingly interesting idea to kick around. To begin with, I fully recommend watching the video of Salman Khan at TED where he talks about his Khan academy (I’ve embedded the talk below). This Khan academy is a nonprofit that he started after several tutorial videos that he made for his cousins on youtube started garnering an intense interest and following. He currently has over 2,000 videos that he has uploaded that teach a whole range of topics: from mathematics to biology to history.

He also has hired people to help him develop software that help to test students on what they are supposed to be learning. For example, to get through a module on simple addition, you have to answer ten questions right in a row — and you can ask for hints, or jump back to see a video on the concept you are having difficulty with. Also, once you get that done, you can go on from there to another subject.

The data is already impressive; in a few years, there are about one million students participating in the Khan academy and traditional teachers using the lectures to supplement the classroom. In most cases, the videos do the lectures, and the teacher looks at the dashboard for the class and figures out who is having trouble with what concept, and where. Further, she can see that another student has mastered the subject, and can use other students to help teach each other about the subject.

I’ve written about my thoughts on the future of education before, and I didn’t have a clear concept of exactly what the teachers would be doing if there was an adaptive computer model that instructed students based on their history and current understanding. Khan argues that this humanizes the classroom experience; I am surprised but I think he is correct.

I think Khan is missing one piece (or at least, I haven’t heard that it has the following piece): I think he needs to open up the possibility for videos from people besides himself. Now, this doesn’t mean that he has to accept any, and hell, I think he’s earned the title of Benevolent Dictator for Life on what is displayed by the Khan Academy. But there are a number of subjects that can be pressed into and teachers who are willing to contribute in useful ways that simply cannot be implemented by Khan himself.

Let’s just stick with physics, since I know physics. Let’s say that I thought I could explain the difference between electrical potential and electrical potential energy in a different/better way that Khan — so I create my own youtube video and submit it to the site. Does my 10 minute video module teach it better than Khan’s? I don’t know, perhaps yes, perhaps no. But the nice thing about having a million students, is that random testing can be easily implemented to display one video or the other to different students and to see what results come from it. Perhaps there’s not one video explanation that’s better than the other — just that people who are more auditory learners learn better from one video, while the opposite happens for the other students. There doesn’t have to be only one “right” video.

If people were allowed to submit videos, they’d clearly have to pass a minimum quality level, but after that level is reached, it would be interesting to be able to test and eventually implement the best video instruction over time.

In fact, I’m sure they have statistics on the videos that would make it easy to determine which video segments taught the concept they were aiming to the best, and which video segments students had the most trouble with. The Khan academy could have a “help us with your version of these 5 videos” that get put up for a week at a time, and as the new videos get cycled through, the bottom of the feeder gets filled with the “most need for improvement” videos. I think replacing from the bottom would be the simplest and most effective way for this kind of a thing to be implemented.

Finally, it would be awesome to have the courses like this be involved through the graduate level in all subjects that can be taught in this way — and I simply don’t think there’s time for Khan to be able to learn and regurgitate all that there is to learn in physics/math/history/biology/everything else. I understand that his goal might be to fully develop K-12, but there’s no reason to limit the benefits of the video database infrastructure to those levels and subjects.

{ 1 comment }