Documented and repeatable "mini-experiments"
In this post I outline my workflow on how I conduct my "mini-experiments", code and notes that may otherwise be "throw-away". You can see and example of what this looks like here.
Table of Contents
When learning, using and creating algorithms and computational tools, an important part of the process is playing around. This involves getting to know the tools, specific commands, trying out ideas, etc. Typically, for me, this might involve creating a bunch of python files with half-meaningful names in a folder with some input data and result files. Unfortunately, this does not leave my findings in a very usable format. If I months later decide I want to look over a similar idea or pass my findings to someone else I have to trawl back through these files, or redo the experiments.
What is needed is a way to organise these ideas together, embedding the test code and results within a document. An easy way to do this is to simply copy-paste any code and results into a text or Word document with some comments. Presented in this post is an alternative workflow, which I find far more satisfactory. In this workflow, thanks to Org Mode, the code is embedded within the document, along with the results.
At our workplace we have a DNS server, that means it is very easy to remotely connect to a computer without remembering an IP address. We also use a VPN. This means that if I use my computer to host my notes over HTTP, I can easily get access to my notes anywhere in the world, by simply connecting over the VPN, and typing my computer's name into a browser.
Org Mode
Org Mode is an Emacs major mode. Emacs operates in "modes", which are
context-dependant (i.e., change depending on what type of file is
open). Org Mode works with .org
files, which are basically just
documents.
Org mode is designed to keeping notes, TODO lists and plans [1]. Hence, it is named Org Mode, designed to be an environment for organisation. As such, it has a strong focus on being able to create and manipulate lists quickly, and to track dates using a calendar.
Whilst Org Mode as an organisational tool is very effective, this is not the primary reason I use it. To me, Org Mode is a simple markup language. Some would argue that Org Mode markup syntax is not as "nice" as markdown's, I would agree with them. However, the two syntaxes are indisputably similar in concept. What Org Mode has over competing markup languages like markdown and ReStructured Text is the host of features that come with it. Importantly to me include:
- the ability to export (accurately and consistently) to a wide range of formats, including PDF (for papers), HTML (read on), and markdown (for this blog); and
- the ability to embed and run code within the document using babel. In fact, you can even embed code in one language, and pass the evaluated results to another embedded code snippet in an entirely different language.
Org Mode for computational experimentation
Babel is specifically designed with literate programming and reproducible research in mind [2]. It has also been recommended in a number of other workflows for reproducible research [3][4]. This article outlines my workflow for my own personal notes, but uses ideas from work presented in these papers.
Nix
Nix is a package manager, much like apt-get
/dpkg
or Homebrew. The
main feature of Nix I find useful is its ability to:
- support multiple versions of an installed package; and
- provide a shell with specific packages installed and nothing extra,
using
nix-shell --pure
.
Nix is able to achieve this functionality through its unique implementation: Nix is a purely functional package manager [5]. This means it is able to consistently depend on specific versions of software, and ensure reproducible environments. In actuality, Nix builds often download source from the internet, which may, of course, become unavailable. So rather than guaranteeing reproducible builds, it actually guarantees that if an environment builds, it is identical, and otherwise will not build. Nonetheless, it is a useful tool as a package manager providing multiple environments.
Setting up an experimental environment
I keep all my experiments in my home directory, ~/experiments
. This
directory looks like:
$ tree ~/experiments |-- 2016-03-13-random-exploration | `-- index.org `-- template.org
The first important file here is template.org
. Using this file, I
can quickly start a new experiment using:
mkdir 2016-03-13-random-exploration cp template.org 2016-03-13-random-exploration/index.org
Lets have a look at its contents.
#+TITLE: #+AUTHOR: Ashley Gillman #+EMAIL: ashley.gillman@csiro.au #+OPTIONS: ^:{} #+HTML_LINK_HOME: / #+HTML_LINK_UP: .. #+HTML_HEAD: <link rel="stylesheet" type="text/css" href="/style.css"> #+HTML_HEAD: <link rel="stylesheet" type="text/css" href="https://cdn.rawgit.com/dreampulse/computer-modern-web-font/master/fonts.css"> * Setup :noexport: #+BEGIN_SRC nix :tangle default.nix let pkgs = import /home/ash/repo/nixpkgs {}; in { stdenv ? pkgs.stdenv, pythonPackages ? pkgs.python34Packages }: stdenv.mkDerivation { name = "python-nix"; buildInputs = [ pythonPackages.python pythonPackages.scipy pythonPackages.numpy pythonPackages.matplotlib ]; } #+END_SRC * Directory listing #+BEGIN_SRC python :results output raw replace :exports results from pathlib import Path link_format = '- [[file:{0}][={0}=]]'.format print(*(link_format(p.name + ('/' if p.is_dir() else '')) for p in sorted(Path('.').iterdir()) if not p.name.startswith(('.', '#'))), sep='\n') #+END_SRC * Aim * Methodology * Local Variables :noexport: Local Variables: org-export-babel-evaluate : nil org-confirm-babel-evaluate : nil org-html-link-org-files-as-html : nil org-html-postamble-format : '( \ ("en" " <p class=\"author\" >Author: %a (%e)</p>\n \ <p class=\"date\" >Date: %T</p>\n \ <p class=\"creator\" >%c</p>\n \ <p ><a href=\"/\">Home</a></p>")) org-babel-python-command : "\ /home/ash/.nix-profile/bin/nix-shell \ --pure \ --command python3" eval: (require 'ox-bibtex) End:
The first block of code is some standard templating, setting myself as
the author, and my email address. Options keywords can be found
here. The next section, under the Setup
heading is interesting. The
:noexport:
tag means that this section will not appear in the
exported document. However, it does contain a source block with a Nix
expression. I have this set up to, by default, set up a basic Python 3
environment. Doing so ensures that we know exactly what libraries our
experiments are using, and ensures that even years later we will be
able to repeat our experiments.
The Directory Listing section simply contains a python script that
will provide a link to each file and folder in the directory. This is
just for convenience when later exploring the results. The Aim,
Methodology and Results headings are empty, just providing
placemarkers for later. Finally, the Local Variables sets up Emacs
file-local variables. Here I instruct org-mode to evaluate all results
when the file is exported (this may need to be changed at some point
if the code takes a long time to run), disable confirmation messages
(be careful if you didn't write the code), allow links to .org
files, and set the HTML footer. Lastly, and importantly, I change the
python command to run via =nix-shell –pure", which uses the
environment defined in the Setup section.
I have hosted an example with some toy experiments at http://ashgillman.github.io/experiments/. The source code for one such experiment can be seen here, and its rendered output, here. Great!
The index file at http://ashgillman.github.io/experiments/ is
generated using gen_index.py
. Let's have a look at its source:
#!/usr/bin/env python3 from pathlib import Path from datetime import datetime html_format = """<body> <h1>Private Repository of Ashley Gillman</h1> {} <p><i>Generated {}</i></p> </body> """.format site = '.' doc_links = ['*.pdf'] link_format = '<p><a href="./{0}">{0}</a></p>'.format hard_links = '<p><a href="/" onclick="javascript:event.target.port=8888;event.target.protocol=\'https:\'">iPython Notebook</a></p>' subdir_links = '\n'.join(sorted([link_format(d.name) for d in Path(site).iterdir() if d.is_dir()])) file_links = '\n'.join(sorted([link_format(f.name) for pattern in doc_links for f in Path(site).glob(pattern)])) html = html_format( '\n'.join([hard_links, subdir_links, file_links]), datetime.now().strftime('%d %b, %Y')) with open(str(Path(site, 'index.html')), 'w+') as f: f.write(html)
This is just a very simple script to make a very simple index. You mightn't even want to use it, opting instead for something like Apache's default indexing.
Citations
Using ox-bibtex.el
, it is also possible to include citations when
exporting to HTML just as you would when exporting to PDF, using TeX
markup. ox-bibtex
is already imported for us in through
template.org
under the Local Variables. The bibliography is included
by simply using:
#+BIBLIOGRAPHY: bibfilename stylename
and citations are inserted using \cite{}
. See the source code for
this blog for examples.
Hosting with Docker
Docker is a virtualisation tool, allowing you to run a service as if it were running on a virtual machine, without the overhead of an actual virtual machine. But also, importantly, Docker has access to the Docker Hub, which allows you to very quickly fire up containers to run common services. I have found the simplest way to launch the server is using Docker. Once Docker has been installed, the Apache HTTP daemon can be launched (and configured to relaunch on restart) using one command:
docker run --name private-server \ -v /home/ash/experiments:/usr/local/apache2/htdocs -p 80:80 \ --restart=always -d httpd
This starts up a container named private-server
, running an Apache
HTTP server serving from the experiments folder, and serving on port
80, the default HTTP port. The container will also try and restart
itself if it errors, or if you restart your computer, etc.
Adding an IPython Jupyter Notebook
I sometimes find it more convenient to work from an IPython Notebook than from within Org Mode, I find it a bit easier to debug and tune Matplotlib plots for example. You can also very easily host one of these using Docker.
docker run --name ipython-server -d -p 8888:8888 \ -v /home/ash:/home/ash -v /home/ash/notebooks:/notebooks \ --restart=always -d ipython/scipyserver
This install includes the SciPy stack, which includes SciPy, NumPy, etc. I actually use a slightly different version, with a few extra packages installed.
docker run --name ipython-server -d -p 8888:8888 \ -v /home/ash:/home/ash -v /home/ash/notebooks:/notebooks \ --restart=always -d gil2a4/mipython
You may also have noted that gen_index.py
includes a hard-coded
inclusion to add a link to port 8888. This makes it a little easier to
access the server. The Jupyter notebook will only be accessible
through HTTPS, and you will have to click through a warning that the
certificate is invalid. Otherwise, it works perfectly.
Conclusion
Included here is a rough outline of how I have my environment set up to be able to document and record my experiments, and provide some formality in their structure. Although still not perfect, I find this approach to have a nice balance between structure and flexibility, providing scaffolding to test things quickly.
If you require more information, you may be able to find it by checking through some of the org source code I have available. Useful links include:
- The example version of this approach: https://github.com/ashgillman/experiments
- This blog's source: https://github.com/ashgillman/ashgillman.github.io/tree/master/_posts
- My
~/.emacs.d
folder: https://github.com/ashgillman/dotfiles/tree/master/emacs.d
References
[1] | C. Dominik, The Org Manual. Network Theory Ltd., 8.3.4 ed., 2016. |
[2] | E. Schulte and D. Davison, “Active documents with org-mode,” Computing in Science & Engineering, vol. 13, no. 3, pp. 66--73, 2011. |
[3] | M. Delescluse, R. Franconville, S. Joucla, T. Lieury, and C. Pouzat, “Making neurophysiological data analysis reproducible: Why and how?,” Journal of Physiology-Paris, vol. 106, no. 3, pp. 159--170, 2012. |
[4] | L. Stanisic, A. Legrand, and V. Danjean, “An effective git and org-mode based workflow for reproducible research,” ACM SIGOPS Operating Systems Review, vol. 49, no. 1, pp. 61--70, 2015. |
[5] | E. Dolstra and A. Hemel, “Purely functional system configuration management.,” in HotOS, 2007. |