Use `Make` for Reproduible Research
Basics of make
for Reproducible Research¶
A research project ca be seen as a tree of dependencies
- the report depends on the figures and tables
- the figurese and tables depend on the data and the analysis scripts used to process this data
Make is a tool for creating output files from their dependencies through pre-specified rules
- Make is a build automatio tool
- Makefile: a configuratio file that contains the rules for what to build
- Make builds targets using recipes
- targets can optionally have _prerequisites
- prerequisites can be files on your computer or other targets: prerequisites == dependent files/targets
- Make figures out what to build based on the DAC of the targets and prerequisites (ie. its dependencies)
- the targets are updated only when needed, based on the modification time of their dependencies
- Phony targets (eg.
all
,clean
): targets that don't actually create an output file- they are always run if they come up in a dependency
- but will no longer be run if a directory/file is ever created that is called
all
orclean
- To define targets as
Phony target
, add a line at the top of the Makefile like:.PHONY: all clean test
- Automatic Variables and Pattern Rules
$<
: first prerequisite$@
: target%
: wildcard for pattern rules
- Variables and Functions: both use
$(...)
syntax. For example,## CaveatsALL_CSV = $(wildcard data/*.csv) INPUT_CSV = $(wildcard data/input_file_*.csv) DATA = $(filter-out $(INPUT_CSV),$(ALL_CSV)
- Use space as the delimiter. Don't use
,
. - Indent with tabs in Makefiles. Makefiles do not accept indentation with spaces
- Make executes each line of a Makefile independently in a separate subshell
- Make executes the first target when no explicit target is given
Recommendation¶
- Put
all
as the first target in the Makefile - Name the main target
all
: it's a convention many people followall
== reference to the main target of the Makefile- In other words, this is the target that generates the main desired output(s)
- Put multiple outputs as the prerequisite of the main target (
all
) - This allows the user to call
make
in the commandline, and get the desired output(s)- All the other rules are there to help build that output (in the simplest case)
- Design your project's directory structure and Makefile hand in hand
- Use all capitals for variable names; define variables at the top of the Makefile
- start small and start early!