Small Simplicity

Understanding Intelligence from Computational Perspective

Feb 22, 2020

Use `Make` for Reproduible Research

Basics of make for Reproducible Research

  • A research project ca be seen as a tree of dependencies

    • the report depends on the figures and tables
    • the figurese and tables depend on the data and the analysis scripts used to process this data
  • Make is a tool for creating output files from their dependencies through pre-specified rules

  • Make is a build automatio tool
    • Makefile: a configuratio file that contains the rules for what to build
    • Make builds targets using recipes
    • targets can optionally have _prerequisites
    • prerequisites can be files on your computer or other targets: prerequisites == dependent files/targets
    • Make figures out what to build based on the DAC of the targets and prerequisites (ie. its dependencies)
      • the targets are updated only when needed, based on the modification time of their dependencies
    • Phony targets (eg. all, clean): targets that don't actually create an output file
      • they are always run if they come up in a dependency
      • but will no longer be run if a directory/file is ever created that is called all or clean
      • To define targets as Phony target, add a line at the top of the Makefile like:
        .PHONY: all clean test
        
    • Automatic Variables and Pattern Rules
      • $<: first prerequisite
      • $@: target
      • %: wildcard for pattern rules
    • Variables and Functions: both use $(...) syntax. For example,
      ALL_CSV = $(wildcard data/*.csv)
      INPUT_CSV = $(wildcard data/input_file_*.csv)
      DATA = $(filter-out $(INPUT_CSV),$(ALL_CSV)
      
      ## Caveats
  • Use space as the delimiter. Don't use ,.
  • Indent with tabs in Makefiles. Makefiles do not accept indentation with spaces
  • Make executes each line of a Makefile independently in a separate subshell
  • Make executes the first target when no explicit target is given

Recommendation

  • Put all as the first target in the Makefile
  • Name the main target all: it's a convention many people follow
    • all == reference to the main target of the Makefile
    • In other words, this is the target that generates the main desired output(s)
    • Put multiple outputs as the prerequisite of the main target (all)
    • This allows the user to call make in the commandline, and get the desired output(s)
      • All the other rules are there to help build that output (in the simplest case)
  • Design your project's directory structure and Makefile hand in hand
  • Use all capitals for variable names; define variables at the top of the Makefile
  • start small and start early!

Source

More