Project

General

Profile

Reproducibility » History » Version 1

Version 1/2 - Next ยป - Current version
Miguel Dias Costa, 15/11/2011 14:21


Reproducibility

The original name of "Annals of Improbable Research", the organizers of the Ignobel Prizes, was "Journal of Irreproducible Results". This was meant as a joke, but how reproducible are computational results published in "real" journals?

Towards reproducible results

Open Source

  • Reproducibility is not only about the developer reproducing the results, everyone else should be able to.

Version Control System

  • Subversion
  • Git
  • Bazaar
  • Mercurial, etc.

The output files must be coupled to a revision. Of course, this implies always commiting before a production run.

Simplest approach

int main(int argc, char * argv[]) {
system("svn info > svn.info");
...

Improvements

  • Always branch on production run?

Parameters and Configuration Files

  • Output files must also be coupled with runtime parameters.
  • Use parsers for parameters and command line arguments (e.g. getopt, boost.program_options)

Project Management

Project management software that integrates with repositories allow us to store documentation, bugs, wikis, files, etc. in a way that is coupled to the repository. Everything can be directly and easily linked to a revision.

  • Trac (widely used, python)
  • Redmine (better for multiple projects, ruby on rails)

Building tools

  • Autotools
    • Autoconf, Automake
  • Scons?

Documentation

  • Doxygen
  • Comments in a standard form
    • Javadoc syntax

IDE (It Does hElp)

  • Eclipse
  • KDevelop
  • ...

But make sure you know how to do everything without it.

Language

  • We need performance, which might imply first Fortran, then C, then C++. But we also need to concentrate on the problem we are trying to solve, delegating the implementation details to good libraries, and there are more available for C and C++. Object oriented programming allows us to write more complex and general code than C, by encapsulating functionality within classes. It also provides us with higher level algorighms and datastructures which increase productivity, and the best implementations (STL and Boost) have little or no overhead at all. So (in a nutshell) C++ it is.

Libraries

  • Use standard, widely used libraries
  • Stl
  • Boost
  • Blitz++?
  • ...

Automation

  • Not only to save time - automated tasks are inherently reproducible

Unit Testing

  • This is probably not realistic in the short term, and we can reduce the need for it by using good (and tested) libraries, but nevertheless:
    • cppUnit
    • cppUnitLite
    • Boost.Test
    • CUTE

Other reproducibility concerns