Project

General

Profile

Reproducibility » History » Version 1

Miguel Dias Costa, 15/11/2011 14:21

1 1 Miguel Dias Costa
h1. Reproducibility
2 1 Miguel Dias Costa
3 1 Miguel Dias Costa
The original name of "Annals of Improbable Research", the organizers of the Ignobel Prizes, was "Journal of Irreproducible Results". This was meant as a joke, but how reproducible are computational results published in "real" journals?
4 1 Miguel Dias Costa
5 1 Miguel Dias Costa
h2. Towards reproducible results
6 1 Miguel Dias Costa
7 1 Miguel Dias Costa
h2. Open Source
8 1 Miguel Dias Costa
9 1 Miguel Dias Costa
* Reproducibility is not only about the developer reproducing the results, everyone else should be able to.
10 1 Miguel Dias Costa
11 1 Miguel Dias Costa
h2. Version Control System 
12 1 Miguel Dias Costa
13 1 Miguel Dias Costa
* Subversion
14 1 Miguel Dias Costa
15 1 Miguel Dias Costa
* Git
16 1 Miguel Dias Costa
17 1 Miguel Dias Costa
* Bazaar
18 1 Miguel Dias Costa
19 1 Miguel Dias Costa
* Mercurial, etc.
20 1 Miguel Dias Costa
21 1 Miguel Dias Costa
The output files must be coupled to a revision. Of course, this implies always commiting before a production run.
22 1 Miguel Dias Costa
23 1 Miguel Dias Costa
h3. Simplest approach
24 1 Miguel Dias Costa
25 1 Miguel Dias Costa
<pre>
26 1 Miguel Dias Costa
int main(int argc, char * argv[]) {
27 1 Miguel Dias Costa
system("svn info > svn.info");
28 1 Miguel Dias Costa
...
29 1 Miguel Dias Costa
</pre>
30 1 Miguel Dias Costa
31 1 Miguel Dias Costa
h3. Improvements
32 1 Miguel Dias Costa
33 1 Miguel Dias Costa
* We could also check status, commit automatically before a production, etc.
34 1 Miguel Dias Costa
35 1 Miguel Dias Costa
  * For that, it would be better to use an API instead of system(), e.g. http://rapidsvn.tigris.org/svncpp.html
36 1 Miguel Dias Costa
37 1 Miguel Dias Costa
* Always branch on production run?
38 1 Miguel Dias Costa
39 1 Miguel Dias Costa
h2. Parameters and Configuration Files
40 1 Miguel Dias Costa
41 1 Miguel Dias Costa
* Output files must also be coupled with runtime parameters.
42 1 Miguel Dias Costa
43 1 Miguel Dias Costa
* Use parsers for parameters and command line arguments (e.g. getopt, boost.program_options)
44 1 Miguel Dias Costa
45 1 Miguel Dias Costa
h2. Project Management
46 1 Miguel Dias Costa
47 1 Miguel Dias Costa
Project management software that integrates with repositories allow us to store documentation, bugs, wikis, files, etc. in a way that is coupled to the repository. Everything can be directly and easily linked to a revision.
48 1 Miguel Dias Costa
49 1 Miguel Dias Costa
* Trac (widely used, python)
50 1 Miguel Dias Costa
51 1 Miguel Dias Costa
* Redmine (better for multiple projects, ruby on rails)
52 1 Miguel Dias Costa
53 1 Miguel Dias Costa
h2. Building tools
54 1 Miguel Dias Costa
55 1 Miguel Dias Costa
* Autotools
56 1 Miguel Dias Costa
57 1 Miguel Dias Costa
 * Autoconf, Automake
58 1 Miguel Dias Costa
59 1 Miguel Dias Costa
* Scons?
60 1 Miguel Dias Costa
61 1 Miguel Dias Costa
h2. Documentation
62 1 Miguel Dias Costa
63 1 Miguel Dias Costa
* Doxygen
64 1 Miguel Dias Costa
65 1 Miguel Dias Costa
* Comments in a standard form 
66 1 Miguel Dias Costa
 
67 1 Miguel Dias Costa
 * Javadoc syntax
68 1 Miguel Dias Costa
69 1 Miguel Dias Costa
h2. IDE (It Does hElp)
70 1 Miguel Dias Costa
71 1 Miguel Dias Costa
* Eclipse
72 1 Miguel Dias Costa
73 1 Miguel Dias Costa
* KDevelop
74 1 Miguel Dias Costa
75 1 Miguel Dias Costa
* ...
76 1 Miguel Dias Costa
77 1 Miguel Dias Costa
But make sure you know how to do everything without it.
78 1 Miguel Dias Costa
79 1 Miguel Dias Costa
h2. Language
80 1 Miguel Dias Costa
81 1 Miguel Dias Costa
* We need performance, which might imply first Fortran, then C, then C++. But we also need to concentrate on the problem we are trying to solve, delegating the implementation details to good libraries, and there are more available for C and C++. Object oriented programming allows us to write more complex and general code than C, by encapsulating functionality within classes. It also provides us with higher level algorighms and datastructures which increase productivity, and the best implementations (STL and Boost) have little or no overhead at all. So (in a nutshell) C++ it is.
82 1 Miguel Dias Costa
83 1 Miguel Dias Costa
h2. Libraries
84 1 Miguel Dias Costa
85 1 Miguel Dias Costa
* Use standard, widely used libraries
86 1 Miguel Dias Costa
87 1 Miguel Dias Costa
* Stl
88 1 Miguel Dias Costa
89 1 Miguel Dias Costa
* Boost
90 1 Miguel Dias Costa
91 1 Miguel Dias Costa
* Blitz++?
92 1 Miguel Dias Costa
93 1 Miguel Dias Costa
* ...
94 1 Miguel Dias Costa
95 1 Miguel Dias Costa
h2. Automation
96 1 Miguel Dias Costa
97 1 Miguel Dias Costa
* Not only to save time - automated tasks are inherently reproducible
98 1 Miguel Dias Costa
99 1 Miguel Dias Costa
100 1 Miguel Dias Costa
h2. Unit Testing
101 1 Miguel Dias Costa
102 1 Miguel Dias Costa
* This is probably not realistic in the short term, and we can reduce the need for it by using good (and tested) libraries, but nevertheless:
103 1 Miguel Dias Costa
104 1 Miguel Dias Costa
 * cppUnit
105 1 Miguel Dias Costa
106 1 Miguel Dias Costa
 * cppUnitLite
107 1 Miguel Dias Costa
108 1 Miguel Dias Costa
 * Boost.Test
109 1 Miguel Dias Costa
110 1 Miguel Dias Costa
 * CUTE
111 1 Miguel Dias Costa
112 1 Miguel Dias Costa
113 1 Miguel Dias Costa
h1. Other reproducibility concerns
114 1 Miguel Dias Costa
115 1 Miguel Dias Costa
* http://www.johndcook.com/blog/2008/05/26/reproducible-scientific-computing/
116 1 Miguel Dias Costa
117 1 Miguel Dias Costa
* http://www.reproducibility.org/wiki/Reproducibility
118 1 Miguel Dias Costa
119 1 Miguel Dias Costa
* http://www.reproducibility.org/wiki/Reproducible_computational_experiments_using_SCons
120 1 Miguel Dias Costa
121 1 Miguel Dias Costa
* http://wwwcdf.pd.infn.it/~loreti/science.html