Go and check ts-6b6-27c: Transparent workflow tools for scientists
Thursday, October 15, 2015
Government should be transparent. Science should be open. Government information belongs into the public domain, and scientific data should be publicly available to permit replication and scrutiny. Few would disagree with those calls for openness, and indeed there has been a flurry of activity within the sciences to upgrade research practices to achieve greater openness and transparency. Organizations such as the Open Science Foundation provide researchers with new tools for research, ranging from pre-registration of studies to making available data sets and other information. And openness seems to be rewarding not just for the scientific community overall but also for the individual researcher: Sharing data has been found to be associated with an increased citation rate.
Recent attention has also focused on making available the analysis programming code underlying published research findings. This may appear quite straightforward at first glance, but in fact is not without its own complication.
A first complication is conceptual, because arguably little benefit accrues if another researcher simply clicks “go” or “run” and produces the same numbers from the same data and source code as a previous researcher. Arguably, more would be gained by an independent re-analysis, whereby the second analyst seeks to reproduce the reported results from the raw data and without being guided (and potentially misled) by the first analyst’s code. However, should a second independent analysis yield incompatible results, then access to the original source code can help resolve those discrepancies.
But this is where the second complication arises: the technicalities of sharing source code are also non-trivial. There are obstacles that arise from the use of different platforms (Windows vs. Mac vs. Linux), or from issues such as missing external libraries or tacit assumptions about local directory structures. Ideally, therefore, the source code and data should be prepared in a manner that avoids such pernickety and annoying glitches.
A recent article in the Psychonomic Society’s journal Behavior Research Methods has addressed this problem. Researchers Nyström, Falck-Ytter, and Gredebäck presented an “open source scientific workflow system for the behavioral and brain sciences” called TimeStudio. TimeStudio is “a fully transparent system dedicated to the analysis, reproduction, and sharing of quantitative data” that is written in MATLAB and is freely available.
The key idea behind TimeStudio is a graphical user interface (GUI) that permits the pipelining of so-called “plug-ins” that perform individual steps of the pre-processing and analysis of the data. A set of standard plug-ins are supplied with TimeStudio but can also be custom-designed for non-standard data formats or unique analysis steps. Ultimately, the sequence of plug-ins and data constitute a “workflow” that can be shared with other researchers. Because the data and all necessary source code are embedded in the workflow, and because MATLAB is a platform-independent environment, TimeStudio provides an avenue around the technical issues that otherwise hamper source-code sharing.
The final product of a TimeStudio workflow is a data structure that combines the raw data together with all analysis code, and that can be run by anyone who has access to MATLAB. The figure below clarifies this data structure.
So how do you get there? How do you create a publishable workflow in TimeStudio?
After loading MATLAB, you type eval(urlread ('http://timestudioproject.com/install.php')); at the command interface. I typed that command at 7.37am this morning. By 7.39am, installation was complete and the following opening screen appeared in response to two further commands, just as advertised:
A grand total of 120 seconds to download, install, and run a software package isn’t too shabby, is it?
From here on, it’s a matter of adding plug-ins from a list of pre-packaged options that can be chosen from a menu, or of building one’s own plug-ins to insert into the workflow. Alternatively, you can develop a new plug-in. This requires some knowledge of MATLAB, but the process is facilitated by the provision of a template (shown below) into which you can insert your custom-written code:
The details of this process go beyond the scope of a blog post, but TimeStudio comes with a webpage of manuals. I scanned the manual for the creation of plug-ins and it provides ample information to get started, although I did not have time to create my own code for this post.
I did, however, have time to run the sample workflow that Nyström and colleagues refer to in the article. I downloaded this using the File --> Open uiw menu option using the unique access code provided in the article. A short time later, after executing all plug-ins shown for the analysis for all subjects, the following result window appeared:
If you compare the above figure to Figure 4 in the published article, you will notice that they are identical. The figure shows data from an experiment in which participants had to multiply two numbers. The task was either easy (both multiplicands in the range 1-9), medium (range 6-14) or difficult (range 11-19). Approximately 6 to 8 seconds after stimulus onset, pupil dilation increases with task difficulty, as shown above. (The shading in the figure represents the confidence intervals associated with the time series).
One of the strong features of TimeStudio is its integration with a database in the “cloud” that facilitates collaboration as well as sharing of the final workflow. A workflow can be uploaded into this integrated database with five mouse clicks and is assigned a sharable unique ID number. During development, this ID can be shared among collaborators who can contribute to the design of a workflow, until the final version is “locked” (similar to preregistration) to create a stable and lasting record. The ID number of the final workflow can be used in published articles and anyone can then download it and execute it from within TimeStudio—as I did while writing this post. Given sufficient time, I could now check all the plug-ins for errors or glitches, but unfortunately someone else will have to do this. All they need is ts-6b6-27c for the workflow ID … and off you go.