2.5 Infrastructure Plan
Database
The pipeline to process files from their original data, through analysis and meta-analysis is simple. Initially anticipated to use a simple file-system-based storage, the project has grown in scope to need a database at the center of its architecture. Support of external classifiers and our workflow necessitates three different intermediate states, as classifications are processed. Comparing classifiers introduces combinatorial explosion. Comparing metrics adds another dimension to the scoring matrix. An automated repository is essential to run large-scale experiments.
A: Storage of data files, workspaces, reports, and other artifacts related to analysis.
B: Database association of files to specific experiments, experts, time series, etc.
C: Ability to generate reports and statistics on various experiments
D: Administrative interface for the admin of experiments, data files, experts, etc.
E: Client-neutral. Can be used as a web application, a plugin in FlowJo, or remotely from R.
Design
The design pattern used in our repository is based on the common LAMP solution stack, using Linux, Apache, MySQL, and PHP in a standard combination. We add Tomcat as a wrapper around FlowJo, our existing cytometry analysis software. This allows us to run analysis on a server in a scripted environment.
A: The MySQL relational database was selected for the secure storage of project data. It is a well-supported, industry-standard database solution.
B: The Apache Webserver in concert with the Tomcat Application Server was selected for application hosting. Apache / Tomcat are common technologies that are very well-suited to the needs of reliable, high-performance data-centric applications
C: The FlowJo Engine has been implemented as a TCP server application [69]. Multiple engine instances run on numerous servers, providing strong scalability and reliability.
D: Open scripting languages, such as Ruby, PHP, or Perl, are used for managing access to engine, analysis results, data collation, etc. The Java language and Eclipse IDE are used in tool creation, and tools are wrapped in Tomcat.
E: The system is designed around Linux servers, but most Unix flavors (Solaris, BSD, OSX), as well as Microsoft platforms, are supported by all of these tools.
Implementation
http://flowdx.com/flowdx.sql
Blogs & WIKI
These contemporary media are very useful for distributed projects, for soliciting feedback over the Internet, and for bridging the many stages of knowledge preparation. Now that the Rational tools have become web-based, there is very little difference between a high-end development tool and a free blogging account.
FlowDx Blog
FlowDx.com
Daily Dongle
Remote Analysis Support
The primary breakthrough of the architecture of this product came in opening it up to remote servers and to remote humans. By using some very simple XML extensions, we are able to create analysis workspaces, send them out to labs, and receive a copy of their analysis. Not only is this a benefit to collaborators on this phase of the project, but also it has great implications for training, validation, testing, collaborations, etc.
Document Repository
2.1.6.1 Software Development Plan
2.5.1 Repository Spec
2.5.2 DB Utilities
2.5.3 Repository of files and populations
2.5.4 DOCUMENTATION
2.5.5 SOPs and Training Docs
2.5.6 Database
2.5.7 Utilities to access and analyze selected populations
2.5.8 Utility to create consensus gates
2.5.9 Utilities to apply classification algorithms
2.5.10 Utility to evaluate automated output compared to manual gating