Loading

Discussion

This project is more about process than product. We come in with existing software that services a large portion of research flow cytometry, but seek to reinvent the software for a new market with more stringent procedural requirements. We are familiar with the domain and have existing software, but the translation from lab to hospital requires us to repeat the process with a higher level of integrated validation.

The consensus framework for best practices in software engineering is known as the Rational Unified Process. IBM offers a commercial line of products to facilitate large scale projects, but the value of the system is in the fundamental principles and disciplines it espouses. Small projects can benefit from its concepts without the extensive server-based collaboration tools IBM sells to the enterprise. Indeed the first iterations of team design is to identify the subset of RUP concepts and tools that are applicable to the project. Our interpretation of the RUP principles in this context is explained in our Project Management Document.

For a project our size, the value is that RUP supplies the vocabulary, schedule and milestones. Accounting and task management software are overkill for a project that can fit its team in one room every week. The biggest lesson learned from this process is that project management software is not only fan excessive burden to a small team project, but creates odious roles for those managing the reporting. The decision to discontinue Omniplan is covered in the risk mitigation document.

Rather than recursively dissect projects into subprojects, we are focusing on the work product, and building collaborative documents. Rational tools like ClearCase and ClearView enforce extreme rigor to produce communication documents that can now easily built in web authoring and blogging software. We have found that email threads and wiki's are completely sufficient for our requirements of defining software specifications. The FlowDx project plan is a complex network of interlinking specifications that relies on cyclic graphs, instead of the top down tree structure used in traditional project software. This better models the iterative nature of the project lifecycle, and handles validation with more parallelism and redundancy.

Almost every informatics project has a similar hub and spoke architecture, centered around a database driven document repository. Ours is done in MySQL, using Tomcat modules to employ tools written Java. Various pieces of Ruby push the data through the pipeline. PHP is responsible for the several web interfaces. A pretty vanilla architecture, but easy to deploy, and highly extensible as we refine the content management system.

Prior to building the repository, we surveyed several other options in the repository design process. Biotrue was a option during our consideration, but the company is since defunct. LabKey has an existing repository that knows how to store flow cytometry data, but contains a myriad of proteomics tools and other modules which increase complexity, and remove the flexibility for us to tweak it as we go. Overkill, for our needs. ICTEB protein classification database had attributes we like for constructing experimental dataset to pass to algorithms in high volume. It remains the exemplar for some of the report generation interface. Cytobank seemed a potential choice, but the political aspects of working with that particular lab ultimately led us to the decision to roll our own repository. We have come to the conclusion that the sort of informatics experiments we intend to perform are sufficiently different from other uses of flow data, specifically the creation of intermediate work product (population masks and consensus files) to raw files, warranted a fresh start. Content management systems have become so ubiquitous that we did not have any problem constructing the first repository, and subsequent iterations are well-defined. Our raw storage requirements are still very low, and maintaining good performance is not expected to be a problem. The visitor entrance to the repository is found at project home.

We use FlowJo as both a server side processing tool to work some of the calculations, and can support user interfaces by generating special FlowJo workspaces. These workspace contain both instructions for the user to compele tasks, and a special XML return address, so that when the user closes or saves the workspace, it is returned to the database, instead of written to the file system. Other tools include utilities to export populations, to compare export file to the original FCS files to build popmask, the combination of several popmasks by averaged their mask values, visualization tools to display consensus rating as a parameter in FlowJo, and to gate on it, A description of the various steps is covered in the Workflow Documents, and the static tools discussed in the Tools page.

Much of this preliminary work was spent in learning the Rational Unified Process, administrative planning of the project, and some very laborious manual analyses. The need for a smart repository/analysis tool was discovered during the first grant phase. The design and implementation of this unique tool has postponed the large-scale research and analysis. 

4.1 Project Summary
4.2 Use Cases
4.2.1 Synthetic
4.2.2 GvHD
4.2.3 SIV
4.3 Algorithms
4.4 Metrics
4.5 Infrastructure