Loading

 

 

2.1.6.2.3 Construction Phase Iteration plan


Iteration Plan for all Phases

The Construction Iteration plan has two distinct phases, with a milestone at Functional Repository. At that point, the workflow can be sufficiently scripted to run experiments within our scope, repeatedly or without human triggering. This is followed by a milestone showing Experimental Results and Initial Operational Ability. Construction Phase ends when Product Development is completed and the 2.1.5.3 Initial Operational Capability Milestone (IOC) has been met.

Repository

A MySQL database is running that interfaces with Tomcat executing FlowJo, but requires a couple of iterations to achieve required functionality. Decisions need to be made on deletion, archival, or maintenance of temporary files created in the workflow. Prototype implementations have relied on batch scripts. Additional software is needed to interface with external data sources and services.

Utilities

Utilities are implemented but need refinement, more testing, documentation, and a way to pipe them together. The design of the project calls for several specific tools:

  • utilities to access and analyze selected populations
  • create consensus gates merging classifications with confidence weighting
  • utilities to apply classification algorithms
  • utility to evaluate automated output compared to manual gating
  • interface to see results
  • interface to administer

Workflow

Preliminary experience quickly showed that comparing typical sized flow cytometry experiments in spreadsheets was not feasible and a central database was required. Functionality within FlowJo enables us to prototype the analysis, but the procedure does not support R scripts, or Matlab analyses that we recognize as useful.

Iterative development of the Workflow Document is ongoing through the first phase of Construction detailing the process of the FCS files, classification of events through manual or algorithmic means, and creation of popmask and tally files, comparison to consensus via the metrics, and reporting of results. Additionally the workflow document provides the links to QA and Documentation teams.

Milestone: Functional Repository
Automated algorithms that will require no user interface except to select the set of FCS data files, AssayID, and algorithm to process the files. Finalized Workflow Document.

Once the mechanisms are satisfactorily operational, we can continue the evaluation of the algorithms and comparison metrics using the use case data in the second part of the Construction Phase.

Evaluating Human Analysis

Evaluate whether there is low enough variability within the group of experts comprising the consensus to have a valid consensus as agreed upon by the PI, collaborators and the project team.
Work Product: Collaborator, PI, and project team approval of the consensus gates defined by the group of experts manually gating the use case data.

Completing Gate-a-thon. Have experts manually gate the whole set of use case data for each use case. This will provide the gold standard to compare the automated classifications again. This step will only be done after the contributing scientists have evaluated the initial set of manually gated results, consensus gates and automated classifiers and have approved of the level of variability and the SOP for gating. This will be using the FlowDx repository and database and the utilities required to automatically save FlowJo workspaces using special FlowDx XML to the FlowDx repository and create records in the FlowDx Database. Potentially have the contributing labs have some technicians do the gating to contribute to the consensus.
Work Product: Consensus population files for all samples for all use cases loaded into the repository from >5 experts, one or more of which should be from the contributing collaborator's lab.

Tuning the Metrics & Supervised Classifiers

Evaluating Algorithmic Analysis utilizing all 5 comparison metrics and biological relevance. Examples of Biological relevant meta-analysis: Linear Spline analysis of the GvHD dataset or the time course analysis for the SIV data set. The biological evaluation report will vary with the use case. Iteratively analyze how the combination of metrics may change the sensitivity / accuracy of the measurement of algorithm success.
Work Product: Report of the algorithms results using all 5 comparison metrics and showing all the values for all samples and all target populations. Report showing the biological evaluation.

Scaling Up Algorithmic Analysis

Taking the automated methods and creating a way to crunch through the use case data without manual intervention. For ANN and SVM this will be done by writing R scripts to do the neural nets and submit the results directly to the FlowDx Database. For Magnetic Gating and Probability Bin Cluster Analysis this will be accomplished by setting up GatingML definitions and applying them to use case data through FlowJo command line to analyze use case data and submit the results directly to the FlowDx database.

Iteratively improve the algorithms to find the target populations with more reproducible results. Implement flags for difficult samples that have strange distributions such as debris, wrong compensation, and other Assay QC failures.

Work Product: Filling up the matrix of results showing: use cases vs. classification methods x comparison metrics. Results show which algorithms perform better than others for each use case and each comparison metric.

Milestone: Experimental Results and Initial Operational Capability

Simplified Goals for Construction Phase


Use Cases
Completing the Gate-a-thon


Algorithms

Scaling up algorithmic analysis
Evaluating Algorithmic Analysis
Improving Algorithmic Analysis


Metrics

Evaluating Human Analysis (Consensus)
Filling up the Matrix of use cases x classification methods for each comparison metric.
Automate the application of metrics.
Run experiments to determine suitability of metric by use case.
Work with collaborators to resolve disagreements and to set rules for handling ambiguous events.
Use metrics to build training sets and to run supervised classification with multiple net architectures.
Simple application to demonstrate measuring differences in clusters and to build training sets
Refined Code for DB tools to apply measurements
Use Case Reports each show optimal hybridization of metric.



Several iterations will occur in the Construction Phase of the project.    There will be one overall iteration plan that refers to sub-projects and the individual sub-projects can refer to the iteration plan.  When work products are completed, they should get referenced in the Subproject Methods, results and discussion project plan documents.

These are the sub-projects:

2.1 Project management Discipline
2.2. Use Cases
2.2.1 Synthetic
2.2.2 GvHD
2.2.3 SIV
2.3 Algorithms
2.4 Metrics
2.5 Infrastructure (Requirements, Analysis & Design Disiplines)
2.6 Workflow
2.7 QA (Test Discipline)
2.8 Engineering Disciplines
Business Model (includes the Collaborative Process & Commercialization)
Implementation
Deployment


Iteration #1 will be to build a version of the FlowDx database and repository on a Linux machine, Load all the FCS data files into the repository, Load all the Gate-a-thon workspaces, create the popmask files, and create the consensus gates.  Load the existing magnetic gating results and PBCA analysis from FlowJo 8.8.6 into FlowDx database to show the ability to export gating results from other analysis software.  Load results from ANN and SVM into the FlowDx database.   Simultaneously, the Use Case data workspaces from the collaborators will have to have the populations exported from the extisting .jo files and loaded into the database.  This will require a utility to load .popmask files from FlwoJo Mac workspaces (.jo files)  and external

Iteration #1 broken into sub-projects:
2.1 Project management Discipline
Cement new collaborations with Richard Scheuermann (FLOCK), Richard Konz (clinical investigator connection), Ohio State U (Wayne...algorithm through Nick), CytoBank (Nikesh & Gary Nolan).  Continue to track tasks and time spend on the project.  Keep documentation and reports of project progress.   Communication of the progress to collaborators, PI and team members through Project Plan documents, Flwodx.com, and flowdx blog as appropriate.
2.2. Use Cases Utility to batch export populations from contributors's Mac FlowJo .jo workspaces and load into the DB and repository
2.2.1 Synthetic
Create data to model different distributions of data as described in the Synthetic Data Discussion
2.2.2 GvHD
Refine the gating SOP with Clay Smith and/or Maura.  Update the remaining Macintosh FlowJo .jo workspaces to trim the excess fcs files and gates away.  This will be required before loading into the database.  Create "seed" workspaces for the entire data set, one workspace for each patient's samples.
2.2.3 SIV
Refine the gating SOP with Michelle Lifton.  All FCS files and previously analyzed workspaces will need to have the primate IDs de-identified. Create "seed" workspaces for all 6 time-points for the next iteration of analysis. 
2.3 Algorithms
Test FJML tethered magnetic gates in Mac version of FlowJo.  Request tethered gates as a feature in Java version of FlowJo. Look at the suggested improvements to PBCA to see if they are worthwhile.  Explain workflow to new algorithm collaborators and build interface to allow their results to be added to the repository/DB.  For Magnetic Gating and Probability Bin Cluster Analysis, set up GatingML definitions and apply them to use case data through FlowJo command line to analyze use case data and submit the results directly to the FlowDx database.
2.4 Metrics
Create utilities for V measure, Mallows, PCA, and Misclassification to interact with the repository/DB
2.5 Infrastructure (Model, Requirements, Analysis & Design Disiplines)
Keep updating the model, requirement, and design of the repository/ DB and utilities if required to perform the tasks described in this iteration.  Keep documentation up to date: Specs, user manuals, and user instructions.
2.6 Workflow
Update the Workflow document to describe the repository/DB, utilities, and processing of the files. 
2.7 QA (Test Disipline)
QA test reports and documentation of the testing procedures of the repository, database, and utilities.
2.8 Engineering Disiplines
Business Model (includes the Collaborative Process & Commercialization) - hopefully no changes in this iteration.
Implement components interacting with the FlowDx repository/DB: Utilities of all the comparison metrics, interface to external algorithms, importer from Macintosh .jo FlowJo workspaces, user interface for creating meta-analysis and reports.
Deployment - no tasks for this iteration

Goal for Iteration #1: to have all the existing data loaded into the database and have all the utilities required to load and these analyzes, create popmask and tally files, as well as the utilities to apply classification algorithms and comparison metrics. Work Products: Working Version of the DB, repository & utilities with Specification, Code, Schema, Location, Contents and Status, URL of interface to see results, and SOPs for use and training documents.  Evaluate the success of this iteration before proceeding to Iteration #2.

Iteration #2  Evaluate the consensus gates to see if eliminating one or two gaters will minimize variance. Evaluate all the data in the database against the consensus gates using all the comparison metrics on the existing results. Determine how the comparison metrics will be reported: Choices are to report as Average +/_ stdev or ranges for sample, target population, timepoint and other variables.  The focus of this iteration is to do the meta analysis of the existing data using the repository and utilities.  During this iteration, there will unlikely be bumps and refinement of the workflow, DB and utilities.  As of Dec. 2009 it is difficult to predict the details.
2.1 Project management Discipline - Report the results to collaborators and incorporate feedback to the future iterations.  Keep documentation and reports of project progress.   Communication of the progress to collaborators, PI and team members through Project Plan documents, Flowdx.com, and flowdx blog as appropriate.
2.2. Use Cases
2.2.1 Synthetic - data description and files ready for outside groups to investigate.
2.2.2 GvHD - data description and files ready for outside groups to investigate.
2.2.3 SIV - data description and files ready for outside groups to investigate.
2.3 Algorithms- run GvHD through ANNs? or maybe wait til next iteration
2.4 Metrics - run all comparison metrics on all meta-analysis
2.5 Infrastructure (Requirements, Analysis & Design Disiplines) -
2.6 Workflow - process all FCS files and gate-a-thon and ANN/SVM data through the workflow.
2.7 QA (Test Discipline) - keep testing the system.
2.8 Engineering Disciplines
Business Model (includes the Collaborative Process & Commercialization) - update the collaborative process.  Joern Schmitz didn't really want to see all the mathematical information about the comparison metrics during conversation in Nov. 2009.
Deployment - plan to have collaborators have access to the repository/db interface for meta-analysis.
  
Goal for Iteration #2:
 To fill in the matrix of use case data x algorithms x comparison metrics for the existing analysis collecting during the elaboration phase of the project.  Each of the algorithms, use cases, metrics and intrastructure(repository, db, project plan & management) will have new documents which will show results of this iteration and plans for iterations #3 and #4. 

Iteration #3 Now that all the data is loaded into the repository and all the utilities are functional, now is the time to do more manual analysis of the data with FlowJo workspaces using special FlowDx XML to automatically save workspaces to the FlowDx repository and create records in the FlowDx Database. If we can have the contributing labs technicians do the gating then we will know the acceptable limits of the consensus.   Run each algorithm/automated classification method against each use case.  Do the meta-analysis of comparing each method against the consensus using all the comparison metrics.  The end of this iteration requires a longer stage for report writing, evaluation of results to determine direction of future iterations, and discussion with collaborators  This is a written report that goes out to the collaborators for their approval.  
2.1 Project management Discipline - Keep documentation and reports of project progress.   Communication of the progress to collaborators, PI and team members through Project Plan documents, Flwodx.com, and flowdx blog as appropriate.  Make sure documentation is acceptable forFDA Premarket Approval process.
2.2. Use Cases
2.2.1 Synthetic
2.2.2 GvHD
2.2.3 SIV
2.3 Algorithms - Incorporate results from outside algorithms
2.4 Metrics - Use all defined in last iteration.  Evaluate new metrics.  Look at which metrics perform better to evaluate specificity vs. sensitivity.
2.5 Infrastructure (Requirements, Analysis & Design Disiplines) - keep refining if required.
2.6 Workflow
2.7 QA (Test Discipline)
2.8 Engineering Disciplines
Business Model (includes the Collaborative Process & Commercialization)
Implementation
Deployment
Goal for Iteration #3: To fill in the matrix of use case data x algorithms x comparison metrics for the existing analysis collecting during the elaboration phase of the project.  Each of the algorithms, use cases, metrics and intrastructure(repository, db, project plan & management) will have new documents which will show results of this iteration and plans for iterations #4 and #5. Look at how each algorithms is performing on each use case.   Evaluate the algorithms and their success at finding the target populations. Which ones have low values from the comparison metrics.  Determine why. 


Iteration #4 Refine the algorithms on each use case.  Iteratively improve the algorithms to find the target populations with more reproducible results. Implement flags for difficult samples that have strange distributions such as debris, wrong compensation, and other Assay QC failures.Goal at the end of this iteration is to have clear guidelines for the next iteration.
Initial Operational Capability Milestone

Last Iteration concludes with the:
1. Software and supporting documentation are acceptable to deploy.
2. Stakeholders (and the business) are ready for the system to be deployed.
3. Risks and problems are continuing to be managed effectively.
4. Current expenditures are acceptable and reasonable estimates have been made for future costs and schedules
5. Detailed iteration plans for the first few Transition iterations, as well as a high-level project plan, are in place
6. Collaborative Process Documentation for Clients.
7. Hopefully some type of publication with one or more collaborators.

iteration image

Figure 1 Example of Iteration of the FlowDx Project.
Each iteration will look essentially the same but will have better understanding of the requirement and a more-complete set of results for all use cases and algorithms.