3.6.1 Preliminary Studies - Progress Report
Back to Phase II Grant Documents
To see more detailed information see Results Index and Discussion
Phase I Grant began August 1, 2008 and ended Sept. 30, 2009.
| Key Personnel: | Title | Dates of Service | Hours |
| Adam Treister | PI | 08/01/2008 - 09/30/2009 | 292.5 |
| Jay Almarode | Chief Engineer | " | 81 |
| Francis Bull | Software Engineer | " | 74.5 |
| Tom Franks | Grant Manager | " | 715.33 |
| Aaron Hart | Application Scientist | " | 92.6 |
| John Quinn PhD | Application Scientist | " | 87.25 |
| Maciej Simm | Application Scientist | " | 88.5 |
| Jill Schoenfeld | Project Manager | 02/02/2009 - 09/30/2009 | 826.8 |
| Jennifer Hanson | Quality Assurance | 06/15/2009 - 09/30/2009 | 14.75 |
| Ryan Minihan | Information Technology | 07/01/2009 - 09/30/2009 | 37.75 |
Figure 1: Distribution of Phase I grant funding by Discipline during the Elaboration phase.
Curves scaled independently: Project Management consumed 80% of project expenses.
Effort on this project has increased since the grant work started. We currently have 4.0 FTEs working on FlowDx.
Tree Star has existing software that services a large portion of research flow cytometry, but now seeks to reinvent the software for a new market with more rigorous procedural requirements. The translation from lab to hospital requires higher quality assurance requirements and a higher level of integrated validation than we have used before. FlowDx is more about the process than the product.
The consensus framework for best practices in software engineering is known as the Rational Unified Process (RUP), which provides the fundamental principles and disciplines of iterative development. The biggest risk of RUP is being overwhelmed by the administration. The FlowDx project team has selected the subset of RUP requirements which apply to our project and have created 50 documents that compose the project plan, all maintained online and editable by the team. This grant application is built from excerpts of our comprehensive Project Plan, available at FlowDx.com. We recognize the reviewer cannot see the depth of the information without hypertext, and that this document is a partial glimpse of a larger knowledge base.
Summary of Phase I Specific Aims- Identify the procedural requirements and algorithms for the high-throughput, automated analysis of clinical specimens using our Use Case data (GvHD, SIV, and Synthetic Data)
- Design software tools for gating based on published theoretical methods of automated analysis (Magnetic Gating, PBCA, ANN, SVM)
- Test the gating algorithms on the Use Case Data and evaluate using Match Ratio
Synthetic Data allows us to model the workflow of our analysis using known results. Synthetic data has two key advantages over real data when the description of algorithms is considered: 1) the correct classification of each event is known; and 2) data can model specific confounding factors that particular algorithms should be able to handle with a prescribed degree of accuracy. Matching particular data sets to algorithms that are well suited to the characteristics of said data should improve the quality of the resulting classifications. Aaron Hart has developed a flow cytometry data simulator, flowsim, which generates synthetic FCS files that contain populations of events, generated by mathematical models. Arbitrary numbers of populations, dimensions, and distributions that can be combined to generate a very diverse collection of synthetic data files.
A single-color antibody titration was modeled to create a set of 25 synthetic FCS files, consisting of five sets of five replicates. Each replicate contained two populations of 5,000 events each. Seven different supervised classifiers, all SVMs and ANNs, were used to process this data following the procedures described in the Classifier Report. The classifiers were trained using the first of the five replicates per group. The number of misclassified events, averaged over four replicates and seven classifiers, is listed in Figure 3. All classifiers were able to correctly identify over 99% of the events up to the point where the data began to lack two distinct populations (population centroids separated by 2.5 standard deviations), at which point they were able to correctly identify greater than 85% of the events. When the populations were completely overlapped and inseparable, the correct classification rate went to 50%, the equivalent of random selection, as expected.

Figure 2. Misclassification rate of the synthetic data set

Figure 3. Misclassification rate of the synthetic data set
Acute graft versus host disease (GvHD) is one of the most significant clinical problems in allogeneic blood and bone marrow transplantation. [45] The potential now exists for combining the advantages of flow cytometry with the power of modern bioinformatics and statistical techniques to determine if there are patterns of cells in the peripheral blood that correlate with a variety of physiologic or disease states, including GvHD. Because GvHD is mediated by donor T-cells and other immune effector cells, lymphocyte populations detectable by flow cytometry can predict development of GvHD.
The data set consists of 31 runs (patients), each with 8-18 time-points representing PBMCs drawn over a time-course and frozen for sample prep with acquisition done on the same day. Each time-point has a panel of 10 tubes prepared with different reagents shown in the following table.

For the first iteration of the analysis of this dataset, Tree Star chose to evaluate a biologically relevant subset of data, panel 2 FCS files from the first patient, and we chose CD3+/CD4+/CD8b+ as the only target population because it was shown to be predictive of acquired GvHD [45]. This decision is in line with the clear goal of having FlowDx provide an automated solution for clearly defined clinical flow assays with very specific target populations of interest. Use of algorithms for exploratory flow cytometric research is not one of the aims of this project. Our first goal was to have Tree Star application scientists perform the gating on this set of files and look at the inter-gater variability.

Figure 4. Overlay of CD3+/CD4+/CD8b+ T-cells gated by several analysts. There is a large
difference between the number of cells selected by contributing lab, and the external analysts.
There is a large variability (134 ± 171) between the number of cells selected by the contribution lab, and the external analysts. This is most likely due to lack of clear guidelines for positioning gates within the data analysis and very imprecise gating procedure instructions (SOP). This will be overcome by confirming the gating strategy with Maura Gasparetto, who did the initial gating. Once Tree Star streamlines the workflow and resolves the consensus gating issues, we will proceed to classify this data using all the algorithms (ANN, SVM, Magnetic, and Probability Bin Clustering). Functional Linear Discriminant Analysis (FLDA) was used to show the biological relevance of the CD3+/CD4+/CD8b+ population as described by Ryan Brinkman, et al. [45]. We will work with Ryan's group to to apply this analysis.
Simian Immunodeficiency Virus (SIV) is commonly studied as a mechanism to learn more about HIV and AIDS. This study is looking at whether immunization with a peptide will increase cytokine response in vitro. IFN, IL-2, & TNF were tracked in CD4+ and CD8+ populations, for six populations of interest.
See SIV Use Case Experiment Description for more details.

Figure 5. CD4+ T-cell IL-2 population gated by 7 different trained scientists from one flow sample shows the gating variability between individual people.

Figure 6. Comparison of the variability of manual analysis, based on level of experience and
training. Shown are the % of CD4+ T-cells with IL-2 responses by individual analyst. Experts
are Tree Star application scientists. Interns are high-school students trained to find these
populations, before and after an improved SOP for the gating.
Aim 2 Describe and evaluate the five published methods of automated cluster analysis:
Subtractive Clustering was removed for administrative reasons; namely, patent restrictions and communication problems prevented us from implementing it.
Magnetic Gates move to accommodate the maximum number of events, by relocating the gate coordinates to the location of the highest density of events. Preliminary results showed some clear limitations of this supervised classifier, especially the tendency to favor large clusters over small ones, thus choosing statistical significance over biologicalsignificance. This is being resolved with referential gating, which can set the position of the gate based on statistical information derived from other gates (control populations).
Probability Bin Clustering Analysis (PBCA)is a method for population identification using Chi-squared analysis, simplified by an innovative binning strategy of creating multi-dimensional spaces by progressive histogram splitting to derive a decision tree [2]. Preliminary results confirmed clear limitations of this unsupervised classifiers, especially in their tendency to favor large clusters over small ones. We do not have the ability to constrain algorithms to specific sub-population calculations in any automated way.
Maciej Simm evaluated a subset of the SIV Use Case Data to find CD4+ T-cells and CD8+ T-cells with induced cytokine responses using FlowJo's probability clustering tool. It was impossible to find the precise subsets, even with extensive manual guidance. Cytokine clusters are outliers whereas FlowJo's binning tool defines clusters based on event density. In samples where cytokine clusters are not there (negative control), PBCA does not recognize the context of the tube, so it fails to base its analysis on the control and spins each tube individually. Using Frequency Minus One (FMO) controls in the analysis would provide better modeling here. PBCA had difficulty analyzing the data in one shot because there are too many parameters in this use case. Many biologically irrelevant clusters were formed, and time was required to figure out which clusters needed to be merged. This process (and errors) varied among tubes, so there is not a straightforward way to automate the process.
Artificial Neural Networks (ANN)aretraditional pattern recognition frameworks that use weighted directed graphs to map input patterns to an output classification [21]. For use in flow cytometry, ANNs can be taught to recognize cell populations in training data and can be applied to identify these populations in test data. There are many types of ANNs with countless variations. John Quinn has tested a set of ANNs that represent that major categories using two platfoms for implementation, WEKA and MATLAB.
Waikato Environment for Knowledge Analysis (WEKA) is an open suite of machinelearning tools that were employed initially to test both the platform and the success ofANNs on flow data. The result of this evaluation was that a set of ANNs classified theSIV data with greater than 85% success using ten-fold cross-validation as a metric forevaluation. The WEKA environment was found to be limiting and was not usedfurther, but the success rate despite the limitations encouraged further study of ANNsusing MATLAB. The same MATLAB procedures were used for both ANNs and SVMs,and the outcomes will be discussed together.
Support Vector Machines (SVMs) are a class of regularized multivariate classificationmodels that are widely used for predictive modeling of multidimensional data. Non-linear boundary problems are addressed using support vector machines by including abasis function that maps the input data into a transformed space that allows a lineardiscriminant to separate the classes.
Using MATLAB, several experiments were performed for calibration and proof ofconcept. In this implementation we considered five ANN variants: a traditional feedforward network, a feed forward radial basis function network, a competitive network(LVQ), a probabilistic network, and a cascade forward network. We used two SVMs,one using a polynomial basis function, and the other a radial basis function. Completedescriptions of all classifiers are available in the classifier report. Initially all sevenclassifiers were used to process the synthetic data to demonstrate that classifyingdistinct populations was a trivial task to classifiers and to associate error rates withexpected difficulty of classification.
Calibration experiments were performed using the SIV data and lymphocyteidentification as a metric. In these experiments appropriate sizes for training vectors,design of training data, and degrees for the polynomial basis function were determined. One thousand event training vectors were selected for stability of result. Training datawas created by sampling from multiple samples assigned to be controls, based onimproved results compared to using a single training sample. All data is available inthe svm report and classifier report.
Positive training events were determined using the average score among experts as ametric. Events with scores of 1.000 were events considered positive by all experts andwere used as exemplars. Events with average scores of less than 0.5 were considerednegative. Choosing events that were universally classified as negative was notselected, as it was experimentally observed that the training data set became over-simplified and the resulting decision boundaries were unintuitive.
The SVM and ANN classifiers were then used to identify the six populations at the bottom of the gating hierarchy, the populations expressing either CD4 or CD8 along with one of three cytokines. Match ratio was used to evaluate the results. Most classifiers were able to achieve an average match ratio exceeding 0.9.

Figure 7: Variance among multiple classification algorithms applied to SIV data.
We also looked at alternative methods of evaluating the quality of various classification results and the match ratio metric itself. The metrics are especially useful for identifying which events are most difficult to classify. We have used the match ratio to identify those events, and then for each user, and each automated classifier, have identified the events classified as positive and negative within the difficult to classify subsets, and created a profile of the events of each class for each user or classifier based on MFI. We then created a profile of universally agreed upon positive and were able to compare profiles to see if a user or automated classifier defined set of positive events, for example, among the difficult to classify subset matched the profile of the universally agreed upon positives. In doing this analysis we have shown that the automated classifiers match the patterns most closely. Figure 8 displays the improvement in minimizing the distance in centroid between the universally classified events and the disputed events for the classifiers versus the experts, averaged over several classifiers and twelve samples.
| % Improvement of classifiers compared to expert | |
|---|---|
| Robust Mean MFI | 38 ± 11 % |
Robust Mean Stdev |
9 ± 8% |
Figure 8: Averages for three pattern recognition tools, and 12 data files for comparison of MFI
both by median and standard deviation for pattern recognition compared to experts.
Both ANN and SVM can be evaluated through MATLAB or R scripts, whereas PBCA and magnetic gates will be calculated by FlowJo.
Aim 3.Test the gating algorithms on the Use Case Data and evaluate using the Match Ratio.
The Match Ratio compares individual results to a consensus of a group: the closer the match ratio is to 1.0, the more closely one gated population is to the consensus. To calculate the match ratio, cells in the classified samples are given a weighted score based on the frequency with which they are included by the analysts included in the consensus. A single analyst’s score is compared with the cumulative score of the consensus group and the match ratio calculated from the sum of the positive and negative events in agreement with the consensus.
This aim has been expanded to describe, evaluate, and implement the five metrics found in this iteration of research (Match Ratio, Mallows Distance [49], Receiver Operating Characteristics (ROC) [15], Misclassification Rate [50], and V-Measure [48]). Having a modular structure will enable adding newly described comparison metrics, as well as the combination of metrics.
Figure 9: One GvHD target population gated by eight experts, compared against the consensus, showing a reasonable level of agreement between these three methods.
Figure 9 shows that the results of each metric agree qualitatively; however, individual metrics may correlate better with different quantitative results, such as robustness, specificity, sensitivity, and accuracy.
Publications and Presentations
This is an open-source project. All information is available to the public as it evolves, through FlowDx.com. We anticipate that publications will result through the use of this methodology, but that they will be comparisons of specific algorithms. The goal of this project is to become a validation mechanism for future innovative classifiers.
The Cytometry Development Workshop at Asilomar, CA, October 2009 was the most recent presentation of these concepts. Presentations were given on Gating-ML, ACS, Synthetic data development, and the repository architecture. This annual meeting is the venue for standards development in cytometry and this project is the first implementation of collaboration through these standards.
Match Ratio was presented as an evaluation metric to the Flow Informatics and Computational Cytometry Society (FICCS) in Seattle, March 2009. [10] This meeting was devoted to sharing approaches to automated clustering in flow cytometry, but we were the only ones talking about how to measure the fit.
Flow Cytometry: Critical Assessment of Population Identification (FlowCAP). [9] Tree Star, as part of FICCS, is contributing to this open project to compare approaches to automated clustering. This project will provide new data sets for extending FlowDx.
Benefits from the Phase I grant process
- Tree Star gained additional staff, leading to broadened capabilities in IT, project management, research, and software development. Very importantly, Tree Star added a dedicated Quality Control/Quality Assurance person for developing validation and verification tests and procedures.
- Increased collaboration with academia has generated fruitful partnerships with bioinformatics [7] and statistical experts [8,9], which led to new ideas in the areas of classification comparison and evaluation metrics.
- A summer intern program was created in response to the American Recovery and Reinvestment Act of 2009. This concentrated our attention on process and preparation of analysis SOPs. Students created high school science curriculumfor teaching cytometry.
- The use of more rigorous project management documentation is teaching us new techniques of planning and design that can be used in all projects.
No completed commercial products have yet resulted from SBIR funding, but we areearly in the development cycle. We aim to have Initial Operational Capability of FlowDx at the end of Phase II funding, but this corresponds to the start of beta testing, not release.