Loading

 

3.2.1 Synthetic Data Results

2.2.1 Synthetic Data Description
4.2.1 Synthetic Data Discussion

Synthetic data sets created by the Python script, Flowsim, have been analyzed by three methods: manually, using FlowJo's Cluster Analysis Platform on the Macintosh, and with Artificial Neural Networks (ANN).

The first set of data consisted of one fcs file and a prototype analysis showing the differences between manual gating and and automated gating, as shown in Figure 1.


Figure 1: Three overlays are shown to demonstrate the type of results that can be achieved with Flowsim data. All three plots are overlays, with the total contents of a simulated file in black, and a population of interest identified in gray. From left to right, the populations are identified by: A) the correct classification, B) the result of a cluster analysis, and C) the result of a manual gating strategy. In this case it is apparent that even though cluster analysis and manual gating provided answers that are very close in magnitude, manual gating is superior to the clustering algorithm. This superiority is a result of the fact that this particular clustering algorithm struggles to identify populations that are not well resolved in at least one dimension, a common occurrence in flow cytometry because of improper compensation settings.

The 25 file set of synthetic data modeling of a single-color antibody titration was evaluated by manual gating, as shown in Figure 2.


Figure 2. Five levels of simulated fluorescence were modeled with five replicates at each level of intensity, for a total of 25 simulated files. This shows the manual gating with the range of Match Ratio values for the set.

Artificial Neural Networks (ANN) Results
For the synthetic data sets, the first of the five replicates per group was used as training data for the remainder of the set. The synthetic data were processed by seven total classifiers, as described in Automated Classifier Report. The average number of misclassified events per set is displayed in Figure 3 and listed in Table 1. All classifiers were able to correctly identify over 99% of the events, up to the point where the data began to lack two distinct populations (population centroids separated by 2.5 standard deviations), at which point they were able to correctly identify greater than 85% of the events. When the populations were completely overlapped and inseparable, the correct classification rate went to 50%, or the equivalent of random selection as expected.


Figure 3. Misclassification rate of the synthetic data set


Table 1: Misclassification rate of the synthetic data set