Loading

 

2.6 FlowDx Database Workflow

The database is a central repository of:

  1. Sample data resulting from flow cytometry experiments  (FCS files)
  2. "Seed" workspaces - like a template for the analysis, but has the FCS files loaded.
  3. Manual analysis of the data
  4. Automated analysis of the data
  5. GatingML for each workspace?
  6. Exported target populations as complete FCS files
  7. Exported target populations as .bitmask files (popmask files)
  8. Tally files representing consensus gates for the target populations.
  9. SOP documents for each Use Case (Assay)
The database is “write once”. After information is recorded, it can not be deleted or changed.

Step sequence for FlowDX Research Database

1) Generate Experiment Workspace: Experimenter can request a FlowJo workspace from the database by specifically asking for a list of the "seed" workspaces associated with a use case. This list can then be e-mailed to specified users by creating a web form with the pertinent information for the experimenter to review before sending to the specified user. These users will then be authorized to save their results, through the Save Utility, back to the database and its repository. The idea for this phase is to have Jill provide the seed workspace and have Ryan M. update the FCS sample URIs to point to the repository and append the extra FlowDx XML to the end of the file. Eventually we'll create the "seed" workspaces on the fly.

Software process: This workspace will have additional <FlowDx> metadata added to communicate back from remote locations to our server, and to invoke a special 'save' operation. This “save” parses the existing xml and adds:

  1. XML for the AssayID
  2. AgentID
  3. TargetPopulations
  4. Timepoint
  5. PatientID
  6. TagetSampleGroups

This step is implemented from within FlowJo, currently on the Debug menu. This location will probably change when the software is closer to implementation.
Menu choice is "Submit FlowDx Assay Workspace" in the debug menu of 7.6beta.

Input:
a) Pointers to specific use cases

Output:
a) A FlowJo workspace file on our server (with database update)
b) Entry into the wspfiles table with filename, URI, AgentID, AssayID, and date submitted

Tests:
a) Request a workspace of known parameters from the database. Confirm correct parameters.
b) Request a workspace from the database. Make changes in a correct manner. Add e-mails and agent IDs to experiment workspace before sending. Add extra FCS samples to workspace before sending. Confirm database updates correctly.
c) Request a workspace from the Database. Upon saving, attempt to record invalid info back to the database. Results should fail.
d) Attempt to delete information from database. Confirm that delete fails.
e) Request a workspace from the database to send to a specified user in an e-mail. Specified user will make changes and save back to the database. Confirm sending of e-mail, confirm updates to database, and confirm the updates are the same as what the user sent.

2) Submit Experiment Workspace A Manual Gater invokes "Submit FlowDx Assay Workspace" from within FlowJo Debug menu. The interface requests incident information, i.e., Agent ID, Date, AssayID, gating method (manual or algoritm), Agent Role (expert or novice), and Notes/comments. This information is saved as metadata in .WSP file to the database tables and its repository.

Software process: Handled by java code in EX version of flowjo.jar, since it is initiated by a workspace ‘save’ operation.

Input:
a) Files for experiment, .WSP, .FCS
b) Support information, Agent ID, Date, AssayID, gating method (manual or algoritm), Agent Role (expert or novice)

Output:
a) Workspace file on server file system with corresponding database wspfile table update,
b) Gating ML file (with database update)
c) Additional database updates for experiment name, populations of interest, (lets assume at this point that the Assay is defined already (if they are already defined, then where is that info kept?)), date, notes/comments.

Tests:
a) Attempt to save incorrect data with WSP file -- i.e., letters in date field, letters in agent ID field. Results should fail.
b) Save a correctly formatted WSP file to database. Confirm correct formatting. Confirm all fields are updated, experimenter name, date, notes/comments.

3) Generate TagetPopulation Files (.fcs and .Popmask) Administrator can invoke a script that queries the database for workspaces without a popmask. It will then apply an script to the workspace, create a popmask and .fcs files for each population for each sample file, and save the results, through the Save Utility, back to the database and its repository. Eventually this process will be triggered automatically when a workspace is submitted to the server, but do manually for now.

Software process: This step is implemented by FlowDxExecuter.java and can be invoked with the command line arg ‘–flowDxExport’. This step could be implemented in a general-purpose scripting language. It would execute a database query and then invoke “command line” FlowJo).

Input:
a) Command line request by Administrator or database trigger (or servelet?)
b) .WSP

Output:
a) many .FCS files to repository and updated in file table (with starting .fcs reference, targetpopname, generating .wspname, agentID, starting .fcs fileID, $TOT, each event# will have original event#id)
b) .popmask files on the server (with database update wi).

Tests:
a) Save a WSP. Confirm that a popmask is automatically generated.
b) Save a WSP with gates. Query database for WSP’s without popmasks. Confirm that corresponding popmasks are generated.
c) Query database for WSP’s without corresponding popmasks. Results should be null.

4) Generate Tally The Administrator invokes the database to detect popmasks. The database will then apply an algorithm to create a corresponding tally (consensus probability for each event for a specific gate) and save the results back to the database and its repository. How do we generate tally files that include .popmasks from other sources such as R, ANN, SVM?  These files won't refer to a .wsp file or could we create a dummy .wsp file?

Software Process: Since popmask files are ascii, this step will be implemented in what language. 

Input:
a) Algorithm request by Administrator
b) Popmasks - each file referencing a wsp file and a target population

Output: database updates with tally results.
a) .tally files for each target population added to repository - references the .wsps included and the target population
b) new records in a DB table that will link to .wsp files and targetpopulationID and assayID

Tests:
a) Save popmask without tally. Query the database for popmasks without a tally. Results should automatically generate tally and update the database.
b) Query database for popmask without a tally. Results should be null

5) Generate Match Ratio Administrator can invoke a script that queries the database for tallies without a match ratio. It will then create one and will save the results back to the database and its repository.  Jill was envisioning this step as a user querying the DB to calculate the match ratio on the fly (if this is the case, wouldn't this step be eliminated and instead be included with step #6, reports?).  Match ratios can be calculated for any combination of consensus gates (tally files) to include or exclude any manual or algorithm gate on a target population for a starting FCS file.  This is an incredible # of combinations, so there is not one match ratio result for a population or FCS file.  However, we could store the Match Ratio of each algorithm against the consensus of expert gaters as defined by the Grant.

Software Process: Implemented in MatchRatioCalculator.java for in-memory .fcs populations.  Should we modify this to process other input streams (.popmask files)?  Aaron Hart is working on this in python to take input files for the consensus and for the 

Input:
a) Algorithm request by Administrator: defined by Assayid, fcs file(s), target population(s) and list of gaters or algorithms for consensus and a list of gaters and/or algorithms to calculate MR.
b) Request will generate the query to get the appropriate tally files for the consensus.

Output: Database updates with match ratio results.

Tests:
a) Query the database for tallies without a match ratio. Confirm that match ratios were generated.
b) Query database for tallies without a match ratio. Result should be null.


6) Generate Reports An Experimenter can invoke the Export/Report Utility and create reports or extract other information for research outside of the database. Specify range of information available.

Input:
a) Queries request by Experimenter
b) Files server from databas

Output:
a) Reports
b) WSP
c) Other output files Specify range of information available.

Tests:
a) Query the database for all WSP files. Results should display all known WSP files.
b) Query database for all files with a match ratio. Result should display all known match ratio files in requested format.
c) Query the database for a list of FCS files
d) Query for all WSP files that correspond to specific use cases. Verify we have enough supporting data for our conclusions.
e) Query for Tallies and separate into use cases
f) Query for match ratios separated into use cases
More queries are listed below

 

Roles: (are these roles in relation to the DB repository or to the project?)

Expert Gater: Creates manual gates on templated "seed" workspaces.
Experimenter: Turns use cases into templates. "seed" workspaces
Scores workspaces against a specified control.
Builds controls from consensus of multiple gatings.
Administrator: Oversees status of database, and creation of "seed" workspaces. Should this be included in an additional step above? Somehow the database needs to be poplulated.

Use Cases (Assays):

GvHD: SOP, Workspace, subcase
SIV: SOP, Workspace, subcase
Synthetic: SOP, Workspace

Subcases:

GvHD: time points, subjects, outcomes, runs
SIV: monkeys, time points, treatments

Definitions:

WSP – workspace file generated from FlowJo. Contains FCS files and gates created by experts and experiments.
FCS – file generated from a flow cytometer
Popmask – Population mask. Listmode file of contents of samples, clustered by Flowjo or by other clustering clients/processes, labeling events as in or out of the target population.
Tally (Consensus Gate)- Sum of the probabilities of inclusion of events in a population based on a group of .popmask files selected all from one starting .fcs file and one target population.
Match ratio - Comparison of events in a popmask to experts’ manually-created “consensus gates.” Used to objectively quantify “goodness” of gates compared to expert consensus.

more diagrams:

 


Sample queries(from Jill):

1) for SIV, show me a list of the available .wsps from the db and the target pops and allow me to select which to use for the consensus and which .wsp to compare against the consensus.

2) build a consensus from the expert gaters for all fcs files for all target populations for V. Let me look at these files and see how many events have a probability of inclusion of 0.5 
(==ambiguous inclusion in the gates) 
Does one fcs file stand out as being difficult to gate?

3) compare all CD4 IL-2 populations (including those from algorithms) against the consensus of interns. 

4) compare all analysis of GvHD against Mario's gating (build tally with just Mario's pops)


5) show me the consensus files (= tally file) for all manual gaters for SIV for all target populations and all fcs files.


6) create a .wsp file with the CD4 IL2 populations (exported .fcs files) for SIV monkey 1 for timepoint March 29, 2007 and group by agent role (expert, intern or algorithm)
This will enable an overlay plot. 

7) show me the match ratio results of query 6 using the experts as the consensus

8) show me the match ratio results of query 6 using the interns as the consensus.

9) calculate match ratio of GvHD CD4, CD8b population
using experts as the consensus for patient H.D.

10) calculate the match ratio of ANNs on synthetic data using John, Aaron and Maciej as the consensus

11) show me all .wsp and populations from the Asilomar gating.

12) show me all the populations from all workspaces for fcs fileID