Standard Deviation Histogram Flow Cytometry

Curr Protoc Cytom. Author manuscript; available in PMC 2015 Jul 1.

Published in final edited form as:

Flow cytometry panel design: The basics. Multiplex flow cytometry experiments need the right combination of fluorophores. “Multicolor Flow Cytometry Panel Design” by Dr. Holden Maecker of Stanford University. Additional publications on this topic are available 2–4. When designing a multicolor flow cytometry panel, there are.

doi: 10.1002/0471142956.cy1312s54

NIHMSID: NIHMS242206

The publisher's final edited version of this article is available at Curr Protoc Cytom

See other articles in PMC that cite the published article.

Abstract

This protocol describes microsphere-based protease assays for use in flow cytometry and high-throughput screening. This platform measures a loss of fluorescence from the surface of a microsphere due to the cleavage of an attached fluorescent protease substrate by a suitable protease enzyme. The assay format can be adapted to any site or protein specific protease of interest and results can be measured in both real time and as end point fluorescence assays on a flow cytometer. End point assays are easily adapted to microplate format for flow cytometry high-throughput analysis and inhibitor screening.

INTRODUCTION

Proteases, proteins that break down other proteins through the hydrolysis of peptide bonds, are a large and diverse group (). The protease family of proteins is considered to be one of the major targets for small molecule drug discovery (). Initially, proteases were primarily thought of as protein degrading enzymes; however, it is now clear that proteolytic mechanisms are highly regulated components of cellular signaling pathways. The improper regulation of specific human proteases involved in cellular signaling can lead to human diseases including inflammatory disease, thrombosis, osteoporosis, cardiovascular and neurological disorders, and increased growth and metastasis in specific cancers (). Proteases of clinical significance include human proteases which are improperly regulated (), bacterial pathogenic proteases that mediate the effects of toxins (), and viral proteases that process viral precursor proteins controlling viral life cycles ().

Many protease assays are based on fluorescence resonance energy transfer (FRET), where a short peptide containing a protease cleavage site is placed between two fluorophores which are FRET pairs. The small peptides typically used in FRET assays fail to address protease/substrate recognition elements distal from the protease cleavage site, which in many cases account for protease/substrate specificity (; ; ; ). These distal elements may also be areas of interest for pharmacological inhibition of specific proteases (; ). Using microsphere-based flow cytometry assays it is possible to use full-length protease substrates in a protease assay (), which may allow detection of inhibitors of protease/substrate interactions distal from the protease active site or substrate cleavage site. Flow cytometry also enables the use of multiplex microsphere sets to assay several proteases in the same assay volume. Adaptation of a flow cytometry assay to high-throughput screening applications makes a cost effective and robust platform for drug screening against target proteases (). This unit describes the basic protocol of microsphere-based flow cytometry protease assays from substrate preparation, attachment to microspheres, data collection, and adaptation to high-throughput screening.

STRATEGIC PLANNING

Substrate preparation

Recombinant fluorescent proteins that serve as protease substrates in these studies have a biotinylated lysine residue at one end and a green fluorescent protein (GFP) at the other end. Only in this configuration will a loss of fluorescence be detected from the surface of the microsphere upon proteolytic cleavage (Figure 13.12.1). Sub-cloning, expression and purification from E. coli is our preferred method of obtaining protease substrates, using protein attachment tags at one end and a fluorescent protein expressed on the other end of protease substrates. In the case of proteins not capable of being expressed in E. coli, other approaches such as mammalian or insect cell expression followed by protein purification is a valid method as well. In this case, it should be noted that the substrate will not be biotinylated in vivo during expression if biotin-avidin attachment chemistry is being used. Only E. coli will biotinylate in vitro using either the Promega PinPoint^™ system or other biotinylation sequences and bacterial strains from Avidity LLC and Life Technologies, Inc. A short amino acid sequence can be used to biotinylate proteins expressed from other systems in vitro after protein purification using the BirA biotin ligase enzyme available from Avidity LLC.

A. Protease substrates are designed to have purification and attachment chemistry (in vivo biotinylation tag) at one end, and a fluorescent protein (GFP) at the other end with a protease cleavage site or full length protein substrate sub-cloned in-frame between them. B. Purified biotinylated protease substrates are bound to streptavidin coated microspheres and the protease of interest is added. For simplicity, only one protein is shown bound to a microsphere (not to scale), whereas in the experimental conditions each microsphere will have between 100,000 and 1 million fluorescent substrate molecules bound as determined by GFP and FITC standard microsphere sets. Cleavage of the protease substrate occurs and is measured as a loss of fluorescence from the surface of the microspheres via flow cytometry.

Choice of attachment chemistry is also an important consideration, as high affinity binding pairs will use less substrate to label the microsphere efficiently and will stay bound on the same microsphere for longer periods of time. Most work on these assays to date uses biotin-avidin attachment by virtue of a biotinylation tag expressed on the N-terminus of the protease substrate (; ). Much of this work has been done by modifying the Promega PinPoint^™ protein expression plasmid, by sub-cloning GFP C-terminal to the multiple cloning site in this plasmid. Bacterial strains such as BL21 (DE3) pLys S efficiently biotinylate this amino acid sequence specifically near the N-terminus when expressed and specific biotinylation can be increased using the AVB 101 bacterial strain available from Avidity LLC. Purification of biotinylated protease substrates from bacterial cell extracts can be done on SoftLink^™ streptavidin resin available from Promega. A brief description of a typical protein purification is described in Support Protocol 1. Furthermore, biotinylated substrates are easily attached to avidin or streptavidin coated microspheres (). While it is relatively straightforward to covalently couple streptavidin to carboxyl functionalized microspheres (Bangs Labortories, TechNote 205), it is also possible to purchase streptavidin coated microspheres at high concentrations from multiple commercial sources.

Other potential methods of substrate to microsphere attachment include using polyhistidine tags for both protein purification and attachment to Ni²⁺ coated microspheres (), or GST fusion protein attachment to microspheres containing reduced glutathione (). Due to the nature of this assay, which measures loss of fluorescence, avidin/streptavidin microspheres with biotinylated substrates are recommended over other attachment chemistry due to their tight binding with little or no dissociation from microspheres over periods of days, as will be highlighted in the protocols here.

One important consideration when using biotin-avidin attachment is to remove all free biotin from the protein preparation. Softlink streptavidin is eluted using 5 mM biotin, which will effectively block streptavidin or avidin microspheres from binding biotinylated protein. Protein purifications must be filtered or dialyzed to reduce the amount of biotin as much as possible (at least to femtomolar levels) to prevent free biotin binding to avidin or streptavidin from blocking biotinylated protein binding. Typical dialysis conditions to effectively remove biotin are 1 to 12,000 volumes, or 1 ml protein sample dialyzed against 12 L of biotin free buffer. Serial dialysis against 3 L of buffer at each stage improves the result.

Choice of Microspheres

Most commercially available flow cytometers are capable of easily detecting microsphere sizes between 1 μm and 30 μm diameter in forward and side scatter bivariate plots. It is possible to conjugate streptavidin to microspheres using established protocols, or to purchase microspheres with attachment chemistry already conjugated at high levels from multiple sources (e.g., Spherotech, Bangs Labs, Luminex). Microsphere choice may depend on costs of the microspheres or ease of preparation. If multiple protease assays will be performed in the same reaction volume via multiplex microsphere sets, an appropriate multiplex microsphere set must be chosen for conjugation with attachment molecules or purchased with attachment molecules present. Multiplex microsphere sets consist of microspheres bearing different intensities of fluorescence in a specific fluorescence channel, which can easily be distinguished in a histogram plot of fluorescence intensity. If multiplex microsphere sets are being used they should fluoresce in a different channel than that of the protease substrate. Streptavidin coated multiplex microspheres purchased from Spherotech Inc. (SVFA-2558-6K) perform well and are used in the assays described in this unit. Other sources or attachment chemistries are available and should be appropriate for the application and protein purification strategy. All protocols here describe biotin/streptavidin chemistry.

BASIC PROTOCOL: MICROSPHERE-BASED PROTEASE ASSAYS BY FLOW CYTOMETRY

This protocol describes the binding of fluorescent protease substrates to microspheres, washing of the microspheres to remove all protease substrate still in solution, as well as real-time and end-point fluorescence measurements by flow cytometry for specific protease cleavage.

Materials

Protease buffer (50 mM HEPES, 100 mM NaCl, 1 mg/ml Bovine Serum Albumin and 0.025% Tween-20 pH 7.4).
Purified and dialyzed biotinylated fluorescent protease substrate (see Support protocol 1)
Purified or purchased protease of interest
Microspheres functionalized with streptavidin or avidin at a stock concentration between 10⁶ and 10⁹/ml (Spherotech Inc.)
Microcentrifuge tubes (1.5 ml)
A mixing device such as a rotator or Nutator^®
A microcentrifuge capable of spinning microcentrifuge tubes between 13,000 and 16,000 × g
A flow cytometer capable of measuring EGFP fluorescence (for example, with excitation at 488 nm and measuring 505–550nm fluorescence)
NOTE: If multiplex microsphere sets are used the cytometer must be able to detect the fluorescence channel of microsphere fluorescence as well as substrate fluorescence.

Bind fluorescent protease substrates to microspheres

Place protease buffer, approximately 10⁵ to 10⁶ microspheres and the appropriate concentration of protease substrate for near microsphere saturation (as determined by protein titration in Support Protocol 2) into a microcentrifuge tube, in 500 μL total volume. If multiplex microsphere sets are being used set up in separate microcentrifuge tubes for each set.
Once specific protease substrate binding to microspheres has been demonstrated as in Support Protocol 2, a concentration of substrate which results in a high degree of specific binding to microspheres must be chosen for use in protease assays. Typically, this amount will be near the saturation point of the microsphere with a low level of non-specific binding observed in the sample with excess biotin at the same concentration. In the sample protein to microsphere titrations shown inFigure 13.12.2, a concentration ~ 100 nM biotinylated protein would be used for ppGFP and 300 nM for SNAP-25 GFP to achieve the desired level of fluorescence. Increasing the amount of microspheres used in binding substrates by up to two orders of magnitude does not appear to have much effect on mean/median fluorescence values observed on the microspheres, most likely due to the low number of binding sites on microspheres compared to the protease substrate in solution.
Biotinylated GFP protease substrate titration onto streptavidin coated microspheres (Spherotech Inc.). A ppGFP, a protease substrate for the protease factor Xa, bound in increasing amounts to streptavidin coated microspheres blocked with biotin and unblocked streptavidin coated microspheres. Specific binding (green) is determined by subtracting the mean green fluorescence of the blocked samples (red) from the unblocked samples (black). This protein preparation had very low non-specific binding. B Protein titration for biotinylated SNAP-25 GFP, a substrate for the Botulinum neurotoxin type A light chain protease, onto streptavidin coated microspheres. This protein has more non-specific binding than the ppGFP protein in A (above) but has acceptable levels of specific binding. Optimal substrate binding conditions for these proteins in protease assays would be ~ 100 nM for ppGFP (A) and 300 nM for SNAP-25 (B)
Incubate the mixture in a mixing device such as a rotator or neutator for 1 hour covered from light.
3
Centrifuge microspheres at 13,000 to 16,000 × g for 1–2 minutes, a small microsphere pellet should be visible at the bottom of the microcentrifuge tube when this is done.
Carefully remove all buffer from the tube while leaving the microsphere pellet intact.
5
Add 500 μl buffer to the microsphere pellet and vortex to wash the pellet.
Repeat steps 3–5 twice for a total of three wash steps.
Microspheres must be bound with substrate and washed to remove all non-bound substrate prior to addition of protease. Concentrations of 0.025% Tween-20 in the buffer help to stabilize microspheres during the wash steps due to increased microsphere pelleting during centrifugation steps required for removal of soluble substrate after binding. For the experiments described here, a protease buffer optimized for the B. anthracis lethal factor and C. botulinum light chain proteases, consisting of 50 mM HEPES, 100 mM NaCl, 1 mg/ml BSA, and 0.025% Tween-20 pH 7.4, was used and is referred to as protease buffer. Protease buffer will need to be optimized for individual proteases.
7
If multiplex microsphere sets with multiple substrates are being used, combine all of the microspheres into one tube after the wash steps. This will require suspending each microsphere set in a smaller amount of buffer (100 μl) and combining them, then adding protease buffer to reach a 500 μl volume.
Multiple protease substrates or proteases can be analyzed at once through the use of multiplex microsphere sets performed in the same protocol if the buffers are compatible. Biotinylated protein bound to streptavidin multiplex microspheres is advised for multiplex assays as no dissociation has been observed from streptavidin microspheres over several hours.

Analyze binding to microspheres

Remove 25 μl of the microspheres; add them to 475 μl buffer and analyze with a flow cytometer. Gate microspheres based upon a bivariate plot of forward scatter vs. size scatter; take care to gate only single microspheres and not microsphere aggregates (Figure 13.12.3A).
Sample gating of microspheres for analysis of fluorescence and protease cleavage over time. A. Bivariate plot of forward and size scatter with single microsphere population gated. B. Histogram of green fluorescence (FL1, 530/15 nm) of size-gated microsphere population from A. C. Histogram plot of four multiplex microspheres fluorescent in the FL2 channel (585/21/ nm). Gates are drawn on this histogram plot for each microsphere type. The small secondary peak populations for each microsphere are common in multiplex microsphere sets and can be gated out in the FL2 histogram. D. Fluorescence vs. time for one size, and gated microsphere as gated in A and C. No protease was added in this sample. All data analysis for these sample gates was done by exporting files from an Accuri C6 flow cytometer as FCS files and plotted graphically in FlowJo flow cytometry data analysis software.

Perform protease assay

Aliquot microspheres into separate sample tubes for protease assays; the number of tubes depends on the number of assays desired and the number of microspheres bound. The total volume of the sample will depend on how long the desired assay will take, and how many microspheres per microliter are present and will be brought up to appropriate volumes using protease buffer.
Typical sample sizes are 500 μL total volume, but can vary from 100 μl to several ml. For example if ten 500 μl samples will be set up from a starting volume of 500 μL microspheres, each sample will contain 50 μl microspheres and 450 μl protease buffer.
10
Display a histogram of fluorescence in the proper fluorescence channel (e.g. FL1 for GFP or FITC, FL2 for PE etc.) for your substrate and gate on the microsphere size gate (Figure 13.12.3B). If multiplex microspheres are used make a histogram of the fluorescence channel of the microspheres and gate each microsphere population on the histogram (Figure 13.12.3C). If there is overlap between substrate fluorescence and microsphere fluorescence, compensation will need to be used to determine and gate the microsphere populations. If only one protease substrate is being used this step is not required.
The multiplex microspheres used infigure 13.12.3Ceach have a small amount of their population, which appears as a small secondary peak of higher fluorescence intensity and can lead to a small amount of overlap into additional microsphere populations. This phenomenon is common with some multiplex microspheres but can be gated out on the histogram plot. Small secondary shoulders which overlap with another microsphere population will lead to approximately 2–5 % overlap of one microsphere population into another.
Make a bivariate plot of substrate fluorescence vs. time (Figure 13.12.3D). Set gating of this plot to display and record only size gated microspheres in the gate from step 9 (Figure 13.12.3A). If multiplex microspheres and several substrates are being used, set up plots based upon gates drawn on histogram plots in step 10 (Figure 13.12.3C).
The length of time recorded will depend on the volume of the sample and the flow rate of the cytometer. Most commercial cytometers will acquire samples of 500 μl for 20 to 30 minutes on a slow flow rate.
12
Run a sample of microspheres with no protease added for 60 seconds to get an average median starting value, remove the tube from the flow cytometer while it is still running and add protease to the sample, vortex or mix quickly, then put the tube back on the flow cytometer.
Protease concentration must be determined empirically for maximum or desired amount of cleavage. It is suggested that for the first experiments with a protease a concentration range of protease be tested on multiple samples of substrate-bound microspheres. Substrate cleavage will be protease concentration dependent as shown in the example using Botulinum Neurotoxin type A Light Chain with the substrate biotinylated SNAP-25 GFP (Figure 13.12.4).
Normalized fluorescence of cleavage of biotinylated SNAP-25 GFP from the surface of streptavidin coated microspheres by the Botulinum Neurotoxin type A Light Chain protease (BoNT/A LC). Each trace is from a separate 500 μl sample of microspheres in protease buffer with the indicated amount of BoNT/A LC added after 60 seconds of acquisition of microspheres on an Accuri C6 flow ctometer. Data were analyzed on HyperView IDL software written by Dr. Bruce Edwards calculating the median FL1 fluorescence of microspheres every 0.3 seconds. Data were normalized by averaging the first 60 second median FL1 values and dividing that average and subsequent median FL1 values for that sample by that value.
Measure median/mean fluorescence values over time to measure loss of fluorescence from the microsphere due to proteolytic cleavage.
Real-time protease cleavage measurements will demonstrate proteolytic cleavage of fluorescent substrates over a relatively short period of time (30 minutes). End-point assays can also be performed at chosen time points instead of continuous real-time measurements. Start at step 10, and prepare samples with and without protease; incubate at the desired temperature on a mixing device and analyze using the mean or median fluorescence values after the desired incubation time, measured on a histogram plot of substrate fluorescence (Figure 13.12.3B). Protease concentrations can be adjusted to give optimal or desired substrate cleavage rates either in real-time or as end-point assays. If multiple substrates with multiple proteases are used, all of the proteases can be added together in a mixture with fluorescence values of each substrate monitored in real-time by gating on microsphere histogram plots. Separate substrate analysis can be done in flow cytometry analysis software such as FlowJo, FCS Express, or in our case we used a custom program known as IDL Hyperview.

SUPPORT PROTOCOL 1: GROWTH AND PURIFICATION OF BIOTINYLATED PROTEINS FROM BACTERIA

This protocol describes the steps to transform and growth bacteria, as well as to purify the tagged protein.

Materials

Plasmid containing a biotinylation sequence on one end (Pinpoint from Promega, or AviTag vectors from Avidity LLC., Gateway plasmids from Life Technologies) and a fluorescent protein on the other end (e.g. GFP, YFP, etc.), cloned in-frame with protease substrate sequence in between.
Calcium or electro-competent expression bacteria which will biotinylate protein sequences in vivo.
NOTE: biotinylated protein purifications have been done using BL21 (DE3) pLys S bacteria available from Promega. Additional specific biotinylation can be achieved by using the AVB 101 strain available from Avidity LLC.
Terrific broth (Fisher Scientific)
Appropriate resistance antibiotic for plasmid selection (Carbenicillin or Ampicillin for Pinpoint plasmids and chloramphenicol for the strains used here)
LB agar plates with 50 μg/ml carbenicillin/ampicillin and 34 μg/ml chloramphenicol
Isopropyl β-D-1-thiogalactopyranoside (IPTG; Sigma-Aldrich)
Centrifuge capable of forces up to 25,000 × g
Centrifuge tubes and rotors to sediment 200–500 ml culture and 30–50 ml bacterial lysate
Column chromatography system or peristaltic pump with glass column
Phosphate buffered saline pH 7.2 (PBS)
Spectrophotometer capable of measuring absorbance values at a wavelength of 600 nm
Softlink streptavidin resin (Promega)
Biotin (Sigma-Aldrich)
Ultra centrifugal filter units, Mw cutoff (Millipore)
12 L protease buffer for dialysis of purified substrates (For B. anthracis lethal factor and C. botulinum light chain proteases: 50 mM HEPES, 100 mM NaCl pH 7.4, see recipe).
Dialysis membrane with appropriate Mw cutoff values
Dialysis clamps

Transform bacteria

Clone and sequence plasmid containing the biotinylation tag, protease substrate and protein fluorophore by preferred method.
2
Transform plasmid into E.coli expression strain by preferred method (). Plate on LB agar plates with appropriate antibiotic resistance (50 μg/ml ampicillin or carbenicillin for pinpoint vectors, 34 μg/ml chloramphenicol for pLysS or AVB 101 strains).
Grow colonies on plates overnight at 37°C.
4
Pick colonies from plates and inoculate 3 ml Terrific broth (TB) cultures with appropriate antibiotics. Grow overnight at 37° C.
Pour 3 ml cultures into 200 to 500 ml of TB with appropriate antibiotics and grow at 37° C for 2–4 hours checking light absorbance at 600 nm (A600) every 30 minutes.
6
When bacterial cells reach an optical density (A600) between 0.6 and 0.8, induce by adding IPTG to a final concentration of 100 μM.
Grow at 30° C for 12–15 hours (overnight)
8
Spin down bacteria at 5,000 × g for 10 minutes in a centrifuge with appropriate centrifuge tubes and rotor. Remove media from cell pellet. GFP expressing cells should have a visible green tint at this point if high expression levels have been reached.
Suspend bacterial cell pellet in 30 to 50 ml PBS.

Concentrate proteins

Lyse cells using freeze/thaw, chemical methods, sonication or preferred method.
11
Spin down lysate at 25,000 to 35,000 × g for 30 minutes. This should clarify the lysate and remove all membrane and lipid components. GFP expressing bacterial lysate should be visibly green. Keep clarified lysate for loading onto column.
Pack a glass column or appropriate column for a protein purification system with 5 ml Softlink streptavidin resin and equilibrate with PBS. Load lysate flowing at a rate of 1 ml per minute and collect flow through.
13
Wash resin with 100 ml PBS to flow through all unbound protein.
Load 5 ml of 5 mM biotin in PBS onto column with Softlink streptavidin resin. Stop flow and incubate for at least 1 hour.
15
Elute column with 5 mM biotin in PBS. Collect elution until no UV absorption at 280 nm is detected, or collect ~ 30 ml if no detection device is being used.
Concentrate elution by spinning at 5,000 × g for 10 minutes in a molecular weight (Mw) cutoff filter of appropriate size (e.g. a molecular weight smaller than your protein.) Spin the maximum elution volume in the filter unit, discard flow through, add more elution and re-spin until the entire eluted sample is concentrated into a 1–5 ml volume. GFP proteins should be visibly green.

Purify proteins

Add concentrated protein into a dialysis filter pre-wet in the same buffer as for the protease assay and securely fasten clamps. Dialyze at least four hours. It is recommended to use multiple rounds of dialysis to remove free biotin. This requires 12 L of buffer to effectively remove biotin from 1–5 ml in 5 mM biotin used in the protein elution.
This step is critical in ensuring biotinylated protein binding to avidin or streptavidin microspheres.
18
Determine protein concentration using an absorbance measurement and the extinction coefficient of the protein at 280 nm. Other preferred methods such as Bradford assays can be used as well.
Perform a titration of biotinylated fluorescent protein to streptavidin or avidin microspheres (see Support protocol 2).

SUPPORT PROTOCOL 2: SPECIFIC MICROSPHERE BINDING MEASUREMENTS FOR OPTIMAL DISPLAY OF PROTEASE SUBSTRATES

This protocol describes a basic protein titration onto the surface of a microsphere via biotinylated protein binding to streptavidin microspheres. Samples are set up with biotinylated protease substrate alone with streptavidin coated microspheres or a pre-block biotin treatment to microspheres prior to substrate addition. The biotin block samples will measure non-specific binding to microspheres which can be subtracted from the total amount of bound substrate to measure specific binding.

Materials

Streptavidin or avidin coated microspheres, either prepared in lab or commercially purchased (Spherotech Inc.)
Phosphate buffered saline pH 7.2 (PBS)
Purified protease substrate biotinylated on one terminus with a fluorophore for detection on the other terminus (Support protocol 1)
Microcentrifuge tubes (1.5 ml)
Mixing device such as a rotator or neutator
A flow cytometer capable of excitation and detection at appropriate wavelength for substrate fluorophore (488 nm excitation for GFP, detection in 505–550 nm range)

Determine the appropriate concentration range for the protein titration and number of samples desired. This should cover approximately two orders of magnitude spanning from 1 nM to 500 nM of protein (Figure 13.12.2). It is recommended to set up between six and eight 500 μl samples in this concentration range. Approximately 10⁴ to 10⁵ microspheres per ml will be needed in each sample. Calculate the volume of buffer (PBS), protein and microspheres needed in a 500 μl volume of the desired protein concentration for each sample. Samples will be set up in duplicate, one sample pre-blocked with biotin and one unblocked sample.
Add the appropriate volume of PBS calculated above to each sample tube.
Add the appropriate volume of microspheres to each tube to yield between 10⁴ and 10⁵ microspheres/ ml.
Add an excess of biotin to the blocked set of samples containing the buffer and microspheres, 1 μM to 5 μM biotin is sufficient to block specific binding. Incubate both blocked and unblocked microspheres at room temperature on a mixing device for 30 minutes.
Add biotinylated, fluorescent protease substrate at the correct concentration to all samples, both blocked and unblocked.
Incubate at room temperature for 1 hour on a mixing device covered from light.
Run samples on a flow cytometer collecting 10,000 size gated events per sample. Microspheres should be gated on size based on a bivariate plot of forward and size scatter determined empirically by the microsphere types used (Figure 13.12.3A). Set up a green fluorescence histogram plot to measure mean or median green fluorescence for each sample (Figure 13.12.3B). If using a cytometer with adjustable fluorescence PMT settings, the PMT settings should be determined by testing first the highest concentration of unblocked fluorescent protease substrate bound microsphere. Adjust the PMT setting in the fluorescence channel so the mean fluorescence readings from size gated samples are not off-scale on a histogram plot of fluorescence (log scale), but are very high (90% of the maximum channel number is a recommended value).
Plot unblocked and blocked fluorescence mean/median values vs. concentration of fluorescent protease substrate (Figure 13.12.2). Subtract median/mean blocked values from median/mean unblocked values and plot these values as specific binding.
The use of this protocol will yield a binding curve of the protease substrate to the microspheres and define specific vs. non-specific binding for each concentration of protease substrate in median or mean fluorescence units (Figure 13.12.2). Ideally, non-specific binding determined by the fluorescence level of the blocked samples, will be relatively low compared to specific binding. High non-specific binding could be due to hydrophobic protein/peptide interactions with microspheres. If the fluorophore is measured in a fluorescence channel with standard microsphere sets available (e.g., FITC, or phycoerythrin), a standard curve can be made to estimate the number of molecules on the surface of the microsphere in MESF values. Two typical biotinylated fluorescent protein titrations to streptavidin microspheres are shown inFigure 13.12.2, one with little non-specific binding (Figure 13.12.2A) and one with relatively high non-specific binding (Figure 13.12.2B).

SUPPORT PROTOCOL 3: ADAPTATION OF MICROSPHERE-BASED PROTEASE ASSAYS TO HIGH-THROUGHPUT SCREENING BY FLOW CYTOMETRY

The microsphere-based protease assays described above can be adapted to high-throughput screening applications by performing assays in 96 or 384 well microplates and using equipment to run these plates on commercial flow cytometry systems. The system used in these studies is the HyperCyt high-throughput flow cytometry system available from Intellicyt (Albuquerque, NM). Other plate-based sample acquisition devices such as the Beckton Dickinson (BD) LSR II HTS System or other flow cytometry HTS plate samplers would be equally appropriate for these applications. It should be noted that an HTS flow cytometry system is required here, as a typical fluorescence plate reader will be incapable of determining loss of fluorescence from the surface of microspheres and instead evaluates total liquid fluorescence in microplate wells.

Materials

Streptavidin or avidin coated microspheres at a stock concentration of 10⁶ to 10⁹ per ml
Purified or purchased protease of interest
Purified fluorescent protease substrate with biotin at one end and a fluorophore at the other end with the protease substrate or cleavage site in between (Support protocol 1)
Protease buffer
Microcentrifuge tubes (1.5 ml)
Centrifuge capable of spinning 1.5 ml tubes between 13,000 and 16,000 × g
96 or 384 well microplates
Test compounds (chemical libraries etc.)
A flow cytometer with appropriate excitation lasers and emissions filters for excitation and detection of fluorophore incorporated in protease substrate.
Plate acquisition device for flow cytometry (e.g., HyperCyt, BD LSR II HTS, or other)

Bind substrate to microspheres

Bind biotinylated substrate to streptavidin or avidin microsphere for 1 hour as described in the Basic Protocol. For multiplex microspheres and several substrates, bind each microsphere/substrate population separately.
When setting up the assay, the appropriate number of beads must be used. Typical well volumes are 10–25 μl per well for 96 well plates and 6–15 μl volumes per well for 384 well plates. It is desirable to input approximately 1,000 to 5,000 microspheres per microplate well if using the Hypercyt flow cytometry system, which aspirates approximately 2 μl per well in 1 second. HTS cytometry plate systems which use the entire well volume during analysis may lower total required microsphere numbers than systems such as Hypercyt, which analyzes only 2 μl of the total volume. Calculations of initial microsphere numbers should take into account ~30% to 50% of microsphere loss during sample preparation.
2
Spin down substrate bound microspheres at 13,000 to 16,000 × g for 1–2 minutes, remove buffer and re-suspend in 500 μl protease buffer. Repeat twice.
If multiple microspheres/substrates are being used combine into one sample.

Perform protease assay

Remove a small volume (10–25 μl) and suspend in 500 μl of protease buffer. Acquire on a flow cytometer and draw size-based and fluorescence gates as described in the Basic Protocol.
5
Add appropriate amount of protease buffer to microspheres to fill desired amount of 96 or 384 well microplates with desired volume of microspheres.
Using multi-channel pipettes or robotic systems, add microspheres suspended in buffer to 96 well or 384 well plates. The volume of microspheres added depends upon the desired final volume of the assay and the type of microplate used (e.g. 96 or 384 well plates).
7
Add test compounds (e.g. concentrated chemical library compounds) to microplate wells containing microspheres and buffer in the 96 or 384 well plates. Do not add test compounds to positive and negative control wells.
Add protease to test wells and negative control wells.
The optimal protease concentration for these assays should be determined empirically with Basic Protocol using end point assays. The volume of protease depends upon the final concentration desired. If multiple proteases are being assayed against multiple substrates, they can be added into the plate together. Do not add protease to positive control wells as these wells mimic complete inhibition. Positive and negative control wells without protease or test compound, respectively, must also be set up and measured to calculate a Z′ Factor for the assay.
9
Cover plate and incubate on a rotating device for 1 to 2 hours. This should be determined empirically depending on end-point assays described in the Basic Protocol.
Microplate high-throughput flow cytometry protease assays are perfomed as end-point assays with a typical 1–2 hour incubation of protease in the presence of substrate bearing microspheres and test compounds.

Collect and analyze data

Acquire microplate on high-throughput flow cytometry device and collect data on flow cytometer.
11
Analyze data using data analysis software and calculate median fluorescence values for fluorescent substrates on microspheres in each well. If multiple microspheres/substrates are being analyzed calculate median fluorescence values for each microsphere/substrate.
Calculate an average median fluorescence for positive and negative control wells along with standard deviation in median fluorescence between control wells.
13
Calculate a Z′ factor for the assay using the formula below, where Z′ is a statistic for assessing assay quality (). If multiple substrates/microspheres are being used calculate Z′ separately for each microsphere/substrate type.
$Z^{'} = 1 - \frac{3 (StdevPositivecontrol + StdevNegativecontrol)}{AvPositivecontrol - AvNegativecontrol}$
Stdev is the standard deviation of the median fluorescence of the positive control well and negative control well, and Av is the averaged median fluorescence values for positive and negative control wells. A Z′ of 0.5 or better is considered indicative of a high quality assay suitable for HTS ().
14
Convert compound well values into percent inhibition using the following formula:
$% inhibition = 100 * \frac{Value - AvNegativecontrol}{AvPositivecontrol - AvNegativecontrol}$
Value is the median fluorescence value for the test well with compound, and AvPositive control and AvNegative control are the averaged median fluorescence values for positive control and negative control wells.
15
Determine compounds with percent inhibition values above 30%. Perform dose-response assays using increasing amount of compound in microsphere based assays over two orders of magnitude, higher and lower than that screened to get an IC₅₀ value of the compound toward the protease of interest.

COMMENTARY

Background information

Proteases are currently one of the major families of proteins under investigation as potential drug targets. There is currently no overlying methodology to investigate protease activity in vitro with applications to high-throughput screening for protease inhibitors. Most protease assays use fluorescence resonance energy transfer (FRET) methods placing protease cleavage sites between two fluorophores capable of FRET. This methodology does not take into account protease/substrate recognition elements distal to the protease cleavage site, which accounts for much of the specificity of some proteases. Using flow cytometry and multiplex microsphere sets it is possible to use full-length protease substrates in assays with multiple proteases or multiple substrates at once in high-throughput screening. These assays can be adapted to any site specific or protein specific protease of interest for assaying protease activity over time and for use in flow cytometry based high-throughput screening for protease inhibitors.

Critical steps and Troubleshooting

Binding of fluorescent protease substrates to microspheres

Choice of attachment chemistry for substrates to microspheres may affect levels of fluorescent protease substrates on microspheres. In the case of avidin/streptavidin microspheres with biotinylated proteins, great care must be taken to remove all free biotin from protease substrates. Free biotin will bind to avidin/streptavidin and prevent biotinylated substrate from binding. Purification of biotinylated protein from Softlink streptavidin is done using 5 mM biotin. Effective removal of biotin can be accomplished by dialyzing 1–5 ml of purified concentrated substrate in 5 mM biotin against 3 L of buffer, four times (12 L total) for four hours each time. This will remove enough free biotin from the protein preparation to prevent biotin from binding to avidin/streptavidin on the microsphere and blocking biotinylated protein binding.

GST fusion tags and 6X Histidine tags bound to reduced glutathione or Ni²⁺ microspheres, respectively, may require higher concentration of protein than biotinylated protein binding to avidin/streptavidin microspheres. If low fluorescence values are seen during initial protein titrations then concentrations may be increased to achieve efficient binding. It should also be noted that Ni or glutathione functionalized microspheres may allow substrate dissociation over time or prolonged storage due to the relative K_D of Ni to 6X Histidine and GST to reduced glutathione compared to biotinylated protein/avidin. Biotinylated protein or peptide bound to avidin/streptavidin microspheres are stably bound for several days of storage at 4°C.

Lack of proteolytic cleavage of substrates off of microspheres

In cases where protease substrates are bound to microspheres but the proteolytic cleavage is not detected, higher protease concentrations may be used. With insufficient wash steps, soluble substrate remains in solution and may be cleaved instead of substrate bound to microspheres, resulting in a substrate sink. Protease activity can be measured by cleavage of substrates in solution, followed by Western blot with antibodies to the protease substrate, or using streptavidin HRP or AP blots to biotinylated proteins, anti-His antibodies to His tagged proteins, etc. Size differences by Western blot analysis will indicate protease cleavage. In the case of protease purified in the laboratory, activity will need to be verified by either microsphere-based assays or secondary assays such as FRET based assays. Protease purification methods will vary for different proteases and it is suggested to use established protocols when available. Dialysis of protease into protease buffer after purification is also necessary to remove high salt, imidazole or other compounds used during purification, which may lead to a lack of protelytic activity.

Current literature should be taken into consideration when short cleavage site substrates are being used instead of full-length protease substrates. In some cases, protease cleavage sites alone are not sufficient for cleavage by their protease as they lack required distal binding elements necessary for efficient protease/substrate interactions. In the case of the B. anthracis lethal factor protease used in microsphere-based protease studies optimized cleavage sites have been developed that cut efficiently enough for use in microsphere assays (; ; ). In the case of C. botulinum light chain protease assays, we have used full length protease substrates, which provide optimal cleavage when compared to cleavage sites alone or substrates with distal binding elements deleted (). Nonetheless, for novel target substrates, it should not be assumed that short amino acid sequences that mimic natural cleavage sites of proteases alone will be sufficient for cleavage of substrates in vitro. Substrate optimization methods may be needed to determine optimal peptide sequences for use in in vitro assays (). Use of full length protease substrates is likely to provide better results and has also been shown to give faster proteolytic rates than cleavage sites alone in these assays.

Lack of reproducible protease assays in high-throughput screening

Adaptation of microsphere-based protease assays to high throughput screening platforms will require optimization. Typical variables in these assays leading to a lack of protease cleavage and acceptable Z′ factors (above 0.5) include protease concentration or incubation times. Proteases in solution will lose their activity over time. If the protease under investigation loses activity quickly, a small change of fluorescence will be seen regardless of incubation time. This can be tested by evaluating microsphere-based protease assays in non high-throughput assays measuring cleavage over time either from the surface of microspheres as described in Basic Protocol, or by fluorimetric measurements such as FRET assays. If the protease activity is lost quickly, increasing the amount of protease in the sample wells will permit shorter incubation times. When a protease is stable over extended periods of time (2+ hours), increased incubation times can be used to augment cleavage from the surface of microspheres.

Anticipated results

Once optimal microsphere and protease concentrations are established, it is expected that a loss of fluorescence will yield a pseudo-first order exponential decay. Fluorescence will likely not reach background (auto-fluorescence of the microspheres) as some of the substrate may not be accessible to the protease or because of non-specific binding of substrate to the microsphere, where cleavage events will not lead to loss of fluorescence. Fluorescence can be normalized by dividing all mean or median fluorescenece time point measurements by the mean or median fluorescence measurement prior to protease addition. This is particularly useful when analyzing multiple substrates or proteases at once using multiplex microsphere sets to give relative rates of cleavage. This will give a range of normalized fluorescence from 0 to 1 and approximate percentage of starting substrate present on the microsphere at certain time points, this can be determined by multiplying the normalized fluorescence value by 100. Subtracting the percentage of substrate present from 100 will give a rough estimate of percentage of substrate cleaved.

Using no protease and no compound control samples, test compounds or known inhibitors can be shown to inhibit proteases of interest. Percent inhibition of a tested concentration of an inhibitor can be calculated using the percentage inhibition formula in step 14 of Support Protocol 3. Positive controls will be no protease and/or a well with a high concentration of a known inhibitor. Negative controls will be no test compound with protease. The same concentration of protease will need to be used in the negative control and all inhibitor containing samples in order to determine percentage inhibition using this formula.

It should be noted that traditional steady state kinetics models to calculate Km and Vmax from these microsphere based assays cannot be done due to the low concentration of substrate used in these assays, which results in [S] being ≪ [E] and makes typical steady state assumptions invalid (Copeland, 2000). [S] can be determined simply by using FITC and GFP standard microspheres to generate a standard curve to estimate the number of fluorescent molecules on the surface of the microsphere, multiplying by the estimated number of microspheres, and dividing by the volume (typically resulting [S] in the picomolar range).

Time considerations

Microsphere-based protease assays usually include a one hour binding step, followed by 10 to 15 minutes for wash steps, as well as the assay time. Protease assays should be performed continuously until the entire sample is aspirated into the flow cytometer. Samples of 500 μl can typically be acquired for 30 minutes depending on the flow cytometer and flow rate. For measurements at defined time points, for example every 15 minutes or every hour, the assay can be performed until the protease loses activity or the maximum amount of cleavage from the microsphere occurs. For HTS, one hour binding times are followed by compound assay plate setup (this can be quick using robotics, very slow by hand), optimized incubation times, readout and data analysis. Times (2–12 hours) will also depend on the number of microplates sampled.

Acknowledgments

This work was supported by the National Flow Cytometry Resource NIH grant RR001315, University of New Mexico Center for Molecular Discovery MH077425 and MH084690, and Joint Sciences Technologies Laboratories grant (JSTL) 26Q4.

LITERATURE CITED

Barth H, Aktories K, Popoff MR, Stiles BG. Binary bacterial toxins: Biochemistry, biology, and applications of common Clostridium and Bacillus proteins. Microbiology and Molecular Biology Reviews. 2004;68:373.[PMC free article] [PubMed] [Google Scholar]
Breidenbach MA, Brunger AT. Substrate recognition strategy for botulinum neurotoxin serotype A. Nature. 2004;432:925–929. [PubMed] [Google Scholar]
Chopra AP, Boone SA, Liang XD, Duesbery NS. Anthrax lethal factor proteolysis and inactivation of MAPK kinase. Journal of Biological Chemistry. 2003;278:9402–9406. [PubMed] [Google Scholar]
Copeland RA. Enzymes : a practical introduction to structure, mechanism, and data analysis. New York: Wiley; 2000. p. xvi.p. 397. [Google Scholar]
Cudic M, Fields GB. Extracellular Proteases as Targets for Drug Development. Current Protein & Peptide Science. 2009;10:297–307.[PMC free article] [PubMed] [Google Scholar]
Cummings RT, Salowe SP, Cunningham BR, Wiltsie J, Park YW, Sonatore LM, Wisniewski D, Douglas CM, Hermes JD, Scolnick EM. A peptide-based fluorescence resonance energy transfer assay for Bacillus anthracis lethal factor protease. Proceedings of the National Academy of Sciences of the United States of America. 2002;99:6603–6606.[PMC free article] [PubMed] [Google Scholar]
Diamond SL. Methods for mapping protease specificity. Current Opinion in Chemical Biology. 2007;11:46–51. [PubMed] [Google Scholar]
Eubanks LM, Hixon MS, Jin W, Hong SW, Clancy CM, Tepp WH, Baldwin MR, Malizio CJ, Goodnough MC, Tothers Barbieri J. An in vitro and in vivo disconnect uncovered through high-throughput identification of botulinum neurotoxin A antagonists (vol 104, pg 2602, 2007) Proceedings of the National Academy of Sciences of the United States of America. 2007;104:6490–6490.[PMC free article] [PubMed] [Google Scholar]
Lauer SA, Nolan JP. Development and characterization of Ni-NTA-bearing microspheres. Cytometry. 2002;48:136–145. [PubMed] [Google Scholar]
Rossetto O, Schiavo G, Montecucco C, Poulain B, Deloye F, Lozzi L, Shone CC. Snare Motif and Neurotoxins. Nature. 1994;372:415–416. [PubMed] [Google Scholar]
Saunders MJ, Graves SW, Sklar LA, Oprea TI, Edwards BS. High-Throughput Multiplex Flow Cytometry Screening for Botulinum Neurotoxin Type A Light Chain Protease Inhibitors. Assay and Drug Development Technologies. 2010;8:37–46.[PMC free article] [PubMed] [Google Scholar]
Seidman CE, Struhl K, Sheen J, Jessen T. Current Protocols in Molecular Biology. Unit 1.8 1997. Introduction of Plasmid DNA into Cells. [PubMed] [Google Scholar]
Saunders MJ, Kim H, Woods TA, Nolan JP, Sklar LA, Edwards BS, Graves SW. Microsphere-based protease assays and screening application for lethal ZXF factor and factor Xa. Cytometry Part A. 2006;69A:342–352. [PubMed] [Google Scholar]
Silhar P, Capkova K, Salzameda NT, Barbieri JT, Hixon MS, Janda KD. Botulinum Neurotoxin A Protease: Discovery of Natural Product Exosite Inhibitors. Journal of the American Chemical Society. 2010;132:2868.[PMC free article] [PubMed] [Google Scholar]
Steuber H, Hilgenfeld R. Recent Advances in Targeting Viral Proteases for the Discovery of Novel Antivirals. Current Topics in Medicinal Chemistry. 2010;10:323–345. [PubMed] [Google Scholar]
Tessema M, Simons PC, Cimino DF, Sanchez L, Waller A, Posner RG, Wandinger-Ness A, Prossnitz ER, Sklar LA. Glutathione-S-transferase-green fluorescent protein fusion protein reveals slow dissociation from high site densiy beads and measures free GSH. Cytometry Part A. 2006;69A:326–334. [PubMed] [Google Scholar]
Turk B. Targeting proteases: successes, failures and future prospects. Nature Reviews Drug Discovery. 2006;5:785–799. [PubMed] [Google Scholar]
Turk BE, Wong TY, Schwarzenbacher R, Jarrell ET, Leppla SH, Collier RJ, Liddington RC, Cantley LC. The structural basis for substrate and inhibitor selectivity of the anthrax lethal factor. Nature Structural & Molecular Biology. 2004;11:60–66. [PubMed] [Google Scholar]
Vitale G, Bernardi L, Napolitani G, Mock M, Montecucco C. Susceptibility of mitogen-activated protein kinase kinase family members to proteolysis by anthrax lethal factor. Biochemical Journal. 2000;352:739–745.[PMC free article] [PubMed] [Google Scholar]
Zhang, Chung, Oldenburg A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays. Journal of Biomolecular Screening: The Official Journal of the Society for Biomolecular Screening. 1999;4:67–73. [PubMed] [Google Scholar]

Published online 2013 Jul 5. doi: 10.1080/17513758.2013.812753

PMID: 23826744

This article has been cited by other articles in PMC.

Abstract

A recently developed class of models incorporating the cyton model of population generation structure into a conservation-based model of intracellular label dynamics is reviewed. Statistical aspects of the data collection process are quantified and incorporated into a parameter estimation scheme. This scheme is then applied to experimental data for PHA-stimulated CD4+ T and CD8+ T cells collected from two healthy donors. This novel mathematical and statistical framework is shown to form the basis for accurate, meaningful analysis of cellular behaviour for a population of cells labelled with the dye carboxyfluorescein succinimidyl ester and stimulated to divide.

Keywords: immunology, flow cytometry, cyton models, mathematical and statistical models, label dynamics, parameter estimation, cellular models

1. Introduction

Dating at least as far back as the work of Bell and Anderson [] in 1967, mathematical models have been proposed which attempt to describe the biophysical processes involved in cell division. These models range in scope and scale from phenomenological descriptions of population growth to mechanistic models of subcellular mechanics. One particularly important class of mathematical models is that in which the behaviours of individual cells are linked in a meaningful way to population-level characteristics. This class of models has applications in quantitative descriptions of the immune system, where the behaviour of individual cells can vary widely across the population but in which the immune response (understood to be the net result of actions of all relevant cells in the system) is much more predictable []. In fact, a quantitative description of the ‘cellular calculus’ [] by which cells send, receive, and respond to intra- and extracellular stimuli is in many respects an open problem in immunology.

In the past, the major limitation of mathematical models linking cellular and population-level behaviour has been the difficulty in obtaining data with which to validate the models. Generally, it is possible to observe the behaviour of a small number of single cells quite carefully in isolation, or to observe a large number of cells in aggregate. More recently, the intracellular dye carboxyfluorescein succinimidyl ester (CFSE) [] for use in flow-cytometric proliferation assays in vitro has emerged as a powerful experimental technique for the study of dividing cells. Because the dye emits bright, approximately uniform, and long-lasting fluorescent labelling of a population of cells and is approximately evenly partitioned during cell division, the dye provides a useful surrogate for the number of divisions a cell has undergone. Individual cells can be assessed by a flow cytometer, which can simultaneously measure additional properties of cells such as size, internal complexity, cell surface marker expression, levels of cytokine secretion, etc.

When a population of cells is measured, the individual CFSE fluorescence intensity measurements can be placed into a histogram as in Figures 1 and and2.2. Each ‘peak’ in the histogram represents a cohort of cells having completed the same number of divisions. When such measurements are made sequentially in time, one obtains information on the dynamic response of the population of cells to a stimulus. As such, CFSE-based flow cytometric analysis is a promising tool for the study of cell division and division-linked changes. The ultimate goal for the quantitative analysis of CFSE data (in particular, as it relates to studies of the immune system) is to incorporate fundamental mechanistic modelling of the cellular calculus into a description of population-level behaviour, and thus to obtain a more comprehensive understanding of the immune system, with obvious implications for the study of disease detection, progression, treatment/control, etc. To that end, mathematical modelling provides a quantitative framework with which to analyse and interpret such data.

Histogram data for CD4T cells from Donor 1 (left) and Donor 2 (right). An initially unimodal distribution of fluorescence intensity becomes multimodal as cells divide asynchronously. By day 5, subsequent generations of cells are no longer detectable as fluorescence resulting from CFSE has been diluted to the level of background autofluorescence.

Histogram data for CD8T cells from Donor 1 (left) and Donor 2 (right). An initially unimodal distribution of fluorescence intensity becomes multimodal as cells divide asynchronously. By day 4, subsequent generations of cells are no longer detectable as fluorescence resulting from CFSE has been diluted to the level of background autofluorescence.

A large number of mathematical models (see, e.g. the recent reviews [4,]) have been proposed with the aim of linking the generation structure (cells per number of divisions undergone) to quantitative descriptions of cellular behaviour (e.g. times to division and death). Most recently [3], a class of mathematical models has been proposed which incorporates the ‘cyton’ model [,] of cell division dynamics into a mathematical description of flow cytometry histogram data based upon conservation principles. Here, we revisit this new class of models and provide a more complete discussion of some mathematical properties of the solutions which make them amenable to the fast computational approaches as described in []. It is also shown how the new model can be compared with older label-structured models such as those proposed in [,42,47]. Next the data collection process is considered in more detail and a theoretical statistical model is derived. The mathematical and statistical models are then incorporated into a rigorous parameter estimation scheme based upon a weighted least-squares framework and members of the proposed class of mathematical models are compared in terms of their ability to fit the available data.

2. CFSE data

CFSE-based flow cytometry experiments are performed by stimulating CFSE-labelled cells to divide by exposure to either a mitogenic compound or a specific antigen. Cells are then placed into separate wells, one for each measurement to be made. Several protocols exist and can be tailored to the needs of a particular experiment; see, e.g. [,]. For the experiment described here, peripheral blood mononuclear cells (PBMCs) from two healthy donors were stained with CFSE and stimulated with the mitogen phytohaemagglutinin (PHA). Measurements were carried out approximately every 24 h for 5 days, beginning 1 day after stimulation.

When a well is selected for measurement, cells are additionally labelled for phenotypic identification by antibodies (anti-CD3T, anti-CD4T, and anti-CD8T cells) tagged with fluorescent markers. These cells are then analysed by flow cytometry, which records the relative brightness of cells in various colours (corresponding to distinct fluorescent markers). Cells of interest can then be identified in the flow cytometry output. For this experiment, we consider CD4 T cells (CD3+, CD4+, and CD8−) and CD8 T cells (CD3+, CD4−, and CD8+). Once these cells are identified, the fluorescence intensity (in the colour channel consistent with CFSE) is analysed for each cell. Because dead cells will disintegrate shortly after death and can be excluded by gating, mainly viable cells are measured by the flow cytometer (the fraction of cells which are dead but not yet disintegrated is assumed to be small).

The population generation structure of the cells at a given measurement time can be visualized by organizing the fluorescence intensity measurements of individual cells into a histogram, as shown in Figures 1 and and2.2. Because of physical limitations, only a fraction of the cells contained in a selected well are actually analysed by flow cytometry. In order to estimate the total number of cells in the measurement sample, a known number of fluorescent beads are placed in each sample; these beads can be identified and counted in the flow cytometry output and the ratio of beads counted to total beads introduced provides an estimate of the fraction of the sample acquired by the flow cytometer. The histogram profiles obtained from the measured cells can then be normalized by the reciprocal of this quantity.

It is assumed that each well contains an identical population of cells at all times, that the fraction of measured cells is representative of the population of cells in that well, and that the fraction of the total well actually measured is accurately estimated by bead counting (though it is possible to consider errors in bead counts; see Section 4). Under these assumptions, the total mass of CFSE should be conserved in the histogram data. For this reason, the mathematical models proposed to describe CFSE-based flow cytometry data (Section 3) are derived from standard conservation principles (see, e.g. [6]).

A complete discussion of the mathematically relevant aspects of the data collection procedure can be found in [4]. The goal of the mathematical modelling process is to link a mathematical description of cellular division and death processes at the population level to the observed fluorescence intensity profiles as measured by a flow cytometer (Figures 1 and and2).2). Because each peak in the flow cytometry data represents a cohort of cells having completed the same number of divisions, it is hypothesized that flow cytometry data collected sequentially in time for cells from a single donor will contain sufficient information to analyse the dynamic response of those cells to stimulus. This dynamic response can only be accurately understood in the context of a mathematical model of the biological system, as well as a statistical model linking the mathematical model to the data. Such a model must be able to account for the slow natural loss of CFSE fluorescence intensity over time, the dilution of fluorescence intensity by division, and the asynchronous nature of cellular division and death processes.

3. Mathematical modelling of CFSE data

We begin by summarizing a partial differential equation model structured by (continuous) fluorescence intensity and (discrete) division number which has been proposed to describe histogram data from CFSE-based proliferation assays [,42,47]. We then summarize a new class of models incorporating cyton dynamics into a label-structured framework and consider several different versions of the cyton model at greater length. Finally, the role of cellular autofluorescence is briefly considered.

3.1. Previous label-structured model

Let n_i(t, x) be the structured density (cells per unit of fluorescence intensity) of a cohort of cells having completed i ≥ 0 divisions at time t and with x units of fluorescence intensity resulting from induced CFSE (that is, ignoring the contributions of cellular autofluorescence). It is assumed that this fluorescence is directly proportional to the mass of CFSE within a cell, and thus can be treated as a mass-like quantity. These cells are assumed to divide with time-dependent exponential rate α_i(t) and die with time-dependent exponential rate β_i(t). Then the entire population of cells can be described by the system of partial differential equations

The recruitment terms describe the symmetric division of CFSE upon mitosis and are given by R_i(t, x) = 4α_{i − 1}(t)n_{i − 1} (t, 2x) for i ≥ 1; the form of these recruitment terms arises naturally from the derivation of the above system of equations from conservation principles [47]. The advection term describes the rate of loss of fluorescence intensity (resulting from the turnover of CFSE), which is assumed to depend linearly on the fluorescence intensity x with time-dependent rate function v(t). This follows the convention of [], and includes exponential loss (v(t) = c) and Gompertz decay (v(t) = ce^−kt) as special cases. The loss of CFSE has been observed to be very rapid during the first 24 h after initial labelling and much slower thereafter [,47]). Thus, when data are collected in the first 24 h, it is more accurate [] to describe the rate of loss of fluorescence intensity with a time-varying rate (e.g. Gompertz decay). Such rates are consistent with the sequence of chemical reactions known to occur during the labelling process []. If data are not collected in the first day after labelling with CFSE (as in the data collected for this report) then, as we shall see below, exponential decay is sufficient. For the remainder of this report, we assume v(t) = c.

The initial conditions for the model (1) are

for x ≥ 0. Note that a no-flux condition at x = 0 is naturally satisfied by the form of the advection term provided n_i(t, 0) is finite (so that the flux term is well defined) for all i and for all t ≥ 0. The solution of Equation (1) can be computed by the method of characteristics [47]. Alternatively, the following characterization of the solution is given in [42].

Proposition 3.1

The solution to Equation (1) can be factored as

The functions N_i(t) indicate the number of cells having completed i divisions at time t and satisfy the weakly coupled system of ordinary differential equations

with initial conditions N₀(t₀) = N₀, N_i(t₀) = 0 for all i ≥ 1. The functions n_i(t, x), describe the distribution of CFSE within a generation of cells. Each satisfies the equation

for x ≥ 0 with initial condition

Again, the no flux condition at x = 0 is trivially satisfied by the form of the advection term. Note that, by definition,

We remark that the form of the equations presented above is slightly different from that considered in [] as here we initially neglect considerations of autofluorescence. The formulation above follows the work of Hasenauer et al. [] and allows for a more intuitive formulation of the model, as well as the fast numerical techniques discussed below and in the appendix. The system above is derived in terms of the fluorescence intensity resulting only from induced CFSE; the experimentally measured fluorescence intensity is the sum of this quantity and the cellular autofluorescence which results from the light absorption and emission properties of intracellular molecules. Let ñ_i(t, ) be a structured density for cells having completed i divisions at time t with measured fluorescence intensity . While the measured fluorescence intensity is given by the sum of the induced fluorescence x and the cellular autofluorescence, this latter quantity may vary from cell to cell in the population. As such, given the solutions n_i(t, x) for i ≥ 0 to Equation (1), one computes the densities ñ_i(t, ) using the convolution integral [,42]

where p(t, ζ) is (for fixed time t) a probability density function describing the distribution of autofluorescence in the population (see [,47] and Section 3.4). Fast approximation techniques for the convolution (5) have been demonstrated in [] and the references therein. These techniques are summarized in the appendix.

3.2. New class of label-structured models

While the model (1) for computing population label structure has been shown to accurately fit experimental data [], it lacks a certain intuitive appeal in that the ‘time-dependent exponential rates’ of division and death, α_i(t) and β_i(t), are not explained in biologically relevant terms (e.g. times to division and death). An alternative to the mathematical model (3) is the cyton model for division dynamics [,] which relates the number of cells in a population directly to probability distributions describing times at which cells divide or die. The cyton model is motivated by the assumption of independent regulation by the cellular machinery of times to division and death. Using the definition of N_i(t) above, the cyton model is described by the set of equations

where n_i^div(t) and n_i^die(t) indicate the rates (cells per hour) at which cells (having already undergone i divisions) divide and die, respectively, at time t. For undivided cells, let φ₀(t) and ψ₀(t) be probability density functions describing the distribution from which the times to division and death, respectively, are drawn. Let F₀, called the initial progressor fraction, be the fraction of undivided cells which would hypothetically divide in the absence of any cell death. (It is assumed that in each generation, non-progressing cells may die according to the probability density function ψ_i(t), but may not divide.) Then, under the assumptions of the cyton model, it follows that

Similarly, one can define probability density functions φ_i(t) and ψ_i(t) for times to division and death (measured in hours since completion of the (i − 1)th division), respectively, for cells having undergone i divisions, as well as the progressor fractions F_i of cells which would complete the ith division in the absence of cell death. Then the cell division and death rates are computed as

Given the success of the cyton model in describing cell dynamics, as well as the experimental evidence supporting it [4,], the cyton model has been incorporated into a label-structured framework [3] similar to Equation (1). The mathematical ideas rely heavily upon the separability of the model solution (Proposition 3.1) originally demonstrated by Hasenauer and co-workers [,42]. Let n_i(t, x) be a structured density as before. Consider the system of partial differential equations

with initial conditions specified as for Equation (1). The terms n_i(t, x) are described as in Equation (4). This system of equations incorporates the cyton model (6)–(8) for cell population dynamics into a label-structured framework for use with histogram data. We remark that Equation (9) corrects a typographical error in [3], where the factors n_i(t, x) were missing from the right side of the equations.

Proposition 3.2

The solution of Equation (9) is

where the quantities N_i(t) satisfy Equation (6) and n_i(t, x) satisfy Equation (4).

Proof

The proof follows immediately by the direct substitution of the stated solution into Equation (9). Working with the left side of Equation (9) for the ith equation,

which is exactly the right side of Equation (9). For the purposes of this proof, it is assumed that n₋₁^div = 0 so that the equations for n₀(t, x) are well defined. It is easy to check that the initial and boundary conditions for Equation (9) are satisfied by the above solution.

Given the densities n_i(t, x) computed according to Equation (9), one can compute the densities n_i(t, ) via the convolution (5) as before. Much like the original model (1), the new model (9) can be fit directly to histogram data from CFSE-based experiments and is highly accurate [3]. Significantly, the new model describes the dynamics of a dividing population of cells in intuitive terms (i.e. probability distributions of times to divide and die). This is the primary advantage of the new class of models over the previous modelling framework. Similarly, while the cyton model has been widely used to analyse cell count data obtained from CFSE data (e.g. through a deconvolution process; see [4]), the new class of models can be fit directly to CFSE histogram data. As a result, the class of models is less dependent upon peak separation or a high frequency of cells which respond to stimulus. Moreover, the fit of the model to data can be assessed in a statistically rigorous manner (see Section 4).

Although the motivation for this model formulation is clear (combining cyton and label dynamics in a division-dependent compartmental model) the form of the new model is complex, describing the population densities n_i(t, x) in terms of yet another set of density functions, n_i(t, x). A simple reformulation shows that the new class of models is consistent with the mass-conservation principles of the old label-structured model. Moreover, this reformulation shows how the two model forms can be related and directly compared.

Recall the definitions of n_i(t, x), N_i(t), and n_i(t, x) given in Proposition 3.2. Note that

and

Thus the quantities n_i(t, x) can be considered as probability density functions for the distribution of CFSE in cells having divided i times at time t. When considering cell death, the mathematical terms n_i^die(t)n_i(t, x) in Equation (9) reflect the tacit assumption that the rate at which cells die (n^die(t)) is independent of the label distribution n_i(t, x) of those cells. Moreover,

with

which is exactly the same form as Equation (1). Similar statements hold true for the rates of dividing cells and the terms α_i(t) in Equation (1) if F_i of Equations (7) and (8) equal 1 for all i. For 0 < F_i < 1, then this is a more complex issue and indeed is the subject of some of our current efforts. This fact can be used as the basis for a quantitative comparison of the two model formulations, as well as to give physical / biological meaning to the time-dependent exponential rates α_i(t) and β_i(t) used previously to describe cell division and death.

3.3. The Cyton class of models

It follows from the form of Equations (6)–(9) that the generation structure of the population (cells per division number) is completely determined by the functions φ_i(t) and ψ_i(t) and the progressor fractions F_i. To motivate the form of the functions φ_i(t) and ψ_i(t), define the random variable T_i^div to be the time required for a progressing cell to complete the ith division, with the clock starting from the completion of the (i − 1)th division. (That is, the random variables T_i^div are defined in the temporal reference frames of the individual cells.) Similarly, define the random variables T_i^die to be the time required for a newly divided cell to die. The cyton model is built from the premise that these two random variables are independent. Upon the completion of the ith division (orupon activation, for i = 0), every cell realizes a new value for T_i^div and T_i^die; whichever realization is smaller determines the eventual fate of the cell. The functions φ_i(t) and ψ_i(t) are the probability density functions for T_i^div and T_i^die, respectively (which are assumed to be common for all cells having completed i divisions).

Experimental evidence suggests that the functions φ_i(t) and ψ_i(t) canbe heuristically described by lognormal probability density functions [,]. Thus for all t > 0,

where the parameters μ_i^div and σ_i^div represent the means and standard deviations of the natural logarithms of the random variables T_i^div (and similarly for T_i^die). Since it is more intuitive to discuss the means and standard deviations of the random variables T_i^div and T_i^die directly (as opposed to the means and standard deviations of their logarithms) these quantities are easily defined in terms of the parameters μ_i^div, μ_i^die, σ_i^div, and σ_i^die:

For the basic cyton model, it is standard (following the work [,]) to assume that the random variables T_i^div are identically distributed for all i ≥ 1 and that the random variables T_i^die are identically distributed for all i ≥ 1. These distributions may be different from the corresponding random variables for undivided cells (i = 0). Thus

It is also assumed that F_i = 1 for all i ≥ 1 in the basic cyton model.

Of course, any number of generalizations of the basic cyton model is possible. For instance, following [], the fractions F_i can be defined in terms of a division destiny. Among the cells which are activated to divide (F₀N₀ of them), let p_i be the probability that a cell (or its progeny) ceases to be activated after completing i divisions and define the cumulative probabilities

(Note that we must have c_i → 1 as i → ∞.) It follows that the progressor fractions (for i ≥ 1) are

Rather than estimate the progressor fractions F_i (or the probabilities p_i) independently, we follow the approach suggested in [] and assume that the probabilities p_i can be described as a discrete normal density function defined on the nonnegative integers. Thus the values of the probabilities p_i (and the progressor fractions F_i) are uniquely determined by the mean D_μ and the standard deviation D_σ of a discrete normal distribution. This assumption has been shown to be consistent with experimental data [] and has the beneficial effect of reducing the total number of parameters of the mathematical model.

We can now define the division destinies in terms of the progressor fractions (12). The division destiny d_i is defined to be the fraction of cells out of those cells in the original population which would have proceeded through exactly i divisions in the absence of any cell death. These quantities are computed as

It should be noted that this definition does not make any assumptions regarding the exact lineage of cells (which cannot be determined from flow cytometry data) so that the progeny of a single cell are not assumed to all undergo the same number of divisions. Rather, one can consider a fractional number of precursors. For example, consider a single cell which divides once, after which one of the two daughter cells divides again. Then there are three cells in the total population, and the division destinies are d₀ = 0, d₁ = ⅓, and d₂ = ⅔. Though counterintuitive for a single cell, division destinies provide an indication of the number of divisions undergone averaged over the population of precursors.

Following the work presented in [3], one may also generalize the death rate mechanism for undivided cells to incorporate a separate set of behaviours forunactivated cells. In particular, it can be assumed that a fraction p_d of such cells will remain dormant and neither divide nor die during the experiment. The remaining fraction (1 − p_d) will die with some exponential rate β which is independent of the death-rate distribution of activated cells. It follows that the probability density function describing cell death for undivided cells is

It should be noted that this generalization changes the interpretation of the random variable T₀^die and its relationship to the parameters μ₀^die and σ^die in the sense that the parameters μ₀^die and σ₀^die describe the statistical properties of progressing cells only.

While the more complex death rate function (14) was found to accurately describe a CFSE data set in [3], the data sets collected for this manuscript differ in that the first measurement was taken approximately 24 h after stimulation by PHA (as opposed to immediately following stimulation). As a result, the initial condition for the mathematical model represents only those cells which have not died in the first 24 h after stimulation. It seems reasonable to hypothesize that such cells are unlikely to die at subsequent measurement times. Thus an additional possibility is ψ₀(t) = 0 for all t. (This ψ₀(t) is not a proper probability density function, but is sufficient to describe the intended behaviour. Equivalently, one could assume the function ψ₀(t) has a large mean and small variance so that the support of the density function is effectively limited to a region beyond the final measurement time.)

Finally, we consider one possible generalization of the density function for time-to-first-division for progressing cells. If there are multiple subpopulations (e.g. naive vs. memory cells) contained within the population under study, the density φ₀(t) may be multimodal. For simplicity, we consider a bimodal density function which is a weighted sum of two lognormal distributions,

where f ∊ [0,1] is a weighting parameter.

The possible model parameterizations considered in this report are summarized in Table 1.

Table 1.

List of the possible cyton model parameterizations considered in this report. These models are compared (in terms of their ability to describe experimental data sets) in Tables 3 and and44.

Model	Description	Parameters (cyton only)
Model 1	Basic cyton model; Equations (11), F_i = 1 for all i ≥ 1	9
Model 2	Basic cyton model plus division destiny according to (12)	11
Model 3	Basic cyton model, but with Equation (14) for undivided cell death	11
Model 4	Basic cyton model, but with no undivided cell death (ψ₀(t) = 0)	7
Model 5	Combine models 2 and 3	13
Model 6	Combine models 2 and 4	9
Model 7	Model 1, but with Equation (15) for undivided cell division	12
Model 8	Model 2, but with Equation (15) for undivided cell division	14
Model 9	Model 3, but with Equation (15) for undivided cell division	14
Model 10	Model 4, but with Equation (15) for undivided cell division	10
Model 11	Model 5, but with Equation (15) for undivided cell division	16
Model 12	Model 6, but with Equation (15) for undivided cell division	12

3.4. Distribution of cellular autofluorescence

To this point we have considered a class of models based on the cyton modelling framework which describe the dynamic population generation structure for dividing cells. This class of models has been incorporated into a label-structured partial differential equation model derived by considering the CFSE in a mass-conservation framework. As discussed previously, once the structured densities n_i(t, x) (in terms of the fluorescence x resulting from CFSE) have been constructed, these quantities must be related to the measured fluorescence intensity (which includes the contribution of cellular autofluorescence) via the convolution (5). Autofluorescence is the result of the absorption and emission properties of molecules which are naturally found within all cells and is present even in the absence of an added fluorescent label. Mean autofluorescence is known to increase as cells are activated to divide [], probably as a result of the production of additional intracellular components associated with increased metabolic activity within the cell. Thus the notation of Equation (5) explicitly includes the time-dependence of the autofluorescence density function, p(t, ζ). Because these intracellular molecules are partitioned among daughter cells during cell division, the distribution of autofluorescence can be intuitively considered as a growth and fragmentation process, which is known to produce skew-right density functions such as the lognormal density function []. In fact, it has been shown [] that the distribution of autofluorescence in the population can be well-approximated using a lognormal density function, and thus can be characterized by its mean and its variance. This observation has been used as the basis for approximation techniques to the convolution (5) [].

To test the assumption of lognormality, a portion of PBMCs from each donor were set aside and stimulated with PHA but never labelled with CFSE. Thus the measured fluorescence distribution of these cells (represented in histogram form) can be used to approximate the density function p(t,ζ) representing the actual population distribution of autofluorescence. This autofluorescence data are depicted for the two donors and cell types in Figures 3 and and4.4. To assess the lognormal approximation, we used two parameter estimation schemes to construct lognormal density functions from the autofluorescence data. The first method is the method of moments, in which the exact mean and variance of the measured cells was computed and a lognormal curve was constructed with the same mean and variance. (In the figures, the resulting lognormal density function has been scaled by the number of cells in the data to facilitate comparison.) The second method is to use least squares to estimate the scale, mean, and variance of a lognormal density function.

Measured autofluorescence distributions in comparison to fitted lognormal curves for Donor 1 (left) and Donor 2 (right) for CD4T cells. The least-squares fit to data is visually better than the fit obtained by the method of moments (MM), though the difference is small and becomes less noticeable at later measurement times. This pattern is more noticeable for CD4T cells (here) than for CD8T cells (Figure 4).

Measured autofluorescence distributions in comparison to fitted lognormal curves for Donor 1 (left) and Donor 2 (right) for CD8T cells. The least-squares fit to data is visually better than the fit obtained by the method of moments (MM), though the difference is small.

Though the autofluorescence data are not perfectly lognormal, the assumption is fairly accurate for both cell types and both donors. For CD4 T cells, the lognormal approximation becomes more accurate as time progresses. This is consistent with the existence of an initial transient distribution of autofluorescence which corresponds to the quiescent state of the cells at the start of the experiment; as cells are activated to divide, the lognormal (or at least skew-right) distribution emerges possibly as a result of growth and division processes []. For CD8 T cells, the validity of the approximation does not change much in time.

Significantly, the statistical moments of the autofluorescence distribution are observed to change in time (Figure 5). This is particularly true for the mean. The significant increase in mean autofluorescence between 24 and 48 h is known to be associated with cellular activation. Though the increase is large, it is of little consequence for mathematical modelling because the contribution of autofluorescence to the fluorescence intensity measurements is very small (less than 1%) for undivided cells. The decrease in autofluorescence as measured at approximately t = 96h is more problematic as some cells will have completed multiple divisions by that time; the cause of the decrease is unknown. It is possible that a change in instrument settings between the two measurement days could account for this effect. Yet this seems unlikely as comparable changes are not observed in the data collected for cells labelled with CFSE (Figures 1 and and2).2). Another possibility is nutrient depletion; by the third day of the experiment, the cells begin to run out of the nutrients originally placed in culture and these must be replaced. Nutrient depletion could cause activated cells to die or return to a quiescent state.

Top: Mean autofluorescence as estimated by the method of moments (solid lines) and least squares (dashed lines). Bottom: Standard deviation as estimated by the method of moments (solid lines) and least squares (dashed lines). Data shown for CD4T cells (left) and CD8T cells (right). Mean autofluorescence is observed to increase significantly in the first two days of the experiment, an effect which is known to be the result of cellular activation. The cause of the decrease in mean autofluorescence at approximately 96 h is unknown; nutrient depletion is one hypothesized cause.

Based upon these observations, the population distribution of autofluorescence can be accurately described as a lognormal density function at each measurement time. Thus

where the parameters μ_xa(t) and σ_xa(t) are related to the mean and standard deviation of the autofluorescence distribution (at time t) by the formulas

In the context of parameter estimation for the mathematical model (9), we consider two frameworks. The first is to use the method of moments (as above) with the autofluorescence data to compute E[x_a(t)] and STD(x_a(t)); these values will then be considered fixed while the remaining parameters of the mathematical model are determined by using least squares to fit the data in Figures 1 and and22 (see Section 4). The second method is to ignore the time-dependence of the distribution and assume E[x_a(t)] = E[x_a], STD(x_a(t)) = STD(x_a). The values of E[x_a] and STD(x_a) are estimated in a least-squares framework as described in Section 4, and these estimated values can be compared to the values returned by the method of moments. One could also attempt to parameterize and estimate a function p(t, ζ) which is time-dependent. This is a more complex estimation problem and is unlikely to be identifiable; therefore we do not consider it here.

4. Statistical modelling of CFSE data

Models similar to those discussed in Section 3 have previously been shown to accurately describe population generation structure as measured by flow cytometry [3]. However, the statistical properties of the measurement process have not been nearly as carefully considered. A major limitation of the current modelling framework lies not in the mathematical model itself but rather in the statistical model which links the mathematical model to the data. An accurate statistical model is of vital importance for the consistent estimation of model parameters, as well as the unbiased estimation of confidence intervals around those parameters [5–7,10,19,44]. Additionally, an accurate statistical model is necessary for the rigorous comparison of different model parameterizations and generalizations [2,9,16] and the optimal design of experiments [].

To this point, we have discussed a class of mathematical models which combine the cyton modelling framework of [,] with the label and division structured population models of [,42,47] to describe CFSE data. Given several members of this class of models (see, e.g. Table 1) there is a need to compare the mathematical models on a quantitative basis in order to identify which model provides the ‘best’ (in an appropriate sense) description of an underlying experimental data set. Several techniques based upon information theory [16] or asymptotic properties of least-squares estimators [2,9,24] have been developed for this purpose. In all cases, the techniques are premised upon an accurate statistical model which links the mathematical model to the collected data.

Mathematical descriptions of CFSE data have generally described numbers of cells per generation as estimated from histogram data, rather than the histogram data itself. As such, little consideration has been given to the statistical model which generates the histogram data. In their likelihood estimation framework, Hyrien and Zand [31] propose that the marginal probability density of each datum is normally distributed, and that the variance of this normal distribution is constant for all data points. Least-squares estimators, though not restricted to any parametric class of probability density functions, have also generally assumed a constant variance error model [3,–,47]. However, it has been shown that such an assumption does not accurately describe the variance as observed in actual data sets [,47]. Another common error model for least-squares estimation, in which the variance of each data point is assumed to be directly proportional to the square of the model value at that point (a constant coefficient of variance model) has also been hypothesized, but was again observed to be inaccurate [,47]. Here, we revisit the discussion of [47, Chapter 4] to consider the probabilistic aspects of the actual experimental process itself and derive a hypothetical statistical model from a theoretical basis. This statistical model is then incorporated into a weighted least-squares estimation scheme and several computational algorithms are proposed.

4.1. Theoretical statistical model

Define the structured densities ñ_i(t, ) (in terms of measured fluorescence intensity) for cells having completed i divisions as in Section 3. Then the structured density for the entire population of cells is

Because CFSE histogram data are most commonly represented using abase 10 logarithmic scale, we define the change of variables z = log₁₀() to arrive at

which gives the structured density (measured in cells per base 10 log unit intensity) for the entire population of cells at time t. Let N_k^j be a random variable representing the number of cells measured at time t_j with log-fluorescence intensity in the region [z_k, z_{k + 1}). The goal of the statistical model is to link the structured density (t, z) to the data random variables N_k^j in a manner that is consistent with the statistical properties of the random variables N_k^j. Moreover, in the experimental process, the quantities N_k^j are not directly measured. Rather, only a fraction of the contents of each measurement well are acquired by the measurement apparatus. In order to account for this discrepancy, a known number of beads (which can be identified and counted in the flow cytometry output) are contained within each sample to be measured. By comparing the number of beads acquired to the number of beads known to be in the tube, one is able to estimate the fraction of the sample acquired.

Let be the vector of parameters of the mathematical model (that is, contains the parameters necessary to describe the cyton dynamics as well as the parameters describing the label loss function v(t) and the autofluorescence distribution p(t, ζ)) so that we may rewrite (t, z) = (t, z; ). In order to derive an error model for the histogram data we first make the common assumption that the model is correctly specified so that the structured population density (t, z; ₀) (where ₀ is a hypothetical ‘true’ parameter) perfectly describes the population of cells. Let N_i(t) be defined as in Equation (6) and let N(t) = ΣN_i(t). That is, N(t) is the total number of cells in the population at time t. Define

It follows that p_j(z) is a probability density function. Let S_j be the number of cells of interest (e.g. CD4T cells) sampled at measurement time t_j. Then one can consider the sample of S_j cells (of interest) to be taken without replacement from the total population of N(t_j) cells; the fluorescence intensity of the sampled cells is subject to the sampling density p_j(z). It should be carefully noted that there are numerous steps required to separate the cells of interest from the actual culture of cells passing through the cytometer; see, e.g. 4], Section 2]. References to the total number of cells N(t), and the number of sampled cells S_j are understood to refer only to the specific cells of interest in the experiment. For the moment, we make the additional assumption that these two numbers are exact and are not subject to any errors (systematic, experimental, or otherwise) caused by gating, etc.

Let B be the total number of beads (in each sample tube) which are used to quantify the fraction of the population of cells which is measured at time t_j and let b_j be the ‘true’ number of beads passing through the cytometer. By this, we mean the exact number of beads which would pass through the cytometer if the measured culture were perfectly homogeneous, etc. It follows that

Now consider the kth histogram bin [z_k,z_{k + 1}). The number of cells in the whole population which are contained in this bin is

Let M_k^j be a random variable representing the number of cells (out of the sampled population) counted into the kth bin. Because the measurement process represents a sampling without replacement, it follows that M_k^j is described by a hypergeometric distribution,

That is, S_j cells are sampled without replacement from a population containing a total of N(t_j) cells, of which I[](t_j, z_k; ₀) are of interest. We make the following assumptions regarding the measurement process:

Then it can be shown [23] that , where

The final approximation is valid provided I[](t_j, z_k; ₀) ≪ N(t_j), which is a perfectly reasonable assumption (Table 2).

Table 2.

Notation	Description
ñ(t, z)	Log-transformed label-structured density
N(t)	Total number of cells in the population at time t
p_j(z)	Probability density function from which cells are sampled
S_j	Number of cells sampled at time t_j
B	Total number of beads originally placed into each well
b_j	‘True’ number of beads counted by the cytometer at time t_j
I[](t_j, z_k; ₀)	‘True’ number of cells from the total population belonging in the kth histogram bin at time t_j
M_k^j	Random variable representing the number of cells counted into the kth histogram bin at time t_j
b_j	Actual number of beads counted by the flow cytometer (a realization of b_j)
N_k^j	Random variable resulting when M_k^j is scaled by the ratio B/_j
n^j_k	The actual data, a realization of N_j^k
λ_j	Random variable representing the bead count error ratio b_j/_j

It can easily be shown that the first assumption regarding the measurement process is accurate provided b_j/B ≪ 1, which again is reasonable (the ratio is typically less than 0.1). This assumption is necessary to ensure that the sampling (without replacement) process is conducted in a such a way that the ratio of cells of interest to total cells is approximately constant during the measurement. The assumption is unusual in that it places a restriction on the total amount of data which can be collected. The second assumption regarding the measurement process bounds the probability that a cell belongs to a particular bin away from zero and one (although this assumption is not strictly necessary in some cases [33]). In practice, this assumption is only violated when I[ñ(t^j, z_k; ₀)] ≈ 0.

Finally, when the measurements are actually taken, a certain number of beads _j are actually counted. We would certainly hope that _j ≈ b_j; however, we can think of _j as a realization of some random variable (which may or may not be an unbiased estimator of b_j, depending upon any systematic error that might occur in obtaining bead counts from flow cytometry data). To obtain the histogram data which is actually used to calibrate the mathematical model, one scales the sampled cell counts M^j_k by the inverse of the fraction of the total population actually sampled (as estimated by the number of counted beads _j). Thus the data may be represented by the random variable

It follows that

where we have defined λ_j = b_j/_j. The quantities λ_j in effect represent a ‘scaling error’ and are sufficient to explain the apparent violation of conservation principles often noticed in flow cytometry data (see, e.g. ],20,]; the problem is discussed at greater length in [47, Chapters 1,4]).

From Equation (19), one sees that the variance of the data is not constant (as is the case for data described by constant variance models), nor does it scale with the square of the magnitude of the model (as is the case for data described by constant coefficient of variance models), see [6, Chapter 3]. Rather, the variance grows linearly with the magnitude of the model solution. This relationship has been shown to accurately describe CFSE data [9,47], and we can use this relationship to establish a parameter estimation framework.

4.2. Parameter estimation

The goal of the parameter estimation problem is to find an estimate for the parameter ₀ which is assumed to generate the data (neglecting model misspecification). In the above statistical model, we must also estimate the nuisance parameters = {λ_j}. Given a collection of random variables N^j_k distributed according to Equation (19), one may define the estimators

where the weights are chosen in a manner that accounts for the assumed reliability of each measurement (see below). Experimental data are considered as a set of realizations n^j_k of the random variables N_k^j; these are used to obtain the estimates

We see that the cost functional J, and thus the estimators, depends upon the data random variables N^j_k and thus on the statistical model (19). In order for the estimators _WLS and _WLS to be asymptotically optimal, the weights w^j_k must be chosen to match the variance of the random variables n^j_k43],44],

The cutoff value I∗ > 0 is determined by the experimenter so that the resulting residuals appear random. In the work that follows, I∗ = 200. The values of B and _j are known from the experiment. Notice that the computation of the weights (22) depends upon the value of the ‘true’ parameter ₀ as well as on the nuisance parameters . As a result, one must use an iterative estimation procedure [6,19].

Traditionally, it has been assumed that λ_j = 1 for all j3],11–13,] – that is, that there is no scaling error. While this assumption is obviously violated by some data sets [,47] it is not clear that the incorporation or omission of the nuisance parameters will have a significant effect on the estimation of parameters. In practice, the nuisance parameter vector must be estimated in conjunction with the model parameter vector in a two-stage process. Unfortunately, two-stage estimation may cause some parameters of the mathematical model to become unidentifiable (or, at the very least, the variance of the estimators for certain parameters may increase dramatically). For this report, it will be assumed that λ_j = 1 for all j and the nuisance parameters will not be estimated. As will be seen in Section 5, the available data are well-described by the mathematical model even under this simplified assumption. We consider the following estimation algorithm:

(1)
Set ℓ = 0. Obtain an initial estimate (the ordinary least-squares estimate) by solving Equation (21) with λ = (1, …, 1) and w_k^j = 1 for all j and k,
(2)
Compute the weights w^j_k for each j and k according to Equation (22) with ₀ replaced by ⁽⁰⁾ and with = (1,…, 1);
At iteration ℓ, compute ^ℓ according to Equation (21) with the current weights, and with = (1,…,1);
(4)
Update the weights again according to Equation (22), now with ₀ replaced by ^(ℓ) (and = (1,…, 1) still); increment ℓ;
Repeat steps 3 and 4 until convergence is obtained.

5. Results

Twelve models of cyton dynamics are considered in Table 1 and two methods of estimating the statistical moments of the autofluorescence distribution(s) as proposed in Section 3.4 are used. In order to determine which of these model formulations best describes the available data, we need mathematical tools for rigorous model comparison. These tools are explored in Section 5.1 and a best-fit model is selected. The fit of this model to the data, as well as the statistical model of the data, are then discussed. Finally, the dynamic responsiveness of the cells from the experimental data is analysed.

5.1. Model comparison

From Section 3.3, it is clear that some of the models of cyton dynamics are refinements of other models (in the sense that the more complex model includes all possible solutions of the simpler model). For instance, the basic cyton model (Model 1) can be considered as a refinement of a cyton model for which it is assumed ψ₀(t) = 0 (Model 6). This is because, for appropriate choices of the parameters E[T₀^die] and STD[T^die₀], Model 1 is exactly equivalent to Model 6. In such a case, it is clear that the more complex model must result in a minimized cost functional at least as low as that of the simpler model; yet the more complex model has more parameters, and one must consider the possibility of overparameterization against this decrease in the least-squares cost. To this end, results from the asymptotic theory of least-squares estimators can be extended [9] to weighted least-squares estimators such as Equation (20).

Assume (without loss of generality) that the nuisance parameters are known. Let _WLS be defined, as above, as the minimizer over the admissible parameter space Q of the least-squares cost function . Now define _WLS to be the minimizer of over the restricted parameter set Q_H, where Q_H = {q ∊ Q|H_q = h} for some linear function H of rank r and a vector h of size r × 1. (For nonlinear restrictions on the parameter space, a locally equivalent condition can be derived from the first order linearization of the nonlinear constraint; see [9].) In such situations, the model comparison problem can be recast as one of hypothesis testing. Consider the null and alternative hypotheses,

Then under fairly general conditions (see [9]) and assuming the null hypothesis is true, the test statistic

(where n is the total number of data points) is asymptotically a chi-square random variable with r degrees of freedom, U_n ∼ χ²(r). Thus given the data {n^j_k} as realizations of the random variables n^j_k, one obtains a realization u_n of U_n which can be used to assess the likelihood that the decrease in cost associated with the unrestricted parameter space is the result of chance (see [6, Section 3.5]). The complete conditions under which U_n is asymptotically distributed as a chi-square random variable, as well as a proof of the result, can be found in [9] and the references therein.

For comparison among models which are not refinements, one can use information theoretic criteria such as Akaike's Information Criterion (AIC). From Equation (19), it follows that the scaled residuals

are independent and normally distributed with constant variance (for all k and j). Then the AIC, which is the expected value of the relative Kullback–Leibler distance for a given model [16], is

where p is the dimension of the space Q for the particular model of interest. Because K_n provides information regarding the relative Kullback–Leibler distance, AIC values have meaning only in comparison to one another. The model which results in the lowest AIC value is the most likely model for the particular data set being investigated. We note that a direct comparison of the costs (in Tables 3 and and4)4) may not be reasonable due to the fact that the AIC values also must take into account the numberp of parameters estimated in a given model (see Equation (24)). A derivation of the AIC as well as numerous examples can be found in [16].

Table 3.

Minimized weighted least-squares costs for various cyton model parameterizations (see Table 1). Autofluorescence estimated as a time-invariant lognormal density function by least-squares fit to data.

CD4T Cells		CD8T Cells
Donor 1	Donor 2	Donor 1	Donor 2
Model 1	6.0547 × 10⁴	11.812 × 10⁴	7.9383 × 10⁴	20.348 × 10⁴
Model 2	5.4212 × 10⁴	5.1257 × 10⁴	2.8669 × 10⁴	4.5650 × 10⁴
Model 3	6.0344 × 10⁴	11.812 × 10⁴	8.2439 × 10⁴	20.348 × 10⁴
Model 4	6.1677 × 10⁴	11.863 × 10⁴	7.9383 × 10⁴	20.348 × 10⁴
Model 5	5.4212 × 10⁴	5.1257 × 10⁴	2.7963 × 10⁴	4.5651 × 10⁴
Model 6	5.4212 × 10⁴	5.1257 × 10⁴	2.8670 × 10⁴	4.5650 × 10⁴
Model 7	4.3175 × 10⁴	9.4856 × 10⁴	6.1827 × 10⁴	18.761 × 10⁴
Model 8	3.4376 × 10⁴	4.2341 × 10⁴	2.8038 × 10⁴	3.9553 × 10⁴
Model 9	4.4006 × 10⁴	9.5017 × 10⁴	6.0628 × 10⁴	19.563 × 10⁴
Model 10	4.3196 × 10⁴	9.4931 × 10⁴	6.2538 × 10⁴	18.761 × 10⁴
Model 11	3.4375 × 10⁴	4.2346 × 10⁴	2.8004 × 10⁴	3.9551 × 10⁴
Model 12	3.4376 × 10⁴	4.8931 × 10⁴	2.8038 × 10⁴	3.9554 × 10⁴

Table 4.

Minimized weighted least-squares costs for various cyton model parameterizations (see Table 1). Autofluorescence estimated using the method of moments at each measurement time.

CD4T Cells		CD8T Cells
Donor 1	Donor 2	Donor 1	Donor 2
Model 1	7.1255 × 10⁴	14.302 × 10⁴	8.5425 × 10⁴	23.068 × 10⁴
Model 2	5.8092 × 10⁴	5.4022 × 10⁴	3.9701 × 10⁴	5.8327 × 10⁴
Model 3	7.1097 × 10⁴	14.313 × 10⁴	8.4900 × 10⁴	23.503 × 10⁴
Model 4	7.2613 × 10⁴	14.344 × 10⁴	8.5434 × 10⁴	23.068 × 10⁴
Model 5	5.8092 × 10⁴	5.4020 × 10⁴	3.9709 × 10⁴	5.8309 × 10⁴
Model 6	5.8092 × 10⁴	5.4041 × 10⁴	3.9701 × 10⁴	5.8327 × 10⁴
Model 7	4.3392 × 10⁴	11.098 × 10⁴	6.7689 × 10⁴	20.965 × 10⁴
Model 8	3.3741 × 10⁴	4.4724 × 10⁴	3.7443 × 10⁴	5.2944 × 10⁴
Model 9	4.3418 × 10⁴	11.098 × 10⁴	6.8680 × 10⁴	20.952 × 10⁴
Model 10	4.3392 × 10⁴	11.098 × 10⁴	6.7689 × 10⁴	20.965 × 10⁴
Model 11	3.3741 × 10⁴	4.4730 × 10⁴	3.7436 × 10⁴	5.2944 × 10⁴
Model 12	3.3741 × 10⁴	4.4724 × 10⁴	3.7443 × 10⁴	5.2944 × 10⁴

With these tools, we are now ready to compare the models suggested in Sections 3.3 and 3.4. The minimized costs J((_WLS, _j)) for each donor, cell type, and model are summarized in Tables 3 and and4.4. Table 3 contains the results when the autofluorescence distribution is assumed to be time-invariant and its statistical moments are estimated within the least-squares framework summarized in Section 4; Table 4 contains the results when the autofluorescence distribution is estimated at each measurement time using the method of moments.

Of the 12 models of cyton dynamics considered, Model 12 is generally selected as the best model. The only exception is for Donor 2 CD4 T cells when estimating a time-invariant autofluorescence distribution using least squares. In this case, Model 8 is narrowly selected by the AIC (K = 9525.77 compared to K = 9525.98), although AIC differences less than 2 are generally not considered significant [16]. On the other hand, the model comparison test statistic (Model 8 is a refinement of Model 12) is u_n = 5.7992, so that one would reject the null hypothesis (Model 12) only at confidences less than 87.82%, which is lower than typical thresholds for hypothesis testing. Thus the results for this particular data set are ambiguous. It should be acknowledged that Model 8 and Model 12 are quite similar (Model 8 is the generalization of Model 12 allowing for undivided cell death), so that the distinction between the two is quite small. It seems safe to consider Model 12 to be the most parsimonious model of the data for both donors and for both cell types.

A comparison of the two methods of treating autofluorescence is not straightforward. On one hand, the minimized costs given in Tables 3 and and44 are directly comparable in the sense that the costs do not include any measure of cost associated with the autofluorescence data, whether or not that data are used. On the other hand, the estimation of a time-invariant autofluorescence distribution does not make any use of the autofluorescence data. From this perspective the two methods of describing autofluorescence are associated with distinct collections of data, even if the histograms of interest (e.g. those shown in Figures 1 and and2)2) are the same. It is also not clear whether the failure of the minimized cost functionals to reflect the costs associated with the autofluorescence data is a strength or a weakness of the current approach. When such data are available (as it is here) it seems that it would make for a useful comparison. However, the collection of such data requires additional experimental setup, and these data itself are only interesting to the extent that it helps to describe the dynamics of cellular division and death as observed in the histogram profiles of labelled cells.

Comparing the two approaches strictly in terms of accuracy in describing the histogram profiles of labelled cells (that is, comparing the minimized costs in Tables 3 and and4),4), the results depend upon cell type. For CD8 T cells, there is a clear advantage in describing the autofluorescence distribution as time-invariant and estimating the moments of the distribution in a least-squares framework. For CD4 T cells, the results are less clear. For Donor 1, there is a very small improvement (among the more accurate models) in using the method of moments to estimate the autofluorescence distribution. For Donor 2, the method of moments works best for Model 12, but not for Model 8 (which is the AIC-selected model when autofluorescence is estimated by least squares). Again, the difference is very small. The analysis presented in the remainder of this document is based on results obtained with Model 12 with a time-invariant autofluorescence distribution estimated in a least-squares framework.

5.2. Analysis of the mathematical and statistical models

The fit of the mathematical model to data for both donors is summarized in Figures 6 (CD4 T cells) and 7 (CD8 T cells). The figures show the best-fit model solution of the mathematical model in comparison to the data; the shaded region indicates the expected level of ‘noise’ in the data as a result of the measurement process. That is, if the calibrated mathematical model were perfectly specified (E[N^j_k] = I[](t_j, z_k; )), the data would oscillate randomly around the model solution. Since the data are assumed to be normally distributed (see Section 4), the 4 standard deviation region highlighted should contain 99.9% of the data points.

Calibrated model with CD4T cell data for a particular set of triplicates from Donor 1 (left) and Donor 2 (right). Shaded regions indicated a 4 standard deviation confidence region computed according to the theoretical statistical model (19), assuming the model is correctly specified.

Overall, the fit to data is good. For Donor 1 CD4 T cells, the estimated mean and standard deviation of the autofluorescence distribution are E[x_a] = < 372.42, STD[x_a] = 220.08. For Donor 2 CD4 T cells the estimates are E[x_a] = 543.76, STD[x_a] = 251.90. For CD8T cells the estimates are E[x_a] = 658.35, STD[x_a] = 319.79 and E[x_a] = 542.76, STD[x_a] = 271.25, for Donors 1 and 2, respectively. These numbers are within reason when compared to measured autofluorescence data (Figure 5).

The primary shortcoming of the statistical model is the assumption of correct specification, i.e. E[N^j_k] = I[](t_j, z_k; ). There are clearly instances in Figures 6 and and77 where the data are not centred around the mathematical model, indicating that the mathematical model is systematically in error. As a result, the shaded regions do not contain 99.9% of the data points. From one perspective, this shortcoming of the statistical model is entirely mathematical – if an improved mathematical model were available, then it is possible that the assumption of correct specification would be accurate. Alternatively, it is possible to consider a more general statistical model which directly considers the effects of misspecification, for instance by assuming an autoregressive error structure [24]. This may provide a more accurate measure for model comparison in the absence of a more accurate mathematical model.

Calibrated model with CD8 cell data for a particular set of triplicates from Donor 1 (left) and Donor 2 (right). Shaded regions indicated a 4 standard deviation confidence region computed according to the theoretical statistical model (19), assuming the model is correctly specified.

It is also assumed in the statistical model that Var[N_k^j] ∝ I[](t_j, z_k; ). Traditionally, the accuracy of this assumption is examined by residual plots [6, Chapter 3]. For instance, the modified residuals (23) should be randomly distributed when plotted against the magnitude of the model solution. However, this analysis is premised upon the assumption of correct specification, and thus cannot be performed here. For other data sets, it has been shown that the statistical model presented in this document does accurately describe the variance in CFSE-based flow cytometry data [9,47].

In spite of these shortcomings, it should be emphasized that the fit of the model is quite good for both donors and cell types. As such, we proceed to analyse the dynamic responsiveness of the measured cells in the current modelling framework.

5.3. Analysis of dynamic responsiveness

From the calibrated mathematical model, one can compute the probability density functions φ_i(t) and ψ_i(t) from which the times to divide and die are assumed to be drawn in the cyton model of cell division. One can also summarize the division destiny of each population of cells. These are summarized graphically in Figure 8 for CD4 T cells and Figure 9 for CD8 T cells.

Comparison of estimated CD4 T cell division dynamics between Donor 1 (left) and Donor 2 (right). Top: Probability density function φ₀(t) for time to first division for initially undivided CD4 T cells, scaled by the initial progressor fraction F₀ of activated cells. Percentages indicate the fraction of undivided cells which will have entered their first division by the next measurement time (vertical dashed lines). On average, cells for Donor 1 complete their first division more rapidly than those for Donor 2; cells from Donor 1 are also more likely to have divided in response to stimulus before the end of the experiment. CD4T cells from both donors are estimated to respond more slowly and in greater frequency than CD8 T cells (compare Figure 9). However the total CD 8 T cell response for Donor 1 is greater than the CD4T cell response while the total CD4 and CD8 responses are comparable for Donor 2. Middle: Probability density functions for time to subsequent division or time to die (inverted) for CD4 T cells having completed at least one division. Cells from Donor 2 divide slightly more rapidly and more synchronously than those from Donor 1. Bottom: Division destiny, indicating the average number of divisions undergone by cells initially in the population (at t = t₀). The fraction of cells with division destiny equal to zero estimates the relative abundance of cells which will not become activated to divide.

Comparison of estimated CD8 T cell division dynamics between Donor 1 (left) and Donor 2 (right). Top: Probability density function φ₀(t) for time to first division for initially undivided CD8 T cells, scaled by the initial progressor fraction F₀ of activated cells. Percentages indicate the fraction of undivided cells which will have entered their first division by the next measurement time (vertical dashed lines). On average, cells for Donor 1 complete their first division more rapidly than those for Donor 2 and cells from Donor 1 are more likely to have divided in response to stimulus before the end of the experiment. Middle: Probability density functions for time to subsequent division or time to die (inverted) for CD8 T cells having completed at least one division. Cells from Donor 2 divide slightly more rapidly and more synchronously than those from Donor 1. Bottom: Division destiny, indicating the average number of divisions undergone by cells initially in the population (at t = t₀). The fraction of cells with division destiny equal to zero estimates the relative abundance of cells which will not become activated to divide.

Notice that the cytons ₀(t) have been truncated to the left at t = t₀ = 23.5. This is because no information is available before the first measurement time; the assumption that the density functions ₀ can be described by a weighted sum of lognormal densities does not require that the support of ₀ be contained in the region t ≥ t₀, so this condition must be additionally imposed (see the appendix). This gives the impression in Figures 8 and and99 that some fraction of cells will begin to divide immediately following the first measurement time. Though this may be true, it is beyond the capabilities of the current modelling framework to determine. This is because the estimated parameters are only unique up to the numbers of cells predicted to divide and die between measurement times. In other words, if {t_j} is a collection of measurement times, the model provides meaningful estimates of the quantities

the fraction of cells from the initial population estimated to have completed the first division between measurement times t_j and t_{j + 1}. These quantities (converted to percentages) are superimposed on the graphs of the curves F₀φ₀(t) in Figures 8 and and9.9. Thus, although the current modelling framework does not provide information regarding when cells will begin to divide it does provide meaningful information on the distribution of times to first division. Unsurprisingly, this information is constrained by the frequency at which the population is measured.

We see that CD4 T cells from Donor 1 reach their first division more rapidly and in greater quantity than CD4T cells from Donor 2. By the end of the experiment, 75.34% of the initial population of CD4 T cells from Donor 1 had divided in response to stimulus, compared to 69.50% for Donor 2. CD8 T cells from Donor 1 likewise respond more rapidly and in greater quantity than those from Donor 2. By the end of the experiment, 88.44% of the initial population of cells from Donor 1 had divided, while only 63.94% of those from Donor 2 had divided. Comparing CD4 T cells and CD8 T cells within the same donor, we observe that CD8 T cells complete their first division more quickly than CD4 T cells.

For cells having already completed at least one division, CD4 T cells from Donor 1 are estimated to divide more slowly and with greater variation across the population (E[T_i^div] = 21.2, STD[T_i^div] = 19.5) compared with those from Donor 2 (E[T_i^div] = 11.3, STD[T_i^div] = 5.7). Similar behaviour is observed for CD8T cells (E[T_i^div] = 11.2, STD[T_i^div] = 4.6 for Donor 1; E[T^div] = 94, STD[T_i^div] = 3.0 for Donor 2).

The analysis of cell death is complicated by several factors. While the cyton density functions φ_i and ψ_i are assumed to be identical for i ≥ 1 (that is, for all cells having divided at least once), the fraction F_i of progressing cells varies from one generation to the next according to Equation (12). The fraction of cells which are non-progressors dies according to the density function ψ_i, but will not divide. Thus the division destiny for the population of cells must be taken into account when interpreting cell death. Simultaneously, even among cells which would be progressors, cell death is a possibility if that cell's time-to-die (sampled according to the density ψ_i is less than time-to-divide (sampled according to the density φ_i).

Based upon the densities φ_i and ψ_i (i ≥ 1) in Figures 8 and and99 and ignoring division destiny, it would appear that cells from Donor 1 (both CD4T and CD8T) are more likely to die relative to cells taken from Donor 2. Yet when the numbers of dying cells are computed (Figure 10), we find that this hypothesis holds for CD4 T cells (cells from Donor 1 are slightly more prone to die) but completely fails for CD8T cells. When division destiny is taken into account, we see that CD4T cells from Donor 1 are estimated to have a much narrower division destiny than CD4 T cells from Donor 2. As a result, fewer cells progress to high division number (i ≥ 6), and these non-progressors begin to die. The result is that more cells are observed with high division number for Donor 2, and the expansion of the population of cells over the course of the experiment is greater for Donor 2. For CD8T cells, the estimated division destiny for Donor 2 is narrow but has a high mean. Coupled with the faster rate of division (middle-right panel, Figure 9), we see that most CD8T cells from Donor 2 proceed through large number of divisions, after which the population has reached its division destiny and the size of the population reaches a maximum before cells begin to die.

Cell numbers and population generation structure for CD4T cells (top) and CD8T cells (bottom) for Donor 1 (left) and Donor 2 (right). Shaded areas represent model generated numbers while dots represent experimental data values. Total numbers of dead cells (with generation structure) are shown inverted. Note that the numbers of dead cells are cumulative and does not reflect any decrease in numbers of dead cells which would be associated with disintegration. While it is difficult to determine the numbers of dying cells from cyton graphics alone (Figures 8 and and9),9), one can clearly compare the relative frequency of cell death between donors and cell types using these reconstructions of the population behaviour. For both donors and cell types, cell death is estimated to be negligible until 3–4 days after stimulation (not considering any cells which die before the first measurement is taken). After this time, most cells appear to have reached their division destinies (see Figures 8 and and9,9, bottom) after which time they begin to die.

Thus, in effect, division destiny imposes a limit on the degree to which a population of cells can expand in response to stimulus. For CD4T cells from Donor 1, the population appears to have reached its maximum expansion just as the experiment ends, expanding by a total factor of 6.01 (ratio of maximum population size to initial size). For CD4T cells from Donor 2, the expansion factor at the end of the experiment is 9.75 and still rising. For CD8T cells, Donor 1 expands by a factor of 20.88 (and rising) by the end of the experiment, while for Donor 2 the population expanded by a factor of 23.55 before it began to contract.

Therefore we see that the cytons φ_i and ψ_i provide meaningful information regarding the times at which cells will divide or die, but this information must be carefully interpreted with respect to division destiny. This can be accomplished by reconstructing the population generation structures for viable and dead cells (as in Figure 10). Then, one can make deductions concerning the viability of the populations of cells by analysing the numbers of cells as described above.

6. Concluding remarks

In this document, we have described and analysed a recent class of mathematical models which combines the cyton model of population generation structure with a mass-conservation model of label dynamics. Unlike previous label-structured models, the new class of models describes the processes of cellular division and death in intuitive terms which are relatable to important biological features. Significantly, because the new models can be fit directly to CFSE histogram data, it is possible to consider the statistical properties of such data. From these properties and under mild assumptions, a statistical model of the data has been derived and incorporated into a least-squares parameter estimation framework. Using this framework, various models selected from the new class of models were fit to experimental data and compared. The best-fitting model has been observed to accurately describe the behaviour of both CD4 T cells and CD8 T cells acquired from two healthy donors and stimulated to divide with PHA.

Of those models tested, the selected model (Model 12) features a bimodal distribution of times to first division and ignores cell death for undivided cells. From the distribution of times to first division, it is possible to compute the fraction of cells completing the first division between each set of measurements. Distributions for cell division and death for cells having already completed at least one division are assumed to be described by lognormal density functions. Though the best-fit mathematical model is observed to be accurate, there is some room for improvement. To this end, the modelling framework presented here is readily generalizable; any distributions of times to divide and/or die which admit density functions can be tested. Moreover, because the mathematical solution is separable (see Proposition 3.2) the cyton model of population generation structure can be replaced by any other model of cellular dynamics with a sufficiently similar form (e.g. branching process models [,]). It is possible that the model may be improved further by amore detailed consideration of the effects of cellular autofluorescence and the changes of the autofluorescence distribution over time.

The primary shortcoming of the statistical model is the assumption that the model is correctly specified; otherwise, the statistical model has been previously shown to correctly account for the variance in histogram data [9,47]. Thus, for a sufficiently accurate mathematical model, the statistical model of CFSE data presented here can be incorporated into a model comparison framework so that alternative mathematical descriptions of cell division can be tested in a manner that is statistically rigorous. However, the lack of inclusion of model misspecification in the statistical model suggests that to use this statistical model for computation of confidence intervals via either asymptotic theory or bootstrapping may not be appropriate. Moreover, the statistical model given by Equation (19) is

where ℰ_kj ∼ (0, 1), was derived by considering a single histogram bin at a single time. The statistical model results from repeating this derivation for each histogram bin at each measurement time. However, in this derivation, the dependence of the cell counts (and thus the probabilities) on additional factors has been completely ignored. The model values I[](t_j, z_k) will depend strongly on the set of bins [z_k, z_{k + 1}) used for the histogram data. It can be readily seen that the level of noise in the data (relative to the magnitude of the data) increases as the number of bins increases. Conversely, as fewer bins are used, the data are effectively ‘smoothed out’ or averaged and some smaller features of the population data may be lost. Thus there seems to be some optimal number of bins to use to represent the histogram data. On one hand, this possibility could be assessed by trial and error on the number of histogram bins. Alternatively, the statistical model might be analysed and/or generalized to explicitly incorporate the dependence of the statistical model on the number of histogram bins, and more importantly, the dependence of parameter confidence intervals on the number of histogram bins. Finally, the statistical model makes the simplifying assumption that the numbers of cells counted into each distinct histogram bin represents an independent (from the other bins) process. But this is not true, for if S_j cells are measured at time t_j, then we must have the identity ∑_kM_k^j. Thus the random variables representing the numbers of cells N^j_k = B/_jM^j_k in the total population counted into distinct bins are not independent. So, for various reasons we cannot expect to be able to compute standard errors or confidence bounds for the estimated parameters in an unbiased manner [6,7].

The work presented here demonstrates how the current modelling framework can be used as a basis for comparison between multiple donors and/or cell types. There are two primary limitations for such comparisons. First, while a comparison among donors and/or cell types may focus on the differences in estimated moments and/or cyton density functions (such as those shown in Figures 8 and and9),9), the information contained within these estimated distributions is limited to the chosen modelling framework. Thus, for instance, one cannot determine the time at which cells first begin to divide only from this information. Of course, more complex models can be incorporated into the current modelling framework if knowledge of this information should prove necessary. Second, there has not been (to our knowledge) a comprehensive study of the biological and experimental variability inherent in the measurement process. In other words, it is not known how the behaviour of cells from a single donor may vary from day to day (if multiple blood samples are acquired) or from sample to sample (even if acquired at the same time).

Because the Malthusian cell proliferation and death rates of [,] are not necessarily compatible with a requirement of minimum cell cycle times, we have here incorporated and used the cyton models of Section 3.3. As noted above (see Equation (10)) this new formulation is compatible with time-dependent Malthusian death rates β_i(t) (and in some cases with time-dependent Malthusian proliferation rates α_i(t)). However, several other generalizations of the proliferation and death rate terms are immediately available.

One might consider, for example, the addition of a second structure variable (say, volume or physiological age []), which could be used to enforce a minimum cell cycle time by requiring that cells progress from some size V to 2V before dividing, at which point two cells of size V are produced. However, in the absence of additional observations, it is unclear what parameters (e.g. average rate of growth, or the structure variable V) could be estimated from CFSE histogram data. Video microscopy measurements by Hawkins et al. [] indicate that average cell size may be division dependent, and this may complicate the inclusion of volume structure. Biologically, it is expected that apoptosis occurs only at particular checkpoints in the cell cycle (particularly if external ‘kill signals’ are absent), so that a generalization to volume structure (or any other surrogate for cell cycle position or physiological age []) may permit a more accurate description of cell death. Still, it is unclear what information might be available when considering only CFSE histogram data. It is possible that the forward scatter (FSC) of laser light might be used as some sort of observable surrogate for cell size, but additional work will be necessary to investigate this hypothesis. However, a more promising approach may involve use of recently developed fluorescence microscopy data (such as the fluorescent ubiquitination-based cell cycle indicator in []) to estimate probability density functions representing durations of cell cycle phases.

Ideally, cell cycle parameters (as represented in the cyton model) can be related back to more physically/experimentally meaningful parameters such as the type and strength of stimulation, which may, in turn, require the translation of certain molecular pathways within individual cells into mathematical equations/expressions. Recent work has indicated that the mechanisms responsible for cell proliferation and death may be mutually dependent upon a common molecular pathway [,]. As more data become available, we hope to examine how the estimated parameters change under various experimental conditions, with an eye toward additional constitutive relationships linking molecular and/or subcellular functions to population dynamics []. In this context, it seems necessary to consider the extent to which these functions and/or pathways are inherited. Evidence suggests that closely related cells exhibit strong correlation in times to divide and some correlation in times to die, and that this correlation tends to decrease with the number of divisions undergone []. Cells with a common precursor may also share a common division destiny [], which can be altered by stimulation conditions []. While computed cell numbers are relatively unaffected provided correlation is limited to cells having undergone the same number of divisions [,], correlation between subsequent division of cells can alter the dynamics predicted by a mathematical model []. For large populations, this effect seems negligible, but may play an important role in vivo where only a small number of responding cells can trigger an immune response []. As noted above, branching process models have been formulated to account for various levels of correlation, and these models may be incorporated into the compartmental model framework as described above.

In spite of the limitations discussed above, the proposed mathematical and statistical framework represents a positive step toward a more comprehensive model of cellular division as measured by flow cytometry. The flexibility of the class of mathematical models combined with the probabilistic treatment of the data collection process allows for a rigorous comparison between competing descriptions of cellular behaviour. As such, this framework can help to test biological hypotheses and serve as a bridge between cellular-level events and population-level observations. This framework can also be used to study the optimal design of experiments; thus it may be possible to identify measurement times which minimize the size of the blood sample required while maximizing the information one can obtain. In the future, it will be possible to analyse cellular behaviour for donors in a variety of clinical states and thus to develop a more complete understanding of infectious disease, immunosuppressive drug actions, etc.

Acknowledgements

This research was supported in part by Grant Number NIAID R01AI071915-10 from the National Institute of Allergy and Infectious Diseases, in part by the Air Force Office of Scientific Research under grant number AFOSR FA9550-12-1-0188, in part by the National Science Foundation under Research Training Grant (RTG) DMS-0636590, and in part by grant SAF2010-21336 from the Spanish Ministry of Science and Innovation. The authors are grateful to several referees for a number of helpful comments.

Appendix 1. Comments on numerical methods and code

The computational algorithm for the model (9) is an extension (accounting for the incorporation of cyton dynamics) of the algorithm originally proposed by Allgöwer et al. []. Because the solution is factorable (Proposition 3.2), it is possible to compute the population generation structure (N_i(t), for i ≥ 0 and t ≥ 0) independently, and then to use this information to compute the population label structure (ñ_i(t), ), for i > 0, t ≥ 0, and x ≥ 0). Naively, one could compute the functions N_i(t) numerically and the functions n_i(t, x) either numerically or exactly (depending on the form of the function v(t)). One then obtains the densities n_i(t,x) by Proposition 3.2 and the densities ñ_i(t,x) by the convolution integral (5). However, this naive approach is computationally intensive as a result of the convolution. As shown in [], there is a more efficient method. We first discuss the overall computational scheme for the construction of the population label structure, followed by a detailed algorithm for the computation of the population generation structure.

A.1. Computation of population label structure

Assume one has already computed the functions N_i(t). The solutions n_i(t, x) of Equation (4) can be obtained using the method of characteristics,

Now, assume (for the moment) that the initial label density in the population is lognormally distributed with parameters μ₀ and σ₀ so that

forx > 0. Inserting this definition into Equation (A1),

where

In other words, if the initial label density is lognormally distributed, then the distribution of CFSE (that is, the distribution of fluorescence intensity resulting from CFSE) will be lognormally distributed at all times, with parameters μ_i(t) and σ₀. Now, assume more generally that the initial condition can be written as a convex combination of lognormal density functions

This assumption is not overly restrictive and the initial condition (see the measurements collected on Day 1 in Figures 1 and and2)2) can be well approximated by such a series. Then by the principle of superposition, Proposition 3.2, and Equations (17) and (A2),

where μ_i^k(t) is given as above. To account for the contributions of cellular autofluorescence, we have from Equation (5)

By assumption, p(t, ζ) is itself a lognormal density function and the integral above (for each pair of values (i, k)) is the convolution of two lognormal density functions, which can be accurately approximated by a lognormal density function having a mean and variance which is the sum of the means and variances of the two density functions in the convolution (see [] and the references therein). In other words

where

Thus, given the population generation structure N_i(t), the values {f_k}, {μ^k}, and {(σ^k)²} which represent the initial condition Φ(x), and the parameters E[x_a] and STD[x_a] describing the distribution of autofluorescence, one can very quickly construct the solutions ñ(t, ) using Equation (A3). Significantly, this approximation to the solution does not involve any discretization in the structure variable (x or ) so that solutions can be evaluated cheaply even on a very fine mesh in the structure variable. We find, in agreement with [], that this method of approximation increases computational speed by several orders of magnitude over other methods of solution [].

Once the label-structured densities ñ_i(t,) have been obtained, we must compute the cell counts I[](t_j,z_k; ) according to Equation (18). The values ñ(t, ) can be computed very cheaply and efficiently by Equation (A3) so that a large number of evaluations of ñ(t, ) can be used in the approximation of the integral operator I[](t_j, z_k; ) with no adverse effect on computational time. For the results presented in this manuscript, the values I[](t_j,z_k; ) have been approximated using two point Gauss–Legendre quadrature on each interval [z_k, z_{k + 1}].

A.2. Computation of population generation structure

We now discuss a computational scheme for a general cyton model, given by Equations (6)–(8). Assume φ_i(t) and ψ_i(t) are known functions of time for all i ≥ 0. Given initial and final measurement times t₀ and t_f, as well as a time step size h, define the number of time steps

and the time grid points

For eachj, the values ₀(t_j) and ψ₀(t_j) can be precomputed for each i ≥ 0 and stored in vectors of size (N + 1). Similarly, the values can be precomputed and stored in vectors. The integration can be efficiently carried out using two-point Gauss–Legendre quadrature on each subinterval of size h. We have generally found h = 1 to be a sufficiently small time step. The results in this document were all obtained with h = 0.25. Because precomputation is cheap, computation of the terms using a higher order rule (e.g. Gauss-Legendre quadrature) allows for a larger value of h than would otherwise be acceptable.

When the functions φ₀(t) and ψ₀(t) are parametric density functions (as is the case in this document), it is possible that a portion of the support of the functions lies in the half line t < t₀. In order for the cyton model to function properly, it must be true that for all i ≥ 0. Since one does not have any information about the behaviour of the population of cells prior to t = t₀, the simplest method of resolving this problem is to truncate (on the left) the functions φ₀(t) and ψ₀(t) at t = t₀. Thus, we can define

If one of the denominators above is zero, we define to be identically zero. We will simply refer to φ₀(t) and ψ₀(t) without tildes, although it should be understood that the functions have been appropriately scaled.

Given the precomputed vectors above, one can compute the quantities n₀^div(t_j) and n₀^die (t_j) according to (7) for each j. These equations can be computed for all values of j simultaneously using a single element-wise vector multiplication. From the quantities n₀^div (t_j) and n₀^die (t_j), one can obtain N₀(t_j) using the trapezoidal rule.

Next, we can define an additional vector of time values, s_j = t_j for all j = 1, …,(N + 1). Then the values of ϕ_i(t_j − s_k), ψ_i(t_j − s_k), can be precomputed for i ≥ 1 and stored in an array of size (N + 1) × (N + 1). To do so efficiently, one can simply compute (and similarly for ψ_i) where _j = t_j − t₀ for each j and store these quantities in a vector; the (j, k) entry of each array is the (j − k + 1) entry of the corresponding vector, or is zero if j − k + 1 < 1. Note that in this document, φ_i(t) and ψ_i(t) are identical for all i ≥ 1, so that only 4 arrays are required to store these precomputed values.

Given these precomputed vectors, one can compute (recursively on i) the values n_i^div(t_j) and n^die(t_j) according to Equation (8) for each j. Again, these values can be computed for all values ofj simultaneously by carefully vectorizing the resulting operations. The terms n_{i − 1}^div in Equation (8) are vectors of size (N + 1), which can be replaced by an (N + 1) × (N + 1) array where each row is the vector of values n_{i − 1}^div(t_j). Then Equations (8) can be computed using element-wise matrix multiplication, followed by quadrature (using the trapezoidal rule) over values of s_k. From the quantities n_i^div (t_j) and n_i^die(t_j), one can obtain N_i(t_j) using the trapezoidal rule.

A.3. Inverse problem/parameter estimation

Given the forward solution constructed as described in the previous subsections, this information must be incorporated into a computational scheme for the optimization problem (21). This optimization is carried out using the BFGS algorithm as implemented in the Matlab routine fmincon. Computations were carried out on a Dell Optiplex 990, running an Intel Core i7-2600 (4 × 3.4 GHz) with 2 × 4BGRAM (1333 MHz). The inverse problem took an average of 6.41 min.

References

[1] Aubin J.E. Autoflouresecence of viable cultured mammalian cells. J. Histochem. Cytochem. 1979;27:36–43. [PubMed] [Google Scholar]

[2] Banks H.T., Fitzpatrick B.G. Springer Lecture Notes in Biomath. Vol. 81. Berlin: Springer-Verlag; 1989. Inverse Problems for Distributed Systems: Statistical Tests and ANOVA, Proceedings of the International Symposium on Mathematical Approaches to Environmental and Ecological Problems; pp. 262–273. LCDS/CSS Report 88-16, Brown University, July 1988. [Google Scholar]

[3] Banks H.T., Thompson W.C. A division-dependent compartmental model with cyton and intracellular label dynamics. Int. J. Pure Appl. Math. 2012;77:119–147. CRSC-TR12-12, North Carolina State University, May 2012. [Google Scholar]

[4] Banks H.T., Thompson W.C. Mathematical models of dividing cell populations: Application to CFSE data. J. Math. Model. Nat. Phenom. 2012;7:24–52. CRSC-TR12-10, North Carolina State University, April 2012. [Google Scholar]

[5] Banks H.T., Thompson W.C. Least squares estimation of probability measures in the Prohorov metric framework Center for Research in Scientific Computation Tech Rep, CRSC-TR12-21. North Carolina State University, Raleigh, NC, November, 2012.

[6] Banks H.T., Tran H.T. Mathematical and Experimental Modeling of Physical and Biological Processes. Boca Raton, FL: CRC Press; 2009. [Google Scholar]

[7] Banks H.T., Davidian M., Samuels J., Sutton K.L. Chowell G., Hyman M., Bettencourt L. M. A., Castillo-Chavez C. Chapter 11 in Mathematical and Statistical Estimation Approaches in Epidemiology. Berlin: Springer; 2009. An inverse problem statistical methodology summary; pp. 249–302. CRSC-TR08-01, North Carolina State University, January 2008. [Google Scholar]

[8] Banks H.T., Holm K., Kappel F. Comparison of optimal design methods in inverse problems. Inv. Prob. 2011;27:075002.[PMC free article] [PubMed] [Google Scholar]

[9] Banks H.T., Kenz Z.R., Thompson W.C. An extension of RSS-based model comparison tests for weighted least squares. Int. J. Pure Appl. Math. 2012;79:155–183. CRSC-TR12-18, North Carolina State University, August 2012. [Google Scholar]

[10] Banks H.T., Kenz Z.R., Thompson W.C. A review of selected techniques in inverse problem nonparametric probability distribution estimation. J. Inv. Ill-posed Problems. 2012;20:429–460. Special issue on the occasion of the 80th anniversary of the birthday of M.M. Lavrentiev. [Google Scholar]

[11] Banks H.T., Sutton K.L., Thompson W.C., Bocharov G., Doumic M., Schenkel T., Argilaguet J., Giest S., Peligero C., Meyerhans A. A new model for the estimation of cell proliferation dynamics using CFSE data. J. Immunol. Methods. 2011;373:143–160. doi:10.1016/j.jim.2011.08.014. CRSC-TR11-05, North Carolina State University, Revised July 2011. [PMC free article] [PubMed] [Google Scholar]

[12] Banks H.T., Sutton K.L., Thompson W.C., Bocharov G., Roose D., Schenkel T., Meyerhans A. Estimation of cell proliferation dynamics using CFSE data. Bull. Math. Biol. 2011;70:116–150. CRSC-TR09-17, North Carolina State University, August 2009. [PMC free article] [PubMed] [Google Scholar]

[13] Banks H.T., Thompson W.C., Peligero C., Giest S., Argilaguet J., Meyerhans A. A division-dependent compartmental model for computing cell numbers in CFSE-based lymphocyte proliferation assays. Math. Biosci. Eng. 2012;9:699–736. CRSC-TR12-03, North Carolina State University, January 2012. [PubMed] [Google Scholar]

[14] Bell G., Anderson E. Cell growth and division I. A mathematical model with applications to cell volume distributions in mammalian suspension cultures. Biophys. J. 1967;7:329–351.[PMC free article] [PubMed] [Google Scholar]

[15] Billy F., Clairambault J., Delaunay F., Feillet C., Robert N. Age-structured cell population model to study the influence of growth factors on cell cycle dynamics. Math. Biosci. Eng. 2013;10:1–17. [PubMed] [Google Scholar]

[16] Burnham K.P., Anderson D.R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed. New York: Springer; 2002. [Google Scholar]

[17] Burroughs N.J., van der Merwe P.A. Stochasticity and spatial heterogeneity in T-cell activation. Immunol. Rev. 2007;216:69–80. [PubMed] [Google Scholar]

[18] Choi A., Huffman T., Nardini J., Poag L., Thompson W.C., Banks H.T. Quantifying CFSE label decay in flow cytometry data. Appl. Math. Lett. 2013;26:571–577. doi:10.1016/j.aml.2012.12.010. CRSC-TR12-20, North Carolina State University, December 2012. [PMC free article] [PubMed] [Google Scholar]

[19] Davidian M., Giltinan D.M. Nonlinear Models for Repeated Measurement Data. London: Chapman & Hall; 2000. [Google Scholar]

[20] DeBoer R.J., Ganusov V.V., Milutinovic D., Hodgkin P.D., Perelson A.S. Estimating lymphocyte division and death rates from CFSE data. Bull. Math. Biol. 2006;68:1011–1031. [PubMed] [Google Scholar]

[21] Dowling M.R., Milutinovic D., Hodgkin P.D. Modelling cell lifespan and proliferation: Is likelihood to die or to divide independent of age? J. R. Soc. Interface. 2005;2:517–526.[PMC free article] [PubMed] [Google Scholar]

[22] Duffy K., Subramanian V. On the impact of correlation between collaterally consanguineous cells on lymphocyte population dynamics. J. Math. Biol. 2009;59:255–285. [PubMed] [Google Scholar]

[23] Feller W. An Introduction to Probability Theory and its Applications. I. New York: Wiley; 1971. [Google Scholar]

[24] Gallant A.R. Nonlinear Statistical Models. New York: Wiley; 1987. [Google Scholar]

[25] Gett A.V., Hodgkin P.D. A cellular calculus for signal integration by T cells. Nat. Immunol. 2000;1:239–244. [PubMed] [Google Scholar]

[26] Goh S., Kwon H.W., Choi M.Y., Fortin J.Y. Emergence of skew distributions in controlled growth processes. Phys. Rev. E. 2010;82:061, 115. [PubMed] [Google Scholar]

[27] Hasenauer J., Schittler D., Allgöwer F. Analysis and simulation of division- and label-structured population models: a new tool to analyze proliferation assays. Bull. Math. Biol. 2012;74:2692–2732. [PubMed] [Google Scholar]

[28] Hawkins E.D., Hommel M., Turner M.L., Battye F., Markham J., Hodgkin P.D. Measuring lymphocyte proliferation, survival and differentiation using CFSE time-series data. Nat. Protoc. 2007;2:2057–2067. [PubMed] [Google Scholar]

[29] Hawkins E.D., Turner M.L., Dowling M.R., van Gend C., Hodgkin P.D. A model of immune regulation as a consequence of randomized lymphocyte division and death times. Proc. Natl. Acad. Sci. 2007;104:5032–5037.[PMC free article] [PubMed] [Google Scholar]

[30] Hawkins E.D., Markham J.F., McGuinness L.P., Hodgkin P.D. A single-cell pedigree analysis of alternative stochastic lymphocyte fates. Proc. Natl. Acad. Sci. 2009;106:13457–13462.[PMC free article] [PubMed] [Google Scholar]

[31] Hyrien O., Zand M.S. A mixture model with dependent observations for the analysis of CFSE-labeling experiments. J. Amer. Statist. Assoc. 2008;103:222–239.[Google Scholar]

[32] Hyrien O., Chen R., Zand M.S. An age-dependent branching process model for the analysis of CFSE-labeling experiments. Biol. Direct. 2010;5 Published Online. doi:10.1186/1745-6150-5-41. [PMC free article] [PubMed] [Google Scholar]

[33] Lahiri S.N., Chatterjee A., Maiti T. Normal approximation to the hypergeometric distribution in nonstandard cases and a sub-Gaussian Berry-Esseen theorem. J. Statist. Plann. Inference. 2007;137:3570–3590.[Google Scholar]

[34] Luzyanina T., Roose D., Bocharov G. Distributed parameter identification for a label-structured cell population dynamics model using CFSE histogram time-series data. J. Math. Biol. 2009;59:581–603. [PubMed] [Google Scholar]

[35] Luzyanina T., Roose D., Schenkel T., Sester M., Ehl S., Meyerhans A., Bocharov G. Numerical modelling of label-structured cell population growth using CFSE distribution data. Theor. Biol. Medical Model. 2007;4 published online. doi:10.1186/1742-4682-4-26. [PMC free article] [PubMed] [Google Scholar]

[36] Lyons A.B., Parish C.R. Determination of lymphocyte division by flow cytometry. J. Immunol. Methods. 1994;171:131–137. [PubMed] [Google Scholar]

[37] Lyons A.B., Hasbold J., Hodgkin P.D. Flow cytometric analysis of cell division history using diluation of carboxyfluorescein diacetate succinimidyl ester. a stably integrated fluorescent probe, Methods Cell Biol. 2001;63:375–398. [PubMed] [Google Scholar]

[38] Miao H., Jin X., Perelson A., Wu H. Evaluation of multitype mathemathematical models for CFSE-labeling experimental data. Bull. Math. Biol. 2012;74:300–326. doi:10.1007/s11538-011-9668-y. [PMC free article] [PubMed] [Google Scholar]

[39] Nordon R.E., Ko K.-H., Odell R., Schroeder T. Multi-type branching models to describe cell differentiation programs. J. Theor. Biol. 2011;277:7–18. [PubMed] [Google Scholar]

[40] Quah B.J.C., Parish C.R. New and improved methods for measuring lymphocyte proliferation in vitro and in vivo using CFSE-like fluorescent dyes. J. Immunol. Methods. 2012;379:1–14. [PubMed] [Google Scholar]

[41] Quah B., Warren H., Parish C. Monitoring lymphocyte proliferation in vitro and in vivo with the intracellular fluorescent dye carboxyfluorescein diacetate succinimidyl ester. Nat. Protoc. 2007;2:2049–2056. [PubMed] [Google Scholar]

[42] Schittler D., Hasenauer J., Allgöwer F. A generalized modelfor cell proliferation: Integrating division numbers and label dynamics. pp. 165–168. Proceedings of the Eighth International Workshop on Computational Systems Biology (WCSB 2011), Zurich, Switzerland, June 2011.

[43] Seber G.A.F., Lee A.J. Linear Regression Analysis. Hoboken, NJ: Wiley; 2003. [Google Scholar]

[44] Seber G.A., Wild C.J. Nonlinear Regression. Hoboken, NJ: Wiley; 2003. [Google Scholar]

[45] Subramanian V.G., Duffy K.R., Turner M.L., Hodgkin PD. Determining the expected variability of immune responses using the cyton model. J. Math. Biol. 2008;56:861–892. [PubMed] [Google Scholar]

[46] Terrano D.T., Upreti M., Chambers T.C. Cyclin-dependent kinase 1-mediated Bcl-x_L/Bcl-2 phosphorylation acts as a functional link coupling mitotic arrest and apoptosis. Mol. Cell. Biol. 2010;30:640–656.[PMC free article] [PubMed] [Google Scholar]

[47] Thompson W.C. Partial differential equation modeling of flow Cytometry data from CFSE-based proliferation assays. Ph.D. Dissertation, Dept. of Mathematics, North Carolina State University, Raleigh, December, 2011.

[48] Turner M.L., Hawkins E.D., Hodgkin PD. Quantitative regulation of B cell division destiny by signal strength. J. Immunol. 2008;181:374–382. [PubMed] [Google Scholar]

[49] Wallace P.K., Tario J.D., Jr., Fisher J.L., Wallace S.S., Ernstoff M.S., Muirhead K.A. Tracking antigen-driven responses by flow cytometry: monitoring proliferation by dye dilution. Cytom. A. 2008;73:1019–1034. [PubMed] [Google Scholar]

[50] Wellard C., Markham J., Hawkins E.D., Hodgkin PD. The effect of correlations on the population dynamics of lymphocytes. J. Theor. Biol. 2010;264:443–449. [PubMed] [Google Scholar]

[51] Witkowski J.M. Advanced application of CFSE for cellular tracking. Curr. Protoc. Cytom. 2008;44:9.25.1–9.25.8. [PubMed] [Google Scholar]