Thu, 14 April 2005
Wed, 15 September 2004
StoJ, an implementation of Stochastic Join
I used OCaml as the implementation language — the parsing tools are quite smooth and good to work with — and gnuplot to graph the output of each simulation run. This is my first OCaml project, and I'm finding it quite pleasant despite the syntax.
Here's an example model in StoJ. I duplicated the rates and the overall design of the model from the Clock.zip code archive at the BioSPI website. This model is discussed in Aviv Regev's paper "Representation and simulation of molecular pathways in the stochastic π-calculus." The implementation of the model below gives the same output graphs as Regev's model gives, which is a nice confirmation of the model and of the StoJ system itself.
// Circadian clock model // Based on Regev et al new DNA_a, DNA_r, !MRNA_a, !MRNA_r, !protein_a, !protein_r, DNA_a_promoted, DNA_r_promoted, !complex_ar . ( // Start state. DNA_a<> | DNA_r<> // Basal rate transcription. | rec L. DNA_a() ->[4] (DNA_a<> | MRNA_a<> | L) | rec L. DNA_r() ->[0.001] (DNA_r<> | MRNA_r<> | L) // MRNA degradation. | rec L. MRNA_a() ->[1.0] (L) | rec L. MRNA_r() ->[0.02] (L) // Translation. | rec L. MRNA_a() ->[1.0] (MRNA_a<> | protein_a<> | L) | rec L. MRNA_r() ->[0.1] (MRNA_r<> | protein_r<> | L) // Protein degradation. | rec L. protein_a() ->[0.1] (L) | rec L. protein_r() ->[0.01] (L) // A/R Complex formation. | rec L. protein_a() & protein_r() ->[100.0] (complex_ar<> | L) // A/R Complex degradation. | rec L. complex_ar() ->[0.1] (protein_r<> | L) | rec L. complex_ar() ->[0.01] (protein_a<> | L) // Activator binding/unbinding to A promoter, and enhanced transcription | rec L. protein_a() & DNA_a() ->[10] (DNA_a_promoted<> | L) | rec L. DNA_a_promoted() ->[10] (protein_a<> | DNA_a<> | L) | rec L. DNA_a_promoted() ->[40] (DNA_a_promoted<> | MRNA_a<> | L) // Activator binding/unbinding to R promoter, and enhanced transcription | rec L. protein_a() & DNA_r() ->[10] (DNA_r_promoted<> | L) | rec L. DNA_r_promoted() ->[100] (protein_a<> | DNA_r<> | L) | rec L. DNA_r_promoted() ->[2] (DNA_r_promoted<> | MRNA_r<> | L) )
Here are a couple of graphs produced by the simulator from the program above — first a graph of reagent concentrations varying with time, and then a graph of Repressor (R) molecule concentration against Activator (A) molecule concentration, showing the bistable nature of the whole system. If you take a look at the graphs in Regev's paper, you'll see that the spikes in reagent concentration occur in the same places with the same frequency in our model as in Regev's.
I also implemented a simple Java applet/application that uses a (human-assisted) mass-spring layout to draw a graph of the (statically apparent) dataflow in the StoJ program it takes as input. It still needs a bit of refining — the graphs/programs tend to be quite densely connected in places, and not only is mass-spring layout poor at densely-connected graphs, especially with cycles, but it doesn't identify any potential clustering in the graph either. I get the feeling clustering (subgraphs) will be important for visualising dataflow in larger models.
Here's a simple chemical equilibrium model in StoJ:
// A simple H + Cl <--> HCl model, // similar to that in Andrew Phillips' slides // with numbers taken from his slides. // See http://www.doc.ic.ac.uk/~anp/spim/Slides.pdf // page 16. new !h, !cl, !hcl . ( rec ASSOC. h() & cl() ->[100] (hcl<> | ASSOC) | rec DISSOC. hcl() ->[10] (h<> | cl<> | DISSOC) | h<> * 100 | cl<> * 100 )
And here's the display generated by the Java program visualisation application:
As you can imagine, for larger models the display gets cluttered quite quickly. A note on implementation though - I found Java2D to be surprisingly pleasant to work with. The built-in support for double-buffering of JComponents along with the various classes of Shape helped me quickly develop the application.
Tue, 14 September 2004
Stochastic Join?
Here's a comparison of the two styles:
| Biological Systems Element | Standard Model | "Alternative" Model |
|---|---|---|
| Molecule | Process | Message |
| Molecule species | - | Channel |
| Interaction Capability | Channel | Process |
| Interaction | Communication | Movement (and computation) |
| Modification | State and/or channel change | State and/or channel change |
| Reaction Rate | attached to channel | attached to prefix |
Join and Rates
We've been thinking about using a stochastic π-variant with a join operator. Having join available allows the expression of state machines to be much more direct than in a joinless stochastic π. We started modelling the lambda-switch to fit the model from this BioConcur paper, and saw that it seems more natural to attach the reaction rate to each individual join rather than to each channel as BioSPI does.
Bisimulation
The standard model has comm meaning interaction; we have comm meaning movement in some sense: atoms (mass) move from one locus (molecular species) to another, or a molecule undergoes conformational change, which is a kind of motion from one state or class or species to another. This means that bisimulation in the alternative model has to do with movement rather than interaction — which seems more fine-grained, but perhaps less natural.
Uncertainty in Biological Simulation
We also thought about a few properties biological simulation systems ought to have. One important one is a notion of uncertainty. Few of the current systems are honest about their limits — only one of the papers at BioConcur displayed error bars on any of its graphs! The systems in use all generate probable trajectories through state-space, and this is I assume implicitly to be understood by the reader of the various program outputs, but it would be useful to have the uncertainty in the results reflected in the various displays that these systems produce.
Besides the uncertainties that come from stochastic modelling and Monte-Carlo-type approaches in general, all the inputs to the systems (eg. rates) are purely empirical data, with associated uncertainties, and these uncertainties are not propagated through the system. Not only do the outputs not show the variation in results due to random sampling, but the outputs do not explore the limits of the accuracy of the inputs, either.
Reagent concentration vs. Gillespie vs. Modelling reagent state
The alternative model deals well with high reagent concentrations, since a high concentration is modelled as a large number of messages outstanding on a channel. The Gillespie scheduling algorithms are O(n)-time in the number of enabled reactions, not in the concentrations of the reagents. The standard model suffers here because each molecule is modelled as a process — that is, as an enabled reaction. In a naïve implementation, this makes the scheduling complexity O(n)-time (or at best, O(log(n)) time if the Gibson-Bruck algorithm is used) in the number of molecules in the system.
However, modelling molecules as processes seems to provide one advantage that doesn't easily carry over into the alternative model: molecule-processes can involve internal state that any interacting molecule need not be concerned with. For example, enzyme-private channels with different rates associated with them might be used to model the different reactivities of phosphorylated and unphosphorylated states of the enzyme.
Perhaps a hybrid of the two models will be workable; after all, even in the alternative model, you have the freedom of the restriction operator available for implementing complex behaviours. The alternative model allows some common patterns to be expressed more simply than the standard joinless model though, it seems.
We're still thinking about ways of keeping the molecules-as-messages approach while allowing reactions to specialise on molecule-message state or to ignore it, as appropriate.
Compartments
BioSPI 3.0 took the standard stochastic-π implementation and added BioAmbients to it (see the 2003 paper by Regev, Panina, Silverman, Cardelli and Shapiro). Any stochastic-join will probably benefit from being able to model compartments similarly.
The simple, ambientless approach suffers some of the same problems as SBML is facing with regard to recognising the similarity of species across compartments. Some way of factoring out reactions/processes/etc common across all compartments should be useful.