4.2. Using the wc_lang package to define whole-cell models¶
This tutorial teaches you how to use the wc_lang package to access and create whole-cell models.
wc_lang provides a foundation for defining, writing, reading and manipulating biochemical models composed of species,
reactions, compartments and other parts of a biochemical system.
It can be used to define models of entire cells, or models of smaller biochemical systems.
wc_lang contains methods to read and write models from two types of files –
Excel spreadsheet workbooks and sets of delimited files. It also includes methods that
analyze or transform models – e.g., methods that validate, compare, and normalize them.
wc_lang depends heavily on the obj_tables package which defines a generic language for declaring
interrelated Python objects, converting them to and from data records,
transferring the records to and from files, and validating their values.
obj_tables is essentially an object-relational mapping (ORM) system that stores data in files
instead of databases.
However, users of wc_lang do not need to use obj_tables directly.
4.2.1. Semantics of a wc_lang biochemical Model¶
A wc_lang biochemical model represents a biochemical system as Species (we indicate
classes in wc_lang by capitalized names in fixed-width text) that get transformed by reactions.
A SpeciesType describes a biochemical molecule, including its name (following Python
convention, attributes
of classes are lowercase names), structure, molecular_weight,
charge and other properties.
The concentration of a SpeciesType in a compartment is stored by a Species instance
that references instances of SpeciesType, Compartment, and Concentration, which provide
the Species’ location and concentration.
A compartment may represent an organelle or a conceptual region of a model.
Adjacency relationships among compartments are implied by reactions that transfer
species among them, but physical relationships between compartments or their 3D positions
are not represented.
The data in
a wc_lang model is organized in a highly-interconnected graph of related Python objects, each of
which is an obj_tables.core.Model instance.
For example, a Species instance contains reaction_participants,
which references each Reaction in which the Species participates.
The graph contains many convenience relationships like this, which make it easy to
follow the relationships between obj_tables.core.Model instances anywhere in a wc_lang model.
A wc_lang model also supports some metadata.
Named Parameter entities store arbitrary values, such as input parameters.
Published data sources used by a model should be recorded in Reference entities,
or in a DatabaseReference objects that identify a biological or chemical database.
wc_lang models are typically used to describe the initial state of a model – a wc_lang
description lacks any notion of time.
More generally, a comprehensive wc_lang model should provide a complete description of a model,
including its data sources and comments about model components.
4.2.2. wc_lang Classes Used to Define biochemical Models¶
This subsection enumerates the obj_tables.core.Model classes that store data in wc_lang models.
When using an existing model the attributes of these classes are frequently accessed, although their definitions are not typically imported. However, they must be imported when they are being instantiated programmatically.
Many of these classes implement the methods deserialize() and serialize().
deserialize() parses an object’s string representation – as would be stored in a text file or spreadsheet
representation of a biochemical model – into one or more obj_tables.core.Model instances.
serialize() performs the reverse, converting a wc_lang class instance into a string representation.
Thus, the deserialize() methods are used when reading models from files and serialize()
is used when writing a model to disk.
deserialize() returns an error when a string representation cannot be parsed into a
Python object.
4.2.2.1. Static Enumerations¶
Static attributes of these classes are used as attributes of wc_lang model components.
TaxonRank- The names of biological taxonomic ranks: domain, kingdom, phylum, etc.
SubmodelAlgorithm- The names of algorithms that can integrate submodels: dfba, ode, and ssa.
SpeciesTypeType- Types of species types: metabolite, protein, dna, rna, and pseudo_species.
RateLawDirection- The direction of a reaction rate law: backward or forward.
ReferenceType- Reference types, such as article, book, online, proceedings, etc.
4.2.2.2. wc_lang Model Components¶
These classes are instantiated as components of a wc_lang model.
When a model is stored on disk all the instances of each class are
usually stored in a separate table, either an Excel workbook’s worksheet or delimiter-separated file.
In the former case, the model is stored in one workbook, while in the latter it is stored in a set of files.
Taxon- The taxonomic rank of a model.
Submodel- A part of a whole-cell model which is to be simulated with a particular
algorithmfrom the enumerationSubmodelAlgorithm. EachSubmodelis associated with aCompartmentthat contains theSpeciesit models, and all the reactions that transform them. ASubmodelmay also have parameters. Compartment- A named physical container in the biochemical system being modeled.
It could represent an organelle, a cell’s cytoplasm, or another physical or conceptual structure.
It includes an
initial_volumein liters, and references to the initial concentrations of theSpeciesit contains. A compartment can have a semi-permeable membrane, which is modeled by reactions that transform reactant species in the compartment to product species in another compartment. These are called membrane-transfer reactions. A membrane-transfer reaction that moves species from compartment x to compartment y implies that x and y are adjacent. SpeciesType- The biochemical type of a species. It contains the type’s
name,structure– which is represented in InChI for metabolites and as sequences for DNA, RNA, and proteins,empirical_formula,molecular_weight, andcharge. A species’typeis drawn from the attributes ofSpeciesTypeType. Species- A particular
SpeciesTypecontained in a particularCompartmentat a particular concentration. Concentration- The molar concentration (M) of a species.
Reaction- A biochemical reaction. Each
Reactionbelongs to onesubmodel. It consists of a list of the species that participate in the reaction, stored as a list of references toReactionParticipantinstances inparticipants. A reaction that’s simulated by a dynamic algorithm, such as an ODE system or SSA, must have a forward rate law. A Boolean indicates whether the reaction is thermodynamicallyreversible. IfreversibleisTrue, then the reaction must also have a backward rate law. Rate laws are stored in therate_lawslist, and their directions are drawn from the attributes ofRateLawDirection.
ReactionParticipantReactionParticipantcombines aSpeciesand its stoichiometric reaction coefficient. Coefficients are negative for reactants and positive for products.RateLaw- A rate law contains a textual
equationwhich stores the mathematical expression of the rate law. It contains thedirectionof the rate law, encoded with aRateLawDirectionattribute.k_catandk_mattributes for a Michaelis–Menten kinetics model are provided, but their use isn’t required. RateLawEquationA rate law equation’s
expressioncontains a textual, mathematical expression of the rate law. A rate law can be used by more than oneReaction. The expression will be transcoded into a valid Python expression, stored in thetranscodedattribute, and evaluated as a Python expression by a simulator. This evaluation must produce a number.The expression is constructed from species names, compartment names, stoichiometric reaction coefficients, k_cat and k_m, and Python functions and mathematical operators.
SpeciesTypeandCompartmentnames must be valid Python identifiers, and the entire expression must be a valid Python expression. A species composed of aSpeciesTypenamedspecies_xlocated in aCompartmentnamedcis writtenspecies_x[c]. When a rate law equation is evaluated during the simulation of a model the expressionspecies_x[c]is interpreted as the current concentration ofspecies_xin compartmentc.
Parameter- A
Parameterholds an arbitrary floating pointvalue. It is named, associated with a a set ofsubmodels, and should include a modifier indicating the value’sunits.
4.2.2.3. wc_lang Model Data Sources¶
These classes record the sources of a model’s data.
Reference- A
Referenceholds a reference to a publication that contains data used in the model. DatabaseReference- A
Referencedescribes a biological or chemical database that provided data for the model.
4.2.3. Using wc_lang¶
The following tutorial shows several ways to use wc_lang, including
reading a model from disk, defining a model programmatically and writing it to disk,
and using these models:
Install the required software for the tutorial:
- Python
- Pip
Install the tutorial and the whole-cell packages that it uses:
git clone https://github.com/KarrLab/intro_to_wc_modeling.git pip install --upgrade \ ipython \ git+https://github.com/KarrLab/wc_lang.git#egg=wc_lang \ git+https://github.com/KarrLab/wc_utils.git#egg=wc_utils
Change to the directory for this tutorial:
cd intro_to_wc_modeling/intro_to_wc_modeling/wc_modeling/wc_lang_tutorialOpen an interactive python interpreter:
ipython
Import the
osandwc_lang.iomodules:import os import wc_lang.io
Read and write models in Excel and delimited files
wc_langcan read and write models from specially formatted Excel workbooks in which each worksheet represents one of the model component classes above, each row represents a class instance, each column represents an instance attribute, each cell represents the value of an attribute of an instance, and string identifiers are used to indicate relationships among objects.wc_langcan also read and write models from a specially formatted sets of delimiter-separated files.In addition to defining a model, files that define models should contain all of the annotation needed to understand the biological semantic meaning of the model. Ideally, this should include:
- NCBI Taxonomy ID for the taxon
- Gene Ontology (GO) annotations for each submodel
- The structure of each species: InChI for small molecules; sequences for polymers
- Where possible, ChEBI ids for each small molecule
- Where possible, ids for each gene, transcript, and protein
- Where possible, EC numbers or KEGG ids for each reaction
- Cell Component Ontology (CCO) annotations for each compartment
- Systems Biology Ontology (SBO) annotations for each parameter
- The citations which support each model decision
- PubMed id, DOI, ISBN, or URL for each citation
This example illustrates how to read a model from an Excel file:
model = wc_lang.io.Reader().run(model_filename)[wc_lang.Model][0]
(You may ignore a
UserWarninggenerated by these commands.)If a model file is invalid (for example, it defines two species types with the same id, or a concentration that refers to a species type that is not defined), this operation will raise an exception which contains a list of all of the errors in the model definition.
To name a model stored in a set of delimiter-separated files,
wc_languses a filename glob pattern that matches the files in the set. The supported delimiters are commas in .csv files and tabs in .tsv files. These files use the same format as the Excel workbook format, except that each worksheet is stored as a separate file. Excel workbooks are easier to read and edit interactively, but changes to delimiter-separated files can be tracked in code version control systems such as Git.This example illustrates how to write a model to a set of .tsv files:
# 'examples_dir' is a directory model_filename_pattern = os.path.join(examples_dir, 'example_model-*.tsv') wc_lang.io.Writer().run(model_filename_pattern, model, data_repo_metadata=False)
The glob pattern in
model_filename_patternmatches these files:example_model-Biomass components.tsv example_model-Biomass reactions.tsv example_model-Compartments.tsv example_model-Concentrations.tsv example_model-database references.tsv example_model-Model.tsv example_model-Parameters.tsv example_model-Rate laws.tsv example_model-Reactions.tsv example_model-References.tsv example_model-Species types.tsv example_model-Submodels.tsv example_model-Taxon.tsv
in
examples_dir, each of which contains a component of the model.Continuing the previous example, this command reads this set of .tsv files into a model:
model_from_tsv = wc_lang.io.Reader().run(model_filename_pattern)[wc_lang.Model][0]
csv files can be used similarly.
Access properties of the model
A
wc_langmodel (an instance ofwc_lang.core.Model) has multiple attributes:model.id # the model's unique identifier model.name # its human readable name model.version # its version number model.taxon # the taxon of the organism being modeled model.submodels # a list of the model's submodels model.compartments # " " " the model's compartments model.species_types # " " " its species types model.parameters # " " " its parameters model.references # " " " publication sources for the model instance model.identifiers # " " " identifiers in external namespaces for the model instance
These provide access to the parts of a
wc_langmodel that are directly referenced by a model instance.wc_langalso provides some convenience methods that get all of the elements of a specific type which are part of a model. Each of these methods returns a list of the instances of requested type.model.get_compartments() model.get_species_types() model.get_submodels() model.get_species() model.get_distribution_init_concentrations() model.get_reactions() model.get_dfba_obj_reactions() model.get_rate_laws() model.get_parameters() model.get_references()
For example,
get_reactions()returns a list of all of the reactions in a model’s submodels. As illustrated below, this can be used to obtain the id of each reaction and the name of its submodel:reaction_identification = [] for reaction in model.get_reactions(): reaction_identification.append('submodel name: {}, reaction id: {}'.format( reaction.submodel.name, reaction.id))
Programmatically build a new model and edit its model properties
You can also use the classes and methods in
wc_lang.coreto programmatically build and edit models. While modelers typically will not create models programmatically, creating model components in this way gives you a feeling for how models are built and will .The following illustrates how to program a trivial model with 1 compartment, 5 species types and one reaction:
# create a model with one submodel and one compartment prog_model = wc_lang.Model(id='programmatic_model', name='Programmatic model') submodel = wc_lang.Submodel(id='submodel_1', model=prog_model) cytosol = wc_lang.Compartment(id='c', name='Cytosol') # create 5 species types atp = wc_lang.SpeciesType(id='atp', name='ATP', model=prog_model) adp = wc_lang.SpeciesType(id='adp', name='ADP', model=prog_model) pi = wc_lang.SpeciesType(id='pi', name='Pi', model=prog_model) h2o = wc_lang.SpeciesType(id='h2o', name='H2O', model=prog_model) h = wc_lang.SpeciesType(id='h', name='H+', model=prog_model) # create an 'ATP hydrolysis' reaction that uses these species types atp_hydrolysis = wc_lang.Reaction(id='atp_hydrolysis', name='ATP hydrolysis') # add two reactants, which have negative stoichiometric coefficients atp_hydrolysis.participants.create( species=wc_lang.Species(id='atp[c]', species_type=atp, compartment=cytosol), coefficient=-1) atp_hydrolysis.participants.create( species=wc_lang.Species(id='h2o[c]', species_type=h2o, compartment=cytosol), coefficient=-1) # add three products, with positive stoichiometric coefficients atp_hydrolysis.participants.create( species=wc_lang.Species(id='adp[c]', species_type=adp, compartment=cytosol), coefficient=1) atp_hydrolysis.participants.create( species=wc_lang.Species(id='pi[c]', species_type=pi, compartment=cytosol), coefficient=1) atp_hydrolysis.participants.create( species=wc_lang.Species(id='h[c]', species_type=h, compartment=cytosol), coefficient=1)
In this example
wc_lang.core.SpeciesType(id='atp', name='ATP', model=prog_model)instantiates aSpeciesTypeinstance with two string attributes and amodelattribute that references an existing model. In addition, this expression adds the newSpeciesTypeto the model’s species types, thereby showing howobj_tables’s underlying functionality automatically creates bi-directional references that make it easy to build and navigatewc_langmodels, and making this assertion hold:assert(atp in prog_model.get_species_types())
The example above illustrates another way to create and connect model components. Consider the expression:
atp_hydrolysis.participants.create( species=wc_lang.core.Species(species_type=atp, compartment=cytosol), coefficient=-1)
participantsis a Reaction instance attribute that stores a list of ReactionParticipant objects. In this expressioncreatetakes keyword arguments for the parameters used to instantiate aReactionParticipant, instantiates aReactionParticipant, and appends it to the list inatp_hydrolysis.participants. These assertions hold after the 5 participants are added to the ATP hydrolysis reaction:# 5 participants were added to the reaction assert(len(atp_hydrolysis.participants) == 5) first_reaction_participant = atp_hydrolysis.participants[0] assert(first_reaction_participant.reactions[0] is atp_hydrolysis)
In general, the
createmethod can be used to add model components to lists of relatedwc_lang.BaseModelobjects.createtakes keyword arguments and uses them to initialize the attributes of the component created. Thus, ifobjhas an attributeattrthat stores a list of references to components of typeX, this expression will create an instance ofXand append it to the list:obj.attr.create(**kwargs)
This simplifies model construction by avoiding creation of unnecessary identifiers for these components.
Similar code can be used to create any part of a model. All
wc_langobjects that are subclassed fromwc_lang.BaseModel(an alias forobj_tables.core.Model) can be instantiated in the normal fashion, as shown forModel,Submodel,Compartment,SpeciesTypeandReactionabove. Each subclass ofwc_lang.BaseModelcontains aMetaattribute that is a class which stores meta information about the subclass. The attributes that can be initialized when awc_lang.BaseModelclass is instantiated can be obtained from the class’Metaattribute, which is a dictionary that maps from attribute name to attribute instance:wc_lang.Model.Meta.attributes.keys() wc_lang.Submodel.Meta.attributes.keys() wc_lang.SpeciesType.Meta.attributes.keys() wc_lang.Compartment.Meta.attributes.keys()
For example,
Reactionhas the following attributes inwc_lang.core.Reaction.Meta.attributes.keys():['comments', 'id', 'max_flux', 'min_flux', 'name', 'participants', 'references', 'reversible', 'submodel']
These attributes can also be set programmatically:
atp_hydrolysis.comments = 'example comments' atp_hydrolysis.reversible = False
Viewing Models and their attributes
All
wc_lang.BaseModelinstances can be viewed withpprint(), which outputs an indented representation that shows the attributes of a model, and indents and outputs connected models. To constrain the size of its outputpprint()outputs the graph of interconnected models to a depth ofmax_depth, which defaults to 3. Model nodes at depthmax_depth+1are represented by<class name>: ..., while deeper models are not traversed. And models re-encountered bypprint()are elided by<attribute name>: --. For example, after creating the reactionatp_hydrolysisabove this expressionatp_hydrolysis.participants[0].pprint(max_depth=1)
creates this output:
ReactionParticipant: species: Species: species_type: SpeciesType: ... compartment: Compartment: ... concentration: None rate_law_equations: reaction_participants: coefficient: -1 reactions: Reaction: id: atp_hydrolysis name: ATP hydrolysis submodel: None participants: ReactionParticipant: ... ReactionParticipant: ... ReactionParticipant: ... ReactionParticipant: ... reversible: False min_flux: nan max_flux: nan comments: example comments references: database_references: objective_functions: rate_laws:
This shows that the first
ReactionParticipantinatp_hydrolysishas the attributes species, coefficient, and reactions, that the coefficient is -1, and that reactions is a list with one element which is theatp_hydrolysisreaction itself.Validating a programmatically generated Model
The
wc_lang.core.Model.validatemethod determines whether a model is valid. If the model is invalid validate return a list of all of the model’s errors. It performs the following checks:Check that only one model and taxon are defined
Check that each submodel, compartment, species type, reaction, and reference is defined only once
Check that each the species type and compartment referenced in each concentration and reaction exist
Check that values of the correct types are provided for each attribute
wc_lang.core.Compartment.initial_volume: floatwc_lang.core.Concentration.value: floatwc_lang.core.Parameter.value: floatwc_lang.core.RateLaw.k_cat: floatwc_lang.core.RateLaw.k_m: floatwc_lang.core.Reaction.reversible: boolwc_lang.core.ReactionParticipant.coefficient: floatwc_lang.core.Reference.year: integerwc_lang.core.SpeciesType.charge: integerwc_lang.core.SpeciesType.molecular_weight: float
Check that valid values are provided for each enumerated attribute
wc_lang.core.RateLaw.directionwc_lang.core.Reference.typewc_lang.core.SpeciesType.typewc_lang.core.Submodel.algorithmwc_lang.core.Taxon.rank
This example illustrates how to validate
prog_model:prog_model.validate()
Compare and difference Models
wc_langprovides methods that determine if two models are semantically equal and report any semantic differences between two models. Theis_equalmethod determines if two models are semantically equal (the two models recursively have the same attribute values, ignoring the order of the attributes which has no semantic meaning). The following code compares the semantic equality ofmodelandmodel_from_tsv. Sincemodel_from_tsvwas generated by writingmodelto tsv files,is_equalshould returnTrue:assert(model.is_equal(model_from_tsv) == True)
The
differencemethod produces a textual description of the differences between two models. The following code excerpt prints the differences betweenmodelandmodel_from_tsv. Since they are equal, the differences should be the empty string:assert(model.difference(model_from_tsv) == '')
Normalize
modelinto a reproducible order to facilitate reproducible numerical simulationsThe attribute order has no semantic meaning in
wc_lang. However, numerical simulation results derived from models described inwc_langcan be sensitive to the attribute order. To facilitate reproducible simulation results,wc_langprovides anormalizeto sort models into a reproducible order.The following code excerpt will normalize
modelinto a reproducible order:model.normalize()
Please see http://code.karrlab.org for documentation of the entire
wc_langAPI.