GRN2SBML: Automated encoding and annotation of inferred gene regulatory networks complying with SBML

Tutorial

Sebastian Vlaic1, Bianca Hoffmann1, Peter Kupfer1, Michael Weber1, Andreas Dräger2
1 Leibnitz Institute for Natural Product Research  and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a,  D-07745 Jena, Germany
2 Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Sand 1, 72076 Tübingen, Germany

1. Introduction

GRN2SBML [1] is a tool that aims to ease the encoding of gene regulatory networks (GRN) in the systems biology markup language   (SBML) [2] and therefore, aims to increase their accessibility for  researchers and users. Two major problems arise in the task of encoding.  Firstly, producing valid SBML code that adequately represents the  features of GRNs and preserves the original mathematical framework in  the best possible way. Secondly, adequate annotation of the model, its  components and the role that each component has. GRN2SBML is designed to  tackle these problems in a standardized way. Key features of GRN2SBML  are:            

  • Solely based on Java.
  • Uses JSBML [3] to produce SBML code.
  • Uses the BioMart and MIRIAM SOAP-based APIs to access annotation information.
  • R interface to use GRN2SBML directly out of R.
  • Modularized structure to add parsers for new, currently unsupported inference algorithms.

Reference:

Vlaic S, Hoffmann B, Kupfer P, Weber M, Dräger A (2013) GRN2SBML: automated encoding and annotation of inferred gene regulatory networks complying with SBML. Bioinformatics 29(17), 2216-2217. [Link]

2. Tutorial

This tutorial is a supplement to the corresponding publication

Vlaic S, Hoffmann B, Kupfer P, Weber M, Dräger A (2013) GRN2SBML: automated encoding and annotation of inferred gene regulatory networks complying with SBML. Bioinformatics 29(17), 2216-2217. [Link]

and is part of the additional material. Both, the additional material and the data needed for this tutorial can be downloaded using the link below. Please note that GRN2SBML was developed using the Oracle Java SE Runtime Environment. If you are running other JRE's the connection to BioMart, MIRIAM and SBMLsimulator will probably not work! If you encounter problems, please switch to this JRE before reporting bugs!

This program is free software; you can redistribute it and/or modify it under the terms of theGNU General Public License as published by the Free Software Foundation, Version 3 This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

I agree with the download to the license:grn2sbml bundle

The following sections will be used to give an example for each of the currently supported algorithms (april 2013). These algorithms are ExTILAR[4], NetGenerator[5] and TILAR[6]. The ExTILAR section will be used to introduce general features of GRN2SBML and to explain some details regarding its GUI and the R-package. The NetGenerator section will demonstrate the use of features such as multiple stimuli in a network and the TILAR section will show how GRN2SBML can be used without the GUI and from the command line. Simulations of the network will be performed using SBMLsimulator [http://www.cogsys.cs.uni-tuebingen.de/software/SBMLsimulator/] and visualization will be demonstrated using Cytoscape[7]. In order to use Cytoscape, you will have to install the CySBML-plugin[8] via the plugin-manager.

3. ExTILAR

For the first example, we will use the transcription factor  network (TFN) that was inferred by Vlaic et al. in the original ExTILAR  publication[4]. This TFN models the adaption of murine hepatocytes to  the change of culture medium after a period of starvation. The gene  expression profiles of 22 differentially expressed transcription factors  together with the mean expression profiles of 6 clusters derived by  clustering of the remaining differentially expressed genes were used to  identify relations between these TFs and therefore, identify new  hypothesis about the biological behavior of the cells in response to  this stimulus. The data that was provided with the corresponding  publication is now used as an input for GRN2SBML to encode the derived  network in SBML.

At first, the R-package and the data from the publication is  loaded. In order for this example to work, you have to set the working directory using the setwd-command.

setwd(....)
set.seed(1)
library('GRN2SBML')
load('examples/ExTILAR/VlaicEtAl2012.RData')

ExTILAR provides the possibility to use more then one  perturbation in a single model. Therefor, the perturbations encoded in  MathML together with their corresponding name are stored in a matrix.

ifu = matrix(NA, ncol=2, nrow=1)
ifu[1,1] = 'Input1-1'
ifu[1,2] = 'piecewise(exp(-(1/6)*time), time>0, 0)'

Subsequently, the scaled expression data that was used  for the inference will be extracted and stored. This way, the  measurements can be imported in tools such as SBMLsimulator for further  analysis. Simply drag and drop the single .csv files into the  SBMLsimulator GUI.

write.table(file="examples/ExTILAR/Measurement-1.csv", x=cbind("time"=as.numeric(colnames(  result.fs$ci$originalData[[1]])), t(result.fs$ci$mat[[1]][,colnames(result.fs$ci$originalData[[1]])]))[,-30],  row.names=FALSE, col.names=TRUE, sep=" ", quote=FALSE)
write.table(file="examples/ExTILAR/Measurement-2.csv", x=cbind("time"=as.numeric(colnames(  result.fs$ci$originalData[[1]])), t(result.fs$ci$mat[[2]][,colnames(result.fs$ci$originalData[[1]])]))[,-30],  row.names=FALSE, col.names=TRUE, sep=" ", quote=FALSE)
write.table(file="examples/ExTILAR/Measurement-3.csv", x=cbind("time"=as.numeric(colnames(  result.fs$ci$originalData[[1]])), t(result.fs$ci$mat[[3]][,colnames(result.fs$ci$originalData[[1]])]))[,-30],  row.names=FALSE, col.names=TRUE, sep=" ", quote=FALSE)

This matrix, together with the ExTILAR inferred network are  passed to the function that creates the ExTILAR specific input for  GRN2SBML. Subsequently, GRN2SBML is started and the network can be  annotated (figures 1 to 5). Please note that you have to substitute the  < at > correspondingly.

o = grn2sbml.create_input.ExTILAR(extilar.result=result.fs$res$res_pp, extilar.input=ifu,  modelName='Adaption of Murine Hepatocytes to Cultivation Medium', authorGivenName=c('Sebastian'),  authorFamilyName=c('Vlaic'), authorEmail=c('Sebastian.Vlaic< at >hki-jena.de'),  authorOrganization=c('Hans-Knoell-Institut Jena'), pubMedID='23190768',  pubMedIDQualifier='BQM_IS', gui=TRUE, modelTimeUnit='Hours', createLayout=TRUE)
grn2sbml.start(o)

Figure 1: The GUI of GRN2SBML is devided into five panels that process the different aspects of GRNs. The Model  tab collects information about the network and the underlying model.  Besides the loading of the network itself the corresponding algorithm  and time scale can be specified. If the algorithm-specific parser  implements multiple ways of encoding (i.e., by supporting different SBML  levels), the model type can be selected by the user. Depending on  whether or not the SBML layout checkbox is selected, layout information  is included in the SBML model based on the information given in the SBML  layout extension panel.

Figure 2: The Annotation - Network panel  collects information about the network itself. The user can include  multiple associated BioModels- and PubMed-IDs and define their relation  to the network. Multiple creators can be added to the model but to  successfully encode the network in SBML, at least one author has to be  named. A description of the qualifiers can be found on the BioModels  webpage BioModels webpage qualifiers.

Figure 3: The Annotation - Species panel  collects information about the species of the network. Here, organism  specific annotation can be added by querying the corresponding marts of  the BioMart Central Portal or by loading the information from user  specified text files. The obtained annotation is then displayed in the  last column of the annotation table. By specifying the resource using  either MIRIAM-Registry unique URIs or by manual supplying the annotation  source, the information is added to the network by pressing the Add annotation-Button. For this example, Select the mart Ensembl Genes 69 (WTSI, UK) (gene_ensembl_config), the dataset Mus musculus genes (NCBIM37), the source identifier MGI symbol and the target identifier NCBI Gene.

Figure 4: The Annotation - Relations panel collects information about the relations of the network. Just like in the Annotation-Species-panel,  the user can load annotation such as PubMed-IDs to reference knowledge  about the relations in the network. This information is loaded from text  files or manually by editing the last column of the table. In the  latter case, the annotation-resource can be specified manually or by  selecting the corresponding URI from the MIRIAM-Registry.

Figure 5: The SBML layout - extension panel  collects information necessary to create additional layouts of the model  using the SBML layout extension. In this tab, the default dimensions  for the available layout-components can be specified as well as the  position of the single species in the network. If the X- and  Y-coordinate of the position is set to -1 the corresponding species will  be placed automatically in the layout in a grid-like fashion. However,  there is currently no advanced automatic layout procedure implemented  since graphical editing is not the focus of GRN2SBML.

Once the network is sufficiently annotated it can be  exported to SBML and stored in an XML-file. Additionally, the latest  version of the SBMLsimulator can be started from the internet via JavaTM  web start automatically importing the encoded model. The network  structure can be easily visualized using Cytoscape with the CySBML  plugin, which is also based on the JSBML library. Figure 6 outlines the  visualized network using Cytoscape. Due to the basic model annotation  using SBO-terms visualization-styles can be applied that automatically  shape and color the network components according to their properties.

Figure 6: Visualization of the ExTILAR inferred  transcription factor network using Cytoscape. Due to the adequate  annotation of the species and interactions in the network with the  corresponding SBO-terms automatic layouts can be applied that allow  users to get an overview of the networks.

By selecting the single nodes in the network, species and  relation annotation that was added by the user becomes accessible  accurately describing the single components (Figure 7). This helps users  to understand and interpret the network in the intended way.

Figure 7: Annotation such as the corresponding gene  identifiers, associated GO-terms or PubMed-IDs to reference reactions  increase the semantic content of the model and aid the viewer in  understanding and the interpretation of the network model.

Tools such as SBMLsimulator can be used to simulate the  dynamics of the network, measure the deviation error of the simulated  data to the experimental data or perform post-processing such as  parameter fitting. Figure 8 outlines the simulated expression profiles  of the genes Egr1, Irf1 and Tsc22d1.

Figure 8: Simulated expression profiles of Egr1, Irf1  and Tsc22d1 obtained using SBMLsimulator. Also visualized are the  boxplots of the corresponding measured data that was used for the  network inference.

4. NetGenerator

The second example illustrates the encoding of the GRN that  was published by Weber et al in the NetGenerator V2.0 publication [5].  The network describes the change of expression of genes that are  important for the differentiation of chondrocytes. In this paper, a new  approach was proposed, which uses a multi-stimuli multi-experiment  dataset for the inference of the network. In the two underlying  experiments, human mesenchymal stem cells were treated with either  TGF-beta1 or TGF-beta1+BMP2. After treatment, cellular response was  measured with microarrays at 11 different time points. Both datasets  were used simultaneously to infer one GRN that is able to simulate the  experiment-specific change of expression for the selected network genes.  Please note that you have to substitute the < at >  correspondingly.

load('examples/NetGenerator/WeberEtAl2013.RData')
o = grn2sbml.create_input.NetGenerator(netgen.result=geneNetOpt, modelName='Chondrogenesis of MSC',  authorGivenName=c('Michael','Sebastian'), authorFamilyName=c('Weber','Henkel'),  authorEmail=c('Michael.Weber< at >hki-jena.de','Sebastian.Henkel< at >biocontrol-jena.com'),  authorOrganization=c('Hans-Knoell-Institut Jena','Biocontrol Jena'), pubMedID='23280066',  pubMedIDQualifier='BQM_IS', gui=TRUE, modelTimeUnit='Hours', createLayout=TRUE)
grn2sbml.start(o)

The first command loads the objects that are returned by  NetGenerator as a result of the program and contain all relevant  information including the inferred network. The geneNetOpt-object,  along with annotation data for the network model is then used to create  the input object for GRN2SBML. To give an example how to automatically  add multiple authors to a model we included the second author of the  paper. Subsequently, the experimental data that was used for the  inference is stored as a csv-file which can be imported and displayed by  the SBMLsimulator. The last command starts GRN2SBML and passes all  information contained in the input object as parameters to the java  application. Using GRN2SBML, the encoded network contains both stimuli  as an input. Figure 9 shows the GRN that was visualized using Cytoscape.

Figure 9: Visualization of the SBML-encoded GRN using Cytoscape.

Using SBMLsimulator, the weights of the edges that connect  the stimuli with the target genes can be individually set to zero,  thereby allowing to reproduce the experimentally measured data. This is  outlined in figure 10. While figure 10 A contains the simulated  expression profiles using the TGF-beta stimulus only, figure 10 B shows  the simulated expression profiles of the same two genes under the  influence of TGF-beta+BMP2 stimulation.

Figure 10: A) Simulated expression profiles of  COL10A1 and SATB2 when only TGF-beta is used as a stimulus. B) Simulated  expression profiles of COL10A1 and SATB2 under the stimulus of  TGF-beta+BMP2. All simulations were performed with SBMLsimulator.

5. TILAR

Along with the TILAR algorithm, Hecker et al. published a  GRN modeling the transcriptional regulation in response to the  antirheumatic drug etanercept [6]. This GRN will be used to show how  GRN2SBML is operated from the command line without using the GUI.  Therefore, additional annotation information for the species in the form  of GO-terms and NCBI gene IDs is stored in text files which are passed  to GRN2SBML as a command line parameter. The available parameters and  their description are outlined in table 1. The complete program call is  shown below. Please note that you have to substitute the < at >  correspondingly.

java -jar Java/GRN2SBML.jar --interactionsFile examples/TILAR/Interaktionsliste.csv  --authorGivenName "Michael" --authorFamilyName "Hecker" --authorEmail "Michael.Hecker< at >hki-jena.de"  --authorOrganization 'Hans-Knöll-Institut*Jena'  --modelName "Transcriptional Regulation in Response to Etanercept" --parser TILAR  --destinationFile examples/TILAR/Model.xml --pubMedID 19703281  --pubMedIDQualifier "BQM_IS" --speciesAnnotations examples/TILAR/GO_Terme.txt  examples/TILAR/NCBI_Gene_IDs.txt -f
Parameter Type Description
--authorEmail String[<argument1> <argument2>...] List of author email addresses. Substitute white spaces with '*'.
--authorFamilyName String[<argument1> <argument2>...] List of author family names. Substitute white spaces with '*'.
--authorGivenName String[<argument1> <argument2>...] List of author given names. Substitute white spaces with '*'.
--authorOrganization String[<argument1> <argument2>...] List of author organization. Substitute white spaces with '*'.
--bioModelsID String ID of the BioModels reference model.
--bioModelsIDQualifier String {BQM_IS, BQM_IS_DERIVED_FROM, BQM_IS_DESCRIBED_BY, BQM_UNKNOWN} Relation between this model and the reference BioModels model.
--createLayout boolean Create additional SBML layouts.
--destinationFile String <path> Path to the output file.
--interactionsFile String <path> File that contains the interactions of the network.
--perturbationFile String <path> File that contains the perturbation interactions of the network.
--force Boolean Force overwriting of the destination file.
--relationAnnotations String[<path1> <path2> ...] List of annotation files for the relations.
--speciesAnnotations String[<path1> <path2> ...] List of annotation files for the species.
--parser String Specifies the parser for the network.
--listAvailableParsers Void Returns a list of available parsers.
--modelName String The title of the network.
--modelTimeUnit String The time unit of the network. Depends on the selected parser.
--modelType String The way to encode the network. Depends on the selected parser.
--bioMart Boolean Set true to not connect with BioMart.
--miriam Boolean Set true to not connect with MIRIAM.
--gui Boolean Starts the graphical user interface.
--help Void Displays this help.

Table 1: Table of the available command line  parameters (Paramaters), the type of parameter they expect (Type) and a  short description of the parameter's usage.

Figure 11: Visualization of the SBML-encoded TILAR-inferred GRN using Cytoscape.

6. References

[1] Vlaic et al. GRN2SBML: Automated encoding and annotation  of inferred gene regulatory networks complying with SBML  Bioinformatics, 2013

[2] Hucka et al. The systems biology markup language (SBML):  a medium for representation and exchange of biochemical network models.  Bioinformatics, Control and Dynamical Systems, MC 107-81, California  Institute of Technology, Pasadena, CA 91125, USA.  sysbio-team@caltech.edu, 2003, 19, 524-531

[3] Dräger et al. JSBML: a flexible Java library for working  with SBML. Bioinformatics, Center for Bioinformatics Tuebingen (ZBIT),  University of Tuebingen, Tübingen, Germany. jsbml-team@sbml.org, 2011,  27, 2167-2168

[4] Vlaic et al. The extended TILAR approach: a novel tool  for dynamic modeling of the transcription factor network regulating the  adaption to in vitro cultivation of murine hepatocytes. BMC Syst Biol,  6:147, 2012.

[5] Weber et al. Inference of dynamical gene-regulatory  networks based on time-resolved multi-stimuli multi-experiment data  applying NetGenerator V2.0. BMC Syst Biol, 7(1):1, Jan 2013.

[6] Hecker et al. Integrative modeling of transcriptional  regulation in response to antirheumatic therapy. BMC Bioinformatics,  10:262, 2009.

[7] Shannon et al. Cytoscape: a software environment for  integrated models of biomolecular interaction networks. Genome Res,  13(11):2498–2504, Nov 2003.

[8] König et al. CySBML: a Cytoscape plugin for SBML. Bioinformatics, 28(18):2402–2403, Sep 2012.