TFInfer is an open-source standalone software designed to estimate the relative activities of transcription factor proteins based on gene-expression data. It is based on a statistical method that has proven to provide biologically meaningful results which are not documented in literature [1,2,3]. Using gene-expression data combined with the architectural information about the regulatory network, activities of transcription factor proteins can be estimated in a computationally efficient way. TFInfer can handle time-series gene-expression data and gene-expression data from several independent conditions with or without replicates . Implementation is done using .Net framework (or equivalent on Linux), so it is a requirement that user either have Microsoft.Net on Microsoft Windows or mono on the other platforms. dnAnalytics, an open-source numerical library in C# and ZedGraph, an open-source plotting tool in C#, are used for the implementation of this software. This software is available on most OSes where support for either Microsoft.Net or mono is available.
This document is divided into following parts.
On the homepage of TFInfer, two types of binaries are available. Standard windows installer TFInfer.msi and a .zip file for use on Linux-based machines. The sections below describe these in detail.
TFInfer for MS Windows can be downloaded from TFInfer website. Microsoft.Net framework is required in order to run TFInfer on MS Windows. This framework is installed on most Windows operating system these days. If you dont have it, the installer for TFInfer installer will prompt you to do so. In this case, go to Microsoft.Net Framework Download Page, and download and install the latest version.
After installing Microsoft.Net Framework, start the installer file TFInfer.msi. Follow the simple on screen instructions to complete the installation. Installer also create entries in the start menu list and a folder containing the sample data and connectivity files in a folder (named TFInfer) under documents folder of the user. Note that this folder is not accessible to other users as this is created in user profile. For changing the location of the TFInfer data folder, see section 5.1.
Mono is an open source implementation of Microsoft's .Net Framework based on the ECMA standards for C# and the Common Language Runtime and available for most operating system now. In order to run TFInfer on Linux-based machines, mono is required on that platform. For downloading the version for your platform, go to Mono Download Page, select and install the mono version available for your platform. The download link also provides details about how to get and install the suitable version from the standard repositories of operating system distributions once you select your platform.
After installing mono on your platform, download the TFInfer.zip, under the Linux section from TFInfer website. Unzip this file and place it under your home directory (recommended) or any place where you have write access to the file system. After this, open the folder TFInfer and go to bin directory and double click on the tfinfer to run the TFInfer main executable. At the root of the TFInfer.zip for linux based machines, a readme file contains installation instructions.
After installing TFInfer on user specified location, TFInfer installer creates a directory named TFInfer in My Documents directory on Windows XP and under Documents directory on Windows Vista. On Linux-based machines, this directory will be located in TFInfer/lib/TFInfer. Sample data and connectivity files for yeast and E.coli are placed in this directory. User specified data and connectivity files can be selected from any location of the file system. For MS Windows, TFInfer installer installs the software in a folder managed by user profile settings and different users on the same machine can install the software on their respective profile folders. On Linux-based operating systems, data directory is a sub-directory in the TFInfer directory as described above.
File format for TFInfer is Comma Separated File (CSV). These files can be edited in many spreadsheet applications like MS Excel and Open Office spreadsheet software. Structure of the data and connectivity file is given below.
Data files contain the logged gene-expression data in the form of rows and columns where each row represents the gene expression levels of a particular gene at different time points or different experimental conditions. A sample data file is shown below. First column contains the genes and other columns specify the expression level at different time points with a header row. For gene-expression data containing different experimental conditions, same file structure is followed. Figure 1 shows a sample data file containing artificial data opened in MS Excel.
Figure 1: Structure of gene-expression data files for TFInfer
There is no upper limit when selecting the replicates of gene-expression data. However, minimum of two replicates need to be selected when dealing with replicates. Genes in data files need not to be sorted as TFInfer can handle the unsorted data files.
Note that when using relative data in data file, if the file contains only one column/time point/condition, then it is necessary to add a column of zeros before the only data sample for the proper function of the software.
Connectivity files are also structured as CSV files. Figure 2 shows a part of the connectivity file opened in MS Excel.
Figure 2: Structure of connectivity file for TFInfer
Two connectivity files are packaged with TFInfer; connectivity file for yeast and connectivity file for E.coli. Connectivity for yeast is derived from [5, 6] and  is used for connectivity in E.coli. In yeast connectivity file, ORF identifier is used for specifying genes while b-numbers are used for E.coli in E.coli connectivity file. In a valid connectivity file, first column specifies the genes (ORF identifier or b-numbers) while next two columns may contain any information about the genes. Header row of the connectivity file must contain transcription factor proteins names. Binary entries in the connectivity file specify the connectivity between a particular transcription factor and genes. If an entry corresponding to a gene and transcription factor is 1, then there may be a connection between the gene and the transcription factor protein. The inferred connectivity weights (Matrix B in the model) returned by the model determine the strength of the connection after observing the data. This connectivity information is available in literature for different organism. Connectivity information for yeast and E.coli is bundled with the software. For other organisms, users have to supply their own connectivity file.
Main interface of TFInfer is shown in Figure 3.
Figure 3: Main interface of TFInfer
User can interact with the components on the main interface based on the following description:
Header row: This checkbox is used by TFInfer while parsing the data file(s) for ignoring the first row (when checked). This option must be selected before selecting data file.
Time-series data: This checkbox enables TFInfer to select appropriate algorithm for time-series gene-expression data (when checked) or gene-expression data from multiple experimental conditions.
Replicate: If this checkbox is checked, then user can specify more than one file for gene-expression data containing replicates and then TFInfer selects the appropriate algorithm based on the status of this checkbox.
Based on the status of two checkboxes (Time-series and Replicates), following four different algorithms can be selected:
Estimation of TF activities using replicates and time-series gene-expression data.
Estimation of TF activities using a single file containing time-series gene-expression data.
Estimation of TF activities using replicates and gene-expression data from multiple experimental conditions.
Estimation of TF activities using a single file containing gene-expression data from multiple experimental conditions.
Table 1: Options for selecting algorithm in TFInfer
TS: Time-series gene-expression data
NTS: Non-time-series gene-expression data
REP: Replicates of gene-expression data in multiple files
NOREP: Single file containing gene-expression data
Select (Experimental Data): This button opens a dialog box to select the gene-expression data file (CSV format) from the TFInfer home directory. Files can be selected from any place on the file system. Same button is used to select replicates of gene-expression data by repeatedly selecting the files.
Select (Connectivity Data): Opens a dialog to select connectivity file in CSV format. Default location for this is TFInfer home directory.
Reset: Resets the internal and external state of the software. It is recommended that user reset the state of the software before starting again.
Load: When data and connectivity files are in place, using this button will load the data from the files and pre-process it for any inconsistencies. If data is loaded successfully, a message is displayed asking the user to start the main loop of the algorithm.
Start: After loading the data, this button should be used to start the main loop of the algorithm. Maximum number of iterations for the main loop can be changed from the menu options as described in menu options.
Stop: During the model building stage, if user wants to interrupt the execution then this option should be used.
View: When main loop of TFInfer finishes, it displays a message and then view button can be used to see the output of the algorithm.
Exit: Closes TFInfer.
List of steps required to start the software run is given below:
User need to select the data file(s) using the open file dialog box. After selecting a data file, TFInfer shows a summary of the data inside the file selected. This summary contains the number of genes and number of time-points (or conditions in case of gene-expression data containing multiple conditions) in the data file. If the user verifies the details, then the file is selected. In case of replicates, user may supply any number of data files in the same way. Three checkboxes labelled Header row, Time-series data and Replicates must be initialized appropriately before selecting the data file.
· Header row:
If the data file contains a header row, then this checkbox must be checked before selecting the data file.
· Time-series data:
Data file may contain gene-expression data from a time-series experiment or data from several independent conditions. Based on the type of the data, user need to check or uncheck this checkbox before building the model with the software.
If replicates are available, then checking this option enables the user to select multiple files containing gene-expression data.
Note that user must initialize these checkboxes before selecting any data file(s).
After selecting the data file(s), user needs to select a valid connectivity file. Structure of a valid connectivity file is discussed earlier. User can select connectivity file for yeast and E.coli from the Data folder of the TFInfer home folder.
A summary of the information contained in the connectivity file is shown. This information contains the number of genes and number of transcription factor proteins available in the file. If user verifies this, then TFInfer select that connectivity file.
After verifying the information contained in the connectivity file, TFInfer will show the list of the transcription factors in the connectivity file as shown in figure 2. User is allowed to select as many transcription factors as required before closing TF selection dialog box. Only the selected transcription factors will be involved in the model building stage. Selecting large number of transcription factors may take more resources depending on the hardware configuration of the target machine.
Figure 4: TF Selection Interface
After selecting the data and connectivity, Load button loads the data and connectivity files and check for inconsistencies e.g. if the data file contains a different set of genes than the connectivity file, then user is informed at this stage. Once the data is loaded successfully, user can proceed to next step.
Clicking the Start button will start the process for building the model. This will disable other components in the user interface and a progress bar at the bottom on the main frame will show the overall progress. This progress is based on the maximum number of iteration taken in order to build the model. User can set this option from the main menu.
When TFInfer run for the given data is completed, results can be seen by clicking the View button. This will open another window containing the list of the transcription factors involved and a plot corresponding to the transcription factor selected from the list as shown in Figure 5. Selecting a different transcription factor from the list will show the corresponding plot. This plot shows the relative concentration of transcription at different time instances.
Figure 5: Results Interface
Different options are available here:
1) Flip button is used to flip the signal in the plot.
2) Save plot is used to save the plot for selected transcription factor protein in various formats.
3) Save data will save the followings into a CSV file:
· Relative concentrations of all TFs and the corresponding error bars.
· Connectivity activity matrix contains the regulatory strengths with which transcription factor proteins influence the target genes. More information on this can be found in the original publication .
Please note that the regulatory strengths and TFAs form a product in the likelihood that can give rise to the sign ambiguity i.e. de-repressing looks the same as activating of TF. This ambiguity can be resolved by having additional information such as TF is an activator for a specific gene or TF is active/inactive in specific condition. More discussion can be found in original model .
After viewing and saving the results, reset the state of the software by clicking the Reset button before starting again.
Options available under menu are discussed here.
It is recommended that user keeps the data and connectivity files in the subdirectory of TFInfer main directory (in My Documents). If user wants to change the location of the TFInfer directory, there is an option in the Main menu of TFInfer where user can specify the new location for the TFInfer. The new location must contain the same directory structure which is given below.
|_____ Data (contains connectivity files and sample data)
|_____ Temps (Temporary data folder)
Figure 6: Directory structure of TFInfer
Default location for this directory is My “Documents” on Windows XP and “Documents” on Windows Vista.
For Linux-based platforms, the location of the home folder is /TFInfer/lib/TFInfer.
Under Menu, user can select Set Steps to set the maximum number of iterations. Default value for this is 2000.
Four data files and three connectivity files are packaged with the software. Data files artificial0_yeast_rep1.csv, artificial_yeast_rep2.csv, artificial_yeast_rep3.csv contains artificial data. Connectivity files yeastConn.csv and ecoliConn.csv contains architectural information of the regulatory network of yeast and E.coli respectively. TFInfer package also contains the data and connectivity files (Davidge_et_al_2009_Data.csv and Davidge_et_al_2009_Conn.csv) used in the . Note that using any of the artificial data files with E.coli connectivity file will not produce any results as these data files contain ORF identifiers for yeast genes.
 Sanguinetti, G.,
 Partridge, J.D.,
Sanguinetti, G., Dibden, D. P., Roberts, R. E.,
 Davidge, K.S., Sanguinetti, G., Yee, C. H., Cox, A. G.,
McLeod, C. W., Monk, C. E., Mann, B. E., Motterlini, R.,
 Asif, H.M.S. and Sanguinetti, G.: Probabilistic Inference of Transcription Factor Concentrations and Gene-specific Regulatory Activities for Time-independent Data. in Prib 2009. 2009.
 Harbison C.T., Gordon D.B., Lee T.I., Rinaldi N.J., Macisaac K.D., Danford T.W., Hannett N.M., Tagne J.B., Reynolds D.B., Yoo J., Jennings E.G., Zeitlinger J., Pokholok D.K., Kellis M., Rolfe P.A., Takusagawa K.T., Lander E.S., Gifford D.K., Fraenkel E., Young R.A.: Transcriptional regulatory code of a eukaryotic genome. Nature, 2004. 431: 99–104
 Lee T.I.,