AUREA: an open-source software system for accurate and user-friendly identification of relative expression molecular signatures

Relative Expression Analysis, which is based on the relative ordering of expression values for a small number of genes, has been shown in several studies to accurately classify between disease phenotypes, cancer subclasses, disease outcomes and diverse human pathologies assayed through blood-borne leukocytes The TSP family of Relative Expression Analysis algorithms possess similar accuracy in classification tasks to other supervised learning based classifiers such as Support Vector Machines; but they also provide an important advantage over many classical methods of discrimination: interpretability.

The ability to leverage insight provided by the returned gene pairs (or triplets or networks) into tasks beyond classification, such as the exploration of their use as possible drug targets or as clues to the underlying biological nature of a phenotype, make these algorithms powerful tools for scientific exploration.

AUREA presents four algorithms from this family of computational methods. The Top Scoring Pair(TSP) algorithm attempts to find the pair of genes with the maximum likelihood of being ordered consistently within each class but differently between classes. The k-TSP algorithm attempts to find the set of k or less disjoint TSPs with the greatest accuracy based on internal cross-validation. The Top Scoring Triplet(TST) algorithm attempts to find the triplet of genes with the maximum likelihood of being ordered consistently within each class but differently between given classes. The Differential Rank Conservation algorithm (DiRaC) attempts to find the gene network with maximum likelihood of showing consistent relative expression(i.e. ranking of network genes) within a class while displaying different relative expression between given classes.

Currently, several excellent versions of these algorithms are available for download; DiRaC is available for Matlab, k-TSP is available as a Perl script, and TSP and TST are available in both R scripts and as Matlab scripts for GPU architectures. Despite their excellence(C code from one of these and ideas from all are incorporated into AUREA), these implementations have serious shortcomings in the disparate platforms they are available for, their accessibility to individuals untrained in the programming languages they were developed and in the subset of learning algorithms provided. As a cross platform, open source combination of all of these algorithms that can be used without any programming experience, we believe that AUREA opens these methods to be fully utilized and explored by the entire biological community.

A primary challenge in effectively using these algorithms is selecting the model where the genomic features learned by the model are most reflective of the provided classification problem. Each algorithm has its own set of parameters, over which the accuracy of the models selected features can fluctuate within and between datasets. For example, TSP, k-TSP and TST filter the input data to learn only on a specified number of the most differentially expressed genes in order to both reduce computational complexity and to constrain the space of genes examined to candidates where expression diverges between phenotypes. In total, for these four algorithms, there are 12 parameters many of which have thousands of possible values. These values have a dramatic effect on the time and space requirements of the model; e.g., choosing to run TST on all genes in a microarray (~10k) requires several terabytes of memory and 6.5 months on a single CPU.

AUREA allows a user to adaptively search the model space to find a model with learned features that provide the highest apparent accuracy over a training set, while weighing the computational complexity of the model. By simply specifying a target accuracy and a maximum running time, AUREA removes the necessity of a time-consuming manual search of the model space. Importantly, AUREA is designed with an easy to use Graphical User Interface, lowering the technical barriers to exploiting the rich research opportunities of these models.

Data and Software File(s):
Example CSV data file
lin-workspace.zip
win-workspace.zip
mac-workspace.zip

Installation

Download stand-alone AUREA

This is the easiest option to get the GUI up and running for Windows and Mac.

Windows

Download the appropriate .msi file.

Click and follow instructions.

Mac OS X

Download the appropriate .zip file.

Extract the zip file.
Drag AUREA.app to your Applications folder using Finder.
Go into Applications and click AUREA

Install the AUREA libraries and run from your system’s python distribution

This option allows you to write your own scripts using the AUREA package. It is also the only option for Linux users.

The AUREA application has two primary components. The workspace and the libraries. The AUREA libraries contain all you need to create your own scripts in Python that take advantage of the AUREA system. The AUREAworkplace contains the initialization files, settings and required data files that are required to run the GUI interface. It requires the AUREA libraries be installed first.

Install AUREA libraries from pre-built binary distributions

Windows Installation

  1. Install required software
    • Python 2.6.x or 2.7.x
      • Available for free download from python.org, we recommend the 2.7.x release.
      • Note: at this time we do not support 3.x versions of Python
    • Tcl/Tk
      • Note: most full python distributions will install this by default
      • If for some reason your distribution did not or you chose not to install it, it is available for free download from ActiveState
  2. Install AUREA libraries
    • Via Windows installer(recommended)
      1. Download appropriate AUREA windows installer.
        • AUREA-#.#.#[AUREA version].win[architecture win32 for 32bit and -amd64 for 64bit]-py2.#[your python version].exe(i.e., AUREA-1.6.3.win-amd64-py2.7.exe for a 64 bit Windows machine running Python 2.7)
      2. Double click installer after downloaded.
      3. Accept default configuration. (If the installer complains that it cannot find python in the registry, double check the architecture and python version number.)
    • Via easy_install
      1. Install and configure easy_install:http://pypi.python.org/pypi/setuptools#windows
        • From downloaded file
          1. Download appropriate AUREA egg.
            • AUREA-#.#.#[AUREA Version]-py2.#[your python version]-win[your architecture 32 for 32bit and -amd64 for 64bit].egg (i.e., AUREA-1.6.3-py2.7-win-amd64.egg for a 64 bit Windows machine running Python 2.7)
          2. Open command line(powershell or cmd)
          3. Change directory to directory containing egg.(Note: if windows changed .egg to .zip you will have to rename the file back to .egg)
          4. Enter: easy_install AUREA*.egg
        • From URL
          1. Copy URL for appropriate AUREA egg
            • AUREA-#.#.#[AUREA Version]-py2.#[your python version]-win[your architecture 32 for 32bit and -amd64 for 64bit].egg
          2. Open command line(powershell or cmd)
          3. Enter: easy_install [URL]
  3. Install the workspace
    1. Download the windows workspace [win-workspace.zip]
    2. Extract the workspace folder to the location you would like to work from.
  4. Open the workspace folder and double click AUREAGUI.pyw

Linux Installation

  1. Install required software
  2. Install AUREA libraries
    1. Select appropriate .egg file for your python version and architecture.Downloads
  3. Install the workspace
    1. Download the windows workspace [lin-workspace.zip]
    2. Extract the workspace folder to the location you would like to work from.
  4. Open the workspace folder and double click AUREAGUI.py
  5. If it asks, click “Run”.
  6. Of course, you can also run it from the terminal by entering the workspace folder and typing ./AUREAGUI.py

OS X Installation

  1. Install required software
  2. Install AUREA libraries
    1. Select appropriate .egg file for your python version(Mac eggs ship as universal binaries, so you do not have to worry about architecture).Downloads
  3. Install the workspace
    1. Download the windows workspace [mac-workspace.zip]
    2. Extract the workspace folder to the location you would like to work from.
  4. See: http://stackoverflow.com/questions/1854718/how-to-auto-run-a-scriptfor instructions on how to make AUREAGUI.py clickable.
  5. Of course, you can also run it from the terminal, ./AUREAGUI.py

 

Build AUREA from source

So, you want to build from source? Abandon hope all ye who enter here.

Get the source

The full source code is available at https://github.com/JohnCEarls/AUREA. If you have git installed you can navigate via the command line to where you want to perform your build and enter:

  • git clone git://github.com/JohnCEarls/AUREA.git

This will put the latest version of the source code into a folder name AUREA.

If you do not have git installed, you can click the Download/ZIP button (athttps://github.com/JohnCEarls/AUREA) and download a zipped version of the AUREA build libraries. Simply unzip that folder wherever you want to build the source.

Build on Mac OSX

Prereqs:

  • Python >= 2.6
    • 2.6.x is default on OS X Snow Leopard and up.
    • Installing XCode on Leopard and earlier should upgrade Python to 2.7.x
  • XCode
    • This will install the GNU toolchain and SWIG packages
    • Supposedly XCode 3 is available for download for free somewhere.(If you find out where, please email john.c.earls-at-gmail and I will add a link here.)
    • XCode should come with any OS X installation disks.
    • XCode 4 is available at the Mac App Store for $5.99
  • Python Easy_Install tools
    • This may come with the default python installation.
  • Tkinter
    • Should be already installed

Build instructions

  • Open terminal
  • Navigate to the base AUREA directory
  • Options from here:
    • Install from source
      • enter: python setup.py install
    • Build a binary egg distribution without installing libraries
      • enter: ARCHFLAGS=”-arch i386 -arch x86_64″ python setup.py bdist_egg
        • Note: the ARCHFLAGS argument is necessary with XCode 4 (see “broken pipe during build“)
        • Note: depending on your python installation you may need to export CXX=g++-4.2 and export CC=gcc-4.2.  You will receive an error about the arch flags if that is necessary.
  • Build a dumb binary distribution without installing libraries
    • enter: ARCHFLAGS=”-arch i386 -arch x86_64″ python setup.py bdist
  • Install the workspace.

Build on Windows

Prereqs:

  • Visual Studio 2008
  • Python 2.6 or 2.7
    • Must add python to the Path and 2 environmental variables (winxp /Windows 7) User variables are fine for PYTHON_INCLUDE and PYTHON_LIB.
      • The python directory must be added to the Path system environmental variable (C:\Python26 for example)
      • PYTHON_INCLUDE – which points at the include folder in the Python2X folder (C:\Python26\include for example)
      • PYTHON_LIB – which points at the file python2X.lib (C:\Python26\libs\python26.lib for example
  • Swig (Swigwin at SF)
    • Must add swig to Path (see above for how to change Environmental variables) Note: You cannot just add the executable, the compiler will need all of the files in the swig folder.
Build instructions
Note: this is the easy part, getting the pre-reqs right is the painful part
  • Get the source either using “git pull” or by downloading the zip file from github.
  • Open Powershell or Command Prompt.
  • Navigate to the unpackaged AUREA directory. (cd “C:\Documents and Settings\user\AUREA” for example)
  • Now you can either build the files, install the library or make distributable packages
    • Install from source
      • enter python setup.py install
    • Build a binary egg distribution without installing libraries
      • enter python setup.py bdist_egg
    • Build a windows installer distribution without installing libraries
      • enter python setup.py bdist_wininst
  • Install the workspace.

Build on Linux

Prereqs:

  • GCC
  • Swig (v.1.3.X)
    • Note: swig under linux has some problems with vectors in the 2.X versions. It is a pain to get it to work, I recommend the 1.3 branch. It is still actively supported
  • Python 2.6 or 2.7
    • Note: you will also need the dev libraries. Should be available from any package manager.(python-dev)
    • Note: You will also need the python-tk libraries.

Build instructions

  • Get the source either using “git pull” or by downloading the zip file from github.
  • Open a terminal
  • Navigate to the unpackaged AUREA director
  • Now you can either build the files, install the library or make distributable packages
    • Install from source
      • enter python setup.py install
    • Build a binary egg distribution without installing libraries
      • enter python setup.py bdist_egg
  • Install the workspace.

 

Download AUREA pre-built

This is the easiest option to get the GUI up and running for Windows and Mac.

Windows

Download the appropriate .msi file.

Click and follow instructions.

Mac OS X

Download the appropriate .zip file.
Extract the zip file.
Drag AUREA.app to your Applications folder using Finder.
Go into Applications and click AUREA

 

Download AUREA pre-built library and workspace

 

Windows Built Distributions

Windows workspace

Mac OS X Built Distributions

OS X workspace

Linux Built Distributions

Linux workspace

GUI User Documentation

Overview

The AUREA GUI is designed to allow researchers to quickly and easily use Relative Expression based learning algorithms in their own work. It provides an intuitive interface to the AUREA libraries allowing biologists and other non-technical researchers to perform expression analysis on a wide variety of platforms and data.

The primary navigation between screens is on the left hand side of the screen. If a button is grayed out, that means that the requirements for that page have not been fulfilled (e.g. you can’t define classes if you have no data imported).

Data Summary Screen

The Data Summary screen (accessible through the Home button on all screens) provides an overview of the current state of your AUREA session. As you add data files, partition your classes, train your learners, etc., you can return to the Data Summary screen to review your selected options and the results of your actions.

The Data File field informs you which files are currently imported into AUREA and statistical information about how many genes and probes are available from these data sources. If you are using multiple data file inputs, AUREA will attempt to merge these files, which may result in loss of probe information if their is no mapping between the probeset of one data input to another.

The Gene Network File field tells you whether you have imported gene network information and, if so, how many networks can be created from your data, and the size of those networks. You are required to import a gene network file if you want to use DiRaC or Adaptive Learning.

The Classes field allows you to review your training set information by displaying the size and labels provided during the class definition.

Under the Best Classifiers field you can review the apparent accuracy of any trained learners. It will display the message “Not trained” when there is currently no trained classifier of that type. When a classifier has been trained, a tuple showing (the true class 1, false class one, true class 2, false class 2 and the Matthew’s Correlation Coefficient) how effectively the learner describes the learning set. The associated More info… will pop up a window that will give a more detailed explanation of the result of the training of the classification algorithm.

Under the CV Performance field you will find displayed the Matthew’s Correlation Coefficient of the k-fold cross validation from Evaluate Performance screen, if it has been run.

Import Data Screen

The Import Data screen (accessible through the Import Data button on all screens) is where you tell AUREA where to pull the data from which it will develop it’s models. After you have selected your datasources, you need to click the Import Files button on the bottom right of the screen to make those files available to AUREA.

The Data File field allows you to choose the gene expression datasources that you wish to learn on. AUREA currently supports csv and SOFT file formats. See the faq for more information on these data formats. If you already have a supported data set you would like to import, you can click Browse … and navigate to where that dataset is located on your local machine. If you wish to include multiple files, you can click the Add another file button and anotherData File field input box will be displayed. When multiple files are being added, you will see a Remove button to the right of the Browse … button that will allow you to deselect a file from your dataset.

The Download … button allows you to enter the number of a GDS SOFT file from the Gene Expression Omnibus and have it downloaded to your data folder. See the faq for more information about how AUREA uses the Gene Expression Omnibus.

The Gene Network File field allows you to define the gene networks for use by DiRaC (and by extension, Adaptive). You can import data without choosing this field, but you will not be able to run DiRaC or Adaptive without it. You will find the default gene network file(c2.biocarta.v2.5.symbols.gmt) in the data folder underneath your workspace. See the faq for more information on available gene network files.

The Gene Synonym File field provides gene synonyms that improve the mapping to Gene Networks. This field is not required, but we recommend that when using DiRaC you also use this field. We provide a default synonym file (Homo_sapiens.gene_info.gz) in the data folder underneath your workspace. See the faq for more information on available gene synonym files (non-human for example).

The Data Settings button pops up a window with options related to how your data import is handled. The Data settings allow you to tell AUREA how you want your data import processed. The Data Folder entry describes where AUREA should look for the data you are looking to process and where it should put any generated or downloaded files (by default this is the data folder in your workspace). You may enter an absolute or relative path to the data files. The Bad Data Value setting is the numeric value that AUREA should use when parsing data files and it encounters non-numeric values (i.e. `nil’). The Gene Collision Rule setting is the method to be used to combine/filter different probes within a sample that map to the same gene. This is used by AUREA to map genes to gene networks. You have the options `MAX’, `AVE’ and `MIN’ which represent taking the maximum value, the average value or the minimum value, respectively. The Gene Column setting is the column title in data files that corresponds to the gene name column. The Probe Column is the column title in data files that corresponds to the probe identifier column. Both the Gene Column and the Probe Column are initially set to the values expected in a GEO SOFT file. If you have a data set where there are only genes or probes available, set the Gene Column and Probe Column to be equal to the single valid identifying column.

Class Definition Screen

The Class Definition screen (accessible through the Class Definition button on all screens, provided data has been imported) is where you partition your imported samples into learning sets. After you have labeled and partitioned your samples, you need to click the Define Classesbutton at the bottom of the screen to save your choices.

The Class 1 and Class 2 fields allow you to assign appropriate labels to your training sets. You are required to enter labels before you can move save your partitioning.

In order to partition your data set, you select a sample and click the arrow(><) that will put it in the Box under the class label you want the sample to belong to. If you are using SOFT file formatted data sets, you will find *ss in the Select Training Set box. SOFT files often contain subsets, and if on import AUREA recognizes that there are subsets available, it will create a label for this subset and you may move the whole subset at once as if it was a single sample. For all samples, if AUREA finds descriptive information associated with the sample, it will display it in the box underneath the partitioning boxes. Note this is only available with SOFT files.

When you have chosen your training set, and provided appropriate labels, click the Define Classes button to save your selections.

Train Classifiers Screen

The Train Classifiers screen (accessible through the Train Classifiers button on all screens, provided the class definition has been set) is where you can select the models wish to train. Only models for which the required data is available will have active buttons.

You may choose to individually run one of the learning algorithms or to let the Adaptive Trainer learn the best model for the data set. When you select any of the individual algorithms for training, after the model is trained based on the settings for that model, AUREA will display the features that will be used by the model for classification. Various related information is available which can be saved as a text file by clicking save in the window displaying the results. In general, this information will include the associated probability matrices, the learner settings, the rules used to determine classification, etc.

To run the Adaptive Trainer you should enter a Maximum Time in seconds for the Adaptive learner and a Target accuracy(MCC linearly mapped to [0.0,1.0]) in cross validation. The Adaptive Trainer will display its progress in the status. When the results are displayed you can save the history of the Adaptive Learner as a text file for future use.

Test Classifiers Screen

The Test Classifiers screen (accessible through the Test Classifiers button on all screens, provided at least one learner has been trained) allow you to select untrained upon samples and run the trained models against them. You can partition a subset of the untrained samples(just like in the class definition) and select a learning algorithm to run against them. A summary of the results will be displayed in a popup window. From this popup window you can save the results to a text file (the text file will give you a sample by sample description of the results).

Evaluate Performance

The Evaluate Performance screen (accessible through the Test Classifiersbutton on all screens, provided classes are defined) allows you to run k-fold cross-validation against any learning algorithm that has the data necessary to run. Learning algorithms that do not have enough data available, will have inactive buttons. A popup window presents the results of the k-fold cross-validation and reports those results as a Matthew’s Correlation Coefficient. You can review these results in the Data Summary page under CV performance.

Learner Settings

The Learner Settings screen (accessible at all times on all screens through theLearner Settings button) contains buttons that will pop up windows allowing you to change the settings of the various learning algorithms. You can save your settings as an xml file for loading in another session(through the File dialog). By default AUREA will read the initial configuration from the config.xml file stored in the data directory in your workspace.

The DiRaC settings control the operation of the stand-alone DiRaC learning algorithm. The Row Key setting tells AUREA whether to use the probe mapping or gene mapping with the learning algorithm. This value should be set to either ‘gene’ or ‘probe’. For DiRaC this value should be set to ‘gene’ in order to facilitate the mapping of genes to gene network. The Number of Top Networks setting tells DiRaC how many networks should be used in classification. The Minimum Network Size setting tells DiRaC how large a network should be in order to be considered a candidate for use as a classifier.

The TSP settings control the operation of the stand-alone TSP algorithm. TheRow Key setting tells AUREA whether to use the probe mapping or gene mapping with the learning algorithm. This value should be set to either ‘gene’ or ‘probe’. In general, with TSP you will want to set this value to ‘probe’, but sometimes after merging data sets you may find that you do not have a good probe mapping, but there is a good gene mapping, you may choose to run the algorithm using gene names instead of probe names. The Filters setting lets you define how many genes/probes you wish to train on (see Wilcoxon for the details).

The k-TSP settings control the operation of the stand-alone k-TSP algorithm. The Row Key setting for k-TSP operates the same as the Row Key setting for TSP. The Maximum K Value setting lets you set the number of possible pairs you wish to examine. The Remove for Cross Validation setting lets you set how many samples should be removed when k-TSP performs cross validation to determine the optimal k value. The Number of CrossValidation Runssetting lets you choose how many times you want to perform its internal cross validation when finding the optimal k value. The Filters setting performs the same function as it does in TSP. Note that k-TSP requires a minimum filter size of at least 2k. You cannot have k disjoint gene pairs without at least 2k genes from which to choose.

The TST settings control the operation of the stand-alone TST algorithm. TheRow Key setting performs the same function as it does in TSP. The Filterssetting performs the same function as it does in TSP, except with three filters instead of two.

The Adaptive settings allow you to set the parameters for the Adaptive model building algorithm. Each learning algorithm has the same set of settings as found in the stand-alone learning algorithm, but these are defined as ranges instead of scalar values. The ranged settings follow the format ‘from, to, by.’ For example, if you want the adaptive algorithm to explore the model space of TSP with first gene filter ranging from 10 to 100 by 10 (i.e. 10,20,30 … 80, 90) you would put 10 in the first column on TSP-Filter 1 Range, 100 in the second column and 10 in the third column.

In addition to the model settings, you should also choose a Wilcoxon Row Keywhich is used to determine whether to use the probe mapping or gene mapping when generating filter sizes and estimating running times. Your choice here involves the same considerations as for TSP (i.e. generally it should be ‘probe’ but a poor probe mapping on a data merge means the user might want to set it to ‘gene’). The Initial Weight setting allows you to increase or decrease the likelihood of a model being run. By default these are set to 1.0. For example, if you wanted DiRaC to be less often you could set the initial weight of DiRaC to a number less than one, or conversely if you wanted to increase the frequency with which DiRaC is run, ass=”cmti-10″>ou could increase this factor to be greater than 1. The Minimum Weight setting is used internally to normalize the weight vector in order to prevent underflows (you will rarely need to change this setting).

 

Simple GUI tutorial

This is a simple walk through for using the GUI.

For how to start AUREA on your system see the install page and find the section for your operating system. The last couple instructions tell you how to start the program.

You will start on the Data Summary page. This page gives an overview of your session. The first thing you will want to do is import data.

Import Data

First click the Import Data button on the left hand side. You should now be on the Import Data page. To find a file to import, you can click browse. AUREA accepts .soft datasets from GEO and csv files. An example csv file is available in the faq. For this tutorial, we will download a data set from GEO.

Click Download… and enter 2546 as the soft file number in the popup window. (You can browse datasets available from GEO athttp://www.ncbi.nlm.nih.gov/sites/GDSbrowser). This will download GDS2546.soft.gz into the data directory under the workspace.

Next we will choose a gene network file. This is required so you can use Dirac and the adaptive learner. A network file (biocarta signalling networks) is provided in the data directory. See the faq for a link to different networks.

Click browse and select c2.biocarta.v2.5.symbols.gmt.

Finally we select the Gene Synonym File. This maps the gene names in your data file to known synonyms. This is helpful for dirac. We provide a synonym file in the data directory called Homo_sapiens.gene_info.gz

Click browse and select Homo_sapiens.gene_info.gz

The data is ready for import and now you can click Import Files. This will take a minute depending on the size of your data files

After the data is imported, the next step is partitioning the data into phenotypes. You do that by going to the class definition page.

Class Definition

Click Class Definition on the left.

This dataset is over prostate cancer. GDS files group things into subsets, so it is not necessary to individually select and move samples. We will make class 1 Normal, so enter “Normal” into the class 1 text area and Tumor into the class 2 text area.

To move the normal tissues under the Normal class, select *ss: normal prostate tissue from the list and click the left arrow on the left. To move the cancer tissues under the Cancer class, select *ss: primary prostate tumorfrom the list and click the right arrow on the right.

Now that the classes are partitioned into phenotypes we click, Define Classes. We are now ready to train Classifiers.

Train Classifiers

Click Train Classifiers. Select a learning algorithm and click the button. You should see progress reports at the bottom of the window. After the algorithm has finished training, a popup will appear with the selected features. Click save. More information about the results of the training is available in the saved file.

 

Frequently Asked Questions

  1. Installation
    1. Why do I have to add python to my system path?
  2. Graphical Interface
    1. What are SOFT files?
    2. How can I find compatible data files at the Gene Expression Omnibus?
    3. How do I format a csv I want to import?
    4. Where can I find different gene network files?
  3. Library

Installation

  1. Why do I have to add python to my system path?
    • Your system path tells your system where to look for executable programs. AUREA needs the python interpreter to run. In order for the system to know where the python interpreter is, you have to add its location to the system path.

Graphical Interface

  1. What are SOFT files?
    • SOFT(Simple Omnibus Format in Text) is a format used by NCBI for storing transcriptomic data in GEO. The format is described athttp://www.ncbi.nlm.nih.gov/geo/info/soft2.html. AUREA allows the importing of SOFT files from GEO of the GDS type. GDS (GEO Dataset) files are curated microarray data files that contain expression values and meta-data from the experiments that generated the data.
  2. How can I find compatible data files at the Gene Expression Omnibus?
    • You can visit the NCBI GEO Dataset browse athttp://www.ncbi.nlm.nih.gov/sites/GDSbrowser/ and search for datasets relevant to your research. Once you have identified Datasets you are interested in, you can either download the GDS####.soft.gz files associated with them or click download on the area Data screen and enter the number associated with the dataset.
  3. How do I format a csv I want to import?https://pricedev.isbscience.net/sites/default/files/software/AUREA-1…
    • Here is an example csv file.GDS2771.csv (57 MB)
      The basic rules for a csv for import are that it should be in row major order (i.e. first line should be the first row.)
      The first row should contain the header and should be

      IDENTIFIER,ID_REF,Sample 1, … Sample n

      The header needs to have the names IDENTIFIER and ID_REF. The IDENTIFIER column is for the gene names(i.e. MMP14, HSPA6, GNAS). These gene names are used by the DiRaC algorithm and need not be unique. The ID_REF column is for the probe names. These must be unique. If the gene names are unavailable for some reason, simply copy the probe names in the IDENTIFIER column and do not run DiRaC or the adaptive algorithm.
      The sample names will be displayed when you are selecting classes, and it may be beneficial to add some information to them to remind you of the phenotypes they represent.

      The data portions of the csv should be:

      Gene Name, Probe ID, S1 expression, S2 Expression, …, SN Expression

      For example,

      DDR1,1007_s_at,1.069700e+001,1.042400e+001,…,1.037400e+001

      The rules for dealing with multiple probes mapping to a single gene can be found in the GUI User Guide under “Gene Collision Rules.”

  4. Where can I find different gene network files?

 

Related Work