We are still on the way of standardizing the inputs and outputs formats of our scripts. Ultimately we will want to use formats such as GFF3 or BAD to facilitate data exchanges. Also possible is that the functions of these simple scripts will soon be covered by other well-written software packages. Overall, these scripts are developed by a self-taught bioinformatician and are generally hard to use unless you are from the Lam lab. Also as I am not a native English speaker, you will see plenty of funny names of variables and options. Of course, please feel free to download them if you really want to find out how we do things.
For any questions, contact me at luo@eden.rutgers.edu.
--by Chongyuan Luo
For any questions, contact me at luo@eden.rutgers.edu.
--by Chongyuan Luo
This script generates a data matrix with all Arabidopsis genes aligned at their transcription start site (TSS). Patterns locate from -1kb to +5kb of the TSS will be covered. The path of the genome annotation has to be specified in the script. The annotation files can be found at ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR8_genome_release/NCBI_chr*.tbl. Generating this data matrix is the first step of ANCORP. The matrix can then be sorted or clustered by various criterias.
This script generates two data matrix each correponds with sense and anti-sense transcripts for all Arabidopsis genes. It is basically a modified Constant_TSS_mapping.pl for transcripts analysis.
As Constant_TSS_mapping.pl always plot the -1 to +5kb regions, part of the signals in the matrix will originate from outside of the genes. trim.pl was written to mask those regions beyond the transcription termination site (TTS). Values will be converted to a "NA".
This script convert the output of Cluster 3.0 (CDT files) to plain format.
This script take the output of MACS as input and identify genes that overlap with domains modified with histone modifications. We use the resulting list to generate chromatin state codes for each annotated gene.
It generates chromatin state codes for all Arabidopsis genes through combining multiple outputs from Overlap_mapping.pl. Unfortunately all options have to be specified within the script at the moment. So I guess no one would know how to use it except for me, yeah!
This script does the actual ANCORP procedure. It takes a anchor materix and using the order of genes to sort the 'correlative' matrix.