Untitled Document
FlyAtlas: the Drosophila adult gene expression atlas |
The raw Affymetrix array data files (If you don't know what these are, your bioinformatician will) have been mounted on Geo , with the accession number GSE7763.
For anything between simple use and full bioinformatic analysis, I'd suggest downloading the tab-delimited file of the data and importing it into Excel. You can then sort on multiple keys, to find the most enriched genes in a particular tissue, or genes that are unique to nervous tissue, for example. It's big (10 Mb), but you can download it off this site (use right-click, then "Save target as..."): use wget if your web browser breaks. But check back often, as we plan to expand it.
The dataset is a tab-separated file, easily readable by Excel or Perl scripts. The first line gives the column headings, so it should be easy to understand.
The data are referenced by probeset name (e.g. 123456_at) in the first field of each line. How do you get from there to gene names?
First you need to download the latest Affymetrix annotation file, either from the Affymetrix website, or from here:
http://flyatlas.org/Drosophila_2.na32.annot.csv (use right-click, then "Save target as...")
The dataset is of the format:
"Probe Set ID","GeneChip Array","Species Scientific Name","Annotation Date","Sequence Type","Sequence Source","Transcript ID(Array Design)","Target Description","Representative Public ID","Archival UniGene Cluster","UniGene ID","Genome Version","Alignments","Gene Title","Gene Symbol","Chromosomal Location","Unigene Cluster Type","Ensembl","Entrez Gene","SwissProt","EC","OMIM","RefSeq Protein ID","RefSeq Transcript ID","FlyBase","AGI","WormBase","MGI Name","RGD Name","SGD accession number","Gene Ontology Biological Process","Gene Ontology Cellular Component","Gene Ontology Molecular Function","Pathway","InterPro","Trans Membrane","QTL","Annotation Description","Annotation Transcript Cluster","Transcript Assignments","Annotation Notes"
Then write a short perl script to read the annotation file, then save the gene names as an associative array:
$annot{$probesetID}="$flybase $gene_symbol $geneontology"; (or whichever fields you'll find useful)
Then read the array data, extracting the probeset id from the first field of each line, and
print "$_ \t $annot{$idfield}";
If Perl isn't one of your skills (you should try it!), then this link does the work for you:
http://flyatlas.org/annotator.cgi (When the data have finished downloading, select all and copy, then Paste Special-> plain text into an excel workbook.)
It appends the entire affy probeset annotation to the end of each line of array data. It'll be a tough workout for Excel, but the data you need WILL be in there!
Please cite us and link back to us, if you present this data elsewhere.