Untitled Document

Flyatlas logo

FlyAtlas: the Drosophila adult gene expression atlas

University of Glasgow
Biotechnology & Biological Sciences Research Council
Home & Search
Batch/table Search
Tissues Search
BLASTP Search
About & FAQ
Top 50
Original data
Interesting meta-analysis
Links

 

 

The cel files

The raw Affymetrix array data files (If you don't know what these are, your bioinformatician will) have been mounted on Geo , with the accession number GSE7763.

The dataset as a tab-delimited file

For anything between simple use and full bioinformatic analysis, I'd suggest downloading the tab-delimited file of the data and importing it into Excel. You can then sort on multiple keys, to find the most enriched genes in a particular tissue, or genes that are unique to nervous tissue, for example. It's big (10 Mb), but you can download it off this site (use right-click, then "Save target as..."): use wget if your web browser breaks. But check back often, as we plan to expand it.

The dataset is a tab-separated file, easily readable by Excel or Perl scripts. The first line gives the column headings, so it should be easy to understand.

Making sense of the dataset

The data are referenced by probeset name (e.g. 123456_at) in the first field of each line. How do you get from there to gene names?

First you need to download the latest Affymetrix annotation file, either from the Affymetrix website, or from here:

http://flyatlas.org/Drosophila_2.na32.annot.csv (use right-click, then "Save target as...")

The dataset is of the format:
"Probe Set ID","GeneChip Array","Species Scientific Name","Annotation Date","Sequence Type","Sequence Source","Transcript ID(Array Design)","Target Description","Representative Public ID","Archival UniGene Cluster","UniGene ID","Genome Version","Alignments","Gene Title","Gene Symbol","Chromosomal Location","Unigene Cluster Type","Ensembl","Entrez Gene","SwissProt","EC","OMIM","RefSeq Protein ID","RefSeq Transcript ID","FlyBase","AGI","WormBase","MGI Name","RGD Name","SGD accession number","Gene Ontology Biological Process","Gene Ontology Cellular Component","Gene Ontology Molecular Function","Pathway","InterPro","Trans Membrane","QTL","Annotation Description","Annotation Transcript Cluster","Transcript Assignments","Annotation Notes"


Then write a short perl script to read the annotation file, then save the gene names as an associative array:
$annot{$probesetID}="$flybase $gene_symbol $geneontology"; (or whichever fields you'll find useful)


Then read the array data, extracting the probeset id from the first field of each line, and
print "$_ \t $annot{$idfield}";

If Perl isn't one of your skills (you should try it!), then this link does the work for you:

http://flyatlas.org/annotator.cgi (When the data have finished downloading, select all and copy, then Paste Special-> plain text into an excel workbook.)
It appends the entire affy probeset annotation to the end of each line of array data. It'll be a tough workout for Excel, but the data you need WILL be in there!

Please cite us and link back to us, if you present this data elsewhere.