The Intracellular Pathogen Cooperative Group @ St George's

Bioinformatics and Related Web Links

Session Started

 Last updated : 22 March 2010

IPC Home Page

St George's Portal London Technology Network Funding Sources

Genreal Bioinformatics Resources

Local Resource pages

Local Bioinformatics and Medical Teaching Resources (Some of the teaching resorces are Internal St George's Access only, external lecturers and or course co-ordinators interested in this type of material please contact Dr Laing please see the Core Groups page of the IPC)

Remote Resources/References

  • The International Society for Magnetic Resonance in Medicine is a nonprofit professional association devoted to furthering the development and application of magnetic resonance techniques in medicine and biology. The Society holds annual scientific meetings and sponsors other major educational and scientific workshops.

Primary Sequence Databases Tools

Nucleic Acid Sequence Databases

Protein Sequence Databases

  • The National Center for Biotechnology Information (NCBI): public databases, develops and distributes software tools for analyzing genome data, based in Bethesda MD US
    • Cross database searching using ENTREZ GenBank (nucleotides and proteins), PubMed (MEDLINE), 3D structures, genomes, and PopSet databases. This is a  similar type of search engine to SRS at the EBI
    • Searching GenBank
    • Genome Resources
    • dbEST is a division of GenBank that contains sequence data and other information on "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms
  • European Bioinformatics Institute (EBI) Home Page: maintains and provides access to public databases such as EMBL and information services, has an out-station based in UK at Hinxton Cambs
    • ENA The European Nucleotide Archive (ENA) provides a comprehensive record of the world's nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation
    • Some of the EMBOSS tools with an web interface
  • DNA Data Bank of Japan (DDBJ) Japanese National database resource
  • "estinformatics" resources/databases of ESTs, GSSs, etc. which have been cleaned up of vector and E.coli sequence
  • miRBase ( is the new home of microRNA data on the web, providing data previously accessible from the miRNA Registry. Old miRNA Registry addresses should redirect you to this page.
    • The miRBase Sequence Database is a searchable database of published miRNA sequences and annotation. The data were previously provided by the miRNA Registry.
    • The miRBase Registry continues to provide gene hunters with unique names for novel miRNA genes prior to publication of results.
    • The miRBase Targets database is a new resource of predicted miRNA targets in animals.
  • miRWalk2.0 is an improved version of the previous database (i.e. miRWalk). miRWalk2.0 perports to be the only freely accessible, comprehensive archive, supplying the biggest available collection of predicted and experimentally verified miRNA-target interactions with various novel and unique features (missing in a previous version i.e. miRWalk) to greatly assist the miRNA research community.human, mouse, and rat miRNAs, mirWalk2.0 . It does have some attractive features including mitochondrila targets
  • The Protein Information Resource (PIR):PIR produces the Protein Sequence Database
    (PSD) of functionally annotated protein sequences
  • Welcome to ExPASy
  • PDB WWW Home Page: repository for the processing and distribution of 3-D biological macromolecular structure data.
  • CATH is a manually curated classification of protein domain structures. Each protein has been chopped into structural domains and assigned into homologous superfamilies.
  • Protein Research Foundation, collects the information related to amino acids, peptides and proteins, Peptide/Protein Sequence Database (PRF/SEQDB) , Synthetic Compounds Database (PRF/SYNDB), Literature Database (PRF/LITDB)
  • Protein Mutant database provides information on what kinds of functional and/or structural influences are brought about by amino acid mutation of protein. The Protein Mutant Database (PMD) is based on literature reports.
  • BioBase Danish Centre for Human Genome Research's 2-D PAGE Databases



Genome Sequence Databases and Genome Annotation

  • The Institute for Genomic Research (Formerly TIGR) Institute with similar goals and aims as the Sanger center:- structural, functional and comparative analysis of genomes and gene products from a wide variety of organisms
  • The Sanger Centre Web Server one of the leading genomics centres in the world, dedicated to analysing and understanding genomes. Provides access to Software tools for interogating the genonme of selected organisms.
  • Genome Squencing projects at the Sanger
  • Laboratory of Genomics of Microbial Pathogens, Institut Pasteur
    • Ensembl eukaryotic genome browser:Ensembl Genome Server
    • The Ensembl Genomes project produces genome databases for important species from across the taxonomic range, using the Ensembl software system. Five new sites will be launched within the first half of 2009. Ensembl Bacteria, Ensembl Protists and Ensembl Metazoa are already available; Ensembl Plants and Ensembl Fungi will be launched later in the Spring 2009.
  • NCBI - The whole genomes of over 1000 viruses and over 100 microbes can be found in Entrez Genome. The genomes represent both completely sequenced organisms
  • The 1000 Genomes Project is the first project to sequence the genomes of a large number of people and aims to provide a comprehensive resource on human genetic variation. The data from this project is freely available through the 1000 Genomes website and from each of the two institutions that work together as the project DCC: the NCBI and EBI.Since the project's launch, the data set has grown enormously: By 2012 at 200 terabytes — the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs — the current 1000 Genomes Project records are a prime example of big data that has become so massive that few researchers have the computing power to use them. The world's largest set of data on human genetic variation — produced by the international 1000 Genomes Project — is now publicly available on the Amazon Web Services (AWS) cloud. Cloud access to the 1000 Genomes Project data through AWS is at
  • Encylopedia of DNA Elements (ENCODE) project started in september 2003 aims to map the functional elements of the Human genome, see the UCSC ENCODE browser
  • FANTOM is an international consortium established originally in 2000 to assign functional annotations to the full-length cDNAs. It has since expanded into mapping transcripts, transcription factors, promters and enhancers in a wide range of mammalian cell types.
  • WikiGenes is the "first wiki system to combine the collaborative and largely altruistic possibilities of wikis with explicit scientific authorship". see Nature Genetics 40, 1047 - 1051 (2008)
  • GenomeNet is a Japanese network of database and computational services for genome research and related research areas in molecular and cellular biology
  • GeneCards Homepage, GeneCards™ is a database of human genes, their products and their involvement in diseases. It offers information about the functions of all human genes that have an approved symbol.
  • The HUGO Gene Nomenclature Committee (HGNC) has assigned unique gene symbols and names to over 33,000 human loci, of which around 19,000 are protein coding. is a curated online repository of HGNC-approved gene nomenclature and associated resources including links to genomic, proteomic and phenotypic information, as well as dedicated gene family pages
  • The Joint Genome Institute (JGI) established in 1997, is a consortium of scientists, engineers and support staff from the U.S. Department of Energy's Lawrence Berkeley , Lawrence Livermore, and Los Alamos National Laboratories. We aim to develop and exploit new sequencing and other high-throughput, genome-scale and computational technologies as a means for discovering and characterizing the basic principles and relationships underlying the organization, function, and evolution of living systems. DOE expanded its genomic research to include the Microbial Genome Initiative in 1994.

SNP and Mutation Databases, Curated Mutations linked to Canacer and Drug Gene interactions

  • NGRI Catalog of Published Genome-Wide Association Studies, publications listed include those attempting to assay at least 100,000 single nucleotide polymorphisms (SNPs) in the initial stage. Publications are organized from most to least recent date of publication, indexing from online publication if available. Studies focusing only on candidate genes are excluded from this catalog. Studies are identified through weekly PubMed literature searches, daily NIH-distributed compilations of news and media reports, and occasional comparisons with an existing database of GWAS literature (HuGE Navigator).
  • NCBI plays a major role in facilitating the identification and cataloging of SNPs through its creation and maintenance of the public SNP database (dbSNP)
  • The Human Gene Mutation Database at the Institute of Medical Genetics in Cardiff

Organism specific databases

  • TubercuList Web Server : constructed around a database dedicated to the analysis of the genomes of the tubercle bacilli, Institute Pasteur Fr
  • MycoDB a new on-line database dedicated to the comparative genomics of mycobacteria, corynebacteria, Tropheryma and related organisms.
  • The Sanger Centre : M. tuberculosis Blast server: Sanger Center UK
  • The Sanger Centre : Campylobacter jejuni : Genome sequencing project of the foodborne pathogen Campylobacter jejuni
  • E.coli Database Portal entry page to E.coli resoruces
  • The HIV databases contain data on HIV genetic sequences, immunological epitopes, drug  resistance-associated mutations, and vaccine trials, also gives access to a large number of tools
  • Standford HIV RT and Protease database
  • HIV Protease database archieve of experimentally determined structures
  • Influenza Research database NIAID launched this influenza virus database in 2004 to serve as a comprehensive, freely available global public database and analysis resource for the study of influenza viruses. This integrated database contains diverse data sets submitted directly from researchers, data imported from public databases, as well as information extracted from the scientific literature.
  • The Sanger Center: P. faciparum sequncing project
  • PlasmoDB official database of the malaria parasite genome project. This resource provides access to finished sequence for Plasmodium falciparum (strain 3D7)
  • WHO/TDR MALARIA DATABASE. "An INFORMATION RESOURCE for scientists working in malaria research", it contains a wide variety of information ranging from sequences to conference news.
  • GiardiDB
  • The Pig Genome Database (PGD)The completion of the pig draft genome sequence marks a milestone in 20 years of pig genome studies. The Pig Genome Database (PGD) is to serve the purpose not only to bring together pig gene expression, quantitative trait loci (QTL), candidate gene, and whole genome association study (WGAS) results, but also to facilitate information integration and mining within the pig and across species.
  • Candida Gene Order Browser. CGOB is an online tool for visualising the syntenic context of genes from multiple Candida genomes

Gene Ontology (GO / Catagorical or Functional Groupings)

  • A comprehensive listing of GO tools organised by category is maintained in association with NEUROLEX at this site. This listing is not great as it doesnt provide descriptions of each package and which occur under multiple headings. This which replaces an older page : GO Tools
  • Phenotype PAGE is a disease focused gene set analysis web tool to analyze microarray gene expression data with predefined groups of disease related genes (registration and login is required but access is free). This tool focuses on gene set analysis using groups of genes that have been genetically determined to be relevant to human disease or genetically assigned phenotypes found in animal models of disease
  • Department of Computer Science, Wayne State University hosts a collection of GO tools which are free tools to  academics and includes Onto-Express (OE) as a novel tool able to automatically translate gene lists of differentially regulated genes into functional profiles.
  • The Gene Ontology web pages provides access to Ontologaical information using AmiGO 2, a Wiki tool AmiGO provides an interface to tools and resources for GO analysis.
  • Panther is a comprehensive, curated database of protein families, trees, subfamilies and functions that is intended to provide inference of gene and protein function. See the 2013 publication describing Panther in NAR. Tools include Gene List Analysis for functional classification, enrichment and over representation
  • WEGO (Web Gene Ontology Annotation Plot) is a useful tool for plotting GO annotation results.
  • GeneCodis3 is a web-based tool for the ontological analysis of large lists of genes. It can be used to determine biological annotations or combinations of annotations that are significantly associated to a list of genes under study with respect to a reference list. It is able to produce not only annotated lists but also some nice graphical displays of the output.
  • This site provides a series of programs allowing the functional investigation of groups of genes, based on the Gene OntologyTM ressource. Gene Ontology for Significant Collection of Annotations: GO-Scan is a tool that selects and presents relevant Gene Ontology (GO) annotations for a gene "hit" list from an Affymetrix microarray experiment
  • GENETOOLS is a collection of web-based tools on top of a database that brings together information from a broad range of resources, and provides this in a manner particularly useful for genome-wide analyses. Today, the two main tools connected to this database are the NMC Annotation Database V2.0 and eGOn V2.0

Prokaryotic Genome Databases

  • MBGD is a database for comparative analysis of completely sequenced microbial genomes, the number of which is now growing rapidly. The aim of MBGD is to facilitate comparative genomics from various points of view such as orthologue identification, paralogue clustering, motif analysis and gene order comparison.
  • TIGR Microbial Database: a listing of published microbial genomes and chromosomes and those in progress
  • The Integrated Microbial Genomes (IMG) system provides a framework for comparative analysis of the genomes sequenced by the Joint Genome Institute. The IMG produce some very useful web based comparative genomics tools.
  • The Ensembl Genomes project produces genome databases for important species from across the taxonomic range, using the Ensembl software system. Five new sites will be launched within the first half of 2009. Ensembl Bacteria, Ensembl Protists and Ensembl Metazoa are already available; Ensembl Plants and Ensembl Fungi will be launched later in the Spring 2009.

Microbiome or Infectious Disease Databases and Resources

  • NIH Roadmap has initiated the Human Microbiome Project (HMP) with the mission of generating resources enabling comprehensive characterization of the human microbiota and analysis of its role in human health and disease.
  • The primary mission of the BioHealthBase system is to assist scientific researchers in their development of vaccines, therapeutics, and diagnostics. The National Institute of Allergy and Infectious Disease (NIAID) Division of Microbiology and Infectious Diseases (DMID) recognizes the challenge posed by bioterrorism, the emergence of disease due to drug-resistant variants of etiologic organisms. DMID has envisioned a consortium of Bioinformatics Resource Centers (BRCs) for Biodefense and Emerging/Re-emerging Infectious Diseases that will provide information technology (IT) support for experimental studies of pathogenic organisms that could be used for biowarfare and bioterrorist activities, many of which also pose an ongoing threat to public health.

GeneSet Databases & Tools For Pathway or Network Analysis (see also miscellaneous bioinformatic tools - pathway modelling)

  • The NCBI BioSystems Database contains records from several source databases: KEGG, BioCyc, Reactome, the National Cancer Institute's Pathway Interaction Database, WikiPathways, and Gene Ontology (GO). The BioSystems database includes several types of records such as pathways, structural complexes, and functional sets. It may be used to interegate lists or mine information relating to individual proteins. It should be stressed the output is an ordered listing based on either percentage or frequency of proteins represented NOT significance.
    • List the components that are involved in a biological pathway
    • Find the pathways in which a given component is involved
    • Retrieve 3D structures for proteins involved in a biosystem
    • Find related biosystems that are linked to each other because they share an identical protein sequence or have another relationship
    • Input a list of GI identifiers and retrieve ranked list of biosystems or alternatively retrieve a ranked list of biosystems in which the differentially regulated genes from GEO are involved using FLink
  • is an open source pathway creation and analysis software for arrays and multiomic data. It can be run as a Java webstart program or installed as a local installation. As of July 2013 version 3.1.0 has been released
  • Graphite Web G. Sales, E. Calura, P. Martini, and C. Romualdi. "Graphite Web: web tool for gene set analysis exploiting pathway topology" Nucl. Acids Res. (1 July 2013) 41 (W1): W89-W97 doi:10.1093/nar/gkt386
  • The BioCyc Knowledge Library is a collection of Pathway/Genome Databases. Each database describes the genome and metabolic pathways of a single organism, with the exception of the MetaCyc. Biocyc is proceeding toward our plan of transitioning to a subscription model in 2017.
    • EcoCyc is a bioinformatics database that describes the genome and the biochemical machinery of E. coli.
    • MetaCyc metabolic pathway database contains pathways from over 150 different organisms see the 2009 NAR publication
    • Other Pathway/Genome Databases Curated Pathway/Genome Databases for many other organisms have been created by various other groups and are available from the list
  • Kyoto Encyclopedia of Genes and Genomes, KEGG integrates the information about genes and proteins generated by genome sequencing, functional genomics and proteomics with metabolic pathways
  • The Reactome knowledgebase relies on collaborations with research biologists to construct expert consensus views of key biological processes
    • Skypainter is a tool to determine which events (reactions and/or pathways) are statistically overrepresented in a set of genes as specified by submitted list of identifiers. In other words, given a list of genes, Skypainter can identify common events for these genes
  • Database for Annotation, Visualization and Integrated Discovery (DAVID)is a web-based tool that provides integrated solutions for the annotation and analysis of genome-scale datasets derived from high-throughput technologies such as microarray and proteomic platforms. Analysis results and graphical displays remain dynamically linked to primary data and external data repositories, thereby furnishing in-depth as well as broad-based data coverage. The functionality provided by DAVID accelerates the analysis of genome-scale datasets by facilitating the transition from data collection to biological meaning.
  • Pathway Commons is a collection of publicly available pathways from multiple organisms. It provides researchers with convenient access to a comprehensive collection of pathways from multiple sources represented in a common language. Access is via a web portal for browsing, query and download. Pathways can be searched by key word and visualised using Cytoscape
  • INOH (Integrating Network Objects with Hierarchies) is a pathway database of model organisms including human, mouse, rat and others. In INOH, the term pathway refers to higher order functional knowledge such as relationships among multiple bio-molecules that constitute signal transduction pathways or biological events in general. As most part of this knowledge resides in scientific articles, the database focuses on curating and encoding textual knowledge into a machine-processable form. The system contains a number of unique features to encode this type of knowledge. Biological terms such as protein names typically represent abstract, conceptual molecules that are used for unspecified organisms. Biologists interpret the name as a specific instance of protein using background knowledge. These abstract names are collected from the literature and are organized into an ontology to annotate abstract objects in pathways. In addition, each term has links to database such as SWISS-PROT and Gene Ontology (GO). An web based Ontology viewer and client program are available.
  • COMPADRE (COMponent Pathways Analysis and Differential expression REmoval) is an R package to analyze pathways activity indexes. It is implemented as a web - based tool running an R package in the background and is primarily used for detection of differentially expressed pathways The Compadre R-package is also available for use in R.
  • PathwayInteractionDatabaseis a highly-structured, curated collection of information about known biomolecular interactions and key cellular processes assembled into signaling pathways. It is a collaborative project between the US National Cancer Institute (NCI) and Nature Publishing Group (NPG), and is an open access online resource. There are several search options including a batch search

Protein Resources

Databases of Biological Markers & Genetic Disorders

  • For SNP databases and GWAS studies see above
  • National Cancer Institute
  • OMIM -- Online: Mendelian Inheritance in Man: database catalog of human genes and genetic disorders
  • The IGMS is a comprehensive information system that combines the knowledge from genomic sequence, genetic map and genetic disorders databases.
  • PathDB database on pathologically relevant mutated forms of transcription factors and transcription factor binding sites held at at BIOBASE
  • dbSTS is an NCBI resource that contains sequence and mapping data on short genomic landmark sequences or Sequence Tagged Sites
  • Cytokine Online Pathfinder Encyclopaedia COPE is an encyclopaedic dictionary of information relating to cytoines compiled by Horst Ibelgauft
  • Cytokines Web Provides information about cytokines and their receptors.
  • Pathogen Host Interaction Database PHI-base> This database contains curated molecular and biological information on genes with published affect on the outcome of host-pathogen interactions. Information is also given on the target sites of some anti-infective chemistries.
  • The NCRI's Oncology Information Exchange, or ONIX , enables scientists and clinicians to find cancer-related data and information from genomics studies and clinical trials. ONIX allows users to search cancer-related databases and includes information about what research is being carried out by which researchers.

Other Functional and Motif based Secondary Databases

  • small RNA database, Small RNAs are broadly defined as the RNAs not directly involved in protein synthesis. Small RNAs are usually in the 75-400 nucleotides range, although some are as long as thousand base pairs. They are synthesized by either RNA Polymerase I, II or III.
  • Ribosomal Database Project Online Analyses, The Ribosomal Database Project (RDP) provides ribosome related data services to the scientific community, including online data analysis, rRNA derived phylogenetic trees, and aligned and annotated rRNA sequences
  • SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya).
  • TransTerm - Translational Signal Database, a database of sequence contexts about the stop and start codons of many species found in GenBank. TransTerm also contains codon usage data for these same species and summary statistics for the sequences analysed.

  • Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. The Blocks database can be searched in a variety of ways
  • SMART allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. More than 500 domain families found in signalling, extracellular and chromatin-associated proteins are detectable. These domains are extensively annotated
  • InterPro is an integrated documentation resource for protein families, domains and sites. It combines a number of databases  that use different methodologies and a varying degree of biological information on well-characterised proteins to derive protein signatures.
  • TRANSFAC is a database on eukaryotic cis-acting regulatory DNA elements and trans-acting factors at BIOBASE
  • Gene Prediction, Extraction, Description and Analysis Tool PEDANT :Genome analysis and annotation Tool used by many of the automatically annotated databases. Site also lists Computational analysis of complete genomic sequences as well as Experimental and unfinished genomic sequences using PEDANT
  • EXProt (database for EXPerimentally verified Protein functions) is a new non-redundant database containing protein sequences for which the function has been experimentally verified
  • The BioModels Database is a new effort to develop a data resource that will allow biologists to store, search and retrieve published mathematical models of biological interests. The models in the BioModels Database are annotated and linked to relevant data resources, such as publications, databases of coumpounds and pathways, controlled vocabularies

PCR -Primer Design -siRNA

Resources for qPCR

  • RefFinder is a easy to use web-based tool to evaluate reference genes. It brings together the three most commonly used algorithms implemented in geNorm, Normfinder, BestKeeper, and the comparative delta delta Ct method to compare and rank candidate reference genes. Based on the rankings from each program, It assigns a weighting to an individual gene and calculates the geometric mean of their weights for the overall ranking.
  • Real-Time PCR technical resource page : useful background on real-time systems maintained by M.Tevfik Dorak
  • Reat-Time PCR technical tutoial produced by Margaret Hunt at University of South Carolina
  • GeneQuantification web pages Technische Universitat Munchen, describes and summarises all technical aspects involved in quantitative gene expression analysis using real-time qPCR & qRT-PCR, has a useful listing of interesting papers
  • Eppendorf has a web based calculator for simplifying your volume calculations Nothing that can't be done in Excel but saves on reinventing the wheel.
  • Online resorurces from Life Technologies includes application notes, web and video tutorials
  • Online resorurces from Iintegrated DNA Technologies educational webinar page

Collections of Real-Time PCR primer and probe sets

  • You can plot and compare spectra and check the spectral compatibility for many fluorophores offered by Molecular Probes. SImilarly another really good viewer is provided by Chroma Technology Corp
  • Lux Primer design and ordering service  at Invitrogen for real-time/qPCR and standard PCR applications
  • TM calculator for Designing LDR Probes , as a downloadable excel template using Nearest Neighbour (NNM) (10-40bp, Breslauer, K.J. (1986)) and standard calculation of Meinkoth, J. and Wahl, G. (1984)
  • Diagnostic SNPCheck is a bioinformatic web application for discovery/checking for Allelic variation in primer sets launched by NGRL Manchester in 2005 for batch checking of oligonucleotide primers for SNPs. It uses the latest build of the human genome from NCBI, and BLAST to identify the position in the sequence where the primers bind. The contents of the current release of dbSNP is used to identify if there are any known SNPs at the primer binding sites.
  • WWW programs for primer design :
  • Primer Banks for Real-time PCR
    • a public database holding real time PCR primers and probes for popular chemistries at University of Gent for gene expression RTPrimerDB and Metylation studies methPrimerDB
    • Real Time PCR Primer Sets mainly Sybr Green probes listed
    • PrimerBank is a public resource for PCR primers. These primers are designed for gene expression detection or quantification (real-time PCR). PrimerBank contains about 180,000 primers covering most known human and mouse genes.

Next Generation Sequencing

  • SEQanswers is a next gen seuencing MediaWiki. SEQanswers was founded to be an information resource and user-driven community focused on all aspects of next-generation genomics Google map of known sequencing facilities in the UK.
  • Illumina's Sequencing Coverage Calculator: This calculator helps with determining the Illumina reagents and sequencing runs that are needed to arrive at the desired coverage for an experiment.
  • DUGSIM is a gateway to the Duke University Genome Sequencing & Analysis Core Resource. They have a useful on-line cost estimator for NGS
  • The Human OligoGenome Resource: a database of oligonucleotide capture probes for resequencing target regions across the human genome. Newburger DE, Natsoulis G, Grimes S, Bell JM, Davis RW, Batzoglou S, Ji HP. Nucleic Acids Res. 2011. "This online database contains over 21 million capture oligonucleotide sequences and enables selection of customized resequencing assays of target regions across the human genome.
  • EBI Metagenomics service is a beta version an automated pipeline for the analysis and archiving of metagenomic data (Archieved in the SRA) that aims to provide insights into the functional and metabolic potential of a sample.
  • MLST Profiling from Genomic assemblies or reads
    • DTU MLST Server The web server at the Center for Genomic Epidemiology at the Danish Technical University, accepts both raw read files and assemblies.
    • BIGSdb web server software that can be installed on your local PC or server, so not for the faint hearted. It offers the ability to call MLST profiles from assembled genome data, as well as running upbespoke typing schemes.

NGS toolbox

  • WGSA Whole Genome Sequence Analysis. A web application for the processing, clustering and exploration of microbial genome assemblies.
  • Microreact allows you to upload, visualise and explore dendrograms (trees) linked to metadata containing geographic locations.

siRNA/RNAi technology

  • Dharmacon Research, Inc. was founded in 1995 to develop and commercialize a new technology for RNA oligonucleotide synthesis. The one of the main focuses of  interest the company currently has is in small inhibitory RNAs. Dharmacon web interface for siRNA design
  • Invitrogen's RNAi Block-iT RNAi Designer page According to Invitrogen "The designer uses a rational design scheme based on statistical analysis of multiple validated siRNA training sets and a proprietary algorithm to select unique target sequences  that have a markedly improved probability of success in silencing the target gene"
  • MWG offers licensed1 custom siRNA and pre-designed siRNA synthesis with siMAX™ technology, the also provide a free web based design tool.
  • See these companys Akceli; Inc.,Alnylam Pharmaceuticals,Ambion; Inc.,BD Biosciences CLONTECH; Dharmacon; Inc.,Imgenex Corporation; Mirus Corporation; Promega Corporation; QIAGEN N.V.; Sequitur, Inc.; Sirna Therapeutics, Inc.
  • See nucleic Acid databases for links to miRBase and Argonaut

Chemical/Drug databases  

  • SMILES Home Page, SMILES is a simple yet comprehensive chemical nomenclature. This document contains links to all sorts of information about the SMILES including tutorials and software for generating structures eg DEPICT and CORINA
  • ChemFinder Searching, Chemical Name, CAS Number, Molecular Formula or Weight.
  • ChemDex - directory of chemistry on the WWW, Sheffield Chemdex: the directory of chemistry on the WWW since 1993
  • PubChem is an open chemistry database at the National Institutes of Health (NIH), in 2018 the database had collated data for over 96 million compounds.

Miscellaneous Databases and Information resources

  • Links to Databases, From this page you can access an ever increasing number of biological and related links. The number of entries in the database currently exceeds 2500.
  • MetaBase, the database of biological databases implemented using MediaWiki
  • genetoolsis a site providing a useful listing of Bioinformatic tools see site map for listing of databases, software utilities etc
  • is an internet gateway portal designed to bring useful and interesting microbiology informational resources
  • Institute Fur Molecular Biotechnologie Jena:  entry page with a range of useful links

Glossarys of terms

  • 2can EBI Bioinformatics Educational Resources
    • an excellent searchable glossary of terms
    • tutorials
    • EBIs own listing of resources on the internet
  • NGRI talking Glossary The Talking Glossary of Genetic Terms was introduced to help people without scientific backgrounds understand the terms and concepts used in genetic research
  • American Type Culture Collection Home Page ATCC is a global nonprofit bioresource center that provides biological products, technical services
  • Cell Line Data Base Servizio Biotecnologie, Istituto Nazionale per la Ricerca sul Cancro, Italy.CLDB, the first database set up within the Interlab Project, contains detailed information on 4.850 human and animal cell lines that are available in many Italian laboratories and in some of the most important European cell banks and cell culture collections.
  • CGSC: E.coli Genetic Stock Center The CGSC Database of E.coli genetic information includes genotypes and reference information for the strains in the CGSC collection, gene names, properties, and linkage map, gene product information, and information on specific mutations.
  • Invitrogen in 2005 made available a series tools to non-for-profit organisations including Vector NTI, Vector Designer, OligoPerfect Desinger and LuxTM Designer along with other online tools such as clickable pathway maps linked to gene and protein information iPath™ contains 225 interactive maps of biological signaling and metabolic pathways.

So I want to learn bioinformatics! Advice and guides for beginners.

  • Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data.

BioInformatic & Software Resources

Genome Browsers

  • Genome MapsI. Medina,F. Salavert, R. Sanchez, A. de Maria, R. Alonso, P. Escobar, M. Bleda, and J. Dopazo. "Genome Maps, a new generation genome browser" Nucl. Acids Res. (2013) 41 (W1): W41-W46 doi:10.1093/nar/gkt530
  • NCBI Genome Workbench: display sequence data in different ways, includes ways to graphically view sequences, alignments, phylogenetic trees, and tabular views of data. It can also align private data to public databases, display your data in the context of public data, and retrieve BLAST results

Miscellaneous BioInformatcs Tools

  • MSight, created by the Proteome Informatics Group, was specifically developed for the representation of mass spectra along with data from the separation step. The software allows graphical exploration inside huge datasets.
  • biosequence conversion tool is one of the miscellaneous tools available in the EBI toolbox, converts sequence between different file formats
  • EZ-Retrieve :A web-server for batch retrieval of coordinate-specified human DNA sequences and underscoring putative transcription factor-binding sites. 
  • RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program
  • Transcription binding site search tool and Veiwer. An easy to use interface can be installed & run from your desk top.
  • The Integrative Genomics Viewer (IGV) from the Broad Institute is a fast, flexible viewer for genomic data. IGV visually integrates datasets from various platforms and sources
  • Scriptome project tool box aims to provide experimental biologists with tools for exploring and manipulating biological data. The site provides perl scrpits which can be copied & pasted in an x-windows session to help bench biologists to "eyeball", filter, format, and analyze the many large files they get from those and other programs.
  • In silico experiments with complete bacterial genomes: includes in-silico PCR, Amplified Fragment Length Polymorphism PCR, restriction digestion and virtual Pulsed Field Gel Electrophoresis in 1.2% agarose on the products as well as other restriction enzyme based tools on 148 genomes
  • A new database of bioinformatics tools and databases was launched (2010) containing links to 1900 web based bioinformatics and software tools that are freely available to researchers
  • Tools, Databases: at IMB
  • Protein and DNA Motif searching
    • Gibbs Sampler The Gibbs Motif Sampler will allow you to identify motifs, conserved regions, in DNA or protein sequences. see Thompson W, Rouchka EC, and Lawrence CE. (2003) Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res. 31(13):3580-3585.
  • Transcriptional Regulatory Element Database (TRED) has been built in response to increasing needs of an integrated repository for both cis- and trans- regulatory elements in mammals
  • Pairwise or Multiple sequence alignment
    • MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, inferring ancestral sequences, and testing evolutionary hypotheses.
    • ThermonucleotideBLAST is a downloadable source code for searching a target database of nucleic acid sequences using an assay specific query. ThermonucleotideBLAST queries are based on biochemical assays (i.e. a pair of oligonucleotide sequences representing PCR primers or Padlock probes, a triplet of oligos representing PCR primers and a TaqMan probe or a single oligo representing a hybridization probe). Unlike existing programs (i.e. BLAST) which use heuristic measures of sequence similarity for identifying matches between a query and target sequence, ThermonucleotideBLAST uses physically relevant measures of sequence similarity -- free energy and melting temperature.
    • ClustalW at Baylor College of Medicine US, ClustalW is also available from the EBI sequence ToolBox tab and elsewhere
    • webPRANK is an easy-to-use web interface to the PRANK alignment algorithm. It supports all the main features of the command-line program PRANK and includes a powerful alignment browser with features similar to those found in the graphical interface PRANKSTER. webPRANK can also upload and display pre-run and saved PRANK alignments.
    • BCM Search Launcher: Multiple Sequence Alignments.
    • MultAlin : Multiple sequence alignment with hierarchical clustering at INRA France
    • DIALIGN is a novel program for multiple alignment developed by Burkhard Morgenstern et al.and constructs pairwise and multiple alignments by comparing whole segments of the sequences. This approach is very efficient where sequences are not globally related but share only local similarities, as is the case with genomic DNA and with many protein families.
    • MGA allows a direct comparison of the genomic DNA sequences of sufficiently similar organisms by the use of anchored segments.An alignment of 74% of the complete genomes of three of strains of E.coli (lengths: 5,528,445; 5,498,450; 4,639,221 bp.) is produced in 30 minutes. Must been run locally
    • VMatch software tool for efficiently solving large scale sequence matching tasks. It uses string comparisons and masking to make multiple alignments to reveal unique sequences across many genomes and is very fast. Unfortunately Vmatch must also be run locally.
  • Biological Pathway modelling
    • BioMiner-modeling, analyzing, and visualizing biochemical pathways and networks. Bioinformatics 2002
    • PathFinder: reconstruction and dynamic visualization of metabolic pathways. Bioinformatics 2002
    • WEGO (Web Gene Ontology Annotation Plot) is a useful tool for plotting GO annotation results for gene lists. see Ye J, Fang L, et al. Nucleic. Acids Res., 2006, 34, 293-297 or as a PDF
    • BioCyc have a pathway viewer in which data can be ovelayed on genome pathways of interest: use it through the Web or by installing our locally. To use it through the Web, go to this page Omics Viewer . It includes documentation on file formats, etc. Biocyc is proceeding in early 2017 toward our plan of transitioning to a subscription model free access will no longer be avaialable from that time.

Statistical Tools

  • see Oliveros, J.C. (2007) VENNY. VENNY is an easy to use interactive tool for comparing lists with Venn Diagrams.

Programming Languages



  • HTML 4.0 Specification
  • At W3Schools you will find all the Web-building tutorials you need,
    from basic HTML and XHTML to advanced XML, XSL, Multimedia and WAP.
  • Batfiles: The DOS batch file programming handbook & tutorial ©


Links pages

on-line journals/periodicals 

Journal Collections & Bibliographic databases (PubMed/BIDS)


  • The ISI Web of Knowledge **Service for UK Education provides a single route to all the Thomson Scientific products subscribed to by your institution.
  • Science Direct, SGUL has a subscription, however, ScienceDirect also has 60 plus free/complementary Journals listed
  • Scirus is the most comprehensive science-specific search engine available on the Internet, linking to more than 167 million indexed scientific pages and documents. With Scirus you can search through a variety of sources, such as Medline, ScienceDirect, BioMedCentral, preprint servers, patents and web sites relevant for your research.
  • Google Scholar is search engine able to carry out "deep searches" and is extremely good at specifically finding  literature, including peer-reviewed papers, theses, books, preprints, abstracts and technical reports from all broad areas of research.
  • SciGlobe Literature Search Engine run by SABiosciences specifically for the biomedical literature that uses the power of Conceptual Relationship Searching, a cutting-edge linguistic technology, to quickly find and deliver the most relevant search results.
  • PubMed
  • PubMed Central is a searchable digital archive of life sciences journal literature at the U.S. National Institutes of Health (NIH)
  • An excellent  extensive listing of free on-line medical journals
  • St George's Library pages & SGUL Journals. From May 15th 2006 there will be a new E-Journal only A-Z web list for SGUL users available here:
  • Wadsworth-Tuberculosis
  • Journal Impact Factors for 2002-2004 or this link includes 2005 as an excel file
  • On-line Science Educational Video and Webinar pages

    On-line medical/educational references (Please remember some care should be taken to cross validate all internet sources.)

    This is a custom search engine that will simultaneously search eMedicine, Department of Health, eMC medicines compendium, MedlinePlus and Radio 4 science directory and will open the results in a new window

    • Search engines
      • PubMed this site includes PubMed Books which is searched by selecting Books in the drop down next to the search text field
      • Scirus is the most comprehensive science-specific search engine available on the Internet, linking to more than 167 million indexed scientific pages and documents. With Scirus you can search through a variety of sources, such as Medline, ScienceDirect, BioMedCentral, preprint servers, patents and web sites relevant for your research.
      • Google Schloar is a new beta search engine able to carry out "deep searches" and is extremely good at specifically finding literature, including peer-reviewed papers, theses, books, preprints, abstracts and technical reports from all broad areas of research.

    • National Library of Medicine The Library collects materials in all areas of biomedicine and health care, as well as works on biomedical aspects of technology, the humanities, and the physical, life, and social sciences. It includes searchable information resources for the layman such as MedLinePlus and health professional.
    • MedlinePlus is one of the services provided to the US National Library of Medicine and the NIH. It provides a searchable links to medical resources in US, the dictionary is particularly useful for clarifying terminology and provides cross indexing to the equivalent UK spelling and terminology.
    • National Electronic Library of Health Programme is working with NHS Libraries to develop a digital library for NHS staff, patients and the public, it provides a portal for health issues and information. Athens password required for some resources
    • National Institute for Health and Clinical Excellence.
    • The General Medical Council
    • Welcome to the Department of Health Providing health and social care policy, guidance and publications.
      • Almost all current and many old DH publications, including statistical reports, surveys, press releases, circulars and legislation, are available in electronic form in this section. To find what you're looking for, type search terms into the publications library (by following the link to the right), or browse by category with the links below.
    • eMedicine has good searchable general resources for learning
    • Medcyclopaedia™The Encyclopaedia of Medical Imaging's eight book volumes: Physics, Techniques and Procedures, Normal Anatomy, Musculoskeletal and Soft Tissue Imaging, Gastrointestinal and Urogenital Imaging, Chest and Cardiovascular Imaging, Neuroradiology and Head and Neck Imaging, and Paediatric Imaging. Access is free (copy text and images for non-commercial use provided that you refer to the source)
    • Is a Radiological resoucre consisting of a huge collection of images including x-ray, MRI and CT along with a wealth of other material.
    • Wikipaedia good genreal Information Web-based, free-content encyclopedia. Please remember some care should be taken to cross validate internet sources.
    • GeneTests web pages includes entry on CF and many other genetic diseases
    • OMIM -- Online: Mendelian Inheritance in Man: database catalog of human genes and genetic disorders
    • ICD-10 WHO disease classification (ICD Version 2006 Searchable Online version)
    • was first launched in 1997 by PiP (Patient information Publications) as a partnership between two GPs in Tyne and Wear, although now a joint venture between PiP and EMIS the ethos has not changed and this site aims to bring high quality medical information to the Patients and Doctors.
    • CKS NHS Clinical Knowledge Summaries (formerly PRODIGY) are a reliable source of evidence-based information and practical 'know how' about the common conditions managed in primary care. They are aimed at healthcare professionals working in primary and first-contact care.
    • The Cochrane Library is a collection of databases that contain high-quality, independent evidence to inform healthcare decision-making. Cochrane reviews represent the highest level of evidence on which to base clinical treatment decisions. In addition to Cochrane reviews, The Cochrane Library provides other sources of reliable information, from other systematic review abstracts, technology assessments, economic evaluations and individual clinical trials – all the current evidence in one single environment.
    • Map of Medicine ceased to be available London wide but can be accessed directly Direct Access to Map of Medicine. It contains Evidence-based patient care pathways (decision trees). Click on the nodes, many have notes on definitions incidence/prevalence, aetiology and contain crosslink to other databases
    • Best Practice produced by the BMJ, a single source combining the latest research evidence, guidelines and expert opinion – presented in a step-by-step approach, covering prevention, diagnosis, treatment and prognosis.
    • PathCAL is a set of student tutorials on clinical biology, pathology etc, aimed at those learning the basics of disease, it is a particularly good resoruce for BCS revision. PathCal can be accessed both through IP address and institutional log on.
    • Claire Aland, Senior Lecturer in Anatomy, has put together a page in Moodle for anatomy resources called "Human Anatomy Study Resource Centre"

    Library text book and journal resources for Medicine

    Medical Educational Video and Audio

    • The Clinical Skills Online (CSO) is a St George's project aimed at providing online videos demonstrating core clinical skills common to a wide range of medical and health-based courses
    • The BBC podcast home page for health and well being. Past radio programs such as Radio 4 Case Notes are only available as a pod cast (MP3 file) for a limited time but are available by streaming to an internet capable device or computer.
    • MedPod101 free case base medical podcasts. The "Flowcharts" page has an interesting selection of diagrams and flow charts covering Neurology, Endocrinology Cardiology etc
    • PodMedics produce a series of pod casts on various aspects subscribe for free via iTunes or register through their web site unfortunately this web content is no longer free but has a nominal cost. However, this comes with extra, utiliy provided on the site which allows simultaneous note taking and storage. As well as upload and download facility
    • University of Aberdeen MediCall Resources series of pod casts on various aspects subscribe for free from iTunes or download via their web site, other MedCal resources can be found at MedCal
    • Award wining video on Anaemia produced by a University of Aberdeen Student
    • PulseToday is a weekly medical publication circulated to GP's and other HealthCare workers. The web site requires free regitration but has some interesting articles and resources. Pulse launched a series of educational clinical skills videos in 2010, giving a practical demonstration of some procedures within primary care. The CPD portal has been extended with a range of new or updated modules (only a limited range of these are free).

    Medical Exam Resources

    • "Meducation is a community of over 20,000 medical professionals and students". Sharing information, resources and ideas. This web site has both a range of free and subscription only exam resources including EMQ, SBAs and MCQs. One of the attractive aspects of this is that cut price subscription is available for short periods for example a months duration.
    • OnExamination is Part of the BMJ gruop providing educational and learning resources for the medical profession this is a subcription resource so not free. OnExamination provides examination resources at all levels of the medical curriculum including both pre-clinical and clinical years.
    • "PassMedicine This is the third of the most commonly used exam resources. it is subdivied into different components "Applied Knowledge Test", "Medical student finals" both based on SBAs, "Foundation Program SJT" but is not targeted by year.