The GEM library

From Algorithm Development Wiki
Revision as of 04:47, 8 November 2012 by Paolo Ribeca (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
(Also home to: The GEM mapper, The GEM splice mapper, and others)



28/10/2012 Second official binary pre-release (download it)
28/10/2012 The GEM mapper paper is out
19/01/2012 The GEM mappability paper is out
19/04/2010 Added officially supported gem-2-sam
10/02/2010 Added officially supported gem-mappability
23/06/2009 First official binary pre-release

Next-generation sequencing platforms (Illumina/Solexa, ABI/SOLiD, etc.) call for powerful and very optimized tools to index/analyze huge genomes. The GEM library strives to be a true "next-generation" tool for handling any kind of sequence data, offering state-of-the-art algorithms and data structures specifically tailored to this demanding task. At the moment, efficient indexing and searching algorithms based on the Burrows-Wheeler transform (BWT) have been implemented. The library core is written in C for maximum speed, with concise interfaces to higher-level programming languages like OCaml and Python. Many high-performance standalone programs (mapper, splice mapper, etc.) are provided along with the library; in general, new algorithms and tools can be easily implemented on the top of it.

The GEM project started at the CRG in June 2008. Since fall 2008, early versions of the GEM tools have been in everyday use to help the development of many different scientific projects involving mapping of DNA/RNA data, reconstruction of RNA splice-form abundances, SNP calling, microRNA analysis, ChIP-seq experiments, metagenomics studies, and other tasks related to next-generation sequencing. Since April 2010, the GEM project is being developed at the CNAG by the Algorithm Development unit.



The GEM system is composed of several parts:

All these components are modular, so you can install them either all together as a bundle or a few at the time.

Temporary note (updated 27/10/2012)

Please note that not all the components are available at the moment. In particular:

  • the subpackages which cannot be distributed as binaries will be uploaded shortly, when the currently ongoing code review is completed and the GEM source code becomes ready for distribution
  • the port to Mac is currently broken. We are working to resurrect it as soon as possible
  • the split mapper and the mappability tools are still in the process of being migrated to the new mapping engine. The mappability will come back online shortly.

Other GEM-friendly tools

If you are happy with GEM, you might also like some friend projects:

  • MIRO, a pipeline to analyze microRNAs using next-generation sequencing data
  • The Flux Capacitor, a set of tools to predict the abundance of splice-forms from next-generation sequencing data.

These projects provide full integration for the gem-mapper to be used as the engine of their mapping stage.

Getting started


At the moment, only a pre-compiled binary pre-release is available.

Please read the installation instructions carefully (making sure that your architecture is supported and that you select the most optimized bundle available for it), and then proceed to the download page.


The information needed to use the GEM tools can be obtained from both the technical documents (user's guide, man pages, etc.) and the scientific ones (pre-prints, published articles, etc.).

Terms of use

Most GEM programs are distributed under a double-licensing scheme: they are free for non-commercial use, but a license is required for commercial applications. In practice, you are very welcome to freely use or redistribute GEM for any purpose, except for the following limitation: you have to ask us for a commercial license if you want to

  • build a commercial software (data analysis framework, pipeline, etc.) on the top of GEM (using GEM either in binary or source form)
  • execute GEM programs or code from inside any commercial software (data analysis framework, pipeline, etc.).

All other cases (using GEM either as a standalone software or embedded into non-commercial applications, and redistributing GEM for free either as a standalone software or embedded into non-commercial applications) do not require a special license from us. In particular:

  • GEM is free for academic non-commercial use
  • you can always use the results you obtained with GEM for any purpose, even a commercial one.

For more details about the free use of pre-compiled binaries, you can read the GEM non-commercial binary license. In case, of doubt, contact us.

At the moment we are not yet ready to distribute the sources of GEM (a major code cleanup is ongoing), but once we are we will do so under a similar double licensing scheme (a GPL-like license for non-commercial use, and a personalized licensing scheme otherwise).


Several people have contributed code to GEM along the years. They are, with their respective funding institutions:


We value very much your input about our tools. You can report a bug or suggest a new feature. Or else you can visit our discussion group and create/join a discussion thread.

If you are interested in being kept informed about the development of GEM, please subscribe to our announcement group.

Finally, in case of any doubt or problem please feel free to directly contact us.

Personal tools