The GEM library

From Algorithm Development Wiki
Jump to: navigation, search
(Also home to: The GEM mapper, The GEM RNA mapper, The GEM mappability, and others)


05/04/2013 Third binary pre-release (download it, change log)
28/10/2012 Second binary pre-release (download it)
28/10/2012 The GEM mapper paper is out
19/01/2012 The GEM mappability paper is out
19/04/2010 Added officially supported gem-2-sam
10/02/2010 Added gem-mappability
23/06/2009 First binary pre-release


Next-generation sequencing platforms (Illumina/Solexa, ABI/SOLiD, etc.) call for powerful and very optimized tools to index/analyze huge genomes. The GEM library strives to be a true "next-generation" tool for handling any kind of sequence data, offering state-of-the-art algorithms and data structures specifically tailored to this demanding task. At the moment, efficient indexing and searching algorithms based on the Burrows-Wheeler transform (BWT) have been implemented. The library core is written in C for maximum speed, with concise interfaces to higher-level programming languages like OCaml and Python. Many high-performance standalone programs (mapper, splice mapper, etc.) are provided along with the library; in general, new algorithms and tools can be easily implemented on the top of it.

The GEM project started at the CRG in June 2008. Since fall 2008, early versions of the GEM tools have been in everyday use to help the development of many different scientific projects involving mapping of DNA/RNA data, reconstruction of RNA splice-form abundances, SNP calling, microRNA analysis, ChIP-seq experiments, metagenomics studies, and other tasks related to next-generation sequencing. Since April 2010, the GEM project is being developed at the CNAG by the Algorithm Development unit.



Temporary note (updated 05/04/2013)

Please note that not all the components are available at the moment. In particular:

  • the subpackages which cannot be distributed as binaries will be uploaded shortly, when the currently ongoing code review is completed and the GEM source code becomes ready for distribution
  • the port to Mac is currently broken. We are working to resurrect it as soon as possible.

The GEM system is composed of several parts:

All these components are modular, so you can install them either all together as a bundle or a few at the time.

As a bonus, the GEM distributions also contain a few external tools originating from friend projects:

  • gemtools from the GEMTools library (see the GEMTools library website), a powerful set of high-level pipelines which greatly simplifies the use of the GEM programs. Using gemtools one can index references and/or map several kinds of data from a simple command-line interface, without having to type complicated commands. In particular, gemtools contains a fast and accurate pipeline for mapping RNA-sequencing data.

Getting started

Quick start



At the moment, only a pre-compiled binary pre-release is available.

Please read the installation instructions carefully (making sure that your architecture is supported and that you select the most optimized bundle available for it), and then proceed to the download page.


The information needed to use the GEM tools can be obtained from both the technical documents (user's guide, man pages, etc.) and the scientific ones (pre-prints, published articles, etc.).

Terms of use

Most GEM programs are distributed under a double-licensing scheme: they are free for non-commercial use, but a license is required for commercial applications. In practice, you are very welcome to freely use or redistribute GEM for any purpose, except for the following limitation: you have to ask us for a commercial license if you want to

  • build a commercial software (data analysis framework, pipeline, etc.) on the top of GEM (using GEM either in binary or source form)
  • execute GEM programs or code from inside any commercial software (data analysis framework, pipeline, etc.).

All other cases (using GEM either as a standalone software or embedded into non-commercial applications, and redistributing GEM for free either as a standalone software or embedded into non-commercial applications) do not require a special license from us. In particular:

  • GEM is free for academic non-commercial use
  • you can always use the results you obtained with GEM for any purpose, even a commercial one.

For more details about the free use of pre-compiled binaries, you can read the GEM non-commercial binary license. In case, of doubt, contact us.

At the moment we are not yet ready to distribute the sources of GEM (a major code cleanup is ongoing), but once we are we will do so under a similar double licensing scheme (a GPL-like license for non-commercial use, and a personalized licensing scheme otherwise).


Several people have contributed code to GEM along the years. They are, with their respective funding institutions:


We value very much your input about our tools. You can report a bug or suggest a new feature. Or else you can visit our discussion group and create/join a discussion thread.

If you are interested in being kept informed about the development of GEM, please subscribe to our announcement group.

Finally, in case of any doubt or problem please feel free to directly contact us.

Personal tools