The GEM library
|28/10/2012||Second official binary pre-release (download it)|
|28/10/2012||The GEM mapper paper is out|
|19/01/2012||The GEM mappability paper is out|
|19/04/2010||Added officially supported gem-2-sam|
|10/02/2010||Added officially supported gem-mappability|
|23/06/2009||First official binary pre-release|
Next-generation sequencing platforms (Illumina/Solexa, ABI/SOLiD, etc.) call for powerful and very optimized tools to index/analyze huge genomes. The GEM library strives to be a true "next-generation" tool for handling any kind of sequence data, offering state-of-the-art algorithms and data structures specifically tailored to this demanding task. At the moment, efficient indexing and searching algorithms based on the Burrows-Wheeler transform (BWT) have been implemented. The library core is written in C for maximum speed, with concise interfaces to higher-level programming languages like OCaml and Python. Many high-performance standalone programs (mapper, splice mapper, etc.) are provided along with the library; in general, new algorithms and tools can be easily implemented on the top of it.
The GEM project started at the CRG in June 2008. Since fall 2008, early versions of the GEM tools have been in everyday use to help the development of many different scientific projects involving mapping of DNA/RNA data, reconstruction of RNA splice-form abundances, SNP calling, microRNA analysis, ChIP-seq experiments, metagenomics studies, and other tasks related to next-generation sequencing. Since April 2010, the GEM project is being developed at the CNAG by the Algorithm Development unit.
The GEM system is composed of several parts:
- The core libraries (written in C and Objective Caml)
- The header files, providing bindings to the library through C and Objective Caml (an experimental Python interface also exists)
- Standalone executables, the main ones so far being:
- gem-retriever: given a GEM index, allows to retrieve its content starting from a specified position
- gem-mappability-retriever: given a GEM mappability file, allows to retrieve its content starting from a specified position
- gem-map-2-map, a program to pipeline GEM mappers and post-process files in GEM alignment format
- the GEMTools, a C library to process files in GEM alignment format (under development)
- Converters to foreign formats:
- Additional information (pre-generated indices, pre-generated mappability tracks, etc.).
All these components are modular, so you can install them either all together as a bundle or a few at the time.
Temporary note (updated 27/10/2012)
Please note that not all the components are available at the moment. In particular:
Other GEM-friendly tools
If you are happy with GEM, you might also like some friend projects:
- MIRO, a pipeline to analyze microRNAs using next-generation sequencing data
- The Flux Capacitor, a set of tools to predict the abundance of splice-forms from next-generation sequencing data.
These projects provide full integration for the gem-mapper to be used as the engine of their mapping stage.
At the moment, only a pre-compiled binary pre-release is available.
Please read the installation instructions carefully (making sure that your architecture is supported and that you select the most optimized bundle available for it), and then proceed to the download page.
Most GEM programs are distributed under a double-licensing scheme: they are free for non-commercial use, but a license is required for commercial applications. In practice, you are very welcome to freely use or redistribute GEM for any purpose, except for the following limitation: you have to ask us for a commercial license if you want to
- build a commercial software (data analysis framework, pipeline, etc.) on the top of GEM (using GEM either in binary or source form)
- execute GEM programs or code from inside any commercial software (data analysis framework, pipeline, etc.).
All other cases (using GEM either as a standalone software or embedded into non-commercial applications, and redistributing GEM for free either as a standalone software or embedded into non-commercial applications) do not require a special license from us. In particular:
- GEM is free for academic non-commercial use
- you can always use the results you obtained with GEM for any purpose, even a commercial one.
At the moment we are not yet ready to distribute the sources of GEM (a major code cleanup is ongoing), but once we are we will do so under a similar double licensing scheme (a GPL-like license for non-commercial use, and a personalized licensing scheme otherwise).
Several people have contributed code to GEM along the years. They are, with their respective funding institutions:
- Paolo Ribeca (CRG, 2008—2010 & CNAG, 2010—2012)
- Santiago Marco Sola (CNAG, 2010—2012)
- Leonor Frias Moya (CNAG, 2010—2012).
If you are interested in being kept informed about the development of GEM, please subscribe to our announcement group.
Finally, in case of any doubt or problem please feel free to directly contact us.