The GEM library
| News | |
| | |
| 05/04/2013 | Third binary pre-release (download it, change log) |
| 28/10/2012 | Second binary pre-release (download it) |
| 28/10/2012 | The GEM mapper paper is out |
| 19/01/2012 | The GEM mappability paper is out |
| 19/04/2010 | Added officially supported gem-2-sam |
| 10/02/2010 | Added gem-mappability |
| 23/06/2009 | First binary pre-release |
Next-generation sequencing platforms (Illumina/Solexa, ABI/SOLiD, etc.) call for powerful and very optimized tools to index/analyze huge genomes. The GEM library strives to be a true "next-generation" tool for handling any kind of sequence data, offering state-of-the-art algorithms and data structures specifically tailored to this demanding task. At the moment, efficient indexing and searching algorithms based on the Burrows-Wheeler transform (BWT) have been implemented. The library core is written in C for maximum speed, with concise interfaces to higher-level programming languages like OCaml and Python. Many high-performance standalone programs (mapper, splice mapper, etc.) are provided along with the library; in general, new algorithms and tools can be easily implemented on the top of it.
The GEM project started at the CRG in June 2008. Since fall 2008, early versions of the GEM tools have been in everyday use to help the development of many different scientific projects involving mapping of DNA/RNA data, reconstruction of RNA splice-form abundances, SNP calling, microRNA analysis, ChIP-seq experiments, metagenomics studies, and other tasks related to next-generation sequencing. Since April 2010, the GEM project is being developed at the CNAG by the Algorithm Development unit.
Contents |
Overview
| Temporary note (updated 05/04/2013)
Please note that not all the components are available at the moment. In particular:
|
The GEM system is composed of several parts:
- The core libraries (written in C and Objective Caml)
- The header files, providing bindings to the library through C and Objective Caml (an experimental Python interface also exists)
- Standalone executables, the main ones so far being:
- gem-indexer, to create a GEM index out of a FASTA file
- gem-mapper, the GEM paired-end mapper
- gem-rna-mapper
- gem-mappability, a program to compute the mappability/alignability of a reference
- Retrievers/Tools:
- gem-retriever: given a GEM index, allows to retrieve its content starting from a specified position
- gem-mappability-retriever: given a GEM mappability file, allows to retrieve its content starting from a specified position
- gem-2-gem, a program to pipeline GEM mappers and post-process files in GEM alignment format
- Converters to foreign formats:
- Additional resources (pre-generated indices, pre-generated mappability tracks, etc.).
All these components are modular, so you can install them either all together as a bundle or a few at the time.
As a bonus, the GEM distributions also contain a few external tools originating from friend projects:
- gemtools from the GEMTools library (see the GEMTools library website), a powerful set of high-level pipelines which greatly simplifies the use of the GEM programs. Using gemtools one can index references and/or map several kinds of data from a simple command-line interface, without having to type complicated commands. In particular, gemtools contains a fast and accurate pipeline for mapping RNA-sequencing data.
Getting started
| Quick start |
| |
Installing
At the moment, only a pre-compiled binary pre-release is available.
Please read the installation instructions carefully (making sure that your architecture is supported and that you select the most optimized bundle available for it), and then proceed to the download page.
Documentation
The information needed to use the GEM tools can be obtained from both the technical documents (user's guide, man pages, etc.) and the scientific ones (pre-prints, published articles, etc.).
Terms of use
Most GEM programs are distributed under a double-licensing scheme: they are free for non-commercial use, but a license is required for commercial applications. In practice, you are very welcome to freely use or redistribute GEM for any purpose, except for the following limitation: you have to ask us for a commercial license if you want to
- build a commercial software (data analysis framework, pipeline, etc.) on the top of GEM (using GEM either in binary or source form)
- execute GEM programs or code from inside any commercial software (data analysis framework, pipeline, etc.).
All other cases (using GEM either as a standalone software or embedded into non-commercial applications, and redistributing GEM for free either as a standalone software or embedded into non-commercial applications) do not require a special license from us. In particular:
- GEM is free for academic non-commercial use
- you can always use the results you obtained with GEM for any purpose, even a commercial one.
For more details about the free use of pre-compiled binaries, you can read the GEM non-commercial binary license. In case, of doubt, contact us.
At the moment we are not yet ready to distribute the sources of GEM (a major code cleanup is ongoing), but once we are we will do so under a similar double licensing scheme (a GPL-like license for non-commercial use, and a personalized licensing scheme otherwise).
Authors
Several people have contributed code to GEM along the years. They are, with their respective funding institutions:
- Paolo Ribeca (CRG, 2008—2010 & CNAG, 2010—2013)
- Santiago Marco Sola (CNAG, 2010—2013)
- Leonor Frias Moya (CNAG, 2010—2013)
- Thasso Griebel (CNAG, 2012—2013)
Contact
We value very much your input about our tools. You can report a bug or suggest a new feature. Or else you can visit our discussion group and create/join a discussion thread.
If you are interested in being kept informed about the development of GEM, please subscribe to our announcement group.
Finally, in case of any doubt or problem please feel free to directly contact us.
