MUMmer: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 23:13, 21 January 2024

MUMmer is a bioinformatics software system for sequence alignment. It is based on the suffix tree data structure. It has been used for comparing different genomes assemblies to one another, which allows scientists to determine how a genome has changed. The acronym "MUMmer" comes from "Maximal Unique Matches", or MUMs.

The original algorithms in the MUMMER software package were designed by Art Delcher, Simon Kasif and Steven Salzberg. Mummer was the first whole genome comparison system developed in Bioinformatics. It was originally applied to the comparison of two related strains of bacteria.

The MUMmer software is open source. The system is maintained primarily by Steven Salzberg and Arthur Delcher at Center for Computational Biology at Johns Hopkins University.

MUMmer is a highly cited bioinformatics system in the scientific literature. According to Google Scholar, as of early 2013 the original MUMmer paper (Delcher et al., 1999)^[1] has been cited 691 times; the MUMmer 2 paper (Delcher et al., 2002)^[2] has been cited 455 times; and the MUMmer 3.0 article (Kurtz et al., 2004)^[3] has been cited 903 times.

Übersicht

Mummer is a fast algorithm used for the rapid alignment of entire genomes. The MUMmer algorithm is relatively new and has 4 versions.

Versions of MUMmers

MUMmer1

MUMmer1 or just MUMmer consists of three parts, the first part consists of the creation of suffix trees (to get MUMs), the second part in the longest increasing subsequence or longest common subsequences (to order MUMs), lastly any alignment to close gaps.

Interruptions between MUMs-alignment, are known as gaps. Otherther alignment algorithms fill these gaps. The gaps fall in the following four classes:^[4]

An SNPinterruption – when comparing two sequences, one character will differ.
An insertion – when comparing two sequences, there is a subsequence in only appears in one of the sequences. It would be an empty gap in the other sequence at the moment of comparison of the two sequences.
A highly polymorphic region – when comparing two sequences, there can be found a subsequence in which every single character differs.
A repeat – it’s the repetition of a sequence. Since MUMs can only take unique sequences, that gap can be one repetition of one of the MUMs.

MUMmer 2

This algorithm was redesigned to require less memory and increase speed and accuracy. It also allows for bigger genomes alignment.

The improvement was the amount stored in the suffix trees by employing the one created by Kurtz.

MUMmer 3

According to Stefan Kurtz and his teammates, “the most significant technical improvement in MUMmer 3.0, is a complete rewrite of the suffix-tree code, based on the compact suffix- tree representation of” ^[5] the tree described in the article “Reducing the space requirement of suffix trees”.^[6]

MUMmer 4

According to Guillaume and his team, there are some extra improvements in the implementation and also innovation with Query parallelism. “MUMmer4 now includes options to save and load the suffix array for a given reference."^[7] This allows the suffix tree can be built once and constructed again after running it from the saved suffix tree.

Software - Open Source

MUMmer has open-source software and can be accessed online.

Related Sequence Alignments

There are other types of sequence alignments:

Edit distance
BLAST
Bowtie
BWA
Blat
Mauve
LASTZ
BLAST

References

^ Delcher, A. L.; Kasif, S.; Fleischmann, R. D.; Peterson, J.; White, O.; Salzberg, S. L. (1999). "Alignment of whole genomes". Nucleic Acids Research. 27 (11): 2369–2376. doi:10.1093/nar/27.11.2369. PMC 148804. PMID 10325427.
^ Delcher, A. L.; Phillippy, A.; Carlton, J.; Salzberg, S. L. (2002). "Fast algorithms for large-scale genome alignment and comparison". Nucleic Acids Research. 30 (11): 2478–2483. doi:10.1093/nar/30.11.2478. PMC 117189. PMID 12034836.
^ Delcher, A.; Harmon, D.; Kasif, S.; White, O.; Salzberg, S. (1999). "Improved microbial gene identification with GLIMMER". Nucleic Acids Research. 27 (23): 4636–4641. doi:10.1093/nar/27.23.4636. PMC 148753. PMID 10556321.
^ Delcher, A.; Kasif, S.; Fleischmann, R.; Peterson, J.; White, O.; Salzberg, S. (1999). "Alignment of Whole Genomes". Nucleic Acids Research. 27 (11): 2369–2376. doi:10.1093/nar/27.23.4636. PMC 148804. PMID 10325427.
^ Kurtz, S.; Phillippy, A.; Delcher, A.; Smoot, M.; Shumway, M.; Antonescu, C.; Salzberg, S. (2004). "Versatile and open software for comparing large genomes" (PDF). Genome Biology. 5 (2): R12. doi:10.1186/gb-2004-5-2-r12. PMC 395750. PMID 14759262. Archived (PDF) from the original on 2019-07-11. Retrieved 2021-05-06.
^ Kurtz, S. (1999). "Reducing the Space Requirement of Suffix Trees". Software: Practice and Experience. 29 (13): 1149–1171. doi:10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O. Archived from the original on 2021-05-06. Retrieved 2021-05-06.
^ Marçais, Guillaume.; Pillippy, A.; Delcher, A.; Coston, R.; Salzberg, S.; Zimin, A. (2018). "MUMmer4: A fast and versatile genome alignment system". PLOS Computational Biology. 14 (1): e1005944. Bibcode:2018PLSCB..14E5944M. doi:10.1371/journal.pcbi.1005944. PMC 5802927. PMID 29373581.

External links

[Delcher-1] Delcher, A. L.; Kasif, S.; Fleischmann, R. D.; Peterson, J.; White, O.; Salzberg, S. L. (1999). "Alignment of whole genomes". Nucleic Acids Research. 27 (11): 2369–2376. doi:10.1093/nar/27.11.2369. PMC 148804. PMID 10325427.

[Delcher2002-2] Delcher, A. L.; Phillippy, A.; Carlton, J.; Salzberg, S. L. (2002). "Fast algorithms for large-scale genome alignment and comparison". Nucleic Acids Research. 30 (11): 2478–2483. doi:10.1093/nar/30.11.2478. PMC 117189. PMID 12034836.

[Kurtz-3] Delcher, A.; Harmon, D.; Kasif, S.; White, O.; Salzberg, S. (1999). "Improved microbial gene identification with GLIMMER". Nucleic Acids Research. 27 (23): 4636–4641. doi:10.1093/nar/27.23.4636. PMC 148753. PMID 10556321.

[Arthur-4] Delcher, A.; Kasif, S.; Fleischmann, R.; Peterson, J.; White, O.; Salzberg, S. (1999). "Alignment of Whole Genomes". Nucleic Acids Research. 27 (11): 2369–2376. doi:10.1093/nar/27.23.4636. PMC 148804. PMID 10325427.

[Stefan-5] Kurtz, S.; Phillippy, A.; Delcher, A.; Smoot, M.; Shumway, M.; Antonescu, C.; Salzberg, S. (2004). "Versatile and open software for comparing large genomes" (PDF). Genome Biology. 5 (2): R12. doi:10.1186/gb-2004-5-2-r12. PMC 395750. PMID 14759262. Archived (PDF) from the original on 2019-07-11. Retrieved 2021-05-06.

[6] Kurtz, S. (1999). "Reducing the Space Requirement of Suffix Trees". Software: Practice and Experience. 29 (13): 1149–1171. doi:10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O. Archived from the original on 2021-05-06. Retrieved 2021-05-06.

[7] Marçais, Guillaume.; Pillippy, A.; Delcher, A.; Coston, R.; Salzberg, S.; Zimin, A. (2018). "MUMmer4: A fast and versatile genome alignment system". PLOS Computational Biology. 14 (1): e1005944. Bibcode:2018PLSCB..14E5944M. doi:10.1371/journal.pcbi.1005944. PMC 5802927. PMID 29373581.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

@@ Line 1: / Line 1: @@
-'''MUMmer''' is a [[bioinformatics]] software system for [[sequence alignment]]. It is based on the [[suffix tree]] data structure and is one of the fastest and most efficient systems available for this task, enabling it to be applied to very long sequences.  It has been widely used for comparing different genomes to one another.  The acronym "MUMmer" comes from "Maximal Unique Matches", or MUMs.
+'''MUMmer''' is a [[bioinformatics]] software system for [[sequence alignment]]. It is based on the [[suffix tree]] data structure. It has been  used for comparing different genomes assemblies to one another, which allows scientists to determine how a genome has changed. The acronym "MUMmer" comes from "Maximal Unique Matches", or MUMs.
-The MUMmer software is open source and can be found at the MUMmer home page.  The home page also has links to technical papers describing the system.  The system is maintained primarily by the bioinformatics group at [[Center for Bioinformatics and Computational Biology]] at the [[University of Maryland, College Park]].
+The original algorithms in the MUMMER software package were designed by Art Delcher, Simon Kasif and Steven Salzberg. Mummer was the first whole genome comparison system developed in Bioinformatics.  It was originally applied to the comparison of two related strains of bacteria.
-== External links ==
+The MUMmer software is [[Open-source software|open source]]. The system is maintained primarily by [[Steven Salzberg]] and Arthur Delcher at [[Center for Computational Biology]] at [[Johns Hopkins University]].
+MUMmer is a highly cited bioinformatics system in the scientific literature. According to Google Scholar, as of early 2013 the original MUMmer paper (Delcher et al., 1999)<ref name=Delcher>{{Cite journal
+| doi = 10.1093/nar/27.11.2369
+| last1 = Delcher | first1 = A. L.
+| last2 = Kasif | first2 = S.
+| last3 = Fleischmann | first3 = R. D.
+| last4 = Peterson | first4 = J.
+| last5 = White | first5 = O.
+| last6 = Salzberg | first6 = S. L.
+| title = Alignment of whole genomes
+| journal = Nucleic Acids Research
+| volume = 27
+| issue = 11
+| pages = 2369–2376
+| year = 1999
+| pmid = 10325427
+| pmc = 148804
+}}</ref> has been cited 691 times; the MUMmer 2 paper (Delcher et al., 2002)<ref name=Delcher2002>{{Cite journal
+| doi = 10.1093/nar/30.11.2478
+| last1 = Delcher | first1 = A. L.
+| last2 = Phillippy | first2 = A.
+| last3 = Carlton | first3 = J.
+| last4 = Salzberg | first4 = S. L.
+| title = Fast algorithms for large-scale genome alignment and comparison
+| journal = Nucleic Acids Research
+| volume = 30
+| issue = 11
+| pages = 2478–2483
+| year = 2002
+| pmid = 12034836
+| pmc = 117189
+}}</ref> has been cited 455 times; and the MUMmer 3.0 article (Kurtz et al., 2004)<ref name=Kurtz>{{Cite journal
+| last1 = Delcher | first1 = A.
+| last2 = Harmon | first2 = D.
+| last3 = Kasif | first3 = S.
+| last4 = White | first4 = O.
+| last5 = Salzberg | first5 = S.
+| title = Improved microbial gene identification with GLIMMER
+| journal = Nucleic Acids Research
+| volume = 27
+| issue = 23
+| pages = 4636–4641
+| year = 1999
+| pmid = 10556321
+| pmc = 148753 | doi=10.1093/nar/27.23.4636
+}}</ref> has been cited 903 times.
+==Overview==
+Mummer is a fast algorithm used for the rapid alignment of entire genomes. The MUMmer algorithm is relatively new and has 4 versions.
+==Versions of MUMmers==
+===MUMmer1===
+MUMmer1 or just MUMmer consists of three parts, the first part consists of the creation of suffix trees (to get MUMs), the second part in the longest increasing subsequence or longest common subsequences (to order MUMs), lastly any alignment to close gaps.
+Interruptions between MUMs-alignment, are known as gaps. Otherther alignment algorithms fill these gaps. The gaps fall in the following four classes:<ref name="Arthur">{{Cite journal |last1=Delcher |first1=A. |last2=Kasif |first2=S. |last3=Fleischmann |first3=R. |last4=Peterson |first4=J. |last5=White |first5=O. |last6=Salzberg |first6=S. |year=1999 |title=Alignment of Whole Genomes |journal=Nucleic Acids Research |volume=27 |issue=11 |pages=2369–2376 |doi=10.1093/nar/27.23.4636 |pmc=148804 |pmid=10325427 |doi-access=free}}</ref>
+*An [[Single-nucleotide polymorphism|SNP]]<nowiki/>interruption – when comparing two sequences, one character will differ.
+*An insertion – when comparing two sequences, there is a subsequence in only appears in one of the sequences. It would be an empty gap in the other sequence at the moment of comparison of the two sequences.
+*A highly polymorphic region – when comparing two sequences, there can be found a subsequence in which every single character differs.
+*A repeat – it’s the repetition of a sequence. Since MUMs can only take unique sequences, that gap can be one repetition of one of the MUMs.
+===MUMmer 2===
+This algorithm was redesigned to require less memory and increase speed and accuracy. It also allows for bigger genomes alignment.
+The improvement was the amount stored in the suffix trees by employing the one created by Kurtz.
+===MUMmer 3===
+According to Stefan Kurtz and his teammates, “the most significant technical improvement in MUMmer 3.0, is a complete rewrite of the suffix-tree code, based on the compact suffix- tree representation of” <ref name=Stefan>{{Cite journal
+| last1 = Kurtz
+| first1 = S.
+| last2 = Phillippy
+| first2 = A.
+| last3 = Delcher
+| first3 = A.
+| last4 = Smoot
+| first4 = M.
+| last5 = Shumway
+| first5 = M.
+| last6 = Antonescu
+| first6 = C.
+| last7 = Salzberg
+| first7 = S.
+| title = Versatile and open software for comparing large genomes
+| journal = Genome Biology
+| url = http://gensoft.pasteur.fr/docs/MUMmer/3.22/MUMmer3.pdf
+| year = 2004
+| volume = 5
+| issue = 2
+| pages = R12
+| doi = 10.1186/gb-2004-5-2-r12
+| pmid = 14759262
+| pmc = 395750
+| doi-access = free
+| access-date = 2021-05-06
+| archive-date = 2019-07-11
+| archive-url = https://web.archive.org/web/20190711082811/http://gensoft.pasteur.fr/docs/MUMmer/3.22/MUMmer3.pdf
+| url-status = live
+}}</ref>
+the tree described in the article “Reducing the space requirement of suffix trees”.<ref >{{Cite journal
+| last1 = Kurtz
+| first1 = S.
+| title = Reducing the Space Requirement of Suffix Trees
+| journal = Software: Practice and Experience
+| url = https://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-024X(199911)29:13%3C1149::AID-SPE274%3E3.0.CO;2-O
+| year = 1999
+| volume = 29
+| issue = 13
+| pages = 1149–1171
+| doi = 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O
+| access-date = 2021-05-06
+| archive-date = 2021-05-06
+| archive-url = https://web.archive.org/web/20210506065646/https://onlinelibrary.wiley.com/doi/10.1002/%28SICI%291097-024X%28199911%2929%3A13%3C1149%3A%3AAID-SPE274%3E3.0.CO%3B2-O
+| url-status = live
+}}</ref>
+===MUMmer 4===
+According to Guillaume and his team, there are some extra improvements in the implementation and also innovation with Query parallelism.  “MUMmer4 now includes options to save and load the suffix array for a given reference."<ref >{{Cite journal
+| last1 = Marçais| first1 = Guillaume.
+| last2 = Pillippy | first2 = A.
+| last3 = Delcher| first3 = A.
+| last4 = Coston | first4 = R.
+| last5 = Salzberg| first5 = S.
+| last6 = Zimin | first6 = A.
+| title = MUMmer4: A fast and versatile genome alignment system
+| journal = PLOS Computational Biology
+| year = 2018
+| volume = 14
+| issue = 1
+| pages = e1005944
+| doi=10.1371/journal.pcbi.1005944
+| pmid = 29373581
+| pmc = 5802927
+| bibcode = 2018PLSCB..14E5944M
+| doi-access = free
+}}</ref> This allows the suffix tree can be built once and constructed again after running it from the saved suffix tree.
+==Software - Open Source==
+MUMmer has [[open-source software]] and can be accessed online.
+==Related Sequence Alignments==
+There are other types of sequence alignments:
+*Edit distance
+*BLAST
+*Bowtie
+*BWA
+*Blat
+*Mauve
+*LASTZ
+*BLAST
+==References==
+{{reflist}}
+== External links ==
 * [http://mummer.sourceforge.net MUMmer home page]
+* [https://academic.oup.com/nar/article/30/11/2478/1024948?login=true MUMmer2]
+* [https://www.amazon.com/Algorithms-Bioinformatics-Introduction-Mathematical-Computational/dp/1420070339 Book]
+* [https://www.youtube.com/watch?v=gsL6TFiqAx0 MUMmer]
+* [http://mummer.sourceforge.net/manual/ Software]
+* [http://mummer.sourceforge.net/ MUMmer3]
+* [https://academic.oup.com/nar/article/27/11/2369/1258501?login=true MUMmer1]
+* [https://mummer4.github.io/ MUMmer4]
 [[Category:Bioinformatics software]]