Heapsort: Difference between revisions

Heapsort
	A run of the heapsort algorithm sorting an array of randomly permuted values. In the first stage of the algorithm array elements are reordered to satisfy the heap property. Before the actual sorting takes place, the heap tree structure is shown briefly for illustration.A run of the heapsort algorithm sorting an array of randomly permuted values. In the first stage of the algorithm the array elements are reordered to satisfy the heap property. Before the actual sorting takes place, the heap tree structure is shown briefly for illustration.
Class	Sorting algorithm
Data structure	Array
Worst-case performance
Average performance
Worst-case space complexity	total, auxiliary
Optimal	Never

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 19:55, 17 March 2009

Heapsort (method) is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines than a good implementation of quicksort, it has the advantage of a worst-case Θ(n log n) runtime. Heapsort is an in-place algorithm, but is not a stable sort.

Overview

The heap sort works as its name suggests - it begins by building a heap out of the data set, and then removing the largest item and placing it at the end of the sorted array. After removing the largest item, it reconstructs the heap and removes the largest remaining item and places it in the next open position from the end of the sorted array. This is repeated until there are no items left in the heap and the sorted array is full. Elementary implementations require two arrays - one to hold the heap and the other to hold the sorted elements. ^[1]

Heapsort inserts the input list elements into a heap data structure. The largest value (in a max-heap) or the smallest value (in a min-heap) are extracted until none remain, the values having been extracted in sorted order. The heap's invariant is preserved after each extraction, so the only cost is that of extraction.

During extraction, the only space required is that needed to store the heap. In order to achieve constant space overhead, the heap is stored in the part of the input array that has not yet been sorted. (The structure of this heap is described at Binary heap: Heap implementation.)

Heapsort uses two heap operations: insertion and root deletion. Each extraction places an element in the last empty location of the array. The remaining prefix of the array stores the unsorted elements.

Variations

The most important variation to the simple variant is an improvement by R. W. Floyd which gives in practice about 25% speed improvement by using only one comparison in each siftup run which then needs to be followed by a siftdown for the original child; moreover it is more elegant to formulate. Heapsort's natural way of indexing works on indices from 1 up to the number of items. Therefore the start address of the data should be shifted such that this logic can be implemented avoiding unnecessary +/- 1 offsets in the coded algorithm.

Ternary heapsort ^[2] uses a ternary heap instead of a binary heap; that is, each element in the heap has three children. It is more complicated to program, but does a constant number of times fewer swap and comparison operations. This is because each step in the shift operation of a ternary heap requires three comparisons and one swap, whereas in a binary heap two comparisons and one swap are required. The ternary heap does two steps in less time than the binary heap requires for three steps, which multiplies the index by a factor of 9 instead of the factor 8 of three binary steps. Ternary heapsort is about 12% faster than the simple variant of binary heapsort.^{[citation needed]}

The smoothsort algorithm [1] [2] is a variation of heapsort developed by Edsger Dijkstra in 1981. Like heapsort, smoothsort's upper bound is O(n log n). The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state. Due to its complexity, smoothsort is rarely used.

Comparison with other sorts

Heapsort primarily competes with quicksort, another very efficient general purpose nearly-in-place comparison-based sort algorithm.

Quicksort is typically somewhat faster, due to better cache behavior and other factors, but the worst-case running time for quicksort is O(n²), which is unacceptable for large data sets and can be deliberately triggered given enough knowledge of the implementation, creating a security risk. See quicksort for a detailed discussion of this problem, and possible solutions.

Thus, because of the O(n log n) upper bound on heapsort's running time and constant upper bound on its auxiliary storage, embedded systems with real-time constraints or systems concerned with security often use heapsort.

Heapsort also competes with merge sort, which has the same time bounds, but requires Ω(n) auxiliary space, whereas heapsort requires only a constant amount. Heapsort also typically runs more quickly in practice on machines with small or slow data caches. On the other hand, merge sort has several advantages over heapsort:

Like quicksort, merge sort on arrays has considerably better data cache performance, often outperforming heapsort on a modern desktop PC, because it accesses the elements in order.
Merge sort is a stable sort.
Merge sort parallelizes better; the most trivial way of parallelizing merge sort achieves close to linear speedup, while there is no obvious way to parallelize heapsort at all.
Merge sort can be easily adapted to operate on linked lists and very large lists stored on slow-to-access media such as disk storage or network attached storage. Heapsort relies strongly on random access, and its poor locality of reference makes it very slow on media with long access times.

An interesting alternative to Heapsort is Introsort which combines quicksort and heapsort to retain advantages of both: worst case speed of heapsort and average speed of quicksort.

Pseudocode

The following is the "simple" way to implement the algorithm, in pseudocode, where swap is used to swap two elements of the array. Notice that the arrays are zero based in this example. Template:Heapsort CODE BLOCK 1 The heapify function can be thought of as building a heap from the bottom up, successively shifting downward to establish the heap property. An alternative version (shown below) that builds the heap top-down and shifts upward is conceptually simpler to grasp. This "siftUp" version can be visualized as starting with an empty heap and successively inserting elements. However, it is asymptotically slower: the "siftDown" version is O(n), and the "siftUp" version is O(n log n) in the worst case. The heapsort algorithm is O(n log n) overall using either version of heapify. Template:Heapsort CODE BLOCK 2

References

J. W. J. Williams. Algorithm 232 - Heapsort, 1964, Communications of the ACM 7(6): 347–348.
Robert W. Floyd. Algorithm 245 - Treesort 3, 1964, Communications of the ACM 7(12): 701.
Svante Carlsson, Average-case results on heapsort, 1987, BIT 27(1): 2-17.
Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Pages 144–155 of section 5.2.3: Sorting by Selection.
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Chapters 6 and 7 Respectively: Heapsort and Priority Queues
A PDF of Dijkstra's original paper on Smoothsort
Heaps and Heapsort Tutorial by David Carlson, St. Vincent College
Heaps of Knowledge

Notes

^ http://linux.wku.edu/~lamonml/algor/sort/heap.html
^ "Data Structures Using Pascal", Tenenbaum & Augenstein, 1991, page 405, gives Ternary Heapsort as an exercise for the student. "Write a sorting routine similar to the heapsort except that it uses a ternary heap."

External links

Animated Sorting Algorithms: Heap Sort – graphical demonstration and discussion of heap sort
Analyze Heapsort in an online Javascript IDE
Heapsort
Heapsort animated
NIST's Dictionary of Algorithms and Data Structures: Heapsort
Sorting revisited
Java-based demonstration of Heapsort
A colored graphical Java applet which allows experimentation with initial state and shows statistics

[1] ttp://linux.wku.edu/~lamonml/algor/sort/heap.html

[2] "Data Structures Using Pascal", Tenenbaum & Augenstein, 1991, page 405, gives Ternary Heapsort as an exercise for the student. "Write a sorting routine similar to the heapsort except that it uses a ternary heap."

[1]

[2]

@@ Line 4: / Line 4: @@
 |data=[[Array]]
 |time=<math>O(nLogn)</math>
+|average-time=<math>\Theta(n\log n)</math>
+|best-time=
 |space=<math>O(n)</math> total, <math>O(1)</math> auxiliary
 |optimal=Never

v t e Sorting algorithms
Theory	Computational complexity theory Big O notation Total order Lists Inplacement Stability Comparison sort Adaptive sort Sorting network Integer sorting X + Y sorting Transdichotomous model Quantum sort
Exchange sorts	Bubble sort Cocktail shaker sort Odd–even sort Comb sort Gnome sort Proportion extend sort Quicksort
Selection sorts	Selection sort Heapsort Smoothsort Cartesian tree sort Tournament sort Cycle sort Weak-heap sort
Insertion sorts	Insertion sort Shellsort Splaysort Tree sort Library sort Patience sorting
Merge sorts	Merge sort Cascade merge sort Oscillating merge sort Polyphase merge sort
Distribution sorts	American flag sort Bead sort Bucket sort Burstsort Counting sort Interpolation sort Pigeonhole sort Proxmap sort Radix sort Flashsort
Concurrent sorts	Bitonic sorter Batcher odd–even mergesort Pairwise sorting network Samplesort
Hybrid sorts	Block merge sort Kirkpatrick–Reisch sort Timsort Introsort Spreadsort Merge-insertion sort
Other	Topological sorting Pre-topological order Pancake sorting Spaghetti sort
Impractical sorts	Stooge sort Slowsort Bogosort