User:ParAlgMergeSort/sandbox/Merge algorithm

Parallel merge

A parallel version of the binary merge algorithm can serve as a building block of a parallel merge sort. The following pseudocode demonstrates this algorithm in a parallel divide-and-conquer style (adapted from Cormen et al.^[1]^: 800). It operates on two sorted arrays $A$ and $B$ and writes the sorted output to array $C$ . The notation A[i...j] denotes the part of $A$ from index $i$ through $j$ , exclusive.

algorithm merge(A[i...j], B[k...ℓ], C[p...q]) is
    inputs A, B, C : array
           i, j, k, ℓ, p, q : indices

    let m_A = j - i,
        m_B = ℓ - k

    if m_A < m_B then
        swap A and B  // ensure that A is the larger array: i, j still belong to A; k, ℓ to B
        swap m_A and m_B

    if m_A ≤ 0 then
        return  // base case, nothing to merge

    let r = ⌊(i + j)/2⌋
    let s = binary-search(A[r], B[k...ℓ])
    let t = p + (r - i) + (s - k)
    C[t] = A[r]

    in parallel do
        merge(A[i...r], B[k...s], C[p...t])
        merge(A[r+1...j], B[s...ℓ], C[t+1...q])

The algorithm operates by splitting either $A$ or $B$ , whichever is larger, into (nearly) equal halves. It then splits the other array into a part with values smaller than the midpoint of the first, and a part with larger or equal values. (The binary search subroutine returns the index in $B$ where $A [r]$ would be, if it were in $B$ ; that this always a number between $k$ and $ℓ$ .) Finally, each pair of halves is merged recursively, and since the recursive calls are independent of each other, they can be done in parallel. Hybrid approach, where serial algorithm is used for recursion base case has been shown to perform well in practice ^[2]

The work performed by the algorithm for two arrays holding a total of $n$ elements, i.e., the running time of a serial version of it, is $O (n)$ . This is optimal since $n$ elements need to be copied into $C$ . To calculate the span of the algorithm, it is necessary to derive a Recurrence relation. Since the two recursive calls of P-Merge are in parallel, we only need to consider the costlier of the two calls. In the worst case, the maximum number of elements in one of the recursive calls is at most ${\textstyle {\frac {3}{4}}n}$ since the array with more elements is perfectly split in half. Adding the $\Theta \left(\log(n)\right)$ cost of the Binary Search, we obtain this recurrence as an upper bound:

$T_{\infty }^{\text{merge}}(n)=T_{\infty }^{\text{merge}}\left({\frac {3}{4}}n\right)+\Theta \left(\log(n)\right)$

The solution is $T_{\infty }^{\text{merge}}(n)=\Theta \left(\log(n)^{2}\right)$ , meaning that it takes that much time on an ideal machine with an unbounded number of processors.^[1]^{: 801–802}

Note: The routine is not stable: if equal items are separated by splitting $A$ and $B$ , they will become interleaved in $C$ ; also swapping $A$ and $B$ will destroy the order, if equal items are spread among both input arrays. As a result, when used for sorting, this algorithm produces a sort that is not stable.

^ ^a ^b Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4.
^ Victor J. Duvanenko (2011), "Parallel Merge", Dr. Dobb's Journal

[clrs-1] Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4.

[vjd-2] Victor J. Duvanenko (2011), "Parallel Merge", Dr. Dobb's Journal

[1]

[2]