Sorting Algorithms

This chapter deals exclusively with classic sorting algorithms. The theory will be presented and applied to integer arrays (the algorithms are generalizable to any type and ordering criterion).

Problem Statement

Consider an integer array of size N whose elements we want to sort in ascending order. A value may appear multiple times in the array. Thus:

11	-2	1515	42	2048	28	11	-78

becomes, once sorted:

-78	-2	11	11	28	42	1515	2048

We assume the existence of a swap function (see previous chapter) that swaps the cells at indices i and j in an array t.

Selection Sort

The principle of selection sort is as follows:

find the smallest element of the array and swap its cell with the first one (i.e., the smallest element is at the head)
find the smallest element of the sub-array [1, N-1] and swap it with the cell at index 1.
… repeat this process until the array is fully sorted.

Algorithm

The procedure in C can be written as:

0111_tri_selection.c

// Rappel: on suppose l'existence de la fonction echange
void echange (int t[], int, int);

void tri_selection (int tab[], int N)
{
    for (int ideb = 0; ideb < N - 1; ideb++)
    {
        int imin = ideb;
        for (int i = ideb + 1; i < N; i++)
        {
            if (tab[i] < tab[imin]) imin = i;
        }
        echange (tab, ideb, imin);
    }
}

Complexity and Invariant

From the perspective of the algorithm’s cost in terms of comparisons, we observe that when ideb equals 0 there are (N-1) comparisons, when ideb equals 1 there are (N-2) comparisons… and when ideb equals (N-2) there is 1 comparison. In total the cost is (N-1) + (N-2) + ... + 1 = n × (n-1) / 2 which when N is very large tends asymptotically toward O(N²).

We can also determine the invariant of this sort (i.e., the quantity that does not change between each loop iteration; this is useful for algorithm proofs). For selection sort, the invariant is: at the end of iteration ideb, the sub-array t[0..ideb] contains the ideb+1 smallest elements of the array in ascending order of their values.

Insertion Sort

The principle of insertion sort is the one used to sort playing cards:

sort the first 2 cards
look at the third card and insert it in its place (shifting larger cards to the right).
and repeat until everything is sorted.

Algorithm

The code is as follows:

0112_tri_insertion.c

void tri_insertion (int tab[], int N)
{
    for (int i = 1; i < N; i++)
    {
        int current_element = tab[i];
        int insertion_index = i;

        while ((insertion_index >= 1) && (tab[insertion_index - 1] > current_element))
        {
            tab[insertion_index] = tab[insertion_index - 1];
            insertion_index--;
        }
        tab[insertion_index] = current_element;
    }
}

Complexity and Invariant

The complexity of this algorithm is not as straightforward as the previous one. Indeed, in the while loop the number of comparisons depends on the initial state of the array (whether it is already sorted or not). Assuming the array is sorted, we observe that there is only one comparison per loop iteration, giving N-1 total. Conversely if the array is sorted in descending order, there will be 1, then 2, then … then N-2 comparisons in the while loop. Thus the complexity is O(N) in the best case and O(N^2) in the worst case and on average as well.

The invariant of this sort is: at loop iteration i, the element t[i] is inserted into the already sorted array t[0..i-1].

Merge Sort

Merge sort uses the divide and conquer principle. For this it operates as follows:

an array of size 1 is necessarily sorted
the initial array is split in two
each sub-array is sorted (recursive algorithm)
the two sub-arrays are merged to obtain the initial array completely sorted.

Algorithm

The algorithm is somewhat more complex than the previous two and relies on auxiliary procedures. The main procedure is as follows:

0113_tri_fusion_a.c

void tri_fusion (int tab[], int N)
{
    if (N > 1) fusion_recursive (tab, 0, N - 1);
}

then the core of the sort is detailed (splitting into two sub-arrays)

0113_tri_fusion_b.c

void fusion_recursive (int tab[], int premier, int dernier)
{
    if (premier < dernier)
    {
        int milieu = (premier + dernier) / 2;

        fusion_recursive (tab, premier, milieu);
        fusion_recursive (tab, milieu + 1, dernier);

        fusion (tab, premier, milieu, dernier);
    }
}

then the merging of two sub-arrays

0113_tri_fusion_c.c

int fusion (int tab[], int premier1, int dernier1, int dernier2)
{
    int premier2 = dernier1 + 1;
    int c2       = premier2;
    int c1       = premier1;
    int fusion[dernier2 - premier1 + 1];

    for (int i = 0; i < dernier2 - premier1 + 1; i++)
    {
        if ((c1 <= dernier1) && ((tab[c1] < tab[c2]) || (c2 > dernier2)))
        {
            fusion[i] = tab[c1];
            c1++;
        }
        else
        {
            fusion[i] = tab[c2];
            c2++;
        }
    }

    for (int i = 0; i < dernier2 - premier1 + 1; i++)
    {
        tab[premier1 + i] = fusion[i];
    }
}

Complexity

The complexity of merge sort is calculated as follows: comparisons are only made in the merge function (where there are 3 comparisons per loop iteration). To sort the complete array:

we merge two sorted half-arrays giving approximately 3 × n comparisons.
to obtain a sorted half-array, we must merge two sorted quarter-arrays giving again 3 × n comparisons (as there are two half-arrays).
to obtain a sorted quarter-array, we must merge two…

In the end there are 3 × n comparisons per recursion with a total of i recursions. i is such that 2^i = N, so i = ln_2(N). Thus the asymptotic complexity of merge sort is N × ln_2(N).

Summary

The previous section presented 3 classic sorting algorithms; others will be covered in tutorials and practicals (bubble sort and quicksort).

:::warning Theorem on Complexity A sort of integer arrays by comparisons cannot be done in o(N x ln_2(N)) on average and in the worst case. A sort is optimal if it has a complexity of Ω(N x ln_2(N)). :::

Below is a summary of the complexities of the main sorting algorithms:

Algorithm	Best	Average	Worst
Bubble sort	`O(n)`	`O(n^2)`	`O(n^2)`
Selection sort	`O(n^2)`	`O(n^2)`	`O(n^2)`
Insertion sort	`O(n)`	`O(n^2)`	`O(n^2)`
Merge sort	`O(n x ln_2(n))`	`O(n x ln_2(n))`	`O(n x ln_2(n))`
Quicksort	`O(n x ln_2(n))`	`O(n x ln_2(n))`	`O(n^2)`

Other properties characterize sorting algorithms:

stability: a sorting algorithm is stable if two identical values remain in the same order at the end of the algorithm
in place: a sorting algorithm is in place if it sorts without creating auxiliary arrays. For example, insertion sort and selection sort are in place, while merge sort is not (memory complexity of O(n)).