Sorting Algorithms
This chapter deals exclusively with classic sorting algorithms. The theory will be presented and applied to integer arrays (the algorithms are generalizable to any type and ordering criterion).
Problem Statement
Consider an integer array of size N whose elements we want to sort in ascending order. A value may appear multiple times in the array. Thus:
| 11 | -2 | 1515 | 42 | 2048 | 28 | 11 | -78 |
|---|
becomes, once sorted:
| -78 | -2 | 11 | 11 | 28 | 42 | 1515 | 2048 |
|---|
We assume the existence of a swap function (see previous chapter) that swaps the cells at indices i and j in an array t.
Selection Sort
The principle of selection sort is as follows:
- find the smallest element of the array and swap its cell with the first one (i.e., the smallest element is at the head)
- find the smallest element of the sub-array
[1, N-1]and swap it with the cell at index 1. - … repeat this process until the array is fully sorted.
Algorithm
The procedure in C can be written as:
// Rappel: on suppose l'existence de la fonction echange
void echange (int t[], int, int);
void tri_selection (int tab[], int N)
{
for (int ideb = 0; ideb < N - 1; ideb++)
{
int imin = ideb;
for (int i = ideb + 1; i < N; i++)
{
if (tab[i] < tab[imin]) imin = i;
}
echange (tab, ideb, imin);
}
}
Complexity and Invariant
From the perspective of the algorithm’s cost in terms of comparisons, we observe that when ideb equals 0 there are (N-1) comparisons, when ideb equals 1 there are (N-2) comparisons… and when ideb equals (N-2) there is 1 comparison. In total the cost is (N-1) + (N-2) + ... + 1 = n × (n-1) / 2 which when N is very large tends asymptotically toward O(N²).
We can also determine the invariant of this sort (i.e., the quantity that does not change between each loop iteration; this is useful for algorithm proofs). For selection sort, the invariant is: at the end of iteration ideb, the sub-array t[0..ideb] contains the ideb+1 smallest elements of the array in ascending order of their values.
Insertion Sort
The principle of insertion sort is the one used to sort playing cards:
- sort the first 2 cards
- look at the third card and insert it in its place (shifting larger cards to the right).
- and repeat until everything is sorted.
Algorithm
The code is as follows:
void tri_insertion (int tab[], int N)
{
for (int i = 1; i < N; i++)
{
int current_element = tab[i];
int insertion_index = i;
while ((insertion_index >= 1) && (tab[insertion_index - 1] > current_element))
{
tab[insertion_index] = tab[insertion_index - 1];
insertion_index--;
}
tab[insertion_index] = current_element;
}
}
Complexity and Invariant
The complexity of this algorithm is not as straightforward as the previous one. Indeed, in the while loop the number of comparisons depends on the initial state of the array (whether it is already sorted or not). Assuming the array is sorted, we observe that there is only one comparison per loop iteration, giving N-1 total. Conversely if the array is sorted in descending order, there will be 1, then 2, then … then N-2 comparisons in the while loop. Thus the complexity is O(N) in the best case and O(N^2) in the worst case and on average as well.
The invariant of this sort is: at loop iteration i, the element t[i] is inserted into the already sorted array t[0..i-1].
Merge Sort
Merge sort uses the divide and conquer principle. For this it operates as follows:
- an array of size 1 is necessarily sorted
- the initial array is split in two
- each sub-array is sorted (recursive algorithm)
- the two sub-arrays are merged to obtain the initial array completely sorted.
Algorithm
The algorithm is somewhat more complex than the previous two and relies on auxiliary procedures. The main procedure is as follows:
void tri_fusion (int tab[], int N)
{
if (N > 1) fusion_recursive (tab, 0, N - 1);
}
then the core of the sort is detailed (splitting into two sub-arrays)
void fusion_recursive (int tab[], int premier, int dernier)
{
if (premier < dernier)
{
int milieu = (premier + dernier) / 2;
fusion_recursive (tab, premier, milieu);
fusion_recursive (tab, milieu + 1, dernier);
fusion (tab, premier, milieu, dernier);
}
}
then the merging of two sub-arrays
int fusion (int tab[], int premier1, int dernier1, int dernier2)
{
int premier2 = dernier1 + 1;
int c2 = premier2;
int c1 = premier1;
int fusion[dernier2 - premier1 + 1];
for (int i = 0; i < dernier2 - premier1 + 1; i++)
{
if ((c1 <= dernier1) && ((tab[c1] < tab[c2]) || (c2 > dernier2)))
{
fusion[i] = tab[c1];
c1++;
}
else
{
fusion[i] = tab[c2];
c2++;
}
}
for (int i = 0; i < dernier2 - premier1 + 1; i++)
{
tab[premier1 + i] = fusion[i];
}
}
Complexity
The complexity of merge sort is calculated as follows: comparisons are only made in the merge function (where there are 3 comparisons per loop iteration). To sort the complete array:
- we merge two sorted half-arrays giving approximately
3 × ncomparisons. - to obtain a sorted half-array, we must merge two sorted quarter-arrays giving again
3 × ncomparisons (as there are two half-arrays). - to obtain a sorted quarter-array, we must merge two…
In the end there are 3 × n comparisons per recursion with a total of i recursions. i is such that 2^i = N, so i = ln_2(N). Thus the asymptotic complexity of merge sort is N × ln_2(N).
Summary
The previous section presented 3 classic sorting algorithms; others will be covered in tutorials and practicals (bubble sort and quicksort).
:::warning Theorem on Complexity
A sort of integer arrays by comparisons cannot be done in o(N x ln_2(N)) on average and in the worst case. A sort is optimal if it has a complexity of Ω(N x ln_2(N)).
:::
Below is a summary of the complexities of the main sorting algorithms:
| Algorithm | Best | Average | Worst |
|---|---|---|---|
| Bubble sort | O(n) | O(n^2) | O(n^2) |
| Selection sort | O(n^2) | O(n^2) | O(n^2) |
| Insertion sort | O(n) | O(n^2) | O(n^2) |
| Merge sort | O(n x ln_2(n)) | O(n x ln_2(n)) | O(n x ln_2(n)) |
| Quicksort | O(n x ln_2(n)) | O(n x ln_2(n)) | O(n^2) |
Other properties characterize sorting algorithms:
- stability: a sorting algorithm is stable if two identical values remain in the same order at the end of the algorithm
- in place: a sorting algorithm is in place if it sorts without creating auxiliary arrays. For example, insertion sort and selection sort are in place, while merge sort is not (memory complexity of
O(n)).