Module 9: Array algorithms

  • how can we search in arrays?
  • how can we sort arrays?
  • what is algorithmic complexity?
  • what makes a good algorithm?

Searching

Arrays can to store large amounts of homogenous data, for example, telephone directories, corporate sales figures or meteorological readings for all of Canada. Thus, a common task is to search an array for a particular piece of data or a data with a particular characteristic.

Search example

Problem: Find the position (index) of the largest value in an array.

Analysis

Clearly a loop is involved since we’ll have to search the entire array. Let’s start by concentrating on the body of the loop.

For each element in the loop, we need to compare that value (let’s call it data[i]) with the previously-largest element. Let’s assume that we’ve stored the position (index) of that previously-largest element in a variable called largest. In that case, every time we go through the loop, we need to check whether data[i] > data[largest]. If it is, i is the position of the newly-found-to-be-largest value, so we can set the value of largest to i. This means that our loop body will look like:

if data[i] > data[largest]:
    largest = i

Now we need to think about how our loop should start and finish. When we start the loop, what should the value of largest be? We haven’t examined any values yet, but we can start out by setting largest to 0, making element 0 the largest-so-far, assuming that the array’s length isn’t zero. We need to address that assumption by either:

  1. defining a precondition that length > 0 or
  2. defining what the function will return if length == 0.

So, if we set largest to 0 at the beginning and then loop through all of the values in the array with the body from above, we get:

largest = 0

for each value of i in the range 0 (inclusive) through length (exclusive:
    if data[i] > data[largest]:
        largest = i

Question: if there are $N$ elements in the array, how many comparisons does our search algorithm require? How many assignments to largest will be required in the worst case? The best case? What are those cases?

Question: What happens if there are two or more largest pieces of data? Where will position end up? Is this sensible?

Translation to C++

Since this algorithm only needs to return one value (the index), we can write a function with an int return type. A sensible name might be something like findLargest, though other names are possible too. For parameters, we need an array of values to search through and the length of that array. Remember, “passing an array” actually means “passing the address in memory where the array begins”; a separate parameter is required to say how long the array is.

Prototype: int findLargest(double data[], int length)

Declaration:

int findLargest(double data[], int length);

Contract:

/**
 * Find the largest value in an array.
 *
 * @param   data[in]     values to search through
 * @param   length       the number of elements in `data` @pre > 0
 *
 * @returns the index of the largest value in the array
 */
int findLargest(double data[], int length);

Definition:

int findLargest(double data[], int length)
{
  int largest = 0;

  for (int i = 1; i < length; i++)
  {
    if (data[i] > data[largest])
    {
      largest = i;
    }
  }

  return largest;
}

Search complexity

If we have $N$ items in an unsorted array, searching through the array to find the largest value (or the smallest, or any particular value, etc.) will require us to examine all $N$ values and compare them against some criterion (larger than what we’ve seen before, equal to the value we’re looking for, etc.). This means that, if the array were twice as large, it would take twice as long to search for the value we’re interested in. An algorithm that requires approximately $N$ operations on an $N$-element array is called a linear algorithm: the number of operations is linear with respect to the number of things we’re operating on. Searching is just such a linear algorithm, but as we’ll see now, many other algorithms are not linear: adding a bit more data to the computation can cause the algorithm to take wildly more time to compute an answer!

Question: how many operations are required to find the largest value in a sorted array?

Sorting

In this module we look at the problem of sorting, in particular, sorting an array of numbers so the values are in ascending order. We’ll look at a few standard sorting algorithms. At bottom, however, they all work by examining a single pair of numbers and switching them if they are out of order. The trick is organizing which pair.

Bubble Sort

A bubble sort is fairly easy to understand but it’s pretty slow. It works by comparing values in an array and swapping them if they’re in the wrong order. Every swap makes the array a little bit more sorted, but to sort the entire array we need to go through it more than once!

for i in range [0, length-1):
	for j in range [0, length-i-1):
		if data[j] > data[j + 1]:
			swap data[j] with data[j + 1]

This algorithm has a couple of interesting properties worth noting:

  1. it passes through the array multiple times, requiring a loop within a loop:

    • the outer loop controls our passes through the array: i is the number of times we’ve passed through the array thus far
    • the inner loop controls the details of each pass through the array: j is the number of elements of the array that we’ve looked at on this pass, and
  2. it relies on the ability to swap two array elements in place; in C++ we would implement this using pass-by-reference:

    void swap(double& x, double& y)
    {
        double temp = x;
        x = y;
        y = temp;
    }
    

    (see the lecture capture for how we derived this in class)

It can be shown that the number of comparisons that a bubble sort needs to perform is:

$$ \frac{n(n-1)}{2} = \frac{n^2-n}{2} $$

The larger the value of $n$ we deal with, the less significant the linear term and constant divisor become; we call the bubble sort a quadratic algorithm or, alternatively, an order $n^2$ algorithm. We are more interested in characterizing how a change in input size will affect the algorithm’s complexity than the precise number of operations required for any given $n$. We can say that a linear (order $n$) algorithm like searching scales linearly with its input data: if we have to process ten times as much data, it will take approximately ten times longer to run. With a quadratic algorithm like bubble sort, however, ten times as much data requires 100 times as much processing time or power to work through.

There are other sorting algorithms, like the merge sort that we saw briefly in lecture or the quicksort algorithm that is often a bit quicker but also more complex to understand. You won’t be expected to remember the details of those sorting algorithms, but you should know that their time complexity is lower: $n \log(n)$ instead of $n^2$. Here’s how time complexity works itself out with a few of these algorithms:

Algorithm Time complexity 2 $\times$ data 10 $\times$ 100 $\times$
Find largest/smallest/median of sorted array O(1) — constant 1 $\times$ time 1 $\times$ 1 $\times$
General search in sorted array O($\log_2 n$) — logarithmic 1 $\times$ time $3.3\times$ $6.6 \times$
General search in unsorted array O($n$) — linear 2 $\times$ time 10 $\times$ 100 $\times$
Merge sort, quicksort O($n \log_2 n$) 2 $\times$ time 33 $\times$ 6,600 $\times$
Bubble sort O($n^2$) — quadratic 4 $\times$ time 100 $\times$ 10,000 $\times$
Matrix multiplication O($n^3$) — cubic 8 $\times$ time 1,000 $\times$ 1,000,000 $\times$

Exercises

Searching

  1. Write a function to count how many times the values in an array go above a given value.
  2. Write a function to find the position of the largest number in an array of floating-point numbers.
  3. Write a function to count how many times a given integer occurs in an array of integers.
    • Extra challenge: also “return” the index of the first and last occurrences of this number in the array using pass-by-reference.
  4. Write a function to count how many times the largest number occurs in an array of integers. Your function should report both the number itself and the count of how many times it occurred.

Sorting

  1. Implement the algorithm described above for a bubble sort.
License: CC BY-NC-SA

(c) 2009–2016 Michael Bruce-Lockhart, Theo Norvell, Dennis Peters and Jonathan Anderson. Licensed under a Creative Commons Attribution–Noncommercial–Share-Alike 2.5 Canada License. Permissions beyond the scope of this license may be available at theteachingmachine.org.