Contents

A Quick Look at Random Algorithms in Java

   Jul 31, 2023     11 min read

Before delving into the reservoir sampling algorithm, let’s first understand some basic Java concepts related to random number generation. Having this knowledge will be really helpful before diving into the algorithm. Java offers two common methods to generate random objects: using a pseudo-random generator and utilizing the Math.random() method.

Java provides the java.util.Random library and the java.lang.Math class for generating random numbers. The former method involves creating a Random instance to generate random numbers, while the latter utilizes the Math.random() method, which generates a double value from 0 (inclusive) to 1 (exclusive).

Understanding the Random Library in Java

The Math.random() method is a static method provided by the Math library. It returns a double value ranging from 0 (inclusive) to 1 (exclusive) without requiring any input parameters. On the other hand, to use the Random generator, you need to create a Random instance before calling its methods. Consequently, you’ll often see developers creating a Random object for random number generation.

Looking into the source code, you’ll find that the underlying logic of Math.random() also involves creating a Random generator and calling the random.nextDouble() method to produce the random number. Therefore, in this blog, we will focus on generating random values using the Random objects.

Inside the Random Object Library

The Random object library provides two constructor methods. The first is the default constructor, while the second one allows you to provide an input seed. A seed is a value used to generate a random sequence. It’s essential to note that if two Random generators have the same seed, all the random values generated by them will be identical.

The default constructor of the Random class employs the system’s nanosecond value to generate the seed. Consequently, the random seed is generated based on the JVM’s running time.

Here is the example:

public static void main(String[] args) {
    Random random1 = new Random(10);
    System.out.println("Here is using one random object:");
    for (int i = 0; i < 10; i++) {
        System.out.print(random1.nextInt(10) + " ");
    }

    System.out.println();
    Random random2 = new Random(10);
    System.out.println("Here is using another random object:");
    for (int i = 0; i < 10; i++) {
        System.out.print(random2.nextInt(10) + " ");
    }
}
/*
Result:
    Here is using one random object:
    3 0 3 0 6 6 7 8 1 4
    Here is using another random object:
    3 0 3 0 6 6 7 8 1 4
*/

In many interview and coding challenges, it will most likely ask to generate random numbers within a specified range [a, b]. Here’s a simple approach to create both a random integer and a random double within this range using the random generator method in Java:

public static void main(String[] args) {
    int a = 1; // Replace 'a' with the desired lower bound of the range
    int b = 10; // Replace 'b' with the desired upper bound of the range

    // Generate a new pseudo-random generator
    Random random = new Random();

    // Generate a random integer in the range [a, b]
    int randomInt = random.nextInt(b - a + 1) + a; // Produces a random number from [a, b]

    // Generate a random double in the range [a, b)
    double randomDouble = Math.random() * (b - a) + a; // Produces a random number from [a, b)

    System.out.println("Random Integer: " + randomInt);
    System.out.println("Random Double: " + randomDouble);
}

Intro to randomness and probability

Consider the seemingly straightforward task of generating a random element from an array. It appears rather simple, doesn’t it? By generating a random index within the range of 0 to array.length - 1, we can subsequently retrieve the element corresponding to that index.

However, a challenge emerges when the goal is to generate an array with elements arranged in a random order, ensuring that each element is equally likely to appear at any position within the array, also known as perfect shuffling.

One approach to solving this involves a rather brute-force(DFS Backtracking) method, explores all possible solutions by generating all permutations of the array. Once these permutations are generated, a single solution can be randomly selected from among them. Essentially, this question boils down to finding a single path within all permutations.

TC: O(n!) SC: O(1)

A more efficient approach then comes into play after grasping the concepts above. If our aim is to find only one randomly selected path within the recursion tree, we can streamline the process by focusing on two key aspects. First, it’s important to randomly select an index for swapping in each iteration/recursion. Second, to ensure that previously swapped elements are not swapped again.

To model this approach, we traverse the array in reverse order. During each iteration, we randomly select an element from the range of 0 to the current index i and swap it with the last element in the array. The index i then acts as a boundary, defining the scope within which elements are considered for swapping.

TC: O(n) SC: O(1)

Code:

public void shuffle(int[] array) {
    Random random = new Random();
    for (int i = array.length; i >= 1; i--) {
      int index = random.nextInt(i);
      swap(array, i - 1, index);
    }
}
private void swap(int[] array, int i, int j) {
    int temp = array[i];
    array[i] = array[j];
    array[j] = temp;
}

Intro to Reservoir Sampling

Although the above approach can assist us in solving problems related to randomness, it still comes with several limitations. First and foremost, it can only handle a fixed-size sample. Additionally, it’s incapable of running in an offline manner; we must accumulate the entire sample size before uncovering randomness within the elements. In reality, this proves to be inefficient and memory-intensive. Consequently, the reservoir sampling algorithm was developed to help with this problem.

The underlying problem is as follows: Picture an unending stream of data elements. How can we effectively select a single element from this ongoing stream in a way that, at any moment during the stream’s processing, we’re able to retrieve a random element from the ‘n’ elements encountered so far? To address this, we’re tasked with implementing two methods within a sampling class: read(int value) – which reads a number from the stream – and sample() – which returns a sample seen so far.

The algorithm to solve the problem is as follows: With each incoming element, our objective is to keep track of the current sample’s size. Whenever a new element arrives, we determine if it’s the one to be selected – with a probability of 1/n. This ensures that whether we’re considering the new sample size or the old one, the selection probability for any element remains consistent. The mathematics behind this process are as follows.

datasizepropability to select new elementprobability for previous selected element to stay
[1]11/1N/A
[1, 2]21/21/2
[1, 2, 3]31/31/2 multuply 2/3(not select) = 1/3
[1, 2, 3, 4]41/41/2 multuply 2/3 multuply 3/4 = 1/4

Code:

public class Sampling {
  Random random;
  Integer count;
  Integer reservoir;
  public Sampling() {
    random = new Random();
    count = 0;
    reservoir = null;
  }

  public void read(int value) {
    count++;
    if (random.nextInt(count) == 0) {
      reservoir = value;
    }
  }

  public Integer sample() {
    return reservoir;
  }
}

Now, let’s take this problem to the next level: what if we need to sample ‘k’ elements from the stream?

Method 1 Brute Force

One approach involves storing all the elements in an ArrayList and then selecting k random samples from it.

Method 2 Sorting

Another strategy entails assigning a random tag to each element before storage. Once stored, we sort the array based on these random tags and select the first ‘k’ elements. However, these two methods are still neither offline nor ideal solutions.

Method 3 Heap

We can transform the above approach into a top ‘k’ problem, maintaining a min-heap of size ‘k’ to store the k largest elements. This way, we can retrieve ‘k’ random elements at any given moment.

Method 4 Reservoir Sampling

Going beyond maintaining a single reservoir element, we can keep a List of reservoir elements. As we read new elements, we add them to the list if the sample size is less than ‘k’. However, when the sample size reaches or exceed ‘k’, we just need to select a number from the size and determine if we put in to the reservoir containers.

TC: O(n) SC: O(1)

Code:

public class Solution {
  private final int k;
  List<Integer> reservoir;
  Integer count;
  Random random;

  public Solution(int k) {
    // Complete the constructor if necessary.
    this.k = k;
    reservoir = new ArrayList<>();
    count = 0;
    random = new Random();
  }

  public void read(int value) {
    int size = reservoir.size();
    count++;
    if (size < k) {
      reservoir.add(value);
      return;
    }
    // get a random number from 0 to current size of the sample that we have seen.
    int probToAdd = random.nextInt(count + 1);
    // then if the probability < k, that means it is selected.
    if (probToAdd < k) {
      // randomly pick an index from the reseervoir, to swap. * 1/k * k/totalElement;
      int index = random.nextInt(size);
      reservoir.set(index, value);
    }
  }

  public List<Integer> sample() {
    return reservoir;
  }
}

More Practice Questions

Problem NumberProblemSolution
1Linked List Random NodeSolution
2Random Pick IndexSolution
3Random Pick with WeightSolution
4Random Point in Non-overlapping RectanglesSolution
3Random Flip MatrixSolution

Referrence to study:

  1. reservoir sampling proof
  2. Picking Fairly From a List of Unknown Size With Reservoir Sampling