Markov Chain

You might want to generate pseudo random data based on a small sample. Markov Chains can allow you to do just that.

Read an input, and calculate the probability of, in this example, a third token following two others. Then generate random numbers and pick from the weighted list of options

Given the input string input = "aaabbbbbaaabbbbbbbbaa" you will notice short clusters of a followed by longer clusters of b.

Result:

Note that I am using a sliding window of size 2 to get a high resemblence of the clustering behavior from the input. So given aa as an input, what is a likely next token?

const createLookup = (input) => { const lookup = {} let j = 0 for (let i = 1; i < input.length - 1; i++) { const k = input[j] + input[i] const result = input[i+1] if (!lookup[k]) { lookup[k] = {[result]: 1} } else { lookup[k][result] = 1 + lookup[k][result] || 0 } j = i } return lookup } const nextToken = (k) => { const ratio = lookup[k] const rnd = Math.random() const total = Object.values(ratio).reduce((acc, el) => acc + el) let seen = 0 for (let [k, v] of Object.entries(ratio)) { seen += v/total if (rnd < seen) { return k } } } const lookup = createLookup(input) // You need something to start with - I am using ab as a seed here let output = "ab" for (let i = 0; i < 50; i++) { output += nextToken(output.slice(-2)) } document.getElementById("result").innerHTML = output