You might want to generate pseudo random data based on a small sample. Markov Chains can allow you to do just that.
Read an input, and calculate the probability of, in this example, a third token following two others. Then generate random numbers and pick from the weighted list of options
Given the input string input = "aaabbbbbaaabbbbbbbbaa"
you will notice short clusters of a followed by longer clusters of b.
Result:
Note that I am using a sliding window of size 2 to get a high resemblence of the clustering behavior from the input. So given aa as an input, what is a likely next token?
const createLookup = (input) => {
const lookup = {}
let j = 0
for (let i = 1; i < input.length - 1; i++) {
const k = input[j] + input[i]
const result = input[i+1]
if (!lookup[k]) {
lookup[k] = {[result]: 1}
} else {
lookup[k][result] = 1 + lookup[k][result] || 0
}
j = i
}
return lookup
}
const nextToken = (k) => {
const ratio = lookup[k]
const rnd = Math.random()
const total = Object.values(ratio).reduce((acc, el) => acc + el)
let seen = 0
for (let [k, v] of Object.entries(ratio)) {
seen += v/total
if (rnd < seen) {
return k
}
}
}
const lookup = createLookup(input)
// You need something to start with - I am using ab as a seed here
let output = "ab"
for (let i = 0; i < 50; i++) {
output += nextToken(output.slice(-2))
}
document.getElementById("result").innerHTML = output