0 votes
1 view
in Data Science by (17.6k points)

Say I have an array like so:

const alphabet = ['a', 'b', 'c', 'd'];

This represents 4 political candidates and a rank choice vote, where candidate a is first choice, b is second choice, etc.

I want to shuffle this into a bunch of random orders, but in this case I want a to appear first with probably 60%, b second with probability 20%, and c third with probability 10%, and all the other ordering with probably 10%. Is there some lodash and ramda functionality that can accomplish this or?

This is for testing a rank choice voting algorithm. Shuffling the array randomly yields candidates that all have pretty much identical vote counts which doesn't mirror most reality (although I will test for that too).

I have this pretty horrible routine which will generate one random array:

const getValues = function () {

  const results = [];

  const remaining = new Set(alphabet);

  const probabilities = [0.6, 0.2, 0.1, 0.1];

  for(let i = 0; i < alphabet.length; i++){

    const r  = Math.random();

    const letter = alphabet[i];

    if(r < probabilities[i] && remaining.has(letter)){

      results.push(letter);

      remaining.delete(letter);

    }

    else{

      const rand = Math.floor(Math.random()*remaining.size);

      const x = Array.from(remaining)[rand];

      remaining.delete(x);

      results.push(x);

    }

  }

   return results;

};

this "works" but doesn't quite order things according to the specified probabilities, because of conditional probability. Does someone know of a good way to have the order appear with certain probability, as I described above?

Here is some sample output that I am looking for:

[ [ 'd', 'b', 'a', 'c' ],

  [ 'a', 'b', 'c', 'd' ],

  [ 'a', 'd', 'b', 'c' ],

  [ 'd', 'b', 'a', 'c' ],

  [ 'b', 'c', 'a', 'd' ],

  [ 'a', 'b', 'c', 'd' ],

  [ 'd', 'b', 'c', 'a' ],

  [ 'c', 'd', 'a', 'b' ],

  [ 'd', 'b', 'a', 'c' ],

  [ 'a', 'b', 'c', 'd' ] ]

if you generated enough data it wouldn't fit the desired order/distribution.

1 Answer

0 votes
by (38.2k points)
edited by

For getting the desired output, you can take a random part of the array, after that  normalize the remaining possibilities and take another one until all items are taken.

 This can be seen in counts of the items and their final index.

const

    getIndex = (prob) => prob.findIndex((r => p => r < p || (r -= p, false))(Math.random())),

    normalized = array => {

        var sum = array.reduce((a, b) => a + b, 0);

        return array.map(v => v / sum);

    };

var items = ['a', 'b', 'c', 'd'],

    probabilities = [0.6, 0.2, 0.1, 0.1],

    counts = { a: { 0: 0, 1: 0, 2: 0, 3: 0 }, b: { 0: 0, 1: 0, 2: 0, 3: 0 }, c: { 0: 0, 1: 0, 2: 0, 3: 0 }, d: { 0: 0, 1: 0, 2: 0, 3: 0 } },

    l = 100,

    index,

    result = [], 

    subP,

    subI,

    temp;

while (l--) {

    temp = [];

    subP = probabilities.slice();

    subI = items.slice();

    while (subP.length) {

        sum = subP.reduce

        index = getIndex(normalized(subP));

        temp.push(subI[index]);

        subI.splice(index, 1);

        subP.splice(index, 1);

    }

    result.push(temp);

}

console.log(result.map(a => a.join()));

result.forEach(a => a.forEach((v, i) => counts[v][i]++));

console.log(counts);

.as-console-wrapper { max-height: 100% !important; top: 0; }

Learn Statistics and Probability for Data Science

If you wish to learn more about how to use python for data science, then go through data science python programming course by Intellipaat for more insights.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...