pith. sign in

arxiv: 1809.06520 · v3 · pith:6AHC2TLEnew · submitted 2018-09-18 · 💻 cs.MS · stat.CO

Random problems with R

classification 💻 cs.MS stat.CO
keywords randomcryptorandomintegersfunctionmultiplyingpythonresultsampling
0
0 comments X
read the original abstract

R (Version 3.5.1 patched) has an issue with its random sampling functionality. R generates random integers between $1$ and $m$ by multiplying random floats by $m$, taking the floor, and adding $1$ to the result. Well-known quantization effects in this approach result in a non-uniform distribution on $\{ 1, \ldots, m\}$. The difference, which depends on $m$, can be substantial. Because the sample function in R relies on generating random integers, random sampling in R is biased. There is an easy fix: construct random integers directly from random bits, rather than multiplying a random float by $m$. That is the strategy taken in Python's numpy.random.randint() function, among others. Example source code in Python is available at https://github.com/statlab/cryptorandom/blob/master/cryptorandom/cryptorandom.py (see functions getrandbits() and randbelow_from_randbits()).

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.