浅夏110 发表于 2020-5-15 10:55

泊松随机数的生成算法:数学推导和程序实现

最近在做一个机器学习的项目,其中用到了泊松随机数。查维基百科 Poisson distribution 发现了一个算法,可以生成泊松随机数:
algorithm poisson random number (Knuth):    init:         Let L ← e^{−λ}, k ← 0 and p ← 1.    do:         k ← k + 1.         Generate uniform random number u in and let p ← p × u.    while p > L.    return k − 1.
词条里面没有解释为什么这个算法可以生成泊松随机数,我在此给出证明。
第一节:算法描述上面的这个算法可以描述为:第一步:给定一个参数 https://www.zhihu.com/equation?tex=%5Clambda+%3E+0 0" style="max-width: 100%; vertical-align: middle; margin-right: 3px; margin-left: 3px; display: inline-block;"> , 生成一系列的随机数,这些随机数服从 https://www.zhihu.com/equation?tex=%5Ctext%7BUniform%7D%280%2C+1%29 分布,也就是这些随机数在开区间 (0, 1) 之间均匀分布。第二步:求这些随机数的乘积,当乘积小于或者等于 https://www.zhihu.com/equation?tex=e%5E%7B-%5Clambda%7D 时,程序停止。记下此时参与求乘积的随机数的个数。第三步:程序终止时参与乘积的随机数的个数减一服从参数为 https://www.zhihu.com/equation?tex=%5Clambda 的泊松分布。
第二节:算法的数学表达为了证明这个算法确实可以生成泊松随机数,我们记


https://www.zhihu.com/equation?tex=P+%3D+%5Cprod_%7Bi+%3D+1%7D%5E%7Bn%7DX_%7Bi%7D%2C+X_i+%5Csim+%5Ctext%7BUniform%7D%280%2C+1%29这就等价于https://www.zhihu.com/equation?tex=%5Clog+P+%3D+%5Csum_%7Bi+%3D+1%7D%5E%7Bn%7D+%5Clog+X_%7Bi%7D已知随机变量 https://www.zhihu.com/equation?tex=X 的概率密度为 https://www.zhihu.com/equation?tex=f_X%28x%29 ,令 https://www.zhihu.com/equation?tex=Y+%3D+%5Clog+X .https://www.zhihu.com/equation?tex=p%28Y+%5Cle+y%29+%3D+%5Cint_%7B-%5Cinfty%7D%5E%7By%7D+f_%7BY%7D%28y%5E%7B%5Cprime%7D%29dy%5E%7B%5Cprime%7D+%3D+p%28%5Clog+X+%5Cle+y%29+%3D+p%28X+%5Cle+e%5E%7By%7D%29+%3D+%5Cint_%7B-%5Cinfty%7D%5E%7Be%5E%7By%7D%7D+f_%7BX%7D%28x%29dx所以变量 https://www.zhihu.com/equation?tex=Y 的概率密度为https://www.zhihu.com/equation?tex=f_%7BY%7D%28y%29+%3D+%5Cfrac%7Bd%7D%7Bdy%7D%5Cint_%7B-%5Cinfty%7D%5E%7Be%5E%7By%7D%7D+f_%7BX%7D%28x%29dx+%3D+f_%7BX%7D%28e%5E%7By%7D%29+e%5E%7By%7D已知https://www.zhihu.com/equation?tex=f_%7BX%7D%28x%29+%3D%5Cbegin%7Bcases%7D+1+%26+0+%5Cleq+x%5Cleq+1+%5C%5C+0+%26+%5Ctext%7Botherwise%7D+%5Cend%7Bcases%7D所以https://www.zhihu.com/equation?tex=f_%7BY%7D%28y%29+%3D+%5Cbegin%7Bcases%7D+e%5E%7By%7D+%26+-%5Cinfty+%3C+y+%5Cleq+0+%5C%5C+0+%26+%5Ctext%7Botherwise%7D+%5Cend%7Bcases%7D这是一个指数分布。因为随机变量 https://www.zhihu.com/equation?tex=%5Clog+P+%3D+%5Csum_%7Bi+%3D+1%7D%5E%7Bn%7D+%5Clog+X_%7Bi%7D+%3A%3D+%5Csum_%7Bi+%3D+1%7D%5E%7Bn%7D+Y_%7Bi%7D ,所以我们要计算一系列服从指数分布的随机变量的和。已知,对于独立随机变量 https://www.zhihu.com/equation?tex=X%2C+Y ,它们的和 https://www.zhihu.com/equation?tex=Z+%3D+X+%2B+Y 的概率密度为https://www.zhihu.com/equation?tex=f_%7BZ%7D%28z%29+%3D+%5Cint_%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D+f_%7BX%7D%28%5Cxi%29+f_%7BY%7D%28z+-+%5Cxi%29+d%5Cxi这是两个概率密度函数的卷积。做傅里叶变换,得到 https://www.zhihu.com/equation?tex=Z 的概率分布的特征函数是 https://www.zhihu.com/equation?tex=X%2C+Y 两个随机变量的概率密度分布的特征函数的乘积。为了计算 https://www.zhihu.com/equation?tex=%5Clog+P 的概率分布,我们先要计算指数分布的特征函数。根据特征函数的定义,我们有https://www.zhihu.com/equation?tex=%5Chat%7Bf%7D_%7BY%7D%28%5Ceta%29+%3D+%5Cint_%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D+f_%7BY%7D%28y%29+e%5E%7Bi%5Ceta+y%7D+dy+%3D+%5Cfrac%7B1%7D%7Bi%5Ceta+%2B+1%7D所以变量 https://www.zhihu.com/equation?tex=%5Clog+P 的概率密度的特征函数为 https://www.zhihu.com/equation?tex=%5Cfrac%7B1%7D%7B%28i%5Ceta+%2B+1%29%5En%7D .第三节:根据概率密度的特征函数计算所对应的概率密度现在我们已经知道了概率密度的特征函数,接下来我们要根据这个特征函数计算所对应的概率密度。做傅里叶逆变换可以得到所对应的概率密度分布:https://www.zhihu.com/equation?tex=I%28y%29+%3D+%5Cfrac%7B1%7D%7B2%5Cpi%7D+%5Cint_%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D+%5Cfrac%7B1%7D%7B%28i%5Ceta+%2B+1%29%5En%7D+e%5E%7B-i+%5Ceta+y+%7D+d%5Ceta因为变量 https://www.zhihu.com/equation?tex=%5Clog+P 是一系列负数的求和,所以上面的积分中, https://www.zhihu.com/equation?tex=y+%3C+0 .选择如下图所示的一个积分围道:
https://pic4.zhimg.com/80/v2-d8460c20dcbc5f4f5884b90bed2d89e3_720w.jpg计算在这个围道上的积分:https://www.zhihu.com/equation?tex=%5Cfrac%7B1%7D%7B2%5Cpi%7D+%5Coint_%7B%5Cgamma_R%7D+%5Cfrac%7B1%7D%7B%28iz+%2B+1%29%5En%7De%5E%7B-izy%7D+dz这个积分可以分为两部分,第一部分是沿着横轴求积分,第二部分是沿着外面的大圆求积分。可以证明沿着大圆的积分为零,因为https://www.zhihu.com/equation?tex=%5CBigg%5Cvert%5Cfrac%7B1%7D%7B2%5Cpi%7D+%5Cint_%7Bz+%3D+R+e%5E%7Bi%5Ctheta%7D%2C+%5Csin%5Ctheta+%3E+0%7D+%5Cfrac%7B1%7D%7B%28iz+%2B+1%29%5En%7De%5E%7B-iyR%28%5Ccos%5Ctheta+%2B+i%5Csin%5Ctheta%29%7D+dz%5CBigg%5Cvert+%5Cleq+%5Cfrac%7B1%7D%7B2%5Cpi%7D%5Cint_%7Bz+%3D+R+e%5E%7Bi%5Ctheta%7D%7D+%5Cfrac%7B1%7D%7B%28R%2B1%29%5En%7D+e%5E%7ByR%5Csin%5Ctheta%7D+Rd%5Ctheta%5Crightarrow+0 0} \frac{1}{(iz + 1)^n}e^{-iyR(\cos\theta + i\sin\theta)} dz\Bigg\vert \leq \frac{1}{2\pi}\int_{z = R e^{i\theta}} \frac{1}{(R+1)^n} e^{yR\sin\theta} Rd\theta\rightarrow 0" style="max-width: 100%; vertical-align: middle; margin-right: 3px; margin-left: 3px; display: inline-block;">当大圆半径为无穷大的时候该积分趋近于零,因为当 https://www.zhihu.com/equation?tex=y+%3C+0%2C+%5Csin%5Ctheta+%3E+0 0" style="max-width: 100%; vertical-align: middle; margin-right: 3px; margin-left: 3px; display: inline-block;"> 时,https://www.zhihu.com/equation?tex=e%5E%7ByR%5Csin%5Ctheta%7D 以指数速度衰减到零。所以我们就有https://www.zhihu.com/equation?tex=%5Cfrac%7B1%7D%7B2%5Cpi%7D+%5Coint_%7B%5Cgamma_R%7D+%5Cfrac%7B1%7D%7B%28iz+%2B+1%29%5En%7De%5E%7B-izy%7D+dz+%3D+%5Cfrac%7B1%7D%7B2%5Cpi%7D+%5Cint_%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D+%5Cfrac%7B1%7D%7B%28i%5Ceta+%2B+1%29%5En%7D+e%5E%7B-i%5Ceta+y%7D+d%5Ceta根据柯西积分定理,左边的积分为https://www.zhihu.com/equation?tex=%5Cfrac%7B1%7D%7B2%5Cpi%7D+%5Coint_%7B%5Cgamma_R%7D+%5Cfrac%7B1%7D%7B%28iz+%2B+1%29%5En%7De%5E%7B-izy%7D+dz+%3D+%5Cfrac%7B1%7D%7B2%5Cpi%7D+%5Coint_%7Bz+%3D+i+%2B+%5Cdelta+e%5E%7Bi%5Ctheta%7D%7D+%5Cfrac%7B1%7D%7B%28i%5Cdelta+e%5E%7Bi%5Ctheta%7D%29%5En%7De%5E%7B-iy%28i+%2B+%5Cdelta+e%5E%7Bi%5Ctheta%7D%29%7D%5Cdelta+e%5E%7Bi%5Ctheta%7Di+d%5Ctheta+%3D+%5Cfrac%7B1%7D%7B2%5Cpi%7D+e%5E%7By%7D%5Coint%5Cfrac%7Be%5E%7B-iy%5Cdelta+e%5E%7Bi%5Ctheta%7D%7D%7D%7B%28i%5Cdelta+e%5E%7Bi%5Ctheta%7D%29%5E%7Bn-1%7D%7Dd%5Ctheta上面式子最右边的积分为https://www.zhihu.com/equation?tex=%5Coint%5Cfrac%7Be%5E%7B-iy%5Cdelta+e%5E%7Bi%5Ctheta%7D%7D%7D%7B%28i%5Cdelta+e%5E%7Bi%5Ctheta%7D%29%5E%7Bn-1%7D%7Dd%5Ctheta+%3D+%5Coint+%5Csum_%7Bm+%3D+0%7D%5E%7B%5Cinfty%7D+%5Cfrac%7B%28-y%29%5Em+%28i%5Cdelta+e%5E%7Bi%5Ctheta%7D%29%5E%7Bm-n%2B1%7D%7D%7Bm%21%7Dd%5Ctheta+%3D+%5Csum_%7Bm+%3D+0%7D%5E%7B%5Cinfty%7D%5Cfrac%7B%28-y%29%5Em%7D%7Bm%21%7D+2%5Cpi+%5Cdelta_%7Bm%2C+n-1%7D+%3D+2%5Cpi+%5Cfrac%7B%28-y%29%5E%7Bn-1%7D%7D%7B%28n-1%29%21%7D所以围道积分为https://www.zhihu.com/equation?tex=%5Cfrac%7B1%7D%7B2%5Cpi%7D+%5Coint_%7B%5Cgamma_R%7D+%5Cfrac%7B1%7D%7B%28iz+%2B+1%29%5En%7De%5E%7B-izy%7D+dz+%3D+e%5E%7By%7D%5Cfrac%7B%28-y%29%5E%7Bn-1%7D%7D%7B%28n-1%29%21%7D%2C+%5Ctext%7B+%7Dy+%3C+0最终我们得到随机变量 https://www.zhihu.com/equation?tex=%5Clog+P 所服从的概率密度函数为https://www.zhihu.com/equation?tex=f_%7B%5Clog+P%7D%28y%29+%3D+%5Cbegin%7Bcases%7D+%5Cfrac%7B%28-y%29%5E%7Bn-1%7D%7D%7B%28n-1%29%21%7De%5E%7By%7D+%26+-%5Cinfty+%3C+y+%5Cleq+0+%5C%5C+0+%26+%5Ctext%7Botherwise%7D+%5Cend%7Bcases%7D这个分布的名字叫做 https://www.zhihu.com/equation?tex=%5CGamma 分布。显然,根据上式,当 https://www.zhihu.com/equation?tex=n+%3D+1 的时候,上面的分布退化为指数分布。
第四节:计算随机变量 https://www.zhihu.com/equation?tex=P 的概率密度函数已经知道了 https://www.zhihu.com/equation?tex=%5Clog+P 服从 https://www.zhihu.com/equation?tex=%5CGamma 分布,那么计算 https://www.zhihu.com/equation?tex=P 的分布也很简单了。已知https://www.zhihu.com/equation?tex=p%28%5Clog+P+%5Cle+y%29+%3D+%5Cint_%7B-%5Cinfty%7D%5E%7By%7D+f_%7B%5Clog+P%7D%28y%5E%7B%5Cprime%7D%29+dy%5E%7B%5Cprime%7D+%3D+p%28P+%5Cle+e%5E%7By%7D%29+%3D+%5Cint_%7B-%5Cinfty%7D%5E%7Be%5E%7By%7D%7D+f_%7BP%7D%28p%29dp所以https://www.zhihu.com/equation?tex=f_%7BP%7D%28p%29+%3D+p%5E%7B-1%7Df_%7B%5Clog+P%7D%28%5Clog+p%29+%3D+%5Cbegin%7Bcases%7D+%5Cfrac%7B%28-%5Clog+p%29%5E%7Bn-1%7D%7D%7B%28n-1%29%21%7D+%26+0+%3C+p+%5Cle+1+%5C%5C+0+%26+%5Ctext%7Botherwise%7D+%5Cend%7Bcases%7D
第五节:计算 https://www.zhihu.com/equation?tex=p+%3C+e%5E%7B-%5Clambda%7D%2C+%5Clambda+%3E+0 0" style="max-width: 100%; vertical-align: middle; margin-right: 3px; margin-left: 3px; display: inline-block;"> 的概率我们已经知道了变量 https://www.zhihu.com/equation?tex=P 的分布函数,那么就可以计算 https://www.zhihu.com/equation?tex=p+%3C+e%5E%7B-%5Clambda%7D 的概率为https://www.zhihu.com/equation?tex=p%28P+%3C+e%5E%7B-%5Clambda%7D%29+%3D+%5Cint_%7B0%7D%5E%7Be%5E%7B-%5Clambda%7D%7D+%5Cfrac%7B%28-%5Clog+p%29%5E%7Bn-1%7D%7D%7B%28n-1%29%21%7D+dp+%3D+%5Cfrac%7B1%7D%7B%28n-1%29%21%7D+%5Cint_%7B%5Clambda%7D%5E%7B%5Cinfty%7D+e%5E%7B-t%7D+t%5E%7Bn-1%7Ddt因为这个概率依赖于 https://www.zhihu.com/equation?tex=n ,所以可以将这个概率重新写作https://www.zhihu.com/equation?tex=p_%7Bn%7D%28P+%3C+e%5E%7B-%5Clambda%7D+%29+%3D+%5Cfrac%7B1%7D%7B%28n-1%29%21%7D+%5Cint_%7B%5Clambda%7D%5E%7B%5Cinfty%7D+e%5E%7B-t%7D+t%5E%7Bn-1%7Ddt利用分部积分,可以得到概率的递归关系为https://www.zhihu.com/equation?tex=p_%7Bn%7D%28P+%3C+e%5E%7B-%5Clambda%7D%29+%3D+%5Cfrac%7B%5Clambda%5E%7Bn-1%7D%7D%7B%28n-1%29%21%7D+e%5E%7B-%5Clambda%7D+%2B+p_%7Bn-1%7D%28+P+%3C+e%5E%7B-%5Clambda%7D%29%2C+n+%3E+1 1" style="max-width: 100%; vertical-align: middle; margin-right: 3px; margin-left: 3px; display: inline-block;">因为 https://www.zhihu.com/equation?tex=p_%7B1%7D%28P+%3C+e%5E%7B-%5Clambda%7D%29+%3D+e%5E%7B-%5Clambda%7D ,所以我们有https://www.zhihu.com/equation?tex=p_%7Bn%7D%28P+%3C+e%5E%7B-%5Clambda%7D%29+%3D+%5Csum_%7Bk+%3D+0%7D%5E%7Bn-1%7D%5Cfrac%7B%5Clambda%5Ek%7D%7Bk%21%7De%5E%7B-%5Clambda%7D
第六节:根据对概率的两种等价解释得到泊松分布现在我们已经算出来了当我们用 https://www.zhihu.com/equation?tex=n 个 均匀分布的随机数连乘时,所得到的乘积小于 https://www.zhihu.com/equation?tex=e%5E%7B-%5Clambda%7D%2C+%5Clambda+%3E+0 0" style="max-width: 100%; vertical-align: middle; margin-right: 3px; margin-left: 3px; display: inline-block;"> 的概率。这里,我们相当于是固定了 https://www.zhihu.com/equation?tex=n ,扫描不同的参数 https://www.zhihu.com/equation?tex=%5Clambda ,得到了概率。我们可以换一个角度。这个概率也可以看作是我们固定了参数 https://www.zhihu.com/equation?tex=%5Clambda ,计算需要多少个 之间均匀分布的随机数连乘才能让最后的乘积小于 https://www.zhihu.com/equation?tex=e%5E%7B-%5Clambda%7D . 也就是,https://www.zhihu.com/equation?tex=p%28P+%3C+e%5E%7B-%5Clambda%7D%29+%3D+p%28N+%5Cle+n%29+%3D+%5Csum_%7Bk+%3D+1%7D%5E%7Bn%7Dp%28N+%3D+k%29根据第五节的结果,我们知道https://www.zhihu.com/equation?tex=p_%7Bn%7D%28P+%3C+e%5E%7B-%5Clambda%7D%29+%3D+%5Csum_%7Bk+%3D+0%7D%5E%7Bn-1%7D%5Cfrac%7B%5Clambda%5Ek%7D%7Bk%21%7De%5E%7B-%5Clambda%7D所以,假设 https://www.zhihu.com/equation?tex=n 个 之间均匀分布的随机数连乘后刚好小于 https://www.zhihu.com/equation?tex=e%5E%7B-%5Clambda%7D , 那么 https://www.zhihu.com/equation?tex=n 服从这样的概率分布:https://www.zhihu.com/equation?tex=p%28N+%3D+n%29+%3D+%5Cfrac%7B%5Clambda%5E%7Bn-1%7D%7D%7B%28n-1%29%21%7D+e%5E%7B-%5Clambda%7D这就是泊松分布。
第七节:程序实现我已经写了一个程序来实现这个算法,并且得到了测试结果。程序GitHub地址为PrimerLi/Poisson
第八节:实验结果

图中显示了 https://www.zhihu.com/equation?tex=%5Clambda+%3D+1%2C+%5Clambda+%3D+4%2C+%5Clambda+%3D+10 所对应的泊松分布的概率曲线。横轴为 https://www.zhihu.com/equation?tex=k ,纵轴为 https://www.zhihu.com/equation?tex=p%28k%29+%3D+%5Cfrac%7B%5Clambda%5Ek%7D%7Bk%21%7D+e%5E%7B-%5Clambda%7D . 可以看出,对于不同的参数 https://www.zhihu.com/equation?tex=%5Clambda ,理论计算出来的结果和用Monte Carlo模拟出来的结果相差不大。

德古拉 发表于 2020-5-15 12:00

非常好, 请问是什么语言的程序?
页: [1]
查看完整版本: 泊松随机数的生成算法:数学推导和程序实现