Home |
Search |
Today's Posts |
#1
![]()
Posted to rec.audio.opinion
|
|||
|
|||
![]()
I have a quick question...
I need a numerical approximation to Student's t distribution that is accurate to double precision. Unless I am completely mistaken, the TDIST() function in Excel behaves very strangely. It cannot calculate a value for x 0 and the result of this function is the pdf of the distribution, not the cdf unlike NORMSDIST(). I realize that I could construct a better function using the BETADIST() function, but I'm not sure whether this will alleviate or compound whatever inaccuracies are already present in Excel. Currently, the one I am leaning towards is Numerical Recipes. Thanks in advance, Schiz |
#2
![]()
Posted to rec.audio.opinion
|
|||
|
|||
![]()
Schizoid Man wrote:
I have a quick question... I need a numerical approximation to Student's t distribution that is accurate to double precision. Unless I am completely mistaken, the TDIST() function in Excel behaves very strangely. It cannot calculate a value for x 0 and the result of this function is the pdf of the distribution, not the cdf unlike NORMSDIST(). I realize that I could construct a better function using the BETADIST() function, but I'm not sure whether this will alleviate or compound whatever inaccuracies are already present in Excel. Currently, the one I am leaning towards is Numerical Recipes. Thanks in advance, Schiz You can use cdf_tdist from http://members.aol.com/iandjmsmith/Examples.xls However, for EXCEL 2003 cdf_tdist(x,df) = TDIST(-x,df,1) for x 0. The problem is that EXCEL only provides one tail for the distributions they offer, so they appear to guess which tail users would most like evaluated and go with that. You are lucky that the t-distribution is symmetric. On the whole TDIST appears to be reasonably accurate. It does not work if the degrees of freedom are greater than 1e10. When the degrees of freedom are greater than 1e6 the relative error in the values returned by TDIST can drop to about 1e-5. It incorrectly returns 0 for TDIST(7e153,1,1). TDIST(2e154,1,1) returns #NUM! which is much better. However, these are undoubtedly extreme cases for the average user, whatever that might mean. Ian Smith |
#3
![]()
Posted to rec.audio.opinion
|
|||
|
|||
![]() |
#5
![]()
Posted to rec.audio.opinion
|
|||
|
|||
![]()
wrote:
You can use cdf_tdist from http://members.aol.com/iandjmsmith/Examples.xls Thanks for the link, Ian. It does not work if the degrees of freedom are greater than 1e10. When the degrees of freedom are greater than 1e6 the relative error in the values returned by TDIST can drop to about 1e-5. I think if the degrees of freedom are so high, then Student's t is probably the wrong distribution to use in any case. It should begin to converge with the normal for df 100, so that would be a better choice. NORMDIST(), though very good, is still not up to spec for the application I have in mind (with enough digits the value for z=0 will be returned as 0.499999999781721). It incorrectly returns 0 for TDIST(7e153,1,1). TDIST(2e154,1,1) returns #NUM! which is much better. However, these are undoubtedly extreme cases for the average user, whatever that might mean. Ian Smith |
#6
![]()
Posted to rec.audio.opinion
|
|||
|
|||
![]() Schizoid Man wrote: Bill Riel wrote: In article , says... I have a quick question... I need a numerical approximation to Student's t distribution that is accurate to double precision. Unless I am completely mistaken, the TDIST() function in Excel behaves very strangely. It cannot calculate a value for x 0 and the result of this function is the pdf of the distribution, not the cdf unlike NORMSDIST(). I realize that I could construct a better function using the BETADIST() function, but I'm not sure whether this will alleviate or compound whatever inaccuracies are already present in Excel. Currently, the one I am leaning towards is Numerical Recipes. I don't know about Excel - I do recall that in the past that some of the statistical functions were questionable. However, I've used the code from Numerical Recipes and it's worked well - it should meet your precision requirements. Thanks, Bill. That's exactly what I did - a bit convoluted given that you needed a Log Gamma and Incomplete Beta integral in order to get the t, but probably worth it given the precision I am striving for. Watch out with NR algorithms. They have a nasty habit of omitting to mention where they don't work. I know you are only using it for the t-distribution but betacf is slow and inaccurate for both large a&b and for small a&b. The implementation of gammln only guarantees an absolute error of less than 2e-10. That means that when you exponentiate it the relative error in the answer will be 2e-10. This is the best you can claim for your overall answer for the cdf of the t distribution. I believe the GAMMALN function in EXCEL2003 uses the same implementation or something very similar. While we are on similarities of code, I think EXCEL2003 uses equation 6.4.9 in NR, or similar, to calculate the t-distribution figures in terms of the beta distribution. Note, however, that as df gets large, df/(df+t^2) tends to 1 and the BETADIST calculation in EXCEL terms suffers from the fact that cancellation errors occur when evaluating 1-(df/(df+t^2)). For large df and -ve t, it is better to calculate cdf_tdist(t,df) as comp_cdf_beta(t^2/(t^2+df),1/2,df/2) instead of cdf_beta(df/(t^2+df),df/2,1/2). Of course EXCEL does not offer a right tail evaluation for the beta distribution and so cannot take advantage of this obvious method of making the calculations more accurate. Mind you, the NR code does not offer a method of evaluating the right tail for the beta distribution. So the NR algorithm does not appear to have any advantages over the EXCEL2003 version. With the example you gave for NORMDIST, you must be using an earlier version of EXCEL. In that case, I would strongly recommend you use the VBA code in http://members.aol.com/iandjmsmith/Examples.txt This does have code which evaluates the beta distribution for large and small values of the parameters. Additionally it has separate code for the t distribution which is more accurate and does not suffer from problems such as df/(t^2+df) overflowing in the calculation of t^2 or underflowing if calculated as df/t/(t+df/t). In other words it works. Ian Smith |
#7
![]()
Posted to rec.audio.opinion
|
|||
|
|||
![]()
wrote:
Watch out with NR algorithms. They have a nasty habit of omitting to mention where they don't work. I know you are only using it for the t-distribution but betacf is slow and inaccurate for both large a&b and for small a&b. I agree. The algorithms are very badly explained and the code is extremely non-intuitive. However, the t-distribution algorithm looks like it is fairly straightforward. While we are on similarities of code, I think EXCEL2003 uses equation 6.4.9 in NR, or similar, to calculate the t-distribution figures in terms of the beta distribution. I heard that a lot of the statistics functions were re-tooled in Excel 2003. I am still on Excel 2002 and the TDIST() function is a joke, just like the NORMINV. Consequently, NORMDIST and TINV are not bad. I believe NORMDIST is based on Abramovitz-Stegun, which was I was originally planning to use. However, I came across an algorithm by J.F. Hart that is accurate to at least double precision. Mind you, the NR code does not offer a method of evaluating the right tail for the beta distribution. So the NR algorithm does not appear to have any advantages over the EXCEL2003 version. That's not a problem, once I get one tail calculated correctly, the other tail is a matter of simple interpolation. With the example you gave for NORMDIST, you must be using an earlier version of EXCEL. In that case, I would strongly recommend you use the VBA code in http://members.aol.com/iandjmsmith/Examples.txt This does have code which evaluates the beta distribution for large and small values of the parameters. Additionally it has separate code for the t distribution which is more accurate and does not suffer from problems such as df/(t^2+df) overflowing in the calculation of t^2 or underflowing if calculated as df/t/(t+df/t). In other words it works. Thanks, Ian. I'll have a look. In case I need to make up a bibitem reference, what should I use? |
#8
![]()
Posted to rec.audio.opinion
|
|||
|
|||
![]()
Schizoid Man wrote:
wrote: Watch out with NR algorithms. They have a nasty habit of omitting to mention where they don't work. I know you are only using it for the t-distribution but betacf is slow and inaccurate for both large a&b and for small a&b. I agree. The algorithms are very badly explained and the code is extremely non-intuitive. However, the t-distribution algorithm looks like it is fairly straightforward. While we are on similarities of code, I think EXCEL2003 uses equation 6.4.9 in NR, or similar, to calculate the t-distribution figures in terms of the beta distribution. I heard that a lot of the statistics functions were re-tooled in Excel 2003. I am still on Excel 2002 and the TDIST() function is a joke, just like the NORMINV. Consequently, NORMDIST and TINV are not bad. I believe NORMDIST is based on Abramovitz-Stegun, which was I was originally planning to use. However, I came across an algorithm by J.F. Hart that is accurate to at least double precision. Mind you, the NR code does not offer a method of evaluating the right tail for the beta distribution. So the NR algorithm does not appear to have any advantages over the EXCEL2003 version. That's not a problem, once I get one tail calculated correctly, the other tail is a matter of simple interpolation. With the example you gave for NORMDIST, you must be using an earlier version of EXCEL. In that case, I would strongly recommend you use the VBA code in http://members.aol.com/iandjmsmith/Examples.txt This does have code which evaluates the beta distribution for large and small values of the parameters. Additionally it has separate code for the t distribution which is more accurate and does not suffer from problems such as df/(t^2+df) overflowing in the calculation of t^2 or underflowing if calculated as df/t/(t+df/t). In other words it works. Thanks, Ian. I'll have a look. In case I need to make up a bibitem reference, what should I use? The NR code for the t-distribution is straightforward. The relationship it is based on is well known (see A&S 26.7.1) However, the code gives the unwary a warm feeling which is not justified. With the algorithm as given in NR, for t=-2.3e-8, df=10 I get errors of 2e-8 for t=-8.4e-8, df=100 I get errors of 7e-8 for t=-2.3e-7, df=1000 I get errors of 2e-7 for t=-2.3e-4, df=1e9 I get errors of 2e-4 It is never going to be high precision code without the changes I suggested and the changes you suggested. Your suggestion of using a Normal approximation instead of the t distribution for large degrees of freedom is a good one. A&S 26.7.8 offers a very good Normal approximation. My point is that the code as given in NR is not really very good. It supposedly caters for people without a strong background in numerical analysis and should not need to be examined closely and corrected for its shortcomings. There is enough bad code about already and in my opinion books like NR lead to an increase in bad code. I am not against books with algorithms for non-experts in a subject. I just feel that the standards of such code should be high - much higher than that exhibited by the NR code. I do not know much about the algorithm by J. F. Hart. I believe it is similar to Algorithm 715 by W. J. Cody, used in DCDFLIB, Gnumeric..., which is based on rational Chebyshev approximations. These are quicker and more accurate than my cnormal function but are fixed precision algorithms. In other words if I decide I need to work with quadruple precision, I would need to work out a completely different set of rational Chebyshev approximations whereas with cnormal I only have to change the constants which control accuracy. It is also easier to check that I have implemented the code in cnormal correctly as it is based on a couple of well known formulae (see A&S 25.2.14 for series, the continued fraction is adapted from the modified version of Legendre's continued fraction). So I prefer arbitrary precision algorithms where speed is not absolutely crucial. However, I have included AS241 also based on rational Chebyshev approximations for inverting the normal distribution function. It is quite easy to write arbitrary-precision code to invert it but the cdf of the normal distribution is inverted frequently and I felt the speed improvement more than justified its inclusion. As regards information about my code, here is a link (http://members.aol.com/iandjmsmith/Accuracy.htm) to some notes on some of the theory on which my code is based and information about the accuracy you can expect. There isn't any information there specifically about the t distribution. The t-distribution was only included originally because it was needed for the asymptotic expansion used for the beta distribution. The code is complicated because in cdf_tdist, if you comment out the df = Fix(df) statement, it also works for non-integral degrees of freedom. However for integral degrees of freedom, it is not difficult to do an error analysis. The accuracy is proportional to the log of the answer (assuming the answer is not 0 or 1). This is effectively the absolute error in calculating df * 0.5 * log0(-(x/(x+df/x))) where log0(x) = ln(1+x). |9*ln(answer)*2^-53| is, I believe, a conservative error bound. For answers which are not 0 or 1, a global bound of 1e-12 will be attained. Even with modifications to the NR algorithm 2e-10 is the best that can be claimed due to use of the gammaln function, until the degrees of freedom are large enough to justify swapping to a Normal approximation at which point you might be able to claim a bound of somewhere around |2*ln(answer)*2^-53|. I'm not very good at writing up all this sort of stuff - hence no long lists of published papers , although I hope my notes referenced above are of some use to anyone interested. I have had considerable help from Jerry W. Lewis with testing and improvements to the code. In particular Jerry pointed out the problem that the value 0 was returned for cdf_tdist(x,1) for x -1e154 or so. Fortunately, Jerry is one of the few people fussier than I about getting things right! Ian Smith |
Reply |
Thread Tools | |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
how to reduce background noise? | Tech | |||
Sub Amps - a Follow up Question | Tech | |||
MP3 background noise artifact | Tech | |||
More Magazine Statistics | Audio Opinions | |||
background vocal versus background music? | Pro Audio |