Imputing new worlds with multiversal appeal

Monthly Archives: July 2014

So today we’re covering standardized biases and how we would like our standardized biases less than 50%, just as we would like our coverage rates greater than 90%.  You’d be surprised how long it could take to have those two babies work for you.  But you’d also be surprised how magically faster they go after your advisor tells you he’s going on sabbatical to Turkey in three months and wants you to defend before then.  So what is this potent root of many a sleepless nights during my grad school days?  Well, a standardized bias (SB) tells you how big of a difference there is between the true parameter of interest and the average of the parameters obtained from each of your simulations relative to the standard error of those parameters.  So, take our previous example of the probability of Tina being in a justified realm as 12%.  But say the mean probability of her being in such a realm from the 100 generated samples of size 50 as 12.34% and the standard error gotten from each of the 100 probabilities from each sample is 3.96%.  So we take the SB as |12% – 12.34%|/3.96% = 0.0851 or 8.51%. And 8.51% is less than 50% so, umm, yay.  And that also concludes today’s lesson.  But stay tuned as we will cover the root mean square error eventually.  Might take a little break though as I’m attending the Joint Statistical Meetings in Boston. Hey, I hear Boston is becoming the celebrity central of New England so maybe I should take my trilogy with me and if I see anyone from my fantasy cast there, then … yes, I’ll stop.  Till next time, I leave with a beautiful view of the Boston harbor.  But seriously, if I were only to see Matt Damon at Legal



So a coverage rate now is the probability that the true value of a parameter estimate lies within a confidence interval, like a 95% confidence interval. So it’s basically the number of times you are right or at least pretty close in guessing the value over the total number of times you tried.  So take our character Tina again and say she could be in an either a justifiable or unjustifiable dimension.  And what is a justifiable dimension you ask?  Well, you’ll just to read … you know where I’m going with this, don’t you?  Now, say there are 6 justifiable dimensions total out of, lets make it 50 total dimensions.  So the probability that Tina would be 6/50 or 12%.  And the 95% confidence interval for that probability is (2.99%, 21.01%).  I’ll spare you the details but yeah, I did some fancy math stuff to get that.  Okay, so actually, I used this site to get that but I’m pretty sure the site used the fancy math stuff.  Now, say I generate 100 samples of size 50 and get probabilities of Tina being in a justified dimension between 6% and 26%, all of them associated with their own confidence intervals.  Now, doing some fancy math stuff (that I did all my myself — thank you very much — with the help of R), I calculated that and noticed that the confidence intervals of the generated probabilities and the confidence interval of the true probability overlap except for 4.92% of the time, meaning the true probability of Tina being in a justified dimension is covered by the 95% confidence intervals of the probabilities from the generated data 95.08% of the time.  Ain’t that cool?  And coverage rates can be computed for different parameters obtained from different random number generation methods as well as from different imputation methods.  In fact, I used to compute them a lot as part of my dissertation.  By the way, a good coverage rate is usually 90% or greater.  Funny thing about simulations though is you run this simulation after two, three nights and think you’re done only to find that your coverage rate is 89.99%.  Just so you get the gist of my graduate life.  But eventually I did start getting those babies at 90% or up after which I did this little dance.SnoopyDance

And then came the next challenge: trying to get standardized biases below 50%  – which I have decided to cover next time.  Until then, turn that frown upside-down, Charlie Brown!  Okay, not the most original Peanuts reference … but, um, okay, bye!


Now that we covered combinations with factorials and a bit about sampling without replacement, I’d thought we go on to sample with replacement and bootstrapping.  Yeah, cuz maybe if we’ve been to Kokomo once, say, we want to go there again — or we want to go to Key Largo three times, assuming we can go to a place three times, or Montego twice or … you get the picture.  Anyway, so we could do this many, many times and then come up with a distribution of values and summarize the general characteristic of those values, say with a mean.  Like, for example, in case you didn’t know the major American Physical Society March (APS) meeting in 2018 will be happening in Los Angeles.  Eh? APS physics, meeting … LA … would be perfect for an Order of The Dimensions movie premiere!  Oh, just humor me and pretend it’s possible.  So say, the APS and the execs at Universal fly me in for the meeting/premiere and want to put me at a hotel by the convention center where the meeting will be held.  Now, I checked the hotels around that area and saw that they range in ratings from 2.5 to 4.0.  I would hope that they (read: they better) put me in a hotel closer to a 4.0-rating, but for argument’s sake, I generated a set of 10 values between 2.5 and 4.0 using a uniform distribution so I got a fairly equal chance of getting a rating between those two values.  Now, I sampled with replacement 100 values from those 10 numbers and got a distribution that looked like this:


where the mean of all 100 sampled values was 3.22.  Which I guess is okay for a hotel rating, although in reality, you future planners out there — think Ritz-Carlton.  Thanks.  But anyway, that is one application of the bootstrap.  But how can we assess if the parameters obtained from this particular simulation or any random number generation or imputation  technique ?  Well,we can look at stuff like the standardized bias, root mean squared error, and the coverage rate.  But we can get into that next time.  Until then … APS … Universal … March meeting 2018 –think about it 😉

So now that we’ve covered factorials just thought I’d go a little into what we can do with them.  And I even thought of any example to do it without pimping out my books.  I’ll try.  Now say we get to choose to go to Bermuda, Bahamas, Key Largo, Montego, and whatever the fifth place in that Kokomo song is over two day, but we can go only to one place one day and we cannot go to the same place twice.  So we can chose 2 out of 5 combinations or 5 ‘chose’ 2.  Got it? So, let me denote each place we can visit as BE, BA, KL, MO, and whatever the fifth place is.  Oh wait … it’s Kokomo!  Thank you, Metro Lyrics!  Okay, that one we can denote as KO then.   The total number of combinations of places we can visit is (BE,BA) + (BE,KL) + (BE,MO) + (BE, KO) + (BA,KL) + (BA,MO) + (BA, KO) + (KL,MO) + (KL,KO) + (MO, KO) = 10.

Now, look at this.


Eh?  Eh?  What did I tell ya?  We basically could do the same calculation using factorials.  Which might be useful if we want to increase our number of places to 10 or 20 or 50 or even more.  Otherwise, we could end up summing up millions of terms, but let’s be real — who wants that?  Or like if Tina is going through a grad school catalogue and wants to choose two from the thirty-something areas of study like she might in this fourth book I might write.  What?  I’m not talking about my trilogy!  I’m talking about a fourth … okay, I’ll stop.  But yeah, that’s basically how we can do combinations with factorials.  But join me next time as I talk about sampling without replacement as opposed to sampling with replacement such as bootstrapping, which we will cover even later. And speaking of bootstrapping …


There you go.  Happy Friday!  Unless you’re not reading this on a Friday — then it’s Happy Day other than Friday!



Hi!  So today we’re talking about factorials in permutations!  How exciting! Yes! Recently I had a colleague tell me about the first time she saw exclamation marks on a math quiz!  She thought the proctors were just really excited about the numbers!  Much to the amusement of her mathematician husband! And … okay, it’s getting annoying so I’ll stop now.  But anyway, an exclamation mark after a number actually indicates a factorial, or the product of that number and all the numbers less than it.  So, for example,

5! = 5 x 4 x 3 x 2 x 1 = 120.

Or in the case of Tina, lets say there are different dimensions where she can live in one of twelve cities and have one of four jobs and be married or not.  Then she technically could find herself involved in as many as 12! x 4! x 2! = 2,299,2076,800 situations, each taking place in a different dimension.  Whew!  Now, I realize that’s a lot so I cut down drastically on that number and only covered several of them in my books.  But I still had people think I covered too many and let it become too confusing.  And one didn’t like that she didn’t quite get a happy ending.  Now granted, maybe her life isn’t perfect in all the remaining dimensions at the end but she does have her happy moments too.  Like in the St. Louis realm, she goes to this cool 80s party and to LA and meets some hunky Hollywood actor types and then goes to the zoo.  And who doesn’t like the zoo?  Especially a happy zoo?


Well, but anyway – join me next time when I will continue on combinations with factorials and other stuff we can do with them.  Hint: prepare to get choosey.