Imputing new worlds with multiversal appeal

Monthly Archives: April 2014

Okay, so this isn’t much of a battle either as many statisticians nowadays adopt principles of both. Like when I started grad school, I was a frequentist with Bayesian tendencies but by working under my advisor who was a Bayesian with frequentist tendencies, I became a Bayesian with frequentist tendencies.  So what is the difference between a frequentist and a Bayesian and is it like being Team Aniston vs. Team Jolie?  Or Team Edward vs. Team Jacob?  Or Team Randy vs. Team Anton?  What?  You’re not familiar with the last one?  Well, you will be in a few years.  Or at least just make my day and pretend you will be.  That’d be terrific, thanks. Anyway, a frequentist bases his or her inferences on the data at hand and assumes that the parameters related to the data, i.e., values such as the mean, median, mode, and so on remain fixed, although the data could change.  But a Bayesian might believe that the data will stay the same, although parameters might change, depending on how much more information we can gain from our data.  So let’s say I go to bed and happen to have a Multiverser (a black box device that can transport me into any dimension as featured in Order of The Dimension, Revised Orders, and Final Orders) in my room and I want to determine the probability of me sleep walking into the black box and waking up in a dimension in Bermuda.  I know … if only, right?

A frequentist will tell you that if I haven’t sleep walked into the black box yet, the probability is very small if not zero that I ever will.  But if I do walk into that box in the future, he or she would argue that a higher probability of me doing so was always there.  It was just underestimated because I’ve never done it before.  But a Bayesian would say that if I was destined to sleep walk into the black box and wake up in Bermuda, I will sleep walk into the black box and wake up in Bermuda.  And if that ever does happen, we can update the probability of that happening once it happens.  So who’s right?  Well, maybe they both are right and both are wrong.  Maybe both data and parameters change or on the other hand, maybe they both stay the same.    Take the discovery of gravitional waves as an example.  Have the data pertaining to the waves changed or have modifications in our parameter estimations pertaining to cosmic data allowed us to discover them?  Personally, I’ve come to the realization that although both data and parameters can be updated, it might be easier for us to observe changes in the parameters, thus leading me to become a Bayesian with frequentist tendencies.  So that’s the current stance of Bayesian vs. frequentist thinking in a nutshell.  Not really like Team Randy vs. Team Anton, but sort of.  Although who would want to be Team Anton since he’s the villain?  Well, I did have one reviewer who gave me a positive review (YAY!), who was definitely Team Anton.  At one point, she also compared Anton to Pinky and The Brain.  Which lead me to this HAWT image of these deep, dark, sexy hunks:


Well, anyway, join me next time when I discuss the Bayesian inference involved in Monte Carlo Marko chains and how they could be also used to create different dimensions.  Just still hoping one chain takes me to a realm in Bermuda.  Hey, sorry to sound like a broken record about that but have you heard about the winter in Chicago this year?  Seriously, if you lived through it, don’t tell me you wouldn’t be obsessed with walking into a box and walking out on a tropical island.  Really now.

Okay, so they are not really in battle with one another since they are two different things.  I just wanted to make the title all dramatic like.  And how are they different you ask?  Well, you know how I said that multiple imputation isn’t really making up data?  Random number generation is making up data.  The two procedures do have their similarities though.  Like they both involve drawing values from a certain distribution.  In multiple imputation, this distribution usually comes from data you’ve already observed.  In random number generation, you can basically pre-determine any distribution you want — that’s the making up part.  But, just like with multiple imputation, you could redraw values a bunch of times until you get it to converge to some value you want it to converge to.  Get it?  Got it?  Good.  Now, let’s move on.  Now, because you are making up data, random number generation is technically more flexible than multiple imputation in that you could draw values from any distribution you specify.  So technically, I could generate some values that might lead me to a dimension where I’m with a former sexy, Soviet spy in an exotic location.  Although in most applications, scientists still base their observations on previously observed data as they want to do computations that would aid in real life applications.  There are also different ways to generate data just like there are different ways to impute data, and we could say that all these ways fall under the field of computational statistics.  But all those ways might be too much to cover in this one lil post.  So I’ll just cover one facet at a time here.  I will just say again here that it is possible to generate (and impute actually) different types of data, like continuous data that can take any value on a certain interval, categorical data which don’t really involve number stuff, like color or say, movie studios (still debating whether I should sell my movie rights to Universal, Warner Brothers, or Paramount … hmmm), and binary values that only take two values like 0 or 1.  By the way, speaking of binary data …



Now, I don’t know about you, but that one gets me every time.


And hope that’s enough M’s to hold ya down for now!  But anyway, a while back, I attended this work dinner function thingy where one of the faculty statisticians from the university I work at was receiving a lifetime achievement award.  And the title of the presented involved big data which is quite a big topic and one of the reasons that statisticians are in high demand these days.  Translation: I picked a good safety day job until my trilogy thingy takes off! So, anyway the dinner was at the East Bank Club here in Chicago (and their honey-glazed salmon was to die for – yum!) and after dinner, the presentation started. Now, after talking about the ins and outs of big data, the presenter said those two words that made my eyes widen and look up despite having this scrumptious fruit tart thingy in front of me: multiple imputation. And the way he presented multiple imputation was a show in itself. He began talking about how there could be worlds where he could have a different occupation or live in a different city or even be a different gender. He presented a bunch of slides with these different possibilities, one I remember in particular as a quite a superb job of photoshopping Marilyn Monroe’s head on Arnold Schwarzenneger’s body during his Mr. Universe heyday. And I thought to myself: “Wait a minute. Did this guy read my book or something? Do I have a customer I don’t even know about? Holy crap!” That was when I realized that all this time the stuff that I was writing about in my fiction was actually a lot like the stuff I do in my work-related research.

Now, he also talked about some scenarios that would be very unlikely, as I have in my first post, like in the case of being kidnapped by a deep, dark, sexy, Russian spy and becoming his love slave in Malta. And how do we know that such realms are very unlikely if not impossible? Well, I’ve sort of written about this in my books too, making some reviewers even say, “Huh?” What I said was that some worlds are unlikely to exist because the algorithm did not converge. And what the heck is an algorithm and how the heck do we know if it converged? Well, I’m getting to that, okay? An algorithm basically computes what you want computed using a bunch of steps and you see if it converges by comparing the final value to some pre-specified value or value gotten from the observed data and looking to see if their difference is smaller than some small constant, like 0.01 or 0.001 or something. Or it could be bigger — I dunno — depending on how precise you want your estimated value. And the values you want to compare could also be anything you want, like means, medians, or in lot of my work, correlations between variables. Now, these algorithms are awesome in that they can not only be used in imputation but also stuff like random number generation. And is random number generation the same as multiple imputation? Well, unless I want my advisor to go straight to the dean’s office and ask if there is any way I could have my conferred doctoral degree revoked  (wait … he can’t do that, can he?), I better answer no. But we’ll cover more on the difference between imputation and random number generation next time. Until then,here is my sorry excuse of photoshopping Marilyn Monroe’s head on Arnold Schwarzenneger’s body. I tried googling “Marilyn Money bodybuilder” to see if I could come up with the presenter’s slide but only got these images so my sorry job will have to do. Oh, come on! It’s not that bad! Is it?






You were able to escape the chains of my old digs and come here.  Awesome!  Now, go read my first post on here before I write another one!




Okay, so this is my first post of my new blog, the blog where I’m gonna try to pimp out my books to you good folks and stay upbeat about it.  That’s the goal anyway.  But first,a little something about me.  My name is Irene Helenowski and I am the author of Order of The Dimensions.  I am also just finishing up the sequels Revised Orders and Final Orders.  I began writing the book after completing my dissertation on a semi-parametric approach for the multiple imputation of missing data.  And what is multiple imputation, you ask?  What?  You’re not asleep yet?  Holy crap!  Wow, I’m off too a good start, aren’t I? Okay then — moving on.   Multiple imputation is the method by which you can fill in missing entries in your data with values that you draw from a plausible distribution.  Now, I know what you’re thinking. You’re thinking, “So you basically make up data.”  And I say “But I’m using a plausible distribution.”  And you say, “But you’re still making up data.”  And I say, “But the distribution is plausible.”  And you say, “…?”  And I say, “Okay, let me give you a better example.”  

Now, say Drs. Michio Kaku, Lisa Randall, and Brian Greene are right and the universe is filled with different dimensions, within each one, we are living a different life.  Now, say in one dimension, I am able to sell at least a few books this summer.  That dimension would be a reasonably plausible scenario and hopefully might actually happen.  Now say that by the end of the summer, one of the buyers might  be George Lucas and after reading my masterpieces, says, “Oh my God!  I totally must fly in this Irene Helenowski woman to Hollywood now and offer her a deal for a major movie franchise!”  The likelihood of living in such a dimension is much less plausible than the first one.  Maybe not impossible, but more than likely not gonna happen any time soon.  Lastly, there might be a dimension where I am kidnapped by a deep, dark, sexy Russian former spy and forced to be his love slave in Malta.  Why yes, there is a part in my second book, Revised Orders, where the heroine is kidnapped by a deep, dark, sexy Russian former spy and taken to a dimension where she is his love slave in Malta.  Intrigued yet?  Hope so.  Awake still?  I hope so too.  Anywhoo, as enticing as that last scenario may be — and believe me,  it’s very enticing at this moment — it is the least plausible situation out of the three I listed and most likely is impossible indeed.  So that’s sorta like multiple imputation in a nutshell and leads to a tie-in to my next post, where I’ll talk about how something I started to do because I wanted to try something totally different from my dissertation ended up being totally related to my dissertation.  So until next time, sayonara, sweeties!