Posterior inference of the ratings distribution

Posted in

#1 by gambsgambs
2021-09-18 at 18:33
< report >Right now it seems like VNDB is calculating the Bayesian average for VNs using this formula (link) which corresponds to Bayesian inference of the mean of a set of observed data when that data is binary.

Ratings, however, are not binary -- they exist on a continuum (in our case 1.0 to 10.0), and typically they're normally distributed.

If one defines a normal-inverse gamma prior (which will become irrelevant once enough data is sampled) as is done in the 6th equation here link, then one can find a posterior distribution conditioned on n samples of data contained in the vector x.

This would allow you to not only estimate the mean of the distribution in a more mathematically sound way, but also to estimate the entire distribution based on limited data. Note that with appropriate choice of prior hyperparameters the mean of the posterior would be the same as the current Bayesian average.

Sampling from the posterior distribution is easy given the posterior parameters and is given here link. One could use rejection sampling to ensure that the ratings stay within [1.0, 10.0]. Alternatively one could do this same procedure with the beta distribution instead of normal, which might be mathematically simpler and would also ensure that the ratings are between [0.0, 1.0] (which can then be shifted and scaled as you wish)Last modified on 2021-09-18 at 20:17
#2 by eacil
2021-09-18 at 20:28
< report >
posterior distribution
posterior parameters
lewd
#3 by beliar
2021-09-18 at 20:34
< report >I'm glad Eacil said it first. At least I wasn't the only one who thought that was the key word in the post.
#4 by eacil
2021-09-18 at 21:19
< report >Yeah, that's the only word I understood.
Remember when I talked with Yorhel about recomputing everbody's votes and he said he doesn't understand a thing about statistics or something? I am sure Gambs' suggestion will be well considered.
#5 by gambsgambs
2021-09-18 at 21:23
< report >I would be happy to help if no one has a good grasp of statistics. It really boils down to just doing some algebra in a loop, and then you'll have an idea of what the distribution will look like after infinity people have rated a VN (with self-correcting uncertainty due to the number of votes baked in, just like is done with the current bayesian average, but now for the entire distribution)
#6 by historyeraser
2021-09-20 at 14:48
< report >Let's say hypothetically a game only received 10/10 votes. How much votes would be required for it to show up on the top 50?Last modified on 2021-09-20 at 23:13
#7 by gambsgambs
2021-09-20 at 17:31
< report >#6 What I'm suggesting wouldn't change the rankings of anything at all

I'm not sure exactly what hyperparameters vndb is choosing right now for their prior, but as I wrote in my post, I'm suggesting doing something similar, but applied to the whole distribution instead of just the average. You could even choose hyperparameters such that the mean of this posterior distribution is the same as the current Bayesian average (you can see that the first equation in the posterior hyperparameters column, rows 6 and 7 here link is the same as the "Bayesian average" equation here link)Last modified on 2021-09-20 at 17:32
#8 by mutsuki
2021-09-20 at 18:50
< report >my knowledge of statistics isn't good enough to really understand what's going on but i feel like we should normalise votes by working out the deviation from the mean of the voter's current voting distribution and then use those normalised measurements (measured in standard devations from the voters' mean vn score) to work out the rating of the vn in question.

i'm not sure but i don't think we include this information about the voters' individual distributions in the calculation anywhere. it feels weird not to because someone's idea of a "perfectly average" vn could be a 5/10 and another person's could could be a 7/10 or whatever, and so trying to compare them doesn't make much sense to me. it should be transformed in such a way that the score for a "perfectly average" game (one that everyone will give their mean score to) would be all the same for all users when calculating the vn's overall score.
this would also be great as you could put in a step that filters out all those people who only give 10/10 or 1/10 or have other stupid score distributions that are not conducive to working out the average score of a vn

again it could just be me sucking at stats.Last modified on 2021-09-20 at 18:57
#9 by dchsflii
2021-09-20 at 19:41
< report >It's not obvious to me that vndb votes are normally distributed. It seems plausible there's some amount of left skew. The average vote is about 6.35 right? One might imagine that people are giving out lots of 6s and 7s and not giving 10s much more often than 1s or 2s, which is what normal would suggest should happen. Some discussion and examples of such phenomena here: link

It might still be reasonable to use normal as an approximation, and I haven't done any EDA on vndb data, so maybe votes here do in fact follow normal quite nicely. However, it's worth looking into this, and if there is a skew, thinking about ways in which it potentially leads to a misleading posterior.Last modified on 2021-09-20 at 19:55
#10 by gambsgambs
2021-09-20 at 21:29
< report >#9 in this case the posterior would be normal-gamma (link) (or normal-inverse gamma) and the likelihood function would be normal. The likelihood function is more or less a weighting on data, which determines how quickly the parameters of the posterior should be changed when data is observed

An empirical posterior could be created through simulation -- make up 10,000 or so simulated users, each of them draws a variance from a gamma distribution with learned parameters, then draws data from a normal using that variance and other learned parameters where "learned" here means "solved through algebra" because we know the form of the posterior

I don't think the shape of the distribution would be an issue at allLast modified on 2021-09-20 at 22:19
#11 by dchsflii
2021-09-20 at 22:29
< report >#10 Reading your original post again, I'm actually not sure I completely understand what problem you have posed here. You link to conjugate priors, so I assume you are talking about parametric inference. I think the problem posed is as follows, but please correct me if I'm wrong:

There is some VN, and we want to estimate the distribution F of votes on that VN. Our data are the observed votes on that VN. We assume some parametric form for F, in this case normal. Since F is assumed to be a normal distribution, we only need estimate its mean and variance to have full information about F. We put some prior on that mean and variance (probably drawing on all the votes on vndb to choose hyperparameters). Since this a conjugate prior model, the posterior of the mean and variance has a nice form, and we can easily obtain samples from the posterior predictive distribution.

The issue here is in prescribing a parametric form for F. It's not hard to find examples of VNs with large numbers of votes where the distribution looks nothing close to normal, e.g. Ever17. You could maybe argue that it could be a censored normal where values >10 get censored to 10. I find that a rather strange assumption mechanistically though, and if you do want to assume that, then the estimation method needs to take it into account when computing the likelihood. Otherwise, no matter how you change the prior, you're still applying a normal likelihood to data that isn't normally distributed. You could specify some other choice for F, but there aren't any obvious ones that occur to me. Maybe Beta, as you mentioned before, could be reasonable since it solves the censoring issue and can have a wider range of shapes. You could also use something like empirical likelihood that doesn't assume the form of F, but that makes the math more complicated.Last modified on 2021-09-20 at 22:41
#12 by gambsgambs
2021-09-21 at 00:07
< report >#11, yes, the task is to infer the full distribution of ratings of some VN by defining a prior and using observed votes. VNDB is already doing this to an extent by ranking VNs according to the Bayesian average, I'm just suggesting doing this for the full distribution because it wouldn't be much harder

the posterior of the mean and variance has a nice form, and we can easily obtain samples from the posterior predictive distribution.

I wasn't actually suggesting using the posterior predictive (which might have some uses though) and instead I was just suggesting constructing the posterior


It's not hard to find examples of VNs with large numbers of votes where the distribution looks nothing close to normal, e.g. Ever17

That looks very normal to me, but the mean is near the cutoff

You could maybe argue that it could be a censored normal where values >10 get censored to 10

I think you're thinking of a truncated normal (link). But you can just use any other distribution and construct the empirical posterior through rejection sampling (link) instead which is much easier mathematics-wise. Practically speaking this just means re-sampling data that doesn't fall in [1.0, 10.0]

you're still applying a normal likelihood to data that isn't normally distributed

The likelhood function is normal, not the posterior. The likelihood is just a weighting that determines how the posterior hyperparameters should be changed when observing data, as I mentioned before. The posterior won't be normal, but it would be like you sampled from several different normals with different variances and the same mean -- this mean would also line up with the current Bayesian average so none of the current rankings would be changed. You just get strictly more information, particularly for VNs with lower vote counts
#13 by dchsflii
2021-09-21 at 02:14
< report >
I was just suggesting constructing the posterior

If do you things as I described, the posterior is a distribution of the mean and variance parameters of the distribution F. Certainly this is interesting and useful. In particular, the posterior distribution of the mean lets you characterize the uncertainty in your belief about the mean of F. It's not a posterior distribution of unobserved/additional votes though, even if it does contain the information needed to construct one. Your idea of simulating 10000 users to characterize the distribution of votes given the information is exactly what the posterior predictive distribution is for.

That looks very normal to me, but the mean is near the cutoff

This can end up being quite different in practice depending on what you do.

I think you're thinking of a truncated normal (link). But you can just use any other distribution and construct the empirical posterior through rejection sampling (link) instead which is much easier mathematics-wise. Practically speaking this just means re-sampling data that doesn't fall in [1.0, 10.0]

I think censoring makes more sense here than truncation, but either way if you don't explicitly account for the censoring/truncation in your likelihood and simply reject posterior samples that don't fall in [1,10], you can very obviously introduce bias. Suppose we take Ever17, which has a lot of high scores, so the mean of the posterior distribution of the mean parameter of F under the normal-gamma prior is more or less the sample mean 8.50 of the observed votes. But following what you have proposed since the mean can't be >10, you reject your posterior sample whenever you see that. You also reject if it's <1, but that will much much rarer in comparison. Doing this drives down the posterior mean of the distribution of the mean parameter compared to the posterior mean if you were to accept all samples. In the case of the posterior of the mean parameter for a VN with a large number of votes, the uncertainty may be so small and rejections so rare that this barely matters.

Where you really run into problems is with the posterior predictive samples, that is, simulating/drawing additional votes conditional on the observed data. If I sampled from this and didn't reject anything, the expected value of my sample is the posterior mean of the mean parameter (just use iterated conditional expectations), so about 8.50. However, there is enough variation among the individual votes that a decent fraction of my samples will be >10, and if I reject them, I push that mean down perhaps substantially. (Yes, you might reject a few <1 too, but that will be orders of magnitude rarer.) It doesn't make sense here to claim that I should expect unobserved/simulated votes to be on average substantially lower than observed votes. It's an artifact of the misspecified model.

As an aside, that's not really how rejection sampling works. If you choose some arbitrary distribution G, draw from it, and reject whenever the sample is not in [1,10] the results are highly dependent on the choice of G. Rejection sampling should also involve the ratio of the pdf g of G and the pdf h of your target distribution H where h has support on [1,10] and there's a constant M such that Mg covers h. If G and H are not the same distribution, you should sometimes be rejecting samples in [1,10] too. What you suggest only works as rejection sampling if G is the untruncated version of a truncated target distribution you mean to sample from. It can't just be any distribution.

The likelhood function is normal, not the posterior. The likelihood is just a weighting that determines how the posterior hyperparameters should be changed when observing data, as I mentioned before.

The likelihood is derived from a statistical model for the data given the parameters. If you claim F is a normal distribution, the data should look normally distributed. If not, you're probably using a misspecified model.

The posterior won't be normal, but it would be like you sampled from several different normals with different variances and the same mean -- this mean would also line up with the current Bayesian average so none of the current rankings would be changed.

I don't think this is true if you reject posterior samples as you describe (though the ordering might be preserved), as I explained above. The vndb bayesian average doesn't need to specify F and so doesn't have this issue.Last modified on 2021-09-21 at 02:42
#14 by gambsgambs
2021-09-21 at 03:04
< report >
In particular, the posterior distribution of the mean lets you characterize the uncertainty in your belief about the mean of F. It's not a posterior distribution of unobserved/additional votes though, even if it does contain the information needed to construct one. Your idea of simulating 10000 users to characterize the distribution of votes given the information is exactly what the posterior predictive distribution is for.

I was just suggesting that it might be easier for VNDB to approximate the posterior by drawing samples from it and plotting the histogram, especially since in this case sampling is pretty easy -- you don't need to invoke the posterior predictive distribution at all for this. That being said if we use conjugate prior knowledge to get an analytic posterior, they could also just plot its pdf

But if G and H are not the same distribution, you should sometimes be rejecting samples in [1,10] too.

From what I understand if you're sampling from G as a proxy for H, and H is just a truncated version of G, then the rejection sampling algorithm will reject iff your sample lands out of bounds. The answer to this stackoverflow question has a plot that shows it gives what you would expect linkLast modified on 2021-09-21 at 03:29
#15 by dchsflii
2021-09-21 at 03:56
< report >
I was just suggesting that it might be easier for VNDB to approximate the posterior by drawing samples from it and plotting the histogram, especially since in this case sampling is pretty easy -- you don't need to invoke the posterior predictive distribution at all for this. That being said if we use conjugate prior knowledge to get an analytic posterior, they could also just plot its pdf

For characterizing the distribution of votes, first drawing the parameters of F from the posterior and then drawing from F with those parameters is equivalent to drawing from the posterior predictive distribution, so I agree, do whatever is easier in practice. In the model you suggested the posterior predictive has a closed form though, so it might be easiest to use that like you say.

That said, you have to first do the bayesian estimation, and I think using a normal-gamma normal conjugate prior model and truncating the posterior/posterior predictive distribution at the end has problems as I outlined in my previous comment. If you do that, you're no longer drawing from the posterior but some approximation derived from it. It might work well in many cases, but I suspect there will be cases where it doesn't, as in my example. I would be more willing to believe a model that explicitly used censored or truncated normal as the likelihood, and that could be done, but would take more work. I just don't believe the problem is a trivial extension of estimating the bayesian means.

I think it's also good to be precise about what whatever quantities are provided are supposed to mean. The average user doubtless knows what a mean is but is likely not familiar with posterior distributions. The posterior of the mean of F might not be intuitive for many if it's not explained well. Drawing votes from the posterior predictive might be more intuitive, along the lines of "here's our best guess for what things would look like if this had more votes", but still needs a good explanation.

From what I understand if you're sampling from G as a proxy for H, and H is just a truncated version of G, then the rejection sampling algorithm will reject iff your sample lands out of bounds. The answer to this stackoverflow question has a plot that shows it gives what you would expect link

Yes, I agree that's correct. Rejecting exactly when the sample is out of bounds works as long as g and h have the same shape on the support of h, not only when G and H are the same distribution. Your previous comment seemed to suggest at one point that any choice of G would work, though I agree your proposed algorithm would sample from the truncated version of the posterior.Last modified on 2021-09-21 at 03:58

Reply

You must be logged in to reply to this thread.