Anyone know why this query isn't working?

Posted in

#1 by echo13243
2020-02-11 at 03:41
This query is trying to find the true average ratings for visual novels with fantasy tags. However, I'm finding that the Bayesian rating, true rating, and vote count don't match up with visual novels on the database. Why would this be? Am I having a stroke? I might need sleep...

Sorry in advance if this sort of thing is a repeat question. Also I "learned" SQL not even a week ago so sorry for my lack of knowledge as well.

-- What is happening here I can't even tell.
SELECT id AS ID, title AS Title, c_rating AS Rating, c_votecount AS Votecount, AVG(ratings.vote)
FROM (
SELECT vid, vote
FROM ulist_vns
JOIN users ON users.id=ulist_vns.uid WHERE vote IS NOT NULL
) AS ratings
JOIN vn v ON v.id=ratings.vid
WHERE
v.id IN (
SELECT v.id
FROM vn v
intersect
SELECT v.id
FROM tags_vn tv
JOIN vn v ON v.id = tv.vid
WHERE
tv.tag = '2' -- Fantasy
)
GROUP BY ID, Title

Edit: added code blocksLast modified on 2020-02-12 at 00:30
#2 by yorhel
2020-02-11 at 06:39
However, I'm finding that the Bayesian rating, true rating, and vote count don't match up with visual novels on the database.
Without looking at your query: This is inevitable, as hidden votes are not available in that query interface, yet they do count towards the VNDB stats.

(Also, use [code] blocks to format queries on the forums. Much more readable :)
#3 by echo13243
2020-02-11 at 20:54
In that case, I have two questions. Once again, they're probably not uncommon:

What are hidden votes and how do they work? Is it just a vote that someone casts that isn't associated with the user?

Also I understand my average calculation not incorporating hidden votes because it accesses user votes, but why doesn't c_rating just get the number? Hidden votes would still be hidden if it doesn't examine each individual vote, right?

Thanks for the response and advice.Last modified on 2020-02-11 at 20:55
#4 by eacil
2020-02-11 at 21:52
I can answer the first one: hidden votes are when a user set his votes as private (now, the labels it is tagged with). Check this list of votes to see what I mean.
#5 by yorhel
2020-02-12 at 07:04
As for the second one: The c_rating column is not included in the database dump (cached info is excluded) and is calculated based on the public votes for the query database.
#6 by echo13243
2020-02-12 at 23:38
So even if the query does not take into account hidden votes, why is it that I'm returning an average of 66 (calculated, not Bayesian) off of a VN like Dangel where two votes of 5 should return an average of 5? Is it still incorporating the Bayesian calculation?

Additionally, I'm now noticing that trying to get all VNs with a certain tag (and all included child tags) does not match the number on the page. Fantasy lists 5539 but I'm getting 5587 rows :/

SELECT id AS ID, title AS Title, c_rating AS Rating, c_votecount AS Votecount
FROM vn v
WHERE v.id IN (
SELECT v.id
FROM vn v
intersect
SELECT v.id
FROM tags_vn tv
JOIN vn v ON v.id = tv.vid
WHERE
tv.tag = '2' OR tv.tag IN (SELECT tag FROM get_child_tags('2')) -- Fantasy
)
GROUP BY ID, Title
#7 by yorhel
2020-02-13 at 07:40
Ah, right, I forgot that ignored votes are included in the dumps nowadays. Compare
select * from ulist_vns where vote is not null and vid = 6373;
with
select * from ulist_vns where vote is not null and vid = 6373
and uid not in(select id from users where ign_votes);

As for the tag filter: Your query seems to include downvoted tags.
#8 by echo13243
2020-02-13 at 14:18
Wow. The more I about this the more I discover how much went into this database. The work that went into this is amazing... but the more I learn the more questions I have...

So now I know about the tag voting system. I read up on how it works. However, I can only figure out how to access the votes for each tag on each visual novel. Is there a simpler way to just exclude downvoted tags? I tried using ignore in tags_vn but it cuts down on the count only a little bit. If there isn't any I could put in the work to calculate the values (assuming that the final tag scores are just averages).

That being said, with your help I got the average calculation correct so that now it displays the correct value. Thank you... and sorry to bother you with all these questions.Last modified on 2020-02-13 at 14:18
#9 by rampaa
2020-02-13 at 16:40
Is there a simpler way to just exclude downvoted tags?
Just add the following to your query:
GROUP BY v.id
HAVING avg(tv.vote) > 0

If you want to get the exact same games that VNDB lists under Fantasy tag, comment out
AND tv.vid NOT IN (SELECT vid FROM tags_vn WHERE ignore)
and do a ">=" comparison instead of ">". It will give you the exact same number of games VNDB does but including games that have the average fantasy score of 0 doesn't make much sense IMO so I am not sure why VNDB does that currently.
#10 by yorhel
2020-02-13 at 16:43
AND tv.vid NOT IN (SELECT vid FROM tags_vn WHERE ignore)
That doesn't seem right. The 'ignore' flag only applies to the single tags_vn row, not to any other votes. In practice it probably works because the ignore flag is typically used to enforce a downvote, but it may also be used to enforce a certain (positive) score. Just adding a "AND NOT ignore" filter to the tag vote fetching query should do the trick.Last modified on 2020-02-13 at 16:45
#11 by echo13243
2020-02-14 at 00:30
Ah ok. That's just about everything I was looking for. The last thing is that, rampaa, wouldn't
HAVING avg(tv.vote) > 0
take into account all tag votes on a visual novel and not differentiate which tag the vote is on, considering the rest of the query?

Edit: that is, of course, if I placed it in the wrong place. Here's my updated query:
SELECT id AS ID, title AS Title, c_rating AS Rating, c_votecount AS Votecount, AVG(ratings.vote), length
FROM (
SELECT vid, vote
FROM ulist_vns
JOIN users ON users.id=ulist_vns.uid WHERE vote IS NOT NULL AND ulist_vns.uid NOT IN (SELECT id FROM users WHERE ign_votes)
) AS ratings
JOIN vn v ON v.id=ratings.vid
WHERE
v.id IN (
SELECT v.id
FROM vn v
INTERSECT
SELECT v.id
FROM tags_vn tv
JOIN vn v ON v.id = tv.vid
WHERE
(tv.tag = '2' OR tv.tag IN (SELECT tag FROM get_child_tags('2'))) -- Fantasy
AND tv.uid NOT IN (SELECT id FROM users WHERE ign_votes)
GROUP BY v.id
HAVING avg(tv.vote) > 0
)
GROUP BY v.id
ORDER BY v.id
Last modified on 2020-02-14 at 00:31
#12 by rampaa
2020-02-14 at 01:55
take into account all tag votes on a visual novel and not differentiate which tag the vote is on, considering the rest of the query?
It does not take the average of all tag votes for the VN. But, unfortunately, it does take the average of all tags that have been selected in the WHERE section as a whole. Most of the time this this does not pose problems but it might lead to faulty results.

To give a concrete example, if one were to search for games that have Group Sex in them with that query, it wouldn't return Home Maid. Because it has the tag score of -3 for Group Sex, +1 for Group Sex of One Female and Several Males and +1 for Group Sex of One Male and Several Females. All of them has one vote so the average tag score would be -0.33... which is less than 0.

Unfortunately though, I do not know if there's a simple solution for this. Maybe @Yorhel can help.
#13 by echo13243
2020-02-14 at 03:31
Yeah that's about what I expected. I'm trying to think of ways to approach that but my mindset still isn't used to SQL so it's hard for me to imagine solutions. Best I can think of is stacking GROUP BYs but, as mentioned, I have no idea how it works.

I also just realized I don't know what the ignore tag even does to tag votes, nor am I competent enough to attach that anywhere in my code that makes a discernible difference.

Goodness. I've never spent so much time on a forum before this. I really do hope I'm not exhausting anyone.Last modified on 2020-02-14 at 03:32
#14 by yorhel
2020-02-14 at 08:37
Yeah, if you want the exact algorithm that VNDB uses, you'll have to use nested GROUP BY subqueries. The algorithm is implemented in SQL, so it's not too hard to adapt that for the query database.

To make that even easier, I've just added the `tags_vn_inherit` table to extras.sql, so you can query it directly:
select * from tags_vn_inherit where tag = 179 and vid = 383;
Gives the expected rating of 1 for Group Sex on Home Maid.
#15 by echo13243
2020-02-18 at 00:05
Threw in a bit of elbow grease and I finally got my query to do what I was looking for. Thanks for everyone in this thread's help and thank you yorhel especially for helping me throughout the whole thing.

Reply

You must be logged in to reply to this thread.