Anyone know why this query isn't working?
|#1 by echo13243|
2020-02-11 at 03:41
|This query is trying to find the true average ratings for visual novels with fantasy tags. However, I'm finding that the Bayesian rating, true rating, and vote count don't match up with visual novels on the database. Why would this be? Am I having a stroke? I might need sleep...|
Sorry in advance if this sort of thing is a repeat question. Also I "learned" SQL not even a week ago so sorry for my lack of knowledge as well.
-- What is happening here I can't even tell.
SELECT id AS ID, title AS Title, c_rating AS Rating, c_votecount AS Votecount, AVG(ratings.vote)
Edit: added code blocksLast modified on 2020-02-12 at 00:30
|#2 by yorhel|
2020-02-11 at 06:39
However, I'm finding that the Bayesian rating, true rating, and vote count don't match up with visual novels on the database.Without looking at your query: This is inevitable, as hidden votes are not available in that query interface, yet they do count towards the VNDB stats.
(Also, use [code] blocks to format queries on the forums. Much more readable :)
|#3 by echo13243|
2020-02-11 at 20:54
|In that case, I have two questions. Once again, they're probably not uncommon:|
What are hidden votes and how do they work? Is it just a vote that someone casts that isn't associated with the user?
Also I understand my average calculation not incorporating hidden votes because it accesses user votes, but why doesn't c_rating just get the number? Hidden votes would still be hidden if it doesn't examine each individual vote, right?
Thanks for the response and advice.Last modified on 2020-02-11 at 20:55
|#4 by eacil|
2020-02-11 at 21:52
|I can answer the first one: hidden votes are when a user set his votes as private (now, the labels it is tagged with). Check this list of votes to see what I mean.|
|#5 by yorhel|
2020-02-12 at 07:04
|As for the second one: The c_rating column is not included in the database dump (cached info is excluded) and is calculated based on the public votes for the query database.|
|#6 by echo13243|
2020-02-12 at 23:38
|So even if the query does not take into account hidden votes, why is it that I'm returning an average of 66 (calculated, not Bayesian) off of a VN like Dangel where two votes of 5 should return an average of 5? Is it still incorporating the Bayesian calculation?|
Additionally, I'm now noticing that trying to get all VNs with a certain tag (and all included child tags) does not match the number on the page. Fantasy lists 5539 but I'm getting 5587 rows :/
SELECT id AS ID, title AS Title, c_rating AS Rating, c_votecount AS Votecount
|#7 by yorhel|
2020-02-13 at 07:40
|Ah, right, I forgot that ignored votes are included in the dumps nowadays. Compare|
select * from ulist_vns where vote is not null and vid = 6373;with
select * from ulist_vns where vote is not null and vid = 6373
As for the tag filter: Your query seems to include downvoted tags.
|#8 by echo13243|
2020-02-13 at 14:18
|Wow. The more I about this the more I discover how much went into this database. The work that went into this is amazing... but the more I learn the more questions I have...|
So now I know about the tag voting system. I read up on how it works. However, I can only figure out how to access the votes for each tag on each visual novel. Is there a simpler way to just exclude downvoted tags? I tried using ignore in tags_vn but it cuts down on the count only a little bit. If there isn't any I could put in the work to calculate the values (assuming that the final tag scores are just averages).
That being said, with your help I got the average calculation correct so that now it displays the correct value. Thank you... and sorry to bother you with all these questions.Last modified on 2020-02-13 at 14:18
|#9 by rampaa|
2020-02-13 at 16:40
Is there a simpler way to just exclude downvoted tags?Just add the following to your query:
GROUP BY v.id
If you want to get the exact same games that VNDB lists under Fantasy tag, comment out
AND tv.vid NOT IN (SELECT vid FROM tags_vn WHERE ignore)and do a ">=" comparison instead of ">". It will give you the exact same number of games VNDB does but including games that have the average fantasy score of 0 doesn't make much sense IMO so I am not sure why VNDB does that currently.
|#10 by yorhel|
2020-02-13 at 16:43
AND tv.vid NOT IN (SELECT vid FROM tags_vn WHERE ignore)That doesn't seem right. The 'ignore' flag only applies to the single tags_vn row, not to any other votes. In practice it probably works because the ignore flag is typically used to enforce a downvote, but it may also be used to enforce a certain (positive) score. Just adding a "AND NOT ignore" filter to the tag vote fetching query should do the trick.Last modified on 2020-02-13 at 16:45
|#11 by echo13243|
2020-02-14 at 00:30
|Ah ok. That's just about everything I was looking for. The last thing is that, rampaa, wouldn't |
HAVING avg(tv.vote) > 0take into account all tag votes on a visual novel and not differentiate which tag the vote is on, considering the rest of the query?
Edit: that is, of course, if I placed it in the wrong place. Here's my updated query:
SELECT id AS ID, title AS Title, c_rating AS Rating, c_votecount AS Votecount, AVG(ratings.vote), lengthLast modified on 2020-02-14 at 00:31
|#12 by rampaa|
2020-02-14 at 01:55
take into account all tag votes on a visual novel and not differentiate which tag the vote is on, considering the rest of the query?It does not take the average of all tag votes for the VN. But, unfortunately, it does take the average of all tags that have been selected in the WHERE section as a whole. Most of the time this this does not pose problems but it might lead to faulty results.
To give a concrete example, if one were to search for games that have Group Sex in them with that query, it wouldn't return Home Maid. Because it has the tag score of -3 for Group Sex, +1 for Group Sex of One Female and Several Males and +1 for Group Sex of One Male and Several Females. All of them has one vote so the average tag score would be -0.33... which is less than 0.
Unfortunately though, I do not know if there's a simple solution for this. Maybe @Yorhel can help.
|#13 by echo13243|
2020-02-14 at 03:31
|Yeah that's about what I expected. I'm trying to think of ways to approach that but my mindset still isn't used to SQL so it's hard for me to imagine solutions. Best I can think of is stacking GROUP BYs but, as mentioned, I have no idea how it works.|
I also just realized I don't know what the ignore tag even does to tag votes, nor am I competent enough to attach that anywhere in my code that makes a discernible difference.
Goodness. I've never spent so much time on a forum before this. I really do hope I'm not exhausting anyone.Last modified on 2020-02-14 at 03:32
|#14 by yorhel|
2020-02-14 at 08:37
|Yeah, if you want the exact algorithm that VNDB uses, you'll have to use nested GROUP BY subqueries. The algorithm is implemented in SQL, so it's not too hard to adapt that for the query database.|
To make that even easier, I've just added the `tags_vn_inherit` table to extras.sql, so you can query it directly:
select * from tags_vn_inherit where tag = 179 and vid = 383;Gives the expected rating of 1 for Group Sex on Home Maid.
|#15 by echo13243|
2020-02-18 at 00:05
|Threw in a bit of elbow grease and I finally got my query to do what I was looking for. Thanks for everyone in this thread's help and thank you yorhel especially for helping me throughout the whole thing.|
You must be logged in to reply to this thread.