Querying Databases

Posted in

#1 by yorhel
2019-08-31 at 14:03
Since a few days now we've had a shiny new feature on VNDB: You can run custom SQL queries directly against the database. Or, to be precise, against the public database dump. After a short community consultation and a bit of experimentation I've decided to go with SQLPad, a simple and friendly tool that allows you to run, save and share queries. It also has (limited) graphing capabilities.

My hope is that this facility will be useful for moderation purposes, to easily find data inconsistencies and missing or incorrect entries. I'd imagine it can also be useful for other purposes, such as more advanced querying than what the website has to offer or for interesting data analysis and stats.

You can find all the information about this new querying interface on this page.

Happy querying!

(Alternative title: "SQL Injection as a Service!")
#2 by shining17
2019-08-31 at 15:36
Thanks for your continuous hardwork for this DB.Last modified on 2019-08-31 at 15:36
#3 by rampaa
2019-08-31 at 16:30
Is it possible to save our queries privately?

What is the sane way of getting VNs with a specific original language? I think the following works as intended but it's way too slow so I can't use it without getting a timeout:

WITH first_release (vid, released)
AS (
SELECT v.id, MIN(r.released)
FROM vn v, releases r, releases_vn rv
WHERE r.id = rv.id
AND v.id = rv.vid
AND r.released > 0
AND r.released < 99999999
GROUP BY v.id
ORDER BY MIN(r.released)
)
SELECT v.id
FROM first_release fr, releases r, vn v, releases_vn rv, releases_lang rl
WHERE r.released = fr.released
AND v.id = fr.vid
AND r.id = rv.id
AND rl.lang = 'ja'
AND rv.vid = fr.vid
GROUP BY v.id

Is there a stored procedure to find all child tags of a tag? I do the following but doing it for all the tags I want to include/exclude seems way too repetitive.

WITH RECURSIVE descendant_tags (gid)
AS (
SELECT tp.tag
FROM tags_parents tp
WHERE tp.parent = '2'
GROUP BY tp.tag

UNION ALL

SELECT tp.tag
FROM tags_parents tp, descendant_tags dt
WHERE tp.parent = dt.gid
GROUP BY tp.tag
)
SELECT *
FROM descendant_tags
Last modified on 2019-08-31 at 16:49
#4 by yorhel
2019-08-31 at 17:13
Is it possible to save our queries privately?
No. I suppose you could keep a notepad somewhere :)

As for original language, this seems to be pretty fast (and uses the current VNDB definition of "original" language, if I didn't screw this up somewhere):
SELECT title
FROM vn v
WHERE v.id IN(
SELECT rv.vid
FROM releases_lang rl
JOIN releases_vn rv ON rv.id = rl.id
JOIN releases r ON r.id = rl.id
WHERE lang = 'ja' -- The original language you want
AND NOT EXISTS(
SELECT 1 FROM releases_vn irv JOIN releases ir ON ir.id = irv.id
WHERE irv.vid = rv.vid AND ir.released < r.released))

EDIT: Forgot to add, but we'll have a proper "original language" field when this makes it in,

Is there a stored procedure to find all child tags of a tag?
There might be if you write one. :)
Alternatively I can also import the 'tags_vn_inherit' and 'traits_chars' cache tables from the main DB which have the tag<->VN and trait<->char links with the parents included, though the inheritance algorithm used may not necessarily be the one you want (case in point).Last modified on 2019-08-31 at 17:15
#5 by rampaa
2019-08-31 at 18:15
... this seems to be pretty fast
Yup it is, thanks.

There might be if you write one.

CREATE OR REPLACE FUNCTION get_descendant_tags (tag_id INTEGER)
RETURNS TABLE (gid INTEGER) AS $$
WITH RECURSIVE descendant_tags(gid) AS (
SELECT tp.tag
FROM tags_parents tp
WHERE tp.parent = tag_id
GROUP BY tp.tag

UNION ALL

SELECT tp.tag
FROM tags_parents tp, descendant_tags dt
WHERE tp.parent = dt.gid
GROUP BY tp.tag
)

SELECT gid
FROM descendant_tags $$
LANGUAGE sql;

Is this acceptable?

("SQL Injection as a Service" does not let me inject things. I shall have my refund!)

Also, I am quite surprised to hear that things mentioned in link are intentional.Last modified on 2019-08-31 at 18:19
#6 by yorhel
2019-08-31 at 18:33
t12750 is intentional.
It kind of is and kind of isn't. It's yet another case of the system being used in a way I hadn't envisioned, and I don't have a proper solution yet.
#7 by hinoe
2019-08-31 at 22:17
"SQL Injection as a Service" is a far better title and you should be ashamed of yourself for not having gone with it. :P

Also -- SELECT do_you_remember_my_script FROM things_not_in_dump WHERE fun_was_had = TRUE (I know no SQL whatsoever so this is likely malformed even if you disregard the rest) :D
#8 by yorhel
2019-09-01 at 08:20
Is this acceptable?
Yup, modified it a little bit to have more consistent naming with the rest of the DB.

Extras added to the DB are documented at /extras.sql.txt.
#9 by hybtranslation
2019-09-02 at 08:10
Wow, this looks really cool. Just a question, the closest to a database schema I could find is
link

Could you maybe post the create table statements, or something similar? One could probably download a database dump and extract it from there, but I think the whole point of this service is that you don't have to do get an SQL Server running in the first place.

P.S. "SQL Injection as a Service!" really made my day. Was actually my first thought when I read as far as SQLPad. ^^Last modified on 2019-09-02 at 08:11
#10 by yorhel
2019-09-02 at 08:18
SQLPad has a built-in schema browser (the "Toggle schema" button to the left of the "VNDB" connection), but it's a little buggy at the moment. The full schema can be found in the VNDB repo - only columns marked as [pub] are available for querying.
#11 by hybtranslation
2019-09-02 at 09:05
Thanks, awesome!
#12 by hybtranslation
2019-09-02 at 10:24
Alright, I played around with it a bit. Pretty awesome.

I personally like to split up my queries into little parcels and combine them at the end. That way, you can easily reuse the bits elsewhere. Kinda like this:
SELECT first_release.year, AVG(vote_avg) AS avg_rating, AVG(vote_cnt) AS avg_votes, COUNT(v.id) AS novels FROM 
vn v INNER JOIN
(
SELECT rv.vid, MIN(r.released) as released, MIN(r.released) / 10000 as year
FROM releases_vn rv
INNER JOIN releases r on r.id = rv.id
GROUP BY rv.vid
) first_release ON v.id = first_release.vid
INNER JOIN
(
SELECT vo.vid, avg(vo.vote) AS vote_avg, count(vo.vote) AS vote_cnt
FROM votes vo
GROUP BY vo.vid
HAVING count(vo.vote) > 150
) votes
ON votes.vid = v.id
GROUP BY first_release.year
ORDER BY first_release.year

This calculates the average rating of VNs grouped by year of first release, only considering VNs if they have at least 150 votes. Anyway, the queries first_release and votes can easily be reused in any other kind of query, you only have to join on the vid. Hmm, maybe it's just another way to do WITH...
Including the table vn is actually redundant in the above query, but it probably makes it easier to mess around with it and also to check the underlying data.

Anyway, it seems to be implemented well. My queries don't even take a second.

P.S. I noticed that vn.c_popularity, vn.c_rating etc are not public. Maybe consider making those available, calculating the rating like I do is probably making Mr. Bayes angry. ^^Last modified on 2019-09-02 at 10:31
#13 by rampaa
2019-09-03 at 22:56
CREATE OR REPLACE FUNCTION get_child_traits(trait_id INTEGER) RETURNS TABLE(trait INTEGER) AS $$
WITH RECURSIVE child_traits(trait) AS (
SELECT tp.trait FROM traits_parents tp WHERE tp.parent = trait_id
UNION ALL
SELECT tp.trait FROM traits_parents tp, child_traits ct WHERE tp.parent = ct.trait GROUP BY tp.trait
) SELECT trait FROM child_traits
$$ LANGUAGE sql;
Can this be added as well?Last modified on 2019-09-03 at 22:58
#14 by yorhel
2019-09-04 at 06:45
Can this be added as well?
Done.

I noticed that vn.c_popularity, vn.c_rating etc are not public.
I've added c_rating (the bayesian thing) and c_votecount. The popularity rankings are much more expensive to recalculate. :(
#15 by gimnacio
2019-09-10 at 07:36
So I tried to get the mode out of all the novels in my vnlist since i had too many and wanted to reduce to those which did not have high score but popularity or rating sucked, average could have worked somehow but I found mode to be more accurate for i want at least
anyway so I made this query

SELECT v.title, v.id, votos.score, votos.cont, votos.ROW_NUMBER
FROM vn v
inner join
(
SELECT vnl.vid
from vnlists vnl
where uid = 123003 -- this is my user id so if you want to try it in your case change that i guess?
) mylista
ON mylista.vid = v.id
left JOIN
(
select round(vo.vote/10,0) as score, vo.vid, count(vo.vote) as cont,
ROW_NUMBER () OVER(
PARTITION BY vo.vid ORDER BY COUNT(round(vo.vote/10,0)) DESC
)
FROM votes vo
GROUP BY vo.vid, score
) votos
ON votos.vid = v.id

where votos.ROW_NUMBER = 1
order by votos.score asc

But the thing is that a lot of votes I do not get to see because of private so I am not getting the real numbers
can you not create a hidden user id 0 or something and dump all hidden votes on him so we can use that data for better queries and nones privacy is violated in the end (right?)
#16 by yorhel
2019-09-10 at 07:41
No, private votes aren't and will likely never be included in the dumps or that query DB. Not even in anonymized form.
#17 by gimnacio
2019-09-10 at 16:14
ok, thanks for answering so fast and not being vague on the subject.
#18 by uvix
2019-09-11 at 03:40
Is there a lookup table for the enums you're saving? (Specifically vn.length.)

Also, is there any query to return the current user's userid (e.g. so I could join to my data without hardcoding that value somewhere)?Last modified on 2019-09-11 at 03:42
#19 by yorhel
2019-09-11 at 05:30
Is there a lookup table for the enums you're saving? (Specifically vn.length.)
Kind of, but those ugly smallint-enums are subject to change at some point - either to Postgres ENUM types or separate tables.

Also, is there any query to return the current user's userid
No. I suppose it could be hacked into SQLPad as some sort of placeholder value, but I'm not very interested in doing that work. :)
#20 by gimnacio
2019-09-11 at 06:20
another thing I am not sure if it s just me not looking right, but I cannot find how to connect the company which produces a vn to it,
what I want is to make a list of the companies to know how many vns I have of each of them but there seems to be no column connecting that in the sql pad, only id title original alias length length img_nsfw image desc l_wp l_encubed l_renai l_wikidata c_rating c_votecount
#21 by yorhel
2019-09-11 at 06:32
producers <-> releases_producers <-> releases_vn <-> vn
#22 by gimnacio
2019-09-11 at 16:15
ok thanks again for the quick reply.

Reply

You must be logged in to reply to this thread.