Linking Databases, part 2

Posted in

#101 by yorhel
2021-03-29 at 08:48
< report >Imported, thanks.
#102 by 707
2021-04-04 at 15:39
< report >Sorry if this has been discussed but is there any plan to add Nintendo eShop and PlayStation links?
#103 by rampaa
2021-04-08 at 09:13
< report >Will echo what t12755.43 asked 'cause I think it would be useful:
Can we get EGS links for staff and producers as well (and maybe characters too, although they're not very comprehensive on EGS)
Last modified on 2021-04-08 at 09:13
#104 by foiegras
2021-04-12 at 05:28
< report >I'm back with more erogamescape links. I changed the title comparison to ignore most symbols and improved a few other parts of the script. Now, if the titles match less, it also checks the producer. This let me lowered quite a bit the lowest acceptable difference ratio between titles. Previously it was at 90%, now it is 50%.
The combination of release date, platform and producer actually does most of the filtering, the script then just picks the best matching title. The difference is mainly caused by using the vn's full name or EGS mixing Latin and original titles.

The data is here. I've added the difference ratio in a new column as I judged it could be useful. I will probably continue working with this script so if you have a preferred output format feel free to ask.
#105 by yorhel
2021-04-13 at 12:20
< report >Imported.

I will probably continue working with this script so if you have a preferred output format feel free to ask.
There's even more!? But TSV is great, just make sure to escape/scrub tabs out of titles; had to edit two lines in the last dump that were throwing errors on import.
#106 by foiegras
2021-04-14 at 05:40
< report >Great, thanks :)

EGS has 25000 entries and vndb had 15000 of them before this import. Although vndb most probably doesn't have all of them, there is still room for improvement. I believe there are quite a few things to try:
- vn aliases could solve some title differences
- the release date can have a few days difference
- the producer's parent brand isn't checked
Maybe I can get some 2000-ish more links? That's quite a wild guess.

Sorry for those two lines, I surely didn't think there would be tabs in titles. I will replace them in future outputs.
#107 by kumiko1
2021-04-15 at 10:06
< report >Now that EGS-vndb relations are pretty accurate, any chance we could consider stealing all the EGS data we can? Things like dlsite and other store links should be accurate and easy to import.
#108 by foiegras
2021-05-03 at 06:20
< report >It's time for more egs links again.

I added a few improvements:
- the producer verification removes a few strings like "Co., Ltd." and also checks aliases
- the release dates comparison accepts an error of 60 days if the titles are at least 90% similar, 15 days otherwise
- the title verification now checks vn aliases. Since release dates may not be exact anymore, I added a digit verification. This avoids mistaking "sequel 2" and "sequel 3", though roman numbers and a few symbols can still get in the way
- the platform check has minor improvements, mainly old computers support
I also manually excluded some vns the script had troubles with.

The data is here. I added a release date difference column in the output.
#109 by yorhel
2021-05-05 at 08:10
< report >Imported.

Now that EGS-vndb relations are pretty accurate, any chance we could consider stealing all the EGS data we can?
Not without their permission, no. Data licensing issues can be tricky. :(
#110 by kumiko1
2021-05-05 at 11:30
< report >EGS doesn't seem to have an official data license, but the owner of the site has repeatedly said that anyone is free to use all publicly viewable data for whatever they want (link).
#111 by eacil
2021-05-14 at 20:43
< report >When a steam link is dead, removing it will also remove the steamdb link when this one still provides info. Example: r59842.5 (link is still alive). I think steamdb, once automatically added, should be manually removed.Last modified on 2021-05-14 at 20:43
#112 by mrkew
2021-05-21 at 11:05
< report >I added dlsite, getchu and gyutto product link to a release, yet only the dlsite price is shown on the main page and in the links. Then I saw link which has 5 different stores linked to it, yet still only DLsite price is shown. What's up with that?
#113 by yorhel
2021-05-21 at 11:15
< report >Only affiliate links have the price shown.

EDIT: forgot to reply to this one.
I think steamdb, once automatically added, should be manually removed.
My idea was to write a crawler to check for the availability on steam, and automatically hide the Steam link but keep the steamdb one when it's not available on steam anymore. Obviously I've not gotten to that yet, but I think we should at least keep the steam ids in the database even if they've been removed from steam.Last modified on 2021-05-21 at 11:18
#114 by mrkew
2021-05-21 at 11:40
< report >Is that intentional design so that people are more likely to click the affiliate one instead of the other stores even if they have the game much cheaper? Or is it technical limitation?
#115 by yorhel
2021-05-21 at 11:49
< report >A combination. Note that we've had affiliate links in that position for much longer than we've had shop links for releases. It's not my intention to turn VNDB into a proper price comparison site - that's simply too much work, there's too many shops and the prices are much too volatile. I'm not interested in maintaining crawlers for all sites we support, either. Doubly so when crawling that information without the shop's blessing can be a nightmare - I've obtained API keys and IP whitelists for some of the current options; not interested in having to negotiate that with every possible shop.
#116 by mrkew
2021-05-21 at 12:00
< report >Understandable, though I wish you'd have a few more major JP sites.


You must be logged in to reply to this thread.