Linking Databases, part 2

Posted in

#101 by yorhel
2021-03-29 at 08:48
< report >Imported, thanks.
#102 by 707
2021-04-04 at 15:39
< report >Sorry if this has been discussed but is there any plan to add Nintendo eShop and PlayStation links?
#103 by rampaa
2021-04-08 at 09:13
< report >Will echo what t12755.43 asked 'cause I think it would be useful:
Can we get EGS links for staff and producers as well (and maybe characters too, although they're not very comprehensive on EGS)
Last modified on 2021-04-08 at 09:13
#104 by foiegras
2021-04-12 at 05:28
< report >I'm back with more erogamescape links. I changed the title comparison to ignore most symbols and improved a few other parts of the script. Now, if the titles match less, it also checks the producer. This let me lowered quite a bit the lowest acceptable difference ratio between titles. Previously it was at 90%, now it is 50%.
The combination of release date, platform and producer actually does most of the filtering, the script then just picks the best matching title. The difference is mainly caused by using the vn's full name or EGS mixing Latin and original titles.

The data is here. I've added the difference ratio in a new column as I judged it could be useful. I will probably continue working with this script so if you have a preferred output format feel free to ask.
#105 by yorhel
2021-04-13 at 12:20
< report >Imported.

I will probably continue working with this script so if you have a preferred output format feel free to ask.
There's even more!? But TSV is great, just make sure to escape/scrub tabs out of titles; had to edit two lines in the last dump that were throwing errors on import.
#106 by foiegras
2021-04-14 at 05:40
< report >Great, thanks :)

EGS has 25000 entries and vndb had 15000 of them before this import. Although vndb most probably doesn't have all of them, there is still room for improvement. I believe there are quite a few things to try:
- vn aliases could solve some title differences
- the release date can have a few days difference
- the producer's parent brand isn't checked
Maybe I can get some 2000-ish more links? That's quite a wild guess.

Sorry for those two lines, I surely didn't think there would be tabs in titles. I will replace them in future outputs.
#107 by kumiko1
2021-04-15 at 10:06
< report >Now that EGS-vndb relations are pretty accurate, any chance we could consider stealing all the EGS data we can? Things like dlsite and other store links should be accurate and easy to import.
#108 by foiegras
2021-05-03 at 06:20
< report >It's time for more egs links again.

I added a few improvements:
- the producer verification removes a few strings like "Co., Ltd." and also checks aliases
- the release dates comparison accepts an error of 60 days if the titles are at least 90% similar, 15 days otherwise
- the title verification now checks vn aliases. Since release dates may not be exact anymore, I added a digit verification. This avoids mistaking "sequel 2" and "sequel 3", though roman numbers and a few symbols can still get in the way
- the platform check has minor improvements, mainly old computers support
I also manually excluded some vns the script had troubles with.

The data is here. I added a release date difference column in the output.
#109 by yorhel
2021-05-05 at 08:10
< report >Imported.

Now that EGS-vndb relations are pretty accurate, any chance we could consider stealing all the EGS data we can?
Not without their permission, no. Data licensing issues can be tricky. :(
#110 by kumiko1
2021-05-05 at 11:30
< report >EGS doesn't seem to have an official data license, but the owner of the site has repeatedly said that anyone is free to use all publicly viewable data for whatever they want (link).

Reply

You must be logged in to reply to this thread.