Opening Up

Posted in

#1 by yorhel
2019-07-22 at 08:21
There have been many requests for VNDB data over the years and that's why we have a few limited database dumps and an API. Despite being limited in scope, those exports have already been used for many awesome projects, including VNStat, VNDB Android and various data-oriented research projects. Still, there is value in providing more. After mulling it over for a while, I've decided to provide dumps of the complete database. Or almost complete, at least. Forum posts, change histories and non-public user information is excluded.

Info and download.

Most of the data (but see licensing for details) is available under the Open Database License, which allows for free usage and sharing of the data, provided that any projects using this data will also share *their* data and keep it open.

Hopefully these dumps also provide an answer to the sustainability of VNDB: If something happens to me or if I end up doing something spectacularly stupid to the site, this will give the community the opportunity to fork the site and start anew - with most of the data still intact. Of course, I'll try to make sure that will never be necessary.

As a reminder: *PLEASE* do not crawl this site or scrape the HTML in other ways. If these dumps still don't provide what you need, let me know about your use case and I'm sure we can work something out.


Finally, I also had the idea (and in fact, an implementation as well) to provide weekly updated tarballs of all images referenced from the database dumps. Unfortunately, I'm convinced that my poor server isn't going to be happy with serving a 20G file to some hundreds of people, so we'll need to work out a better distribution channel. If you're an experienced sysadmin with a fair bit of bandwidth to spare, please get in touch. If that doesn't work out, we'll probably have to play around with Bittorrent. But scripting torrents is kind of annoying...

Update 2019-08-04: Images are available now, see #9 and d14#6.Last modified on 2019-08-04 at 07:44
#2 by kurothing
2019-07-22 at 11:01
I would love to see a more... Common file type used for the database dump...

As it stands, as a primarily windows user, there isn't really a nice way to open ".zst" files.

Sure, there is the command line app released by facebook (No thanks) and the 7zip fork that requires ending explorer.exe / restarting windows to completely uninstall... But wouldn't a gzip tar like the other files be fine?

That's my only request in regards to the file. While the active format being weird as it is (A tab seperated, new line delimited file... Seems a bit out there, maybe for a few hundred kb extra a series of json arrays would have been easier?), I'm honestly happy to see that in the unfornate event that something goes horribly wrong, the time and effort put in via the community to vndb would not be wasted.

And reading that back, it sounds really mean! I'd be perfectly happy for vndb, and you yourself Yorhel to continue to operate as-is! Especially with all the free time you have dedicated to this project, I cannot be more grateful.
#3 by yorhel
2019-07-22 at 11:11
7zip fork
Wait, main 7zip doesn't support .zst yet? That'd be pretty damning, and I understand your frustration then. That said, gzip turned out to be a bottleneck in creating the dump, zstd does it twice as fast *and* saves an additional 8MB. Here's hoping it'll get more mainstream soon.

A tab seperated, new line delimited file
It's the PostgreSQL COPY format, provides easy exporting and importing (more so than JSON) and isn't terribly uncommon (in fact, I pretty much copied the format of the MusicBrainz dumps).
#4 by hi117
2019-07-22 at 11:17
I have a little bit of free time that I could spend on something like this. As far as experience goes you can check my other posts on Reddit (u/hi117). I currently run most of the public facing web infrastructure at Malwarebytes.

I have a few ideas that can make this pretty easy let me know if you're interested.
#5 by yorhel
2019-07-22 at 12:17
@hi117: Mail me at contact@vndb.org, I'm interested in hearing those ideas!
#6 by roadi
2019-07-22 at 16:36
Thank you, this is most welcome.

Now, if you'll excuse me, I'll go do some perversely convoluted queries just for the sake of it. :P
#7 by rampaa
2019-07-22 at 19:37
Alternatively, a script is provided to load the data into a PostgreSQL database for easy querying. See import.sql for options and usage information
I couldn't find "import.sql" in "vndb-db-latest.tar.zst".
#8 by yorhel
2019-07-22 at 19:44
I couldn't find "import.sql" in "vndb-db-latest.tar.zst".
Crap, my bad. That file got lost when I changed some tar args. Here's the script: link
It'll be in the next dump, too.
#9 by yorhel
2019-08-04 at 06:09
The images are up: d14#6

Decided to go with an rsync server. If the number of downloads of the image database are anywhere near that of the main database dumps, then the server ought to be able to handle the load just fine. It's currently limited to 500 KiB/s and 10 concurrent connections, so downloading an initial copy may take a full day and the server may not always have free connections during the first few weeks. I expect to relax these limits as the initial surge of downloads dies out after a while.

Reply

You must be logged in to reply to this thread.