Don't Crawl VNDB!

Posted in

#26 by klutch
2012-03-30 at 06:49
< report >you could always deny application fetching outside of browsers.
#27 by leion8000
2012-04-06 at 00:09
< report >"We can rebuild it, faster, stronger...portabul" -Ben-Heck show opening
#28 by warfoki
2012-04-06 at 01:10
< report >Awesome. So your point is?
#29 by leion8000
2012-04-06 at 04:11
< report >build a new (ish) server by scrapping parts from older servers and make it a serving beast.
\m/ (>_<) \m/
- the portabilaty <---(joke)
add a donation tab or somthing to pay for more bandwidth. id donate ten bucksto this site. maby have a small ad in the corner of the page. (just thinking out loud)

(Pleas nothe im a n0ob to this site so i dont exactly know the inner workings of this site.)Last modified on 2012-04-06 at 04:14
#30 by warfoki
2012-04-06 at 05:57
< report >As of now Yorhel (the admin, if that wouldn't be clear) pays for everything from his own pocket. The site runs on his own server, not on a rent one. The monthly cost is around 50 bucks and the server itself isn't a top notch one. At least this was the situation a few months back, when I saw Yorhel's post about it in one of the threads, hell knows in which one exactly.
#31 by yorhel
2012-04-06 at 06:18
< report >
maby have a small ad in the corner of the page.
The problem with ads is that they don't get you anything until you move them to the most annoying place imaginable.

That said, the affiliate links currently cover the server costs. More or less. Also, I manage 6 other dedicated servers that I am allowed to use should VNDB require that. But as I said: overkill-dimensioning everything to keep bot owners happy is just plain stupid and a chore.
#32 by leion8000
2012-04-06 at 07:01
< report >good points.
I got a little carried away... again lol.
Great site by the way (^_^)
#33 by pendelhaven
2012-04-16 at 19:23
< report >this came very late. And I know vndb is more or less crawl-free, but I think the reason was the quotes on the bottom. As someone said, pressing F5 had never been this fun.
#34 by bikz
2012-05-18 at 20:19
< report >I read all the comments, but at comment number 16, I remembered that AniDB.net has a protection system like this.I bookmarked a lot of pages from AniDB and once when I was doing some cleaning (to remove titles I wasn't interested in watching anymore) I tried to open 30-40 pages at once.But surprise,surprise...an error message appeared saying something about consuming too much bandwidth and that i'm banned for 24h from viewing anything else except the main page.
#35 by nitroxgen
2012-06-12 at 06:48
< report >Well, i accidentally found something. Perhaps it's related?

link

it's said that they have cached version of VNDB's database and reloaded every five days or so...
#36 by yorhel
2012-06-12 at 07:01
< report >Lol. Tinfoil nicely uses the API for that. I'd encourage such use of the database.
#37 by micah686
2012-08-06 at 02:26
< report >Well, I had thought about downloading a section of the website, but after seeing the message, and understanding that the data changes often, I decided against it.
#38 by bobwps3
2012-08-27 at 17:16
< report >I agree with yorhel 100%. Writing code for socket level protocols isn't very hard, I remember how I was surprised about how easy it was the first time I used them when I wrote an imap/nntp/smtp client.

http is used way beyond what it should be used for, I blame stuff like google groups for this mentality, where you basically have to use stuff like libwww or mechanize just to make an app frontend, which can be broken at any time by small ui changes by google. (that said, in their case, maybe the google api covers this)
#39 by ninjamask
2013-03-22 at 03:52
< report >BTW... is there a list of all the clients which uses the API? (like link)
#40 by kelpie
2013-03-22 at 05:14
< report >yorhel listed some in t3599.11, though there are probably others (e.g. I use one I wrote myself, but it's not publicly available).
#41 by irx
2013-11-22 at 07:02
< report >Sorry for a little necroposting, but I've been wondering as of late - I know vndb source code is open, but does anyone except yorhel have the actual database backups?

In the light of what've become of mobygames, it's somewhat troubling that existence of vndb many of us put a lot of time in depends on a single person. Because, you know, an asteroid can fall on the city where yorhel lives and vndb is lost.

Maybe something like making a yearly full backup torrent could provide the survival of vndb just in case of something going wrong?

I also noticed the legal things are not defined anywhere, maybe attaching some kind of open license would be a good idea?
#42 by space-ranger
2013-11-22 at 16:54
< report >I like the idea of a torrent backup. However just making a torrent will give info like login and emails for all users. To make this correctly, somebody would have to write a VNDB copy script, which deletes all secret info before making a torrent. That somebody could be anybody with access to the code, which mean everybody.

The quote database is also a secret because... well that is how Yorhel wants it.

But yeah we should have the option to save all the hard work included in the database in case the unexpected happens. Such a torrent will also provide a more realistic local test environment for code submitters.
#43 by yirba
2013-11-23 at 00:57
< report >Well, the database is made up of a bunch of different tables. Just make available the tables that are relevant (VNs, releases, characters, producers, etc.). The tables for users, quotes, etc. wouldn't be included.
#44 by yorhel
2013-11-23 at 07:22
< report >
I also noticed the legal things are not defined anywhere, maybe attaching some kind of open license would be a good idea?
As much as I like the idea, it's too much of a hassle. I don't even know who has the copyright on most of the data. I suspect that a good part of the metadata (most release info) isn't copyrightable. The parts that are are probably copyrighted by the contributor? In which case I'd have to ask everyone to allow it to be distributed under a certain license and delete stuff that can't be confirmed? For now the license a contributer gives to us is implicit: You're giving the data to VNDB knowing that we make it searchable and presentable, so I can only assume that means that that's what we're allowed to do. Then there's VN descriptions copied from many other sites without tracking their license. And then there's images which we include on a as-long-as-the-publisher-doesn't-complain-basis.
Copyright is a mess if you want to do it "right". So we're doing it wrong and hope that nobody complains.

Regarding a database backup: I don't feel too strongly against it, but it's important to define its goals and limits. Creating a dump that can be imported directly into a VNDB copy is going to be hard if we also want to exclude user-sensitive data. Not exporting the user-related tables would break a lot of foreign key references. The actual data tables also contain the full history of all entries and the changelog info that comes with it - including userid references and links to the discussion board.

An easier approach is to create a dump of all information that is accessible through the API without logging in. This excludes the history of each entry and makes it impossible to do a direct database import, but it does include most important data (except votes, if you care about that. Some character data hasn't been implemented yet, either).
Note that someone has already managed to write a "crawler" of sorts for the API before. The API limits aren't very strict, so it's actually possible to download the entire database in a few hours already.Last modified on 2013-11-23 at 07:24
#45 by irx
2013-11-27 at 09:05
< report >An official torrent is better than a crawler imo, for several reasons:
1) it doesn't require any additional skill and effort on the part of users, thus more copies will be made, making it easier for theoretical successors to find one when needed.
2) less load and traffic on vndb - 10+ people using crawlers is not very efficient.

The goal is to provide the survival of vndb in case of anything unpredictable happening to the site. What data to include is a tricky thing... ideally everything but the passwords and private lists should be included, which could be recreated via password restore functions, but it's probably safer to exclude all user data and histories.Last modified on 2013-11-27 at 09:16
#46 by silence
2016-12-04 at 05:37
< report >Not sure if this is a suitable topic, but... Why is vndb so slow in the past few months? Maybe I missed some terrible news?
#47 by yorhel
2016-12-04 at 06:19
< report >It's not been any slower for me. I see that you do have some VN and release filters active (using "save as default" button in filter selection), that feature has the potential to slow down all pages while you're logged in.
#48 by dk382
2016-12-04 at 08:25
< report >I've noticed slower image loading over the last few months. I thought it might have been due to my new ISP because my connection has been a little spotty, but even when the rest of the internet works great VNDB loads images slowly for me. It's variable, but it ranges from 1 second to 4 or 5 for a single image at its worst. It used to load images near instantly before.Last modified on 2016-12-04 at 08:25
#49 by silence
2016-12-04 at 09:13
< report >#47
These filters were active for years. I just reset them for a particular search without saving as default. vndb was always loading in milliseconds earlier. Now it's 3-4 seconds. The problem has appeared in the last half year or so.
#50 by yorhel
2016-12-04 at 10:01
< report >I enabled some performance logging on the server so see if it's on the response generating end, but things are looking fine so far. There's still many places where slowness can occur, however. :(

If the slow loading is easily reproducible, could I get a screenshot of the network monitor? Especially the 'timing' tab of a slow resource is interesting. Chrome has a similar feature, but I'm not too familiar with that one.

EDIT: To illustrate, a screenshot like this would be great. :)Last modified on 2016-12-04 at 10:07