Don't Crawl VNDB!

Posted in

#1 by Yorhel
2012-02-10 at 18:29
< report >I seriously doubt this message will reach those for whom it is intended, but it can't hurt to try.

$$$$$$$\ $$$$$$\ $$\ $$\ $$\ $$$$$$$$\
$$ __$$\ $$ __$$\ $$$\ $$ |$ |\__$$ __|
$$ | $$ |$$ / $$ |$$$$\ $$ |\_/ $$ |
$$ | $$ |$$ | $$ |$$ $$\$$ | $$ |
$$ | $$ |$$ | $$ |$$ \$$$$ | $$ |
$$ | $$ |$$ | $$ |$$ |\$$$ | $$ |
$$$$$$$ | $$$$$$ |$$ | \$$ | $$ |
\_______/ \______/ \__| \__| \__|

$$$$$$\ $$$$$$$\ $$$$$$\ $$\ $$\ $$\
$$ __$$\ $$ __$$\ $$ __$$\ $$ | $\ $$ |$$ |
$$ / \__|$$ | $$ |$$ / $$ |$$ |$$$\ $$ |$$ |
$$ | $$$$$$$ |$$$$$$$$ |$$ $$ $$\$$ |$$ |
$$ | $$ __$$< $$ __$$ |$$$$ _$$$$ |$$ |
$$ | $$\ $$ | $$ |$$ | $$ |$$$ / \$$$ |$$ |
\$$$$$$ |$$ | $$ |$$ | $$ |$$ / \$$ |$$$$$$$$\
\______/ \__| \__|\__| \__|\__/ \__|\________|

$$\ $$\ $$\ $$\ $$$$$$$\ $$$$$$$\
$$ | $$ |$$$\ $$ |$$ __$$\ $$ __$$\
$$ | $$ |$$$$\ $$ |$$ | $$ |$$ | $$ |
\$$\ $$ |$$ $$\$$ |$$ | $$ |$$$$$$$\ |
\$$\$$ / $$ \$$$$ |$$ | $$ |$$ __$$\
\$$$ / $$ |\$$$ |$$ | $$ |$$ | $$ |
\$ / $$ | \$$ |$$$$$$$ |$$$$$$$ |
\_/ \__| \__|\_______/ \_______/

While it may be normal for some very popular websites, 20 hits per second is *definitely* not normal for VNDB. My cheap server wasn't designed for that, and I have no intention to overkill-dimension everything just because some people think they have something to gain by crawling VNDB.

Please, if you need something, just contact me.Last modified on 1970-01-01 at 00:00
#2 by kalikai2188
2012-02-10 at 22:36
< report >Yorhel, I don't really get it about "Don't Crawl VNDB"'s slogan. Would you mind to explain in detail here please? O.O''Last modified on 2012-02-10 at 22:41
#3 by lne
2012-02-10 at 23:04
< report >I'm guessing it has to do with this.
#4 by abyssaleros
2012-02-10 at 23:11
< report >Ine, that is probably half of the truth - I think someone has used a tool like WinHTTrack on VNDB, to get himself an offline copy.
DON'T DO IT - It is idotic, as the database changes from day to day, so an offline copy is useless and just kills the server bandwiths.

Should have read the wikipedia entry, as the tool is mentioned at he bottom.
Me <<<< Idiot too.Last modified on 2012-02-10 at 23:15
#5 by yirba
2012-02-11 at 01:20
< report >Well, you could just block people who are doing that sort of thing.

Of course, if you want to access the database, there's always the API (although I'm not too sure how to actually connect to it… would be nice to be able to access via HTTP).
#6 by Yorhel
2012-02-11 at 08:10
< report >I'm sure the message will be understood by those for whom it is intended. I should also mention that it's not only the pages themselves that are being crawled, but for a large part also images (screenshots, VN images, etc). I really don't have the kind of bandwidth to allow *everyone* to download all of those.

@yirba: Ah, thanks for mentioning the API, totally forgot about that. >_>

What saddens me, however, is that most people think that HTTP is "simpler". You're definitely no exception in that, but it's just stupid. HTTP is an overly complex protocol and should never have been used for anything besides web surfing. Whereas plain TCP connections (as used by the API), are much more simple... but I guess HTTP is the first thing people teach themselves to use and then forget all about the existance of alternative means of communication. :-(
#7 by sunclaudius
2012-02-11 at 08:29
< report >Hmm Just Like Grabbed vndb Database? 0.0a
#8 by pendelhaven
2012-02-11 at 16:53
< report >yeah I'm also experiencing very slow bandwidth when looking at screenshots. I thought it was the server needing more charcoal or steam or some other locomotive shit, but it wasn't like that.
#9 by max1337
2012-02-11 at 17:18
< report >Bring a torrent with complete database (aside user accounts and votes) and pictures. Problem solved.
#10 by abyssaleros
2012-02-11 at 18:11
< report >Where is the need for a torrent???
Do not see one. The moment you post a torrent the torrent is outdated and you do not have the fun to read all those funny posts like "Is someone translating this game?" or "My opinion is far surperior than yours!!" and "How could I learn japanese with nukiges?"Last modified on 2012-02-11 at 18:12
#11 by unravel
2012-02-11 at 18:19
< report >
Bring a torrent with complete database (aside user accounts and votes) and pictures. Problem solved.

But, as abyssaleros have said before, it's rather unreasonable. Unless someone's going to create an imposter database, and at this rate, I wouldn't move a pinky in admins' place. Don't screw VNDB, please, whoever is the person breaking the server.Last modified on 2012-02-11 at 18:21
#12 by surferdude
2012-02-11 at 19:22
< report >In Soviet Russia, VNDB screws you.
#13 by soketsu
2012-02-12 at 04:47
< report >
Bring a torrent with complete database (aside user accounts and votes) and pictures. Problem solved.
That definitely solves my problem of being off-line most of the time. I can only hope yorhel would upload at least until 2011 database.

...but please, if ever, not in torrent.Last modified on 2012-02-25 at 01:28
#14 by rusanon
2012-02-12 at 05:25
< report >@yorhel While HTTP is indeed complex, its primary selling point is vast amount of ready and simple to use libraries. You don't have to implement anything, esp. for HTTP-based protocols like JSONRPC - you just pick whatever JSONRPC library you prefer and it just works.
With custom TCP protocols you have to craft everything yourself based on nothing better than sockets.
#15 by Yorhel
2012-02-12 at 08:18
< report >
While HTTP is indeed complex, its primary selling point is vast amount of ready and simple to use libraries.
I realize that, but there are four reasons why I chose not to use it anyway:

1. Stacking protocols on top of protocols for the sole reason that there happen to be easy-to-use libraries for it is bad design. This is a somewhat equivalent rant to link

2. Sockets aren't hard to use, people just don't know how to use them. Seriously, any self-respecting high-level language has a very easy to use socket interface, and the VNDB API is designed so that it's pretty easy to make use of with just a socket API.

3. JSONRPC (or rather, this applies to pretty much any *RPC and RESTful services) is not designed as a data query language. I could provide a function to lookup VN info by ID, and then a separate function to lookup VN info by search string, and then one for language filters, and ... etc. It's an inflexible plain to use, it works fine for simple things but certainly not for VNDB.

4. Those "easy-to-use" libraries tend to be huge. I wouldn't want to ship an application with 5MB of libraries when 10KB would have been more than enough.

Either way, I realize that this rant isn't going to change things. So instead I'll just offer my help to people who wish to use the API but are too scared of sockets.
#16 by lne
2012-02-13 at 14:28
< report >Well it would help if you could somehow block requests that happen too soon one after the other. To tell the truth I don't even know if that's possible or even if you can distinguish crawler traffic from normal traffic at all, but if it was possible...
#17 by yirba
2012-02-14 at 20:09
< report >@lne: It's probably possible. After all, 20 hits per second is far from normal. It's just a matter of choosing a suitable threshold.
#18 by horseband
2012-02-19 at 18:09
< report >@13

Torrent is the best form of distribution for something like that. Finding someplace free to host a relatively big file like that and give free bandwidth is not easy. Sure you can break it into 15 different downloads and use something like mediafire but whats the point when you could just consolidate it into a single torrent? On top of that he wouldn't have to waste ridiculous amounts of bandwidth.

I never understood what some people have against torrents (As a means of downloading files, I'm not talking about the illegal content). Why exactly do you dislike them?
#19 by ganchan
2012-02-19 at 19:53
< report >@18

In my case, there are two reasons
1- I just don't feel very comfortable with them and with the client thing. (I know is something totally illogical, maybe because I don't know it very well)
2- The fact that to download it, I have to depend on someone who is seeding it, in that right moment (something that I also disliked from eMule). And if everyone deletes the data after using(like I usually I do), just after one month you won't find the seed, true hotfile can also expire the link, but I just have to depend on the uploader instead of many people downloading.
And I know I could be wrong as my knowledge of torrent is low, but for the things I know I can say I dislike it and that I prefer 15 different downloads, in a DD way.
#20 by shini
2012-02-19 at 21:24
< report >The only thing I have against torrents is that I'm on a university internet line and torrenting results in having to pay a fine if caught.
#21 by kelpie
2012-02-23 at 12:34
< report >Incidentally, I've been tinkering, on and off, with a design for a VNDB client library. If someone had some plan they'd need to use the API for, I could probably dust it off and put together a special-purpose program, if needed. It's been a while since I've touched it, but I assume the API hasn't changed since then.
#22 by soketsu
2012-02-25 at 01:34
< report >@19
lets add one more
3- Torrents are inefficient in the part of the deistirbutor/sharer. You gotta have to seed it 24/7 to cater the downloads all the time. In file hosting, on the other hand, you only have to spend couple of minutes or hours uploading.
#23 by chrnno
2012-03-01 at 23:15
< report >Err... Isn't the main argument against a VNDB torrent the fact it is constantly being changed so a new torrent would have to be created very often.
#24 by LordCapsLock
2012-03-10 at 17:54
< report >
Err... Isn't the main argument against a VNDB torrent the fact it is constantly being changed so a new torrent would have to be created very often.
That's the point yeah. Well on the other hand, I doubt hat those people will stop with this shit just because of this message/thread here. Let's hope for the bset.
#25 by selfex1led
2012-03-25 at 22:03
< report >I'm not sure if this could work as a solution because it might be a bit extreme BUT perhaps you can limit the people who come to visit this website by requiring them to register first before viewing? I dunno.

But come to think of it, this above idea would defeat the spirit of this page which is meant to be a wiki.

Never mind.Last modified on 2012-03-25 at 22:06