Quote:
Originally Posted by handsonhosting
How quickly do you want to index those pages - that's going to be your determining factor on this. The bigger the machine (or cluster of machines) the faster it will get done. I think you'll be dragging along if you opt for a VPS, but it depends on how quickly you need the indexing done and how often you're spidering back.
I've not worked with Nutch myself, but based on some of the reading that I've done it seems to be a memory intensive program, so be sure to get enough memory. There's a number of places that recommend NOT using a VPS server for this due to the constraints.
|
The speed does not really matter, I need to reindex these pages once a 3 mounths, but the most important for me is the disk space I need at least 50 GB since I index about 1 million pages.