Recently my crawling attempts came to a very swift end. I may filled up dreamhosts mysql server ‘madal’ which caused multiple peoples websites to go down until the issues were resolved. Unfortunately for me the resolving meant them moving my mysql information to a temporary location until I contacted customer support.
Turns out that their ‘unlimited’ policy only allows for 3GB of mysql usage rather than the 109GB of indexed websites I had sitting on there. This means that I had to find another solution for my problem.
Solution A: Dump archives of the indexes, dump raw content, dump backups and try to work with virtually no storage.
Solution B: Go and buy a mini form factor computer, pull the dvd drive out and replace it with a second hard disk. Turn the computer into a home server and start the process again with a better code design behind it.
I went with solution B because it allowed me to have a new toy and work with my own system. The server has an i3 putting out just over 3.0Ghz, 3TB total of hdd and 8GB of ram. This is hooked up via gigabit and allows me to better work things out and write some of the processing scripts in C++. This way I can constantly crunch information.
I don’t mean to hate on dreamhost. They really are an excellent company and they are pretty immediate when it comes to answering your problems for help. The so called unlimited policy is on the retarded side of things but for what I pay annually I can’t complain.
stored urls: 1.8m
checked urls: 56m
url count: 27m