Hey folks I have a question that can't really be solved by the "try it and see how it works" method.
At work we got 9 web-servers. All are running IIS, has from 128-400 IPs allocated for each machine. Websites are run in an independent
service in IIS. Each IP dumps it's weblogs in a folder on the server it's running on (ie E:\logs\www\ip_addr\w3svc\*.log ). The E:\logs\www\
folder is shared on the private network so out statistics server can read them. We recently migrated to Urchin (fantastic product IMHO, and
actually works unlike Deepmetrix's offerings). We currently have it set for the stats server to process the logs daily. This process takes around
2-3 hours to complete (logs for ~1400 websites). I am confident that part of the slowness is due to accessing the logs over Windows File Sharing (WFS).
Urchin can read logs over FTP but unfortunately doesn't take wildcard characters and all of our logs are timestamped, so each day the log files have
a different name.
At the moment, I am considering the following solutions :
A - Leave it how it is... who cares if it takes 3 hours to process the logs. It starts at 2am in the morning and will be done before everyone has
has their first cup of coffee.
PROS - no work for me
CONS - What if we have 3000 or 6000 websites? The solution might work now, but what about in the future.
B - Create a share on the stats server, and configure the web servers to dump their log files over the WFS
PROS - Urchin should be able to import the logs much faster (no WFS overhead). Centralized location for logs files, can simply other things. ie log file
backups.
CONS - Have to write a script to change the log location for every website. Will generate a lot of traffic over the private network.
C - Write a script that will copy all the log files from the web servers to the stats server once a day.
PROS - simple, can be done using a basic batch script. Only happens once a day so it should not affect the private network performance very much.
CONS - Have to coordinate log_rotators and the copy script to minimize potential data (log file) loss.
I am still pondering this, but I would love to hear some feedback on this... if you don't mind, share your statistics software setups...
thanks
At work we got 9 web-servers. All are running IIS, has from 128-400 IPs allocated for each machine. Websites are run in an independent
service in IIS. Each IP dumps it's weblogs in a folder on the server it's running on (ie E:\logs\www\ip_addr\w3svc\*.log ). The E:\logs\www\
folder is shared on the private network so out statistics server can read them. We recently migrated to Urchin (fantastic product IMHO, and
actually works unlike Deepmetrix's offerings). We currently have it set for the stats server to process the logs daily. This process takes around
2-3 hours to complete (logs for ~1400 websites). I am confident that part of the slowness is due to accessing the logs over Windows File Sharing (WFS).
Urchin can read logs over FTP but unfortunately doesn't take wildcard characters and all of our logs are timestamped, so each day the log files have
a different name.
At the moment, I am considering the following solutions :
A - Leave it how it is... who cares if it takes 3 hours to process the logs. It starts at 2am in the morning and will be done before everyone has
has their first cup of coffee.
PROS - no work for me
CONS - What if we have 3000 or 6000 websites? The solution might work now, but what about in the future.
B - Create a share on the stats server, and configure the web servers to dump their log files over the WFS
PROS - Urchin should be able to import the logs much faster (no WFS overhead). Centralized location for logs files, can simply other things. ie log file
backups.
CONS - Have to write a script to change the log location for every website. Will generate a lot of traffic over the private network.
C - Write a script that will copy all the log files from the web servers to the stats server once a day.
PROS - simple, can be done using a basic batch script. Only happens once a day so it should not affect the private network performance very much.
CONS - Have to coordinate log_rotators and the copy script to minimize potential data (log file) loss.
I am still pondering this, but I would love to hear some feedback on this... if you don't mind, share your statistics software setups...
thanks