Go Back   AnandTech Forums > Software > Programming

Forums
· Hardware and Technology
· CPUs and Overclocking
· Motherboards
· Video Cards and Graphics
· Memory and Storage
· Power Supplies
· Cases & Cooling
· SFF, Notebooks, Pre-Built/Barebones PCs
· Networking
· Peripherals
· General Hardware
· Highly Technical
· Computer Help
· Home Theater PCs
· Consumer Electronics
· Digital and Video Cameras
· Mobile Devices & Gadgets
· Audio/Video & Home Theater
· Software
· Software for Windows
· All Things Apple
· *nix Software
· Operating Systems
· Programming
· PC Gaming
· Console Gaming
· Distributed Computing
· Security
· Social
· Off Topic
· Politics and News
· Discussion Club
· Love and Relationships
· The Garage
· Health and Fitness
· Home and Garden
· Merchandise and Shopping
· For Sale/Trade
· Hot Deals with Free Stuff/Contests
· Black Friday 2014
· Forum Issues
· Technical Forum Issues
· Personal Forum Issues
· Suggestion Box
· Moderator Resources
· Moderator Discussions
   

Reply
 
Thread Tools
Old 11-26-2012, 09:12 PM   #1
TechBoyJK
Lifer
 
TechBoyJK's Avatar
 
Join Date: Oct 2002
Posts: 13,587
Default Managing a 'hit' counter

I'm working on an app, and I want to be able to pull stats on how many times certain pages are viewed.

As of now, I have a method that works, but I know it's not efficient.

I keep a log of hits in a table. hitID, pageID, date, ip, user, etc. etc.

In order to compile 'how many' hits, I was just doing a basic 'select hitID where pageID=pageID' and using the returned 'record count'.

However, I'm doing an ajax ticker that can be enabled to see hits scroll through in real time.

What I'm thinking about doing is creating another table called 'hit count' and just doing a +1 to the hitID's hit number everytime a new hit is logged. I can always recompile the number, and I'm thinking having that number available immediately via db rather than having to scan a bunch of records counting will be incredibly faster.

And since I'd have that number availabe, I could do a simple ajax call to a method that returns just the hit count via json. With this, I can check that number before making a larger request to get updated info.

make sense? Thoughts?
__________________
Originally posted by: n0cmonkey
You're being difficult. You have not provided us with the information we need to troubleshoot the problem. You have not given us errors, you're vague about where the problem is, you are not answering the questions we ask. In short, you DON'T KNOW HOW TO THINK. Give back that piece of paper you think makes your intelligent, apologize, destroy your computer and go live in the woods kid.
TechBoyJK is offline   Reply With Quote
Old 11-26-2012, 09:53 PM   #2
Red Squirrel
Lifer
 
Red Squirrel's Avatar
 
Join Date: May 2003
Location: Canada
Posts: 28,373
Default

Yeah doing the +1 route is much more efficient. Also for getting the count you can also do select count(hitID)) as number; and then number will have the value. But storing the actual value somewhere and doing +1 is better.
__________________
~Red Squirrel~
486dx2 @66Mhz turbo, 8MB ram, 512MB HDD, sound blaster 16 + 2x cdrom, Trident 1MB video card @ 640*480, 56k high speed modem.
Red Squirrel is offline   Reply With Quote
Old 11-27-2012, 08:30 AM   #3
beginner99
Platinum Member
 
Join Date: Jun 2009
Posts: 2,217
Default

Quote:
Originally Posted by Red Squirrel View Post
But storing the actual value somewhere and doing +1 is better.
Proof? Doing +1 will require a select + update and hence some kind of locking to be accurate and with high traffic, than can be problematic. Just inserting is trivial compared to that. Also Select count(*) From myTable is usually pretty fast unless you have millions of rows and in that case you can cache it.

Not saying your wrong but just saying it is better without proof is troublesome. It will depend from case to case and if there actual is a performance issue (doubtful) he will need to try himself what works better.

@OP

Is this really probelmatic? Sounds a bit like premature optimization. And as mentioned use select count(*).
beginner99 is offline   Reply With Quote
Old 11-27-2012, 09:40 AM   #4
TechBoyJK
Lifer
 
TechBoyJK's Avatar
 
Join Date: Oct 2002
Posts: 13,587
Default

Quote:
Originally Posted by beginner99 View Post
Proof? Doing +1 will require a select + update and hence some kind of locking to be accurate and with high traffic, than can be problematic. Just inserting is trivial compared to that. Also Select count(*) From myTable is usually pretty fast unless you have millions of rows and in that case you can cache it.

Not saying your wrong but just saying it is better without proof is troublesome. It will depend from case to case and if there actual is a performance issue (doubtful) he will need to try himself what works better.

@OP

Is this really probelmatic? Sounds a bit like premature optimization. And as mentioned use select count(*).
Well, my thinking is that if someone has a busy page, and it gets a lot of hits, it would be easier to

a) do a select on a small table (3 columns max) to a specific record. Since this would be a specific record, a clustered index would help and it would be remarkably more efficient to select this 1 record, and +1 a numeric column everytime there is a hit

versus

b) skip a, but everytime someone wants to view stats on the page (how many hits, etc) I'll need to do a count on the actual hit table, and anytime the page refreshes (could be up to once per 10 seconds) I'll need to find out if there are any more hits before I retrieve the rest of the info.

It seems more efficent to +1 a separate column and be able to ping that column for finding out if there have been any new hits.
__________________
Originally posted by: n0cmonkey
You're being difficult. You have not provided us with the information we need to troubleshoot the problem. You have not given us errors, you're vague about where the problem is, you are not answering the questions we ask. In short, you DON'T KNOW HOW TO THINK. Give back that piece of paper you think makes your intelligent, apologize, destroy your computer and go live in the woods kid.
TechBoyJK is offline   Reply With Quote
Old 11-27-2012, 10:31 AM   #5
beginner99
Platinum Member
 
Join Date: Jun 2009
Posts: 2,217
Default

Quote:
Originally Posted by TechBoyJK View Post
Well, my thinking is that if someone has a busy page, and it gets a lot of hits, it would be easier to

a) do a select on a small table (3 columns max) to a specific record. Since this would be a specific record, a clustered index would help and it would be remarkably more efficient to select this 1 record, and +1 a numeric column everytime there is a hit

versus

b) skip a, but everytime someone wants to view stats on the page (how many hits, etc) I'll need to do a count on the actual hit table, and anytime the page refreshes (could be up to once per 10 seconds) I'll need to find out if there are any more hits before I retrieve the rest of the info.

It seems more efficent to +1 a separate column and be able to ping that column for finding out if there have been any new hits.
Your probably right but I like to think about certain implications in more depth.

How will you do the +1? It requires updating the same database row for every +1. that row will be locked and hence that can become a bottleneck. Once per 10 seconds is not that often. you can easily cache the count for 10 sec then which will reduce load on database a lot.
beginner99 is offline   Reply With Quote
Old 11-27-2012, 04:36 PM   #6
KentState
Diamond Member
 
KentState's Avatar
 
Join Date: Oct 2001
Location: Atlanta, GA
Posts: 5,777
Default

Why even bother with the database? Just use an array at the application level to track a counter by page. You can update the database when the app is destroyed or on certain intervals. At the same time, just do an insert into the database on each page visit to prevent loss of history. When the app loads, get a count from the database and start incrementing the array from there.
KentState is online now   Reply With Quote
Old 11-27-2012, 10:08 PM   #7
brandonbull
Platinum Member
 
brandonbull's Avatar
 
Join Date: May 2005
Posts: 2,522
Default

Try looking into tagging your pages using google analytics. Basically you have javascript code that fires after certain events occur on a web page.
brandonbull is offline   Reply With Quote
Old 11-28-2012, 02:46 PM   #8
Cerb
Elite Member
 
Cerb's Avatar
 
Join Date: Aug 2000
Posts: 15,971
Default

Quote:
Originally Posted by TechBoyJK View Post
Well, my thinking is that if someone has a busy page, and it gets a lot of hits, it would be easier to

a) do a select on a small table (3 columns max) to a specific record. Since this would be a specific record, a clustered index would help and it would be remarkably more efficient to select this 1 record, and +1 a numeric column everytime there is a hit
Except that if it is a busy page, you can only update the record as fast as the DB can commit data. Is your server running a SSD? In MySQL with InnoDB, Postgres, or SQL Server, at the least, this could work fine until the load gets high, and then burn you with disk wait time (also, keep in mind that, "disk wait time," starts at the filesystem buffers, and ends at the disk--the storage device itself is not the only possible bottleneck, when loading up on IOPS to a small number of files).

With pure inserts, the separate inserts can be batched and committed in any order, instead of each update query having to wait on the previous one. Using MySQL with a "faster" engine may fix the performance problem, if using MySQL, but for this kind of case, that just begs for corruption or otherwise bad updates. Using separate inserts, inserts occurring while updating stat summaries will also be able to commit concurrently, both relative to each other, and the stat-gathering query.

Quote:
b) skip a, but everytime someone wants to view stats on the page (how many hits, etc) I'll need to do a count on the actual hit table, and anytime the page refreshes (could be up to once per 10 seconds) I'll need to find out if there are any more hits before I retrieve the rest of teh info.

It seems more efficent to +1 a separate column and be able to ping that column for finding out if there have been any new hits.
It may be less efficient, but it will scale.

There is, however, a middle solution:
1. Have another table, with only summarized results. These summaries include a timestamp column, which is the last page view in the summary, so that you have a data-only means to verify them.
2. Insert into the hit log table, like now.
3. Compare the latest timestamp in the hit log with the latest timestamp in the summary. Stp here if not changed. Otherwise:
4A. When stats get requested, update the summary, and remove those rows used for updating the summary from the main hit table, if there were any new ones.
4B. Alternatively, if the main hit table is used for other things, use the timestamp in the summary to get only newly updated rows.

In either #4, the amount of rows that need to be counted over is minimized, but you still get lazy caching, and the hit log won't need to be able to grow indefinitely, at least with 4A.

Or, if using Postgres, SQL Server, or another more robust DB, just make an indexed view for the summarized results, and move on.
__________________
Quote:
Originally Posted by Crono View Post
I'm 90% certain the hipster movement was started by aliens from another galaxy who have an exaggerated interpretation of earth culture(s).

Last edited by Cerb; 11-28-2012 at 03:16 PM. Reason: I was thinking of a locking event that won't occurr.
Cerb is offline   Reply With Quote
Old 12-01-2012, 04:36 PM   #9
Leros
Lifer
 
Leros's Avatar
 
Join Date: Jul 2004
Location: Austin, TX
Posts: 21,605
Default

Cerb makes some good points. I wouldn't update the hit count every request. If you have a high traffic site, you'll be putting a lot of load on the database for such for small gain. You don't need instant updates to your hit counter. Having a minute or two of lag is quite fine.

You can store hits in memory and update the hit count in the database periodically, which is probably good enough for most use cases. I would do it a bit differently though. Every request would write an entry to a log file. Periodically, a cron job executes that reads in the log file, counts new hits to all the various pages, and updates the hit counter in the database.
Leros is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 12:15 AM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.