Managing a 'hit' counter

Discussion in 'Programming' started by TechBoyJK, Nov 26, 2012.

  1. TechBoyJK

    TechBoyJK Lifer

    Joined:
    Oct 17, 2002
    Messages:
    16,296
    Likes Received:
    5
    I'm working on an app, and I want to be able to pull stats on how many times certain pages are viewed.

    As of now, I have a method that works, but I know it's not efficient.

    I keep a log of hits in a table. hitID, pageID, date, ip, user, etc. etc.

    In order to compile 'how many' hits, I was just doing a basic 'select hitID where pageID=pageID' and using the returned 'record count'.

    However, I'm doing an ajax ticker that can be enabled to see hits scroll through in real time.

    What I'm thinking about doing is creating another table called 'hit count' and just doing a +1 to the hitID's hit number everytime a new hit is logged. I can always recompile the number, and I'm thinking having that number available immediately via db rather than having to scan a bunch of records counting will be incredibly faster.

    And since I'd have that number availabe, I could do a simple ajax call to a method that returns just the hit count via json. With this, I can check that number before making a larger request to get updated info.

    make sense? Thoughts?
     
  2. Red Squirrel

    Red Squirrel Lifer

    Joined:
    May 24, 2003
    Messages:
    36,981
    Likes Received:
    636
    Yeah doing the +1 route is much more efficient. Also for getting the count you can also do select count(hitID)) as number; and then number will have the value. But storing the actual value somewhere and doing +1 is better.
     
  3. beginner99

    beginner99 Platinum Member

    Joined:
    Jun 2, 2009
    Messages:
    2,832
    Likes Received:
    4
    Proof? Doing +1 will require a select + update and hence some kind of locking to be accurate and with high traffic, than can be problematic. Just inserting is trivial compared to that. Also Select count(*) From myTable is usually pretty fast unless you have millions of rows and in that case you can cache it.

    Not saying your wrong but just saying it is better without proof is troublesome. It will depend from case to case and if there actual is a performance issue (doubtful) he will need to try himself what works better.

    @OP

    Is this really probelmatic? Sounds a bit like premature optimization. And as mentioned use select count(*).
     
  4. TechBoyJK

    TechBoyJK Lifer

    Joined:
    Oct 17, 2002
    Messages:
    16,296
    Likes Received:
    5
    Well, my thinking is that if someone has a busy page, and it gets a lot of hits, it would be easier to

    a) do a select on a small table (3 columns max) to a specific record. Since this would be a specific record, a clustered index would help and it would be remarkably more efficient to select this 1 record, and +1 a numeric column everytime there is a hit

    versus

    b) skip a, but everytime someone wants to view stats on the page (how many hits, etc) I'll need to do a count on the actual hit table, and anytime the page refreshes (could be up to once per 10 seconds) I'll need to find out if there are any more hits before I retrieve the rest of the info.

    It seems more efficent to +1 a separate column and be able to ping that column for finding out if there have been any new hits.
     
  5. beginner99

    beginner99 Platinum Member

    Joined:
    Jun 2, 2009
    Messages:
    2,832
    Likes Received:
    4
    Your probably right but I like to think about certain implications in more depth.

    How will you do the +1? It requires updating the same database row for every +1. that row will be locked and hence that can become a bottleneck. Once per 10 seconds is not that often. you can easily cache the count for 10 sec then which will reduce load on database a lot.
     
  6. KentState

    KentState Diamond Member

    Joined:
    Oct 19, 2001
    Messages:
    6,860
    Likes Received:
    3
    Why even bother with the database? Just use an array at the application level to track a counter by page. You can update the database when the app is destroyed or on certain intervals. At the same time, just do an insert into the database on each page visit to prevent loss of history. When the app loads, get a count from the database and start incrementing the array from there.
     
  7. brandonbull

    brandonbull Platinum Member

    Joined:
    May 3, 2005
    Messages:
    2,527
    Likes Received:
    2
    Try looking into tagging your pages using google analytics. Basically you have javascript code that fires after certain events occur on a web page.
     
  8. Cerb

    Cerb Elite Member

    Joined:
    Aug 26, 2000
    Messages:
    17,409
    Likes Received:
    0
    Except that if it is a busy page, you can only update the record as fast as the DB can commit data. Is your server running a SSD? In MySQL with InnoDB, Postgres, or SQL Server, at the least, this could work fine until the load gets high, and then burn you with disk wait time (also, keep in mind that, "disk wait time," starts at the filesystem buffers, and ends at the disk--the storage device itself is not the only possible bottleneck, when loading up on IOPS to a small number of files).

    With pure inserts, the separate inserts can be batched and committed in any order, instead of each update query having to wait on the previous one. Using MySQL with a "faster" engine may fix the performance problem, if using MySQL, but for this kind of case, that just begs for corruption or otherwise bad updates. Using separate inserts, inserts occurring while updating stat summaries will also be able to commit concurrently, both relative to each other, and the stat-gathering query.

    It may be less efficient, but it will scale.

    There is, however, a middle solution:
    1. Have another table, with only summarized results. These summaries include a timestamp column, which is the last page view in the summary, so that you have a data-only means to verify them.
    2. Insert into the hit log table, like now.
    3. Compare the latest timestamp in the hit log with the latest timestamp in the summary. Stp here if not changed. Otherwise:
    4A. When stats get requested, update the summary, and remove those rows used for updating the summary from the main hit table, if there were any new ones.
    4B. Alternatively, if the main hit table is used for other things, use the timestamp in the summary to get only newly updated rows.

    In either #4, the amount of rows that need to be counted over is minimized, but you still get lazy caching, and the hit log won't need to be able to grow indefinitely, at least with 4A.

    Or, if using Postgres, SQL Server, or another more robust DB, just make an indexed view for the summarized results, and move on.
     
    #8 Cerb, Nov 28, 2012
    Last edited: Nov 28, 2012
  9. Leros

    Leros Lifer

    Joined:
    Jul 11, 2004
    Messages:
    21,881
    Likes Received:
    5
    Cerb makes some good points. I wouldn't update the hit count every request. If you have a high traffic site, you'll be putting a lot of load on the database for such for small gain. You don't need instant updates to your hit counter. Having a minute or two of lag is quite fine.

    You can store hits in memory and update the hit count in the database periodically, which is probably good enough for most use cases. I would do it a bit differently though. Every request would write an entry to a log file. Periodically, a cron job executes that reads in the log file, counts new hits to all the various pages, and updates the hit counter in the database.
     
Loading...