Forum is being scraped again

Page 35 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

GodisanAtheist

Diamond Member
Nov 16, 2006
8,102
9,352
136
Just dropping the following invite link for the AT Discord I made.


It's a ghost town and rightly so cause Discord is not a great message board replacement, but again can be an alright crash pad if the forums go down for an extended period of time.

Also tried signing up on @Red Squirrel 's forum but the registration email never showed up so be warned it's probably a honeypot trap and now I'm being watched by a dozen intelligence agencies and am listed is a sovereign state extremist somewhere.
 
Last edited:
  • Haha
Reactions: igor_kavinski

Red Squirrel

No Lifer
May 24, 2003
70,148
13,565
126
www.anyf.ca
Now can't receive password reset email.

Still working on that part, the new DKIM requirement is quite ridiculous and involves very extensive configuration changes, and seems very redundant when I already have SPF and DMARC records too. Google just keeps tacking on new requirements for mail servers to be able to send to them. You should be able to login with whatever password you already made though, or is that not working?
 

Red Squirrel

No Lifer
May 24, 2003
70,148
13,565
126
www.anyf.ca
Pretty sure I'm entering what I think is the right password but not working.

Can you send me a password reset link manually please?

Until I figure out the DKIM stuff I can't send mail to gmail, this must have literally JUST been added as it worked a few days ago when I was testing.

Try again now, your account went into activation mode again so it was rejecting login, I think it should work now. Don't issue a reset link just try to login directly.

If not will just wait until I get the DKIM stuff setup. I have a dozen domains I need to do it on. They keep adding these new requirements, they're all redundant too. There's another one called ARC too that I will have to look into... it's getting a bit ridiculous.
 
  • Like
Reactions: igor_kavinski

lxskllr

No Lifer
Nov 30, 2004
59,391
9,920
126
Is someone twiddling knobs here, or are the problems simply intermittent? It goes from awesome to terrible. Right now it's awesome.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,094
16,014
136
502s and 504s this morning.
Booted all the AI scraping bots.. for now. Cat and mouse these days.
So for somebody that is not fluent on current lingo, you rebooted the boxes that kill BOTs ? Can you schedule it so these reboot like once a week at a predetermined time ? We are used to things being off-line once a week at a certain time for an hour or two, just let us know when.
 

CParsons

Staff member
Dec 4, 2019
39
61
91
502s and 504s this morning.

So for somebody that is not fluent on current lingo, you rebooted the boxes that kill BOTs ? Can you schedule it so these reboot like once a week at a predetermined time ? We are used to things being off-line once a week at a certain time for an hour or two, just let us know when.
Never rebooted anything to kill the bots, just blocked their connections to the site through various methods (Not going to outline them here as I would like it to continue working). These are not hard numbers, but rather an example. As @igor_kavinski rightly pointed out, a large portion of users tuning into AnandTech are 'friendly' bots such as Google, Bing and guest accounts.

Assume those guest accounts generally hover around 5,000 - 10,000 users.. based on that it's safe to assume most of those are real people, just browsing the site, taking in the content posted but never registering or interacting. If that number shoots up to 350,000+ randomly as we've seen recently when it's never 'normally' that large, it's a pretty good indication that some bots are behaving badly and not following the rules, so you have to start investigating that traffic, seeing where it's coming from and start blocking it.

There's a MASSIVE gulf of difference between 5K-10K real users just browsing the site and consuming content vs. 350,000+ bots scraping your site, following every possible link, downloading every possible image, cataloguing whatever it possibly can. It sucks up resources, and leads to the slow downs we all see. AnandTech is a large site that has been here for years, lots to be gleaned from it to AI companies and such.

If you're looking to learn more, toss "ai bots flooding websites" into Google and start reading some articles. It's a massive problem at the moment, and only seems to getting worse as more AI companies come online and AI tools become increasingly available to end users.