creating a captcha

Red Squirrel

No Lifer
May 24, 2003
71,312
14,085
126
www.anyf.ca
What are some good techniques to use to avoid bots from being able to read captchas?

The one in phpbb is easily readable by bots even though in theory it looks like it would be hard. The minute I enable registration on my forum I find myself with 100's of auto generated accounts.

I'm thinking of variating font size, different backgrounds, adding characters that are hard for humans to see so the real code stands out, but have a bunch of fake codes in the background. (though, smart bots will still be able to count those out)

What are other good techniques?

Or should I totally change the way it works, like perhaps have randomly generated images of objects, and have to input what it is?
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
I think the captcha is about dead. The bad guys are distributing the work across vast botnets now, and seems like every week another supposedly strong captcha is cracked.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,836
4,816
75
CAPTCHA is an arms race. If you think about it in terms of evolution, the only way to guarantee winning in the long term (at least some of the time) is to have a CAPTCHA that has been tested against the best available OCR techniques and failed.

That's why I think ReCaptcha is the only CAPTCHA that will work in the long term. (Until, eventually, computers are better at reading than humans; but that should take several more years at least.)
 

troytime

Golden Member
Jan 3, 2006
1,996
1
0
don't use phpbb, or take out all the phpbb references so the spammer registration scripts don't find your site

and then, if you must captcha, use recaptcha
 

chronodekar

Senior member
Nov 2, 2008
721
1
0
It might not just be CAPTCHA. If I remember right, there was a report/article somewhere that some under-the-table companies where hiring people to just sit in front PC's and type in the CAPTCHA words....

You can imagine what a low-paying and *** life those *** live.

... :frown:

Coming back to the OP's point, I think some forums have a mandatory math question when you register with them. Like asking 8 + 35 = ?

Its not perfect, but it should help weed out the undesirables. (and some bots as well)
 

Red Squirrel

No Lifer
May 24, 2003
71,312
14,085
126
www.anyf.ca
I was thinking that too, not just have a traditional capcha, but maybe an image with an equation on it, or a picture of an item (actual photo, and then take like 100 photos of the same item to get variation). That would be longer to do though.

And yeah phpbb forums seem to be the most attacked by spammers.

What about an animated captcha, think that would work better? Not sure how to do that though, as thing png is really the only image type that is easy to create on the fly.

I was thinking of maybe an audio captcha, where it verbally says the code.

Eventually I want to finish coding my forum software from scratch, so I'll be sure to incorporate some of these ideas in it. Though I'm wondering if simply going away from something widely used may help, as some OCRs are probably designed to read specific types of captchas. Change the background image or the font and it should screw them up right?
 

chronodekar

Senior member
Nov 2, 2008
721
1
0
Just be careful so that even the most silly of your users will be able to de-crypt the 'CAPTCHA'.

:p
 

Red Squirrel

No Lifer
May 24, 2003
71,312
14,085
126
www.anyf.ca
lol yeah thats the thing, need to draw the line as to how complicated it is. I was messing with algorthms that threw in extra characters and scrambled them as well as adds random lines, but it became too hard to tell where the actual code is LOL. I'm also looking at using a phone verification service, but I suppose generating DMTF tones is not all that hard, so it would be thrown away money.

For now I'm hoping the fact that I'm not using something premade, will make it harder to decode. I'll also have post moderation setup. At one point I want to link all my web services to one central point of authentication for easier management.

Pretty much have to build this like fort knox. Game servers get attacked all the time by people with too much time on their hands.

I've thought of recaptcha as well, though I want to try to avoid having to rely on a 3rd party server. I'm sure they have tons of redundancy and stuff mind you, so I may consider it anyway.
 

Red Squirrel

No Lifer
May 24, 2003
71,312
14,085
126
www.anyf.ca
Yaeh might just go ahead and use recaptcha. Those look fairly easy to decipher though, or is there more to it? I did notice they have ones where you have to circle an object, that's a bit more advanced and I can't see a bot being able to do that. The kitten one looks interesting too. I'm thinking of maybe have a plain captcha durring registration, then when they get the email to confirm, they then get another captcha. I might go with that. Should not be too hard to add that to phpbb. I think I'm also going to change all the field names and have a javascript actually set the names if that's doable (I'll have to look it up, I hardly know js) as some bots might just have them hard coded. Hopefully all these little things help.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
I don't see the point to using two captchas. If the bot can get past the first one but not the second one you would be better off only using the second one during registration. If they can get past the second captcha then chances are they can get past the first one too which makes it only a hassle for legitimate traffic.
 

Cogman

Lifer
Sep 19, 2000
10,286
147
106
If you just do something nonstandard then I think that will foil most bots. (So for example, make an image that says something like "If you are human, say "I like cheese"), vary the phrase to things like"If you aren't a computer", ect. Most captcha crackers aren't able to read, so something else that might be effective is to write short paragraphs like "My brothers name is bob, he has cousin named tim who has a brother named steve. Who is my brother? (Or you could say who is bobs brother, who is steves brother, ect.).

As long as your webpage doesn't get too popular, I doubt you'll get bots cracking it. (Security through obscurity, it does work sometimes :))

Another method to reduce spam at least is giving the registries a grace period before they can post. Maybe even allow them to post be give the message "You can't post for x days" at the top of the posting box so that the bot will think it is posting. (Most bots post once and never post again)
 

Red Squirrel

No Lifer
May 24, 2003
71,312
14,085
126
www.anyf.ca
Yeah in this case I'm pretty much relying on security through obscurity, and then just improve on top of that. I think just the post email validation captcha may throw off bots as they expect to be already registered after clicking the email link. I could even go a step further and make the email send an equation and the user has to enter the answer + the captcha. This is going to anoy people though as I am getting into more complicated stuff.

I also want to enable moderator validation on the first few posts so at least if a spammer does make it through they'll be banned before anybody but me sees the spam. (I usually DDoS those sites or screw around with their servers, if I'm bored)
 

torpid

Lifer
Sep 14, 2003
11,631
11
76
I don't understand how recaptcha providers better security. One of the two words is not even checked for correctness. The entire point of the project is just to apply grid computing / volunteer computing to the problem of OCR accuracy.

Since it is really only checking one of two words, it's therefore just a captcha algorithm that is centralized. Which means that, though it may be a deterrent for a bot trying to hack a multitude of sites, if your site is a being specifically targeted, it would be using a well-known algorithm that they could have plenty of experience with. Just apply OCR to the image. Chances are decent that the word it can't scan isn't being checked. What am I missing? How is it not just another arms race?
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,836
4,816
75
The words on ReCaptcha have already been run though OCR software, and it couldn't figure any of them out. If someone develops OCR software that can read some significant fraction of the words, and the technique gets distributed, ReCaptcha can apply that OCR technique in reading the books, and the words remaining to be sent out will be even harder.

Which is why I say ReCaptcha will work until computers can read as well as humans. When that happens, they'll have no more words to send out, and some other CAPTCHA method will be needed.
 

troytime

Golden Member
Jan 3, 2006
1,996
1
0
most captcha cracking applications are geared toward specific captchas
last i checked, there aren't any recaptcha cracking scripts

change your form and website around and you won't be a target
you're not a target due to traffic, you're a target because you're easy and people have written scripts to attack your platform
 

Red Squirrel

No Lifer
May 24, 2003
71,312
14,085
126
www.anyf.ca
Originally posted by: troytime
most captcha cracking applications are geared toward specific captchas
last i checked, there aren't any recaptcha cracking scripts

change your form and website around and you won't be a target
you're not a target due to traffic, you're a target because you're easy and people have written scripts to attack your platform

Yeah that's probably it. The phpbb captcha would be fairly easy to crack for someone who knows what they're doing. Can probably use an algorhm to make each dot bigger until they all touch, then it's just a block letter format font.

Mine varies in angle, has bunch of lines and other crap thrown in, and there are 3 that show up, and you have to enter the one that looks different, so that's 3 captcha images to read, chances are one may render a bit harder to scan.
 

torpid

Lifer
Sep 14, 2003
11,631
11
76
Originally posted by: Ken g6
The words on ReCaptcha have already been run though OCR software, and it couldn't figure any of them out. If someone develops OCR software that can read some significant fraction of the words, and the technique gets distributed, ReCaptcha can apply that OCR technique in reading the books, and the words remaining to be sent out will be even harder.

Which is why I say ReCaptcha will work until computers can read as well as humans. When that happens, they'll have no more words to send out, and some other CAPTCHA method will be needed.

You don't seem to understand how recaptcha works. It doesn't matter if OCR can figure out the second word because it is not checked for accuracy. You see two words... one word has been successfully scanned with OCR, and the other hasn't. When the user types in the two words, if the one that was scanned is correct, they assume the second is correct and submit it to their central DB.

However, troytime's explanation makes sense, but only until someone wants to target your site...
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,836
4,816
75
No, that's not how ReCaptcha works, though I didn't fully understand it either:

For a new word not tested yet, "The identification performed by each computer program is given a value of 0.5 points, and each interpretation by a human is given a full point. Once a given identification hits 2.5 votes, the word is considered called. Those words that are consistently given a single identity by human judges are recycled as control words." (citation)

So I think it's closer to my explanation than yours.
 

torpid

Lifer
Sep 14, 2003
11,631
11
76
I'm going by the reCaptcha website itself. You might want to read it. It seems to make it even more confusing. I'm assuming word #2 is an OCR'd one but maybe it's a word recognized as valid by human standards from a prior recaptcha... but then how did they get the initial seed words? And couldn't the system be overwhelmed by a brute force "false positive" attack?

http://recaptcha.net/learnmore.html

reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.

But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.