why will voice/face recognition require 10 ghz?

draggoon01 · Aug 14, 2002

every once in a while i come across articles talking about future possibilities when more cpu power is available. almost always voice and face recognition are mentioned. can someone explain why they require so much computing power, and why these tasks are limited by today's cpu?

singh · Aug 14, 2002

Any Links?

IcemanJer · Aug 14, 2002

My guess would be just image processing. When you take a GOOD quality picture in bitmap format, each pixel is worth 24-bits (8 bit for each of the 3 colours), and when you have a picture that's 1200 x 1600, that a sh!t load of data that you have to filter through. And on top of that, you would have to run some algorithm for recognition (like the ratio of distance between facial features) and compare that against a database of images.

Noj · Aug 15, 2002

Computers can do some things that humans can't (like doing the same calculation a millions times in one second) but perform very badly at some things that humans find very easy (like pattern recognition). As a result, it takes a lot of computing power to do things like face recognition quickly enough for it to be usefull.

glugglug · Aug 15, 2002

My old 25MHz 68040 mac (with 55Mhz DSP chip) could do really GOOD voice recognition.

So whoever says it will take 10GHz is full of it...

unless they are talking about doing it in a microsoft .NET based language hehe.

Face recognition is also already being done, without 10GHz processors.

Noj · Aug 15, 2002

Voice recognition is done differently to face recognition, it tends to use complicated heuristics while face recognition is usually done with neural networks.
I know that face recognition canbe done, but what capacity? Does it recognize just one face or hundreds? How quick is it?
I?m not saying that face recognition DOES require a 10Ghz CPU because I really don?t know, but I do know that it?s not running on my PC right now.

Locutus4657 · Aug 15, 2002

What makes you say that about .Net? .Net languages are compiled to machine language just before run time, so it shouldn't be anyslower than anyother language with the exception of added latency before the program actually runs. In theory it might run faster because the code can be optimised for future platforms as they come of age through updated JIT compilers.

Originally posted by: glugglug
My old 25MHz 68040 mac (with 55Mhz DSP chip) could do really GOOD voice recognition.

So whoever says it will take 10GHz is full of it...

unless they are talking about doing it in a microsoft .NET based language hehe.

Face recognition is also already being done, without 10GHz processors.

Locutus4657 · Aug 15, 2002

You don't need a neral network to do face recognition, all you need is a very good edge detection program backed up by a fairly good database program. This can be done on any system to varying degrees of accuracy.

Originally posted by: Noj
Voice recognition is done differently to face recognition, it tends to use complicated heuristics while face recognition is usually done with neural networks.
I know that face recognition canbe done, but what capacity? Does it recognize just one face or hundreds? How quick is it?
I?m not saying that face recognition DOES require a 10Ghz CPU because I really don?t know, but I do know that it?s not running on my PC right now.

Shalmanese · Aug 15, 2002

Well, theres face/voice recogntion and theres GOOD face/voice recognition.

With voice, so far, we can do VERY good recog with known voice, finite vocab and no noise, Suprisingly good with unknown voice, semi-finite vocab and some noise (telephone booking systems) and relativly poorly with unknown voice, infinite vocab and some noise (untrained dictation software).

Face recognition is currently still in the controlled lighting and looking straight at the camera region if you want to get any level of accuracy, however, we can do funky tricks to make accuracy a LOT better in an uncontrolled enviroment. I heard recently that they have managed to encode face data into something as small as 500 bits while still maintaining differentiability.

However, even the best scanners can still be fooled by simply putting a photo or a laptop running a video up to the screen

.

Leafblighter · Aug 16, 2002

As it is right now there are many different systems that are in place in places like airports and at events like the olympics or the super bowl that are there to weed out people in a crowd or search for "terrorists." But the problem with these programs are that they are not the most reliable. It is not the case of taking one picture and simply comparing it to one taken by a security camera to determine whether the two are a match. The computer goes through thousands of computations on different parts of the image and compares that to a massive database of images and information. Often they will detect a match, but that match that is detected is simply a false positive. The only real way for them to work correctly 99% of the time (100% is impossible with current technology) is for a person to be standing still and facing directly towards the camera, and with the computer having a recent picture of themself in the same pose in its database. Something as simple as a fake moustache or a hat can possibly fool most of the programs out there. They also are unable to properly correct for human aging.

On voice communication....it is as easy to fake these as it is to fool the face recognition programs. While one can just hold up a photograph to a camera, a person only has to properly raise or lower one's voice to fool the computer. I remember reading about one promising program that runs the voice and splits it into 30 different frequencies for analysis. While this may sound like good, one must also remember that the human ear can detect upwards of 20,000 different sound frequencies. So there is still a long way to go. Our brains are much more complex than the standard computer program and that's where parallel computing and neural nets come in. We are very complex and scientists are modeling their computer programs to mirror the workings of our own mind. What comes so natural to us is udderly difficult to reproduce in the digital world.

glugglug · Aug 16, 2002

Originally posted by: Locutus4657
What makes you say that about .Net? .Net languages are compiled to machine language just before run time, so it shouldn't be anyslower than anyother language with the exception of added latency before the program actually runs. In theory it might run faster because the code can be optimised for future platforms as they come of age through updated JIT compilers.

.NET code runs in a virtual machine, like Java.

JIT compiling also is just like Java, and it is the only other popular modern language to use garbage collection.

NONE of those characteristics are associated with performance. JIT compiling drastically reduces performance vs. true compiled code, not increases it. Its not like a regular compiler where the code is completely transformed to machine language before you start to run it -- each section of code is translated right before it is needed and cached. That cache WILL run out quickly, and the next time the same code is run it will have to JIT compile it again, reducing performance AND increasing the memory footprint for the caching (making the garbage collection an even more drastically bad design decision).

JIT compiling code written for a virtual machine is a pretty accurate description of how a Mac can run windows with any of the various emulator programs (From the mac's perspective, the PC emulator is a virtual machine running x86 VM code instead of JVM or .NET VM code). Ever hear of a mac user running the PC version of photoshop when they have the native PowerPC version available? They don't because the performance would suck. I don't expect .NET to be any better, in fact I expect it to be worse because more middle level libraries will be "upgraded" to run in the VM instead of native. I fully expect that Windows .NET applications will require an Opteron in order to get the performance of today's applications on a Pentium MMX.

Of course I don't have a copy handy to witness this firsthand (and therefore haven't gone through any license agreement prohibiting any .NET performance info to be released).

glugglug · Aug 16, 2002

2 more things:

Voice recognition is about to become A LOT more common. There is a standard being worked out right now called HTML+V (HTML + voice) which will probably be in the next major revisions of each web browser. Meaning web pages will no longer require clicking.

On the virtual machine JIT performance (.NET) thing.... even some implementations of java let you just compile to native Win32 code to get around the terrible performance. .NET does not.

DRGrim · Aug 16, 2002

Originally posted by: glugglug

NONE of those characteristics are associated with performance. JIT compiling drastically reduces performance vs. true compiled code, not increases it. Its not like a regular compiler where the code is completely transformed to machine language before you start to run it -- each section of code is translated right before it is needed and cached. That cache WILL run out quickly, and the next time the same code is run it will have to JIT compile it again, reducing performance AND increasing the memory footprint for the caching(making the garbage collection an even more drastically bad design decision).

Not true. From here:

When running the executable, the CLR uses Just-In-Time compilation. As each method within the executable gets called, it gets compiled to native code; subsequent calls to the same method don't have to undergo the same compilation, so the overhead is only incurred once.

Fatt · Aug 20, 2002

My opinion, for what it's worth is that comples computing problems like this aren't going to be brute forced by fast processors.

Not that you couldn't but most likely you wouldn't. Much better to use multiple processors. I almost said "symetrical Multi Processing" but who says that symetrical is always the best?

The fundamental concept is that if you have software that can break down a problem into many parts and farm it out to individual processors then you can really take advantage of a CPUs ability to do a lot of calculations really fast, even though they're really dumb.

So essentially, face recognition is a software issue, not a hardware issue.

That is, assuming I have the slightest idea what I'm talking about.

jhu · Aug 20, 2002

neural nets should work pretty well for voice and face recognition.

RedBeard0531 · Aug 20, 2002

plz help this (<-------) idiot out and expaine what a neural net is. Isnt that what was in the erminator. I think they said it was a machine that leans, but that was just a movie.

jhu · Aug 21, 2002

neural nets are just a bunch of nodes connected to each other whereby nodes can excite or inhibit the firing of other nodes given certain input. node excitation and inhibition can eventually give rise to output. that's basically what your whole nervous system does:
1) you see something
2) signals travel to occipital lobe where neurons are excited or inhibited depending on their triggers (eg, movement, horizontal line, vertical line, etc.)
3) output of occipital lobe neurons excites or inhibits frontal lobe neurons eventually giving rise to output depending on what the input was and what part of the body the output signals go to (eg, input=visual of a tiger lunging at you; output=you crapping your pants).

so for visual or voice recognition we don't necessarily need fast, complex computers. the neurons in our brain only have a speed of about 30Hz if not less. and pretty much all they do is send and receive excititory signals, inhibitory signals, or both.

Binarydigit313 · Aug 22, 2002

The reason why you need such high speeds on a computer to do that is to get the computer to be able to understand what your saying with out going to fast. Now say your talking to the computer now and you say a sentence really fast, well its gonna try so hard just to understand what you are saying and its gonna come out all messed up. Now everyone doesn't have the same voice and all voices are different. That right there takes up tons of computing power to differ it. ONe person said that computer can add millions of things way faster then a person, well yes that is true but only resaon a human can't process it as fast is cause of "human thought". Human thought is what slows down a person because they sit there and wonder how they are gonna do that. Now human thought alone could kill any computer out there. Our imagination has so much computing power. You close your eyes and think of a whole area instantly with whatever you want. Now for a computer to do that at the quality your thinking of it would take forever, because it has to go threw all that crap of making the landscapes, etc. Now for face regition computers need to determine the face and differ which person it is. RIght now it be like talking to a really mentally slow person and waiting for them to answer. Now one person said you can have good facial recognition programm but still it would be way to complex to be processed on a computer right now. Even if it did process it and understand who ya were. ALl we need ot do is change that way face looks like making retarded face or something then its thrown off and its computing like crazy to see who it is. Now when you get faster and faster to speeds of 10Ghz or more then you can ahve very complex programs which will differ stuff very easily because it has enough computing power to run it.

Now one way to see how fast a brain is is by just going into a city nad looking around at everything. You will never see Skip. If you do then something is really wrong with you. But just imagine processing everything like that on computer that the eye does normally. It be impossible right now. When you look threw eye you don't see jagded edges or nothing and amazing realistic colors. Plus its running real time which computers right now can not do even on computer graphics. So they're are reasons for this speed.

KillerCow · Aug 22, 2002

why will voice/face recognition require 10 ghz?

I don't want to start a flame war ro anything, but I don't think that more Mhz will solve the problems in these technologies. Whenever the media reviews these things, they always say the "current computers aren't fast enough" and "more speed is needed to increase accuracy" -- these statements are just cop-outs. If you look back... to the 25Mhz erra, you will see the same statements: "we need 100Mhz to make this work right". Then in the 100Mhz erra: "we need 1Ghz". These technologies are always two speed generations away.

Computers are giant calculators. They can do what they are told very well. That is, if we can make a workable algorithm, a computer can follow it perfectly. But we havn't made good algorithms for these problems yet. I remember IBM's via Voice product shipping way back when... it did it's job reasonably well, on a 100Mhz machine. We now have 2Ghz machines and nothing has improved. That is because the algorithm is the same... so the porduct still makes the same mistakes... only faster. We can, of course, use the increase in speed to make the algorithms more complex, but then we encounter another fundamental flaw.

Humans are extreamely good at pattern recognition and matching. Our evolution has allowed us to quickly recognise words and faces. I don't know if you have noticed this, but when people speak, they make a constant stream of sylables. There are no pauses between the words. Peopletalklikethis. This can be demonstrated when you listen to someone speak in a language that you don't know -- the more foriegn the better. When I hear Cantonese, it is just a constant stream of sounds. The only reason you can understand english is because your brain is good a pettern recognition. It can instantly put separators between the sounds that make words that you know. When I listen to Japanese, it is also a constream stream of sounds... except for the words that i know: ogawa, hie, kaijo, ich, ni, nani, and a few others... everything else is just a blur. Then there is the problem of words that sound the same but mean different things. To a computer "recognizing speech" is alot like "wreaking a nice beach".

Facial recognition is even worse. The current algorithms measure distances between related facial features. But these features change. If I smile, the side of my lips are farther apart. If I'm tired, my eyes narrow. If you look at me from an angle... all of the ratios change. If the algoritm is strict, there will be many false negatives. If the algorithm makes allowences for variations, there will be very many false positives. Iris and fingerprint matching are much easier, since the ratios can't change.

The problem is that these problems don't easily translate into bits that can be filtered. It's not like subtracting 2 numbers and branching if the result is 0. They are inherently more complex and seemingling random to a computer. We need to develop better techniques to solve the problem, rather than rely on brute force.

Just my 2 cents, take it as you will.

jhu · Aug 22, 2002

it's all about neural nets

Maverick2002 · Aug 23, 2002

It depends on what voice/face recognition is used for. If it's for disability aid purposes, well, software that's out there now works quite well (Dragon Naturally Speaking for instance).

But biometric authentication comes to mind. Unless some genius figures out some very impressive method, there are always going to be pretty simple ways of getting around it. Not only is the machine going to error a LOT (as people mentioned here with varying face ratios, etc), but things like fingerprints and iris patterns are incredibly easy to duplicate (here's a relatively easy read). Face recognition is going to be quite a challenge.

why will voice/face recognition require 10 ghz?

Senior member

Golden Member

Diamond Member

Member

Diamond Member

Member

Senior member

Senior member

Platinum Member

Member

Diamond Member

Diamond Member

Senior member

Senior member

Lifer

Senior member

Lifer

Member

Member

Lifer

Diamond Member