detecting laser pointer on a projection thru a camera and program...

bignateyk · Feb 13, 2003

Ok, so I have a projector that would be projecting a powerpoint presentation or something similar onto a wall. Above the projector, there is a digital camcorder recording the projection (and the presenters laser pointer). The presenter would be using a laser pointer to point things out on the projector. I want to be able to determine the coordinates of the laser pointer in relation to the original image.

I would have two images to compare (The image being projected vs. the image being recorded. The image being recorded, however, is likely to be blurrier or slightly off center from the actual image)

Here is what i currently have:

EDIT: See lower posts for an update on the algorithm i am currently using

Thanks,

-Nate

Peter · Feb 13, 2003

How 'bout just finding the four corners of the (much brighter than the surrounding) image area, and then finding the absolute brightest red spot in it?

The latter would be the laser pointer, and the corners give you the projection geometry to calculate back to the original image rectangle.

bignateyk · Feb 13, 2003

Heres what i have it doing now...

First, the user will point the laser pointer to a pre defined spot on an image, this will be to determine how much out of alignment the camera is vs. the projection. After that, I go thru the camera image and look to see how much the color is off from the projection. If the RGB values are more than 10 colors off I go thru places around where the color is close to red and adjust it. Then, I detect where on the image a color close to 255,0,0 is (I use >240, <40, <40) to get a color close to it. This makes it go a good bit faster than before because i dont do any pixelating or subtracting images.

AbsolutDealage · Feb 13, 2003

what language are you writing this in?

bignateyk · Feb 13, 2003

visual basic

Shalmanese · Feb 13, 2003

If both the projector and the camera are going to be fixed in place, why dont you make up a series of "test" patterns to sychronise the camera and the projector. Then you can have rough values for the alignment and the brightness and you can work from there.

Peter · Feb 14, 2003

Originally posted by: bignateyk
visual basic

You are not SERIOUSLY writing a digital image processing application in VB? Lesson one for successful home improvement: Pick a suitable tool.

AbsolutDealage · Feb 14, 2003

Originally posted by: bignateyk
visual basic

Get yourself a book on C, at least. There are a million different image processing that are already highly optimized that are written in C. VB is altogether too slow for this type of application.

bignateyk · Feb 14, 2003

I know c++ but not visual c++, which is what would be needed I think...

Smilin · Feb 14, 2003

Hey, he obviously knows VB pretty well. C++ is nifty but why use a sledge when a ball-peen will do?

You said the pixelating takes the longest? Don't pixilate the whole thing. Do a section at a time and search for a match. No match, next section. Put a little intelligence behind it and you might be able to narrow it down to only pixelating a very small section of screen.

bignateyk · Feb 14, 2003

hmm thats a good idea... im not sure why i didnt think of that since im doing the same thing only with the brightness correction. As of now, I dont think it will be necessary to do any pixelating. So far I have been pretty successfull having it recognize the simulated pointer on the one image alone. Right now I am assuming the camera image will be about 320 by 240 pixels or so, and the laser pointer will be about 2-4 pixels, but it might end up being smaller than this which could be a problem...

Current algorithm:

1) The program takes a screenshot of the computer (would be identical to what is being projected), and then resizes it to the same size as the camera image, and uses this as the comparison to the camera images. It always at least does this once for alignment purposes, but it can be done continuously if there is a need to compare the two images more often.

2) There is a circle drawn on the program itself that the user must point the laser pointer at. Eventually there will be more than one circle for more accurate calibration. This is used to calibrate the alignment of the camera and the projection. The program compares where the user actually pointed the laser pointer vs the pre-defined coordinates of the circles. Then it makes a correction to the alignment of the image. Once these two are calibrated, the images will start to feed in.

3) Check the images as to whether they need to be color or brighness corrected. So far, I have determined that it can be about 10 colors in R, G, and B off before it gets innacurate at finding the pointer. If it is more than 10 off, I tell it to correct it. It does not however correct the whole thing. I only have it correct areas where, say, the color values are >180, <60, <60 RGB. This way it only does a small portion of the pixels that could possibly be the laser pointer.

4) After color correction, it scans thru the image looking for a color of >220, <40, <40 RGB. It also check to make sure that the area it finds is within say 3 or 4 pixels. If it is a big area, it is obviously not the laser pointer, and it is just background or other image that is similar in color. Once it finds the laser pointer, the procedure ends, and exits with a laserX and laserY coordinates.

5) Take laserX and laserY and divide then by the size of the camera generated image to figure out a ratio of where they are on the screen. From here, it just multiplies it by the screen resolutions and determines the mouse coordinates

6) Move the mouse to the given coordinates.

7) Repeat at step 3 or 4. It would probably be ok to skip straight to step 4, as all the images from the current feed would need the same brightness correction applied to them.

Current or forseen problems:
- I am not quite sure how big the laser pointer will be on the image, because right now I am just using test images. If the laser pointer were too small, it would not work to just scan one image for the pointer, and I would have to compare both images.

- If the background color of the slide is too close to the laser pointer color, it makes too innacurate to detect.

- Framerate. I will have it reading frames straight from the camera, but since it takes several miliseconds to process each frame, it will not be able to process all the frames (at least using VB). Maybe I will have it do only 1/3 or 1/5 frames.

Thoughts for future:
Mouse stuff:
I will be adding left click, right click, and drawing capabilities. For left click, i was thinking that if the user turned off the laser pointer, and then if they turned it back on within a second or so of within like 10 pixels of where they turned it off, it would register a left click. A right click could be like 2-3 seconds or something. For drawing, i could have a small always on top button that the user could move the laser pointer over, and then it would draw anywhere they moved.

Multiple Users:
Ability to have multiple users or laser pointers acting on the same screen.

Once written in Visual Basic, I can port it to visual C once I learn it.

I welcome any criticisms or suggestions. Thanks,

-Nate

Shalmanese · Feb 14, 2003

Im not an expert but would it be possible to use GL extensions to do some of the graphics manipulation stuff and offload it onto the GPU (Im thinking LOTS of matrix manipulations). Even better if you can find a way to work with Pixel Shaders.

That way, you might be able to greatly improve performance.

bignateyk · Feb 15, 2003

hmm pixel shaders... thats a good idea, and yes i do think it is possible...

msdn

br0wn · Feb 15, 2003

I built a similar system before, this is the paper that I was basing it from:
Smarter Presentation: Exploiting Homography in Camera-Projector System.

bignateyk · Feb 15, 2003

interesting. I still dont understand the equation for mapping the two images together... it was something like

(x,y) = (((p1X + p2Y + p3)/(p7X + p8Y + p9)) , ((p4X, p5Y, p6)/(p7X + p8Y + p9)))

then the further simplified it into a matrix. Anyone care to explain this procedure? I didnt really follow it.

-Nate

Edit: Oh, and also, they mention that teh images that the camera is generating are only 160x120 resolution. Do you think you would actually be able to see the laser pointer in an image of that low size and quality? I'm not sure how they are detecting it...

br0wn · Feb 15, 2003

Originally posted by: bignateyk
interesting. I still dont understand the equation for mapping the two images together... it was something like

(x,y) = (((p1X + p2Y + p3)/(p7X + p8Y + p9)) , ((p4X, p5Y, p6)/(p7X + p8Y + p9)))

then the further simplified it into a matrix. Anyone care to explain this procedure? I didnt really follow it.

That equation is based on plane-to-plane homography.
(X,Y) is the point on one plane (the camera image plane or the image you observed from camera) and
(x, y) is the point on the image plane (original projection slide).
Thus, given four (X, Y) points and its corresponding four (x, y) points in the image plane, you can compute the (3x3) homography matrix.
Given this matrix, you can map any points from (X, Y) to (x, y) and its reverse.

Edit: Oh, and also, they mention that teh images that the camera is generating are only 160x120 resolution. Do you think you would actually be able to see the laser pointer in an image of that low size and quality? I'm not sure how they are detecting it...

Yup, small camera image is sufficient. It is very fast!!!

Even faster if you define in advance where your virtual buttons should be. In my system, I define these buttons in advance, then I can just observe
the regions of these virtual buttons. If the intensity changes (compare with original image) around the buttons, I know some events have happened (button touch by hand, or laser pointer).

bignateyk · Feb 15, 2003

about how many pixels do you think the laser pointer would show up as on an image 160 by 120? maybe 1? or would it be larger?
If this is so, i dont think just scanning the image for certain colors would work anymore.... what method do you use to actually detect the pointer in the camera image? do you compare it to the projected image to see if there is a difference?

also, what do the p1,p2,p3, etc... stand for?

br0wn · Feb 15, 2003

Originally posted by: bignateyk
about how many pixels do you think the laser pointer would show up as on an image 160 by 120? maybe 1? or would it be larger?

Not sure, depends on the angle of the incoming light?

If this is so, i dont think just scanning the image for certain colors would work anymore.... what method do you use to actually detect the pointer in the camera image? do you compare it to the projected image to see if there is a difference?

You can perform color segmentation and background color substraction (the projected image without laser pointer is the background color).

also, what do the p1,p2,p3, etc... stand for?

It is easier to visualize in matrix form. p1, p2 to p9 are the elements in 3x3 homography matrix.
This matrix is also called affine transformation matrix.

It is the same as in computer graphics, you have a transformation matrix (usually 3x3 rotation matrix,
3x1 translation matrix, and 3x3 diagonal scaling matrix).
This 3x3 affine matrix can be thought as the product of the above matrix. Hence, it has combine
rotation, translation and scaling into a single 3x3 matrix.

Thus, given an input point (x, y, 1) (we work in homogeneous coordinate) and 3x3 affine matrix,
you can compute the output point (Xw, Yw, w) (same in homogenous coordinate).
A x = B
where A is 3x3 affine matrix, x is a 3x1 matrix (the point in camera image) and
B is a 3x1 matrix (the point in the projected image).

The goal is to find this 3x3 affine matrix (or you can think of as the transformation from your
camera image plane to the projected wall). Given four points in x and the corresponding
four points in B (four corner points work great), you can find this 3x3 affine matrix via least squares method.
You will get eight equations with nine unknowns (the last unknown in the scaling factor).
You can solve these equations.

Once you've solved the above equations, you will get a 3x3 affine matrix. Now, given any points
in camera image plane, you can always find its corresponding point in the projected wall
(recall that B = Ax) and vice-versa.

bignateyk · Feb 15, 2003

Thanks, that clears it up somewhat... If its not too much to ask, could you put an example on to explain the math of that more? if its too much work then dont do it... it would just be easier to visualise it with a more numerical example. Say i found the pointer at (20,30) of the camera image, then...

Thanks,

-Nate

br0wn · Feb 15, 2003

The steps are as followed:
1. Find four corresponding points (corner points). Lets use the four corner points, and lets say you are
using 501x501 image.
Let the four corner points of the original image defined as
S1 = (0, 0) upper left corner point
S2 = (500, 0) upper right corner point
S3 = (500, 500) lower right corner point
S4 = (0, 500) lower left corner point

The corresponding point in the camera image are:
s1 = (100, 100) (this is the coordinate using camera pixels)
s2 = (200, 100)
s3 = (200, 200)
s4 = (100, 200)

2. Compute the 3x3 homography matrix:
We know that:
A * S = s
where S is in original image coord, and s is in camera coordinate

So form 8*9 matrix (read the paper referred earlier), to solve
p1 to p9 in A.

Let us say the resulting matrix A turns out to be (in fact this might
be the correct resulting matrix):
[ 0.2 0.0 100.0 ]
[ 0.0 0.2 100.0 ]
[ 0.0 0.0 1.0 ]

3. Now, given any point in the original image, I can always predict
where it will be in camera image using the following equation:
A * S = s

That is, given a point (250, 250) in the original image, after
multiplication with matrix A, I will get (150, 150) which is the right
answer (from looking at the points, and intuition).
Try it!

Now you actually want the inverse of A, because you have found the
location of laser pointer in your camera image, and you want to know
what is it correspond to the original image coordinate.
So find inverse(A), then
S = inverse(A) * s

bignateyk · Feb 15, 2003

wow thanks, that helps a ton.

bignateyk · Feb 15, 2003

Originally posted by: br0wn
The steps are as followed:
1. Find four corresponding points (corner points). Lets use the four corner points, and lets say you are
using 501x501 image.
Let the four corner points of the original image defined as
S1 = (0, 0) upper left corner point
S2 = (500, 0) upper right corner point
S3 = (500, 500) lower right corner point
S4 = (0, 500) lower left corner point

The corresponding point in the camera image are:
s1 = (100, 100) (this is the coordinate using camera pixels)
s2 = (200, 100)
s3 = (200, 200)
s4 = (100, 200)

One other thing. Where you say the camera pixels, are you using a camera image that is the same size as the projected image in pixels? And if so, is the reason the points are not at like 0,0 / 100,0 etc... because you are taking it to be a trapezoidal like shape or something non square?

br0wn · Feb 15, 2003

Originally posted by: bignateyk
One other thing. Where you say the camera pixels, are you using a camera image that is the same size as the projected image in pixels? And if so, is the reason the points are not at like 0,0 / 100,0 etc... because you are taking it to be a trapezoidal like shape or something non square?

The camera image can be of size 200 by 200 while the original image can be of size 500 by 500.

I am assuming that your camera can see more than projected image. So the projected image is fully contained inside the camera image, thus it doesn't start
from (0, 0). Also, the projected image will not be square or rectangular unless your camera and projected image is orthogonal to the wall of projection (no distortions
at all).
It will be trapezoidal shape as you have figured out.

also, I am assuming this works both ways right? if i had a point in the camera image (the laser pointer coordinates for example), I would be able to use the matrix to find the coordinates of where it would be on the original?

Yup, works both way or you can just compute the inverse of the matrix.

bignateyk · Feb 15, 2003

edit: ok, i understand how to get to the 8*9 homography matrix now, but what do you do with it? How do you go from the 8*9 matrix to getting a resultant coordinates?

WHat numbers to you put into the 8*9 homography matrix, and then what do you do from there?

even if you put all the X's, x's, Y's, and y's onto the 8*9, what do you do from there? Also, what would the 8 equations be, if they are needed to solve it?

element · Feb 21, 2003

Not sure about that affine matrix mumbo jumbo but I hope you're not testing each pixel, that would be too slow, instead test every other pixel or every 3rd pixel (and every other row or every 3rd row also)

This way you don't need to pixelize the image, just pixelize your pixel test algrorithm, so to speak. Also IIRC opengl has pixel testing commands but I don't think they are accelerated by the T&L engine so you're outta luck there but i hope a modern processor can handle testing a large number of pixels at a decent framerate.

Figure 640x480=307200 pixels to test but if you do every other row and every other column you bring that down to 320x240 or 76,800 pixels to test. Sounds manageable though I don't really know how many you can test per second.

How are you going about testing the pixels though? Just curious.

Anyway, good luck and hope this helps. Let us know how it goes.

detecting laser pointer on a projection thru a camera and program...

Lifer

Elite Member

Lifer

Platinum Member

Lifer

Platinum Member

Elite Member

Platinum Member

Lifer

Diamond Member

Lifer

Platinum Member

Lifer

Senior member

Lifer

Senior member

Lifer

Senior member

Lifer

Senior member

Lifer

Lifer

Senior member

Lifer

Diamond Member