• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Panning and zoom detection algorithm

Greetings. There's a programming problem that I've been puzzling over for some time now, and I'm hoping to get some advice. Here's what I'm trying to do.

Many times in video files, especially in anime, the camera zooms or pans along a still image. What I need to accomplish is to write a program that will ascertain exactly how far the camera has panned/zoomed between one frame and another. In other words, given a later frame, I need to find out how many pixels I would have to shift over and to what percentage I would have to zoom in order to get the original frame.

I realize that despite how simple the problem sounds and how easy it is for humans to do, coding this will be no small task. From what I've researched so far, I gathered that what I want to do is similar to the motion detection algorithms used in video compression. But I could really use some advice and a point in the right direction. Can anyone give me any advice on this or tell me where I might find some help?
 
It doesn't sound even a little bit simple 🙂. You're talking about detecting the physical parameters of the virtual world the scene was drawn in, and then estimating where in that virtual world the camera is positioned at any given time by detecting changes in the displayed images. Even assuming that you can ignore cuts to a new scene, and make some assumptions about the scale of some easily recognizeable figure (like a character), what you propose is in the category of "damn hard to do."

Assuming a very constrained problem, i.e. a smooth pan from scene a to scene b with recognizeable features persistent in both, and no zooming, you essentially need to detect an object, assign it some dimensions based on scale assumptions or comparison to another object, and then track it from point a to point b and measure its movement. Dealing with zooms would be an extension of this in which you measure the object's scale factor relative to its dimensions when it appeared in the scene.

If there are any existing implementations, and I don't doubt there are, they probably live in expensive image/video analysis suites.

The application has a big impact too. If you just need to make this measurement once, interactively, then it is a lot easier than trying to do it at runtime w/o human intervention.
 
I thought about having the user define a reference point at the start and end of the clip, but that would only work for smooth, linear pans. One of the main applications I'd like the program to be able to handle is a scene where the camera shakes or vibrates in random directions (as often happens in anime). Obviously asking the user to define a reference point for each frame would be a situation I would like to avoid if possible.

I get that it's not going to be a walk in the park, but I'd really like to give it a shot. Are there any open source (and preferably somewhat simple) projects that deal with things remotely similar to this type of problem?
 
Back
Top