Minimization problem

veri745

Golden Member
Oct 11, 2007
1,163
4
81
I have a minimization problem I need to solve as accurately as possible. I think it's NP-complete, but not entirely sure. I'd like to know if anyone has better ideas for solving it than I have, since my solution is O(2^n) time, and that's not acceptable.

I have n vectors, each of which can be represented with a k-bit binary number. Each vector has a "weight" of w

I need to select vectors such that the result of an AND between all of the vectors meets a maximum density (there is a max number of 1's in the result) while minimizing w.

The ideal solution would be to build a binary tree of the vectors such that each leaf represents one possibility of included/excluded vectors and the sum of w for the included vectors. The problem is that building this binary tree takes O(2^n) time.

Is there some way to combine the density of each vector with its weight to come up with a combined ranking of each vector? Feel free to ask for clarification if the problem is unclear.
 
Sep 29, 2004
18,656
68
91
I ask for clarification because the problem is unclear.

What are you trying to do? Tell us what data you have and what you are trying to do with it. Your explanation made me confused. You're probably trying to do something simple, but I don't get it.

After I press the post button, I will re-read your OP.

You said minimization problem. Then say some big O notation statement. Did you mean optimization?

EDIT:
you said:
I have a minimization problem I need to solve as accurately as possible.
Did you mean:
I have a optimization problem I need to solve as quickly as possible.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
I think the limits need to be better explained. Which is it you need:

a) the highest density that is < maximum and the minimum weight

b) the minimum density and the minimum weight

Also, it would be nice to know if there is a relationship between weight and density.

Building a binary tree always takes a fairly long time. Searching them is fast. Anyway, I'm not sure a binary sort make sense here, given that you want to order on two dimensions. There are structures that are more suited to n-dimensional sorts. BSPs, octrees, and the like. Have you considered anything along those lines?
 

veri745

Golden Member
Oct 11, 2007
1,163
4
81
I'm trying to minimize w.

w is actually time, and the bits in each vector represent "hits" and "misses". Each vector represents a different trial. I'm trying to minimize the time to have "enough" of the targets hit. All of the variables are... well, variable... so I'd like to come up with a general algorithm.

I need to find a solution that is as close to optimal as possible, but it must be able to complete in reasonable time. By "as accurately as possible", I just mean I'd like to get as close to the ideal solution as I can: minimal time for adequate coverage of the targets.
 

CycloWizard

Lifer
Sep 10, 2001
12,348
1
81
This is a common problem in optimization, usually called a knapsack problem. It is generally stated as, "How can I maximize the weight in the knapsack subject to the constraint of the size of the knapsack?" Since you have binary independent variables, there are a few methods you can use to solve the problem. Which is best depends on how accurate the solution needs to be and the size of the problem. You could use a branch and bound method (similar to the binary tree, except you make the search a little smarter by packing higher value items first). This is a deterministic method, which means it will always converge to the exact solution. You could also use a genetic algorithm, which would likely be very fast but is only heuristic (i.e. even if it runs forever, there is no guarantee that it will converge to the exact solution). You could also use simulated annealing and lots of other methods, but this should be enough to point you in the right direction.

As far as combining the density/weight into your cost function, there are a variety of methods you could use for this as well. However, in my experience, the best way to do this is using the Courant-Beltrami penalty function, which is simply added on to the value of the cost function. I have taken an entire course on solving this sort of problem, so if you can't figure it out from this, feel free to ask more specific questions and I'll try to help.
 

veri745

Golden Member
Oct 11, 2007
1,163
4
81
Originally posted by: Markbnj
I think the limits need to be better explained. Which is it you need:

a) the highest density that is < maximum and the minimum weight

b) the minimum density and the minimum weight

Also, it would be nice to know if there is a relationship between weight and density.

Building a binary tree always takes a fairly long time. It's searching them that is fast.

Looking for the minimum weight that is in the density tolerance.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Originally posted by: veri745
Originally posted by: Markbnj
I think the limits need to be better explained. Which is it you need:

a) the highest density that is < maximum and the minimum weight

b) the minimum density and the minimum weight

Also, it would be nice to know if there is a relationship between weight and density.

Building a binary tree always takes a fairly long time. It's searching them that is fast.

Looking for the minimum weight that is in the density tolerance.

You beat my edit to the post button :).
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,697
4,658
75
Given what you've said, the solution would seem to be to select only the vectors with acceptable bit density, AND none of them with any others, and select the one with the smallest w.

I suspect there is a constraint on having some minimum number of vectors ANDed together that you haven't told us about, or that I missed.

Edit: Did you mean OR them together, so you get complete coverage?
 

veri745

Golden Member
Oct 11, 2007
1,163
4
81
"coverage" is actually being represented here by the 0's, so complete coverage would be a zero vector. There is no "minimum number of vectors", but it is assumed that each vector will be relatively dense (little coverage)

The vectors will be quite large, bit widths most like in the high thousands or approching tens of thousands.

n, the number of vectors, is expected to be in the range of the hundreds.

Thanks, CycleWizard, for your input. I was at a loss for good terms to search, and that should get me started.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Try a greedy algorithm:
1) Start with a vector of all 1's.
2) Rank all remaining vectors by added coverage / weight.
3) Add the vector with the best added coverage / weight ratio.
4) Repeat 2 & 3 until you have 'enough' coverage.

Edit: Make sure you're using your native hardware's word length appropriately. vector<bool> won't cut it.
 

CycloWizard

Lifer
Sep 10, 2001
12,348
1
81
Originally posted by: veri745
"coverage" is actually being represented here by the 0's, so complete coverage would be a zero vector. There is no "minimum number of vectors", but it is assumed that each vector will be relatively dense (little coverage)

The vectors will be quite large, bit widths most like in the high thousands or approching tens of thousands.

n, the number of vectors, is expected to be in the range of the hundreds.

Thanks, CycleWizard, for your input. I was at a loss for good terms to search, and that should get me started.
No problem. I have written pretty efficient functions to do these sorts of optimization in MATLAB if you want to take a look at them. My genetic algorithm is very random and not very good, but it's very straightforward and you would at least be able to see the basic ideas involved. The other methods are much simpler and are pretty efficient, at least by my standards, but I have no formal training in computer science, so I could be wrong. :p PM me your e-mail address if you want them.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,697
4,658
75
It looks like your problem is a weighted (meaning each vector has a different cost), partial (meaning you don't have to cover all the bits) set cover problem.

Do you really, really need the best answer, or will a good enough answer do? I assume that a good answer will do, considering that you're optimizing for time, and that finding the best answer would likely take more time than the time it would optimize away.

Degibson's greedy algorithm will give you a good answer quickly. It may not be the best answer, but it's at least a good starting point. Maybe some math genius can go over this pdf and figure out what it means WRT approximation potentials. It looks to me like (1) greedy is likely to take at least twice as long as the most optimal set covering (but I don't know what that H means), (2) it's hard to find a better approximation, and (3) there is a limit to how good approximations can be.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Bottom line: OP thinks it might be NP complete, and as I was reading his problem, the little part of my brain that fears NP problems was chirping like mad. So, I am leaning toward calling this problem NP, too. CycloWizard points out that this problem is very close to the Knapsack problem, which is NP. If we're right, then optimization in some form, be it Simulated Annealing, Genetic, Greedy, Random Search, is the only choice short of exhaustive search.
 

blahblah99

Platinum Member
Oct 10, 2000
2,689
0
0
I dealt with a similar problem. Was an NP-complete. Ended up programming the algorithm based on neural networks with a LOT of training based on actual experiences.

You might want to look into doing a genetic algorithm if computational time isn't a factor.