I don't know why you bring up x87 when you mentioned SSE2 vs AVX2. I'm still waiting to hear the cases where AVX2 can give you a 20x speedup over General purpose x86 and 8x over SSE2.
32-bit to 256-bit (potentially) is an 8x (for 64-bit ints, only 4x). Haswell+AVX2 offers a 256-bit integer multiply, add, shift, etc., which standard x86 does not have. Then there's a more nebulous potential from gather, which will still need some tuning (but with sufficiently high-level code in languages like C++, maybe not much), but not require either several loads or packed operands, ideally (FI, load several red values from an array of pixels as packed RGB values, with a single instruction, which should also mean only a single load per cache line). Also, FMA can offer a 2x speedup, to any code currently doing FMUL+FADD, and reduce register writes and reads in the process.
Add to that that some of the speedups are Haswell v. IVB, not merely AVX2. FI, HSW has 2 FMAs, allowing 2x w/ AVX for FADD or FMUL, or 4x v. AVX for FMA cases, on top of AVX2 itself as an ISA extension, whereas IVB has 1 FMUL and 1 FADD, and no FMA. I'm not sure about scheduling with SSE, though, so it may only offer the real benefits for AVX and AVX2.
For code that already works with SSE2 or AVX, there may be a very low upper limit (2x-4x, plus some for gather, just improvements from gather, or nothing). For code that could be vectorized in theory, and works well on other ISAs that way, but hasn't been on x86, it depends on how pessimisal the scalar x86 version is, in comparison to how efficiently an AVX2 implementation can be, and/or how easily it can be made to use AVX2 well. With OOOE and speculative loads and stores, pinning a single broad value down is all but impossible--"at least," and, "up to," are good as it will get until the chips become widespread, and applications can be tested.
x87 and SSE2, because SSE2 is often faster for scalar FP than x87, due to x87 being based off a stack (indexable, but stack nonetheless), while SSE has named registers. Binaries made with MS' compiler may use both, for scalar FP arithmetic, FI.