I actually said the exact same thing Tuesday at
[H] (can't go back in time)
That actually is an area where neon is a bit better or at least more specifically optimized for rgb. Their vector load interleaves and d-interleaves rgb seamlessly, where sse would require shuffles and blends. They have the equivalent vzip vuzp to pack unpack to alternate data quickly.
Please take other peoples word for the best available information.
I often wonder how arm's 256bit simd will differ from AVX. Laneing has it's advantages as well as it's annoyances. There were a lot of growing pains in the early days of intel simd. ARM will have to go through the same process to reach parity.
Different architectures have different trade offs, which is what most of my arguments against the IPC is greater expand on. However, IMO there is at most a small set of operations a simple efficient core can out perform a monolithic x86.
When full reviews are made, I hope I am pleasantly surprised.