Midis are pretty basic and are fairly standard too. There are two different kinds of audio files really, ones that actually store waveform data(uncompressed like a .wav, or compressed like .mp3) and ones that store instructions on how to create the waveforms(like midis).
So given that, midis won't be able to reproduce music anywhere near the capacity of what a waveform file can store. It's also obvious from the file size. A typical mp3 file of an average song from a CD could be anywhere from 3-10MB, depending on quality settings. Midis are usually 2 orders of magnitude smaller, which is needed given your constraint size.
All that being said, I'd suggest a simple custom file format. All you need in order to play music is a sequence of frequencies and durations, assuming you only have instrument playing and only need 1 tone played at the same time. You've already stated that you have a working device based on that, so the next step would be to allow multiple instruments to be played at the same time. Depending on your device(assuming it's hardware based) you would be limited to whatever the audio processor on the board can do. You can see where I'm going here, it quickly gets complex. In the end, coding a midi file interpreter will be complex, but so will writing a custom format.
My final advice would be to lay out exactly what features you want from the player, and determine whether or not it will be easier to write something from scratch or whether or not it will be easier to write a simple midi file parser.
Another note is that most older phones, and still most newer phones use midi files for their ringtones.