API integration - invalid XML and hex characters

etherealfocus

Senior member
Jun 2, 2009
488
13
81
We hired a programmer to do a BigCommerce API integration for us at work (importing a vendor's products into BC). He's telling us that he's having trouble with the vendor's API returning invalid XML and hex characters (I believe the hex for a space is the primary offender). I asked the vendor and they said it should be easy enough to write the program so it just ignores those problems and keeps running; the programmer says that causes other problems and screws up product data.

The latest response I got from our vendor is the following:

"I can help with questions directly regarding our web services, but I will be unable to provide programming support. There are simply too many types of programming languages out there for us to try and support them. If your programmer still has questions in regards to our web services (how to get this information? or what web service provides what information?).

Even through it’s our policy to not provide programming support, I would still definitely help if I knew the answer. I am not a programmer, so unfortunately I don’t know how to get around invalid characters.

Our web services return raw XML with no formatting. This means that whatever program your programmer is using to receive that XML is what is throwing the errors. Whether it’s a custom or 3rd party software, this is what needs to be corrected to avoid the stops thrown by the invalid characters."

Our programmer is unfortunately in New York and currently out of commission with the hurricane, so I'm trying to have a solution ready for him whenever he's back on the radar. Unfortunately I'm not a programmer either so I'm having trouble figuring out who's actually correct here.

The vendor has a fair point that their API is used by lots of companies and apparently nobody else is complaining, so I'm thinking the probable issue is that our programmer just isn't skilled at dealing with this kind of problem.

Can you guys shed some light on this? Is there a reasonable solution I can pass to our programmer?
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
In valid XML some characters within tags are supposed to be escaped, < > & are the main ones.

There is also the question of the character encoding, usually either ASCII / ISO (8-bit) or UTF-8 (multibyte)

The questions are
- is the raw XML that they return valid or not?
- if it's not valid, can you show them a specific example and get them to fix it?
- (if they won't fix it, can you work around the bugs?)
- if it is valid, can you fix your own code to deal with it?

If this was my project the first step would be to save out one of their "bad" responses to a log file, as the raw response before any parsing by your code.

Open this in other tools, like IE's XML viewer or anything else that can show XML errors. Does it pass that check?
 

etherealfocus

Senior member
Jun 2, 2009
488
13
81
Thanks Dave. To answer your questions:

-I don't know the encoding but I'll send a note to the vendor and ask.
-The vendor acknowledges that their XML is invalid. They might be willing to fix the invalid hex characters but they won't fix the invalid XML. Their take is that tons of other companies are using it and we're the only ones complaining. They think it should be easy to just work around it. I assume it's possible to work around it since apparently everyone else is, but I don't think our programmer is horribly experienced at dealing with invalid XML and is having some trouble with it.

We've already sent in a couple examples of invalid stuff. I believe they fixed some of the bad hex characters but again they're not gonna deal with the XML. They've got around 30k products - I guess their rationale is that validating everything would be too much work? I dunno.

Also, not sure what this means, but the vendor recommended that we use html encoding instead of url encoding. Or was it the programmer asking about that? Can't remember, but maybe that'll help?

Thanks again!
 

DannyBoy

Diamond Member
Nov 27, 2002
8,820
2
81
www.danj.me
Thanks Dave. To answer your questions:

-I don't know the encoding but I'll send a note to the vendor and ask.
-The vendor acknowledges that their XML is invalid. They might be willing to fix the invalid hex characters but they won't fix the invalid XML. Their take is that tons of other companies are using it and we're the only ones complaining. They think it should be easy to just work around it. I assume it's possible to work around it since apparently everyone else is, but I don't think our programmer is horribly experienced at dealing with invalid XML and is having some trouble with it.

We've already sent in a couple examples of invalid stuff. I believe they fixed some of the bad hex characters but again they're not gonna deal with the XML. They've got around 30k products - I guess their rationale is that validating everything would be too much work? I dunno.

Also, not sure what this means, but the vendor recommended that we use html encoding instead of url encoding. Or was it the programmer asking about that? Can't remember, but maybe that'll help?

Thanks again!

Can you post or DM some examples of this invalid XML data?

If you want pass it through an xmllint tool and post the error messages / references you get instead of posting the whole file.
 

etherealfocus

Senior member
Jun 2, 2009
488
13
81
No prob. I just asked our programmer to send us the unfinished app for testing purposes. Unfortunately he's in NY and got nailed by the hurricane, but he's gonna try and get it taken care of today.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
We've already sent in a couple examples of invalid stuff. I believe they fixed some of the bad hex characters but again they're not gonna deal with the XML. They've got around 30k products - I guess their rationale is that validating everything would be too much work? I dunno.

Making changes could possibly break the fix-ups that other partners have in their code, so I understand that.

Also, not sure what this means, but the vendor recommended that we use html encoding instead of url encoding. Or was it the programmer asking about that? Can't remember, but maybe that'll help?

HTML escape = &#xxx; (xxx = usually a decimal character code, can be >255 for multibyte characters) Urlencode = %ff (hex code, usually 2 digit, such as %2B = { )

Urlencode normally uses a + for spaces but %20 is also valid.

If your developer is using an XML library and/or SOAP library to work with the API, they might need to write extra post-processing and pre-processing code to fix up what is send and received.

If they don't know how to do that and the API is small, it might be easier to bypass the SOAP or XML libraries and write custom code to create requests and parse responses.

You can create XML or SOAP messages without using a canned library, using standard code to build text strings.