• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

JSON feed / most recently added doc in a CouchDB using PHP?

scootermaster

Platinum Member
So, here's the drill:

I have this URL that gives me JSON, but it's not guaranteed to be new JSON. I want to grab it and stick it in a CouchDB. Easy enough. But I obviously don't want duplicates.

I've looked at a few posts on StackExchange, but I was hoping for a little more clarity.

The JSON feed itself has an [increasing] burnedID (not the CouchDB _id) for each entry, which is handy (although the server's implementation of this is finicky, but that's their issue, not mine). I also looked in to adding a field via microtime(true), which seems to give me a big number that always increases.

But the actual implementation is sort of escaping me. I know I want to create a view -- which I can't seem to find a way of doing in PHP, since they have to be in JS, right? -- that gives me, say, the burnedIDs in decreasing order, look at the last one, and then only add the [new] incoming JSON if they're greater than that. Or does the view just give me the _ids of the documents in the CouchDB which match, and I then have to use those _ids to get the burnedID, and then check?

Am I overthinking this? Is there an easier way of doing this? I can't imagine I'm the first person to experience this, right?

Any help would be greatly appreciated, and let me know if you need more information.
 
It might help if you explain in more detail which servers are supposed to do what.

Do you have your own "server A" that is fetching JSON data from another "server B", then storing it, in order to buil a page response to send to "browser client C"?
 
It might help if you explain in more detail which servers are supposed to do what.

Do you have your own "server A" that is fetching JSON data from another "server B", then storing it, in order to buil a page response to send to "browser client C"?

No problem. Thanks for the tip.

Server A produces JSON, exposes it via a URL.

My Server, B, queries this URL, grabs JSON, via PHP.

I want to stick this in a CouchDB, also running on B, via PHP.

Eventually, there will be a client, C, that uses a browser to view the JSON stored on B, after B does some searching/sorting/etc. but that's in the future. This will probably be done with DataTables or JqGrid, but again, that's an entirely separate issue.

The issue is that the way A works is it gives you the last, say, 200 entries. If I query it on Monday, and get 200 entries, and then query it Tuesday and get 200 more, it's possible only 50 of those are "new", so I don't want to stick the other 150 that are replicated in the DB. What's the best way of going about this, using PHP and CouchDB/JSON.

Thanks!
 
> The JSON feed itself has an [increasing] burnedID

Does each record in the JSON have an ID or timestamp, or just the set of 200?

If each record has an ID or timetamp there are 2 simple options, do either one before the JSON fetch:

a) query your DB for the MAX (JSON-ID) or newest timestamp. One way to do this for timestamps or some strings is order-by, descending, limit 1

b) keep a second table with just one row where you store the last ID or timestamp that was read in the previous JSON fetch

Given the latest ID or time from a) or b), you do the new fetch and discard any records that are less than or equal to that ID or time. If using b) you then store the new latest ID or timestamp for next time.
 
> The JSON feed itself has an [increasing] burnedID

Does each record in the JSON have an ID or timestamp, or just the set of 200?

If each record has an ID or timetamp there are 2 simple options, do either one before the JSON fetch:

a) query your DB for the MAX (JSON-ID) or newest timestamp. One way to do this for timestamps or some strings is order-by, descending, limit 1

b) keep a second table with just one row where you store the last ID or timestamp that was read in the previous JSON fetch

Given the latest ID or time from a) or b), you do the new fetch and discard any records that are less than or equal to that ID or time. If using b) you then store the new latest ID or timestamp for next time.

Each record in the JSON has an ID provided via the URL server.

Yeah, I think I was going to do a). I just need a little help with the whole view/map/emit thing with CouchDB and PHP. I can handle the JSON fetch, the decode/array convert and whatever other manipulation I need. It's the DB stuff that's confusing me.

I'm looking at some hints provided here: http://stackoverflow.com/questions/2923063/searching-by-key-in-apache-couchdb to give me some background, but any help would be appreciated. Let's say the JSON returned by the URL is like this:

Code:
[
   {
    "AuthorName": "Thomson",
    "CreatedOn": "2013-07-23T22:21:51.822+00:00",
     "SystemID: "101"
  },
  {
    "AuthorName": "Johnson",
    "CreatedOn": "2013-07-23T22:22:51.822+00:00",
     "SystemID: "102"
  }
]

(i.e. that's what it's the CouchDB at this moment)

Any tips on how to set up the view/query for the system ID?

Because once I get that, then I can easily check the records in the incoming JSON stream to see if their SystemID fields are greater, etc.

Thanks!
 
Does it support SQL?

SELECT MAX(SystemID) FROM tablename

-or-

SELECT SystemID
FROM tablename
ORDER BY SystemID DESC
LIMIT 1;
 
Last edited:
Ya, this sounds like a database query issue on A, not a JSON issue. Unless I am missing something. JSON is just a transport mechanism much like XML except alot of languages like PHP handle JSON natively.
 
Does it support SQL?

SELECT MAX(SystemID) FROM tablename

-or-

SELECT SystemID
FROM tablename
ORDER BY SystemID DESC
LIMIT 1;


Ya, this sounds like a database query issue on A, not a JSON issue. Unless I am missing something. JSON is just a transport mechanism much like XML except alot of languages like PHP handle JSON natively.

Uh, yeah, that's my problem.

a). If this was MySQL, then this would be trivial, but it's CouchDB, which has it's own query structure that I can't quite wrap my head around.

b). And yes, the issue isn't JSON itself. I know what JSON does. The issue is that I'm sort of clueless with the CouchDB query -- or they actually call it Map and view and emit and etc. etc. -- structure. I was hoping someone more knowledgeable than I could help out.
 
Last edited:
So your main question is how to query a table in CouchDB, without using SQL?

I can't with help that, but if there is no search similar to what I suggested using SQL, is there a reason not use my method b) -- store the last ID / timestamp to a second table with just 1 row? If there is only 1 row that you keep updating, you just fetch and update that one row over and over.
 
So your main question is how to query a table in CouchDB, without using SQL?

I can't with help that, but if there is no search similar to what I suggested using SQL, is there a reason not use my method b) -- store the last ID / timestamp to a second table with just 1 row? If there is only 1 row that you keep updating, you just fetch and update that one row over and over.

Well, there's a more fundamental issue here, which is that neither I -- nor you apparently, nor the other guy who commented here -- has any idea how to run queries on a CouchDB. Nobody mentioned views and maps/reduce, etc. etc. I was hoping to get some hints in that direction.

Anyway, I sorted it out. I set up a View to do the following:

Code:
function(doc) {
  if (doc.systemID) {
     emit(doc.systemID);

  }
}

Then query the following URL:
Code:
http://myserver.com/my-db/_design/systemID/_view/Get%20systemID?descending=true&limit=1

This returns the largest systemID currently in the database. Well, technically, it returns a JSON document with one record that eventually contains the key matching the largest systemID in the database.

I then query my URL on the server, and iterate through the JSON feed, only adding entries that are greater than this number.

So that's that.

Thanks for all the help.
 
Back
Top