Programming help - need to get data off SEC website

petesamprs · Jan 13, 2006

I'm looking for some help to do the following:
Go to the SEC website everyday, identify any new 10Ks or 10Qs that were released that day, and then output any 10Ks or 10Qs that contain a specific search field I'm looking for.

The SEC website actually has a site dedicated to "latest filings", which is for any new filings for that day. The URLs for identifying 10Ks and 10Qs for any given day are:

10Q:
http://sec.gov/cgi-bin/browse-edgar?com...er=include&count=100&action=getcurrent

10K:
http://sec.gov/cgi-bin/browse-edgar?com...er=include&count=100&action=getcurrent

That's the easy part. I want to see if there is a way to search the retuned filings for some specified text. If it finds it, download the HTML file. Otherwise, skip it and go onto the next one.

My wife, who's a software developer, says it is pretty doable but she's too busy to do it. And she said something about perl. What do you guys think?

xtknight · Jan 13, 2006

Here's what the URL looks like (this particular one is 10-K): http://\sec.gov/cgi-bin/browse-edga...e=10-k&SIC=&State=&CIK=&owner=include&accno=&start=10&count=10

(I put a \ after http: so this forum doesn't shorten that URL.)

Increment start= by count= to get to the next page. So for next page go to start=20. That website supports count=100. So likewise just increase start by 100 until you get a page that no results are found. start=0 is the first page no matter what count= is.

What text do you want to search for? Is the text actually in the body of the individual results themselves or is it on the returned search page (like the title of the items)?

Perl would actually be a good way to do this.

EagleKeeper · Jan 13, 2006

Download the file.

Parse it in your favorite language.

If you do not find what you are looking for, delete it.

petesamprs · Jan 13, 2006

Thanks for the initial comments.

xtknight - good suggestions on the increasing the count. During filing seasons it's possible to have more than 100 10Ks or 10Qs per day. I'm searching for a specific text string within the actual filing (not in the exhibits)....for example, "Issuer Sale of Equity" to identify if the issuer sold equity in that period. The text is actually in the filing (so you have to select "TEXT" and find that string, or select "HTML", select the actual filing, and find that string).

eagle - where can you point me to figure out how to actually MASS download the results and auto-parse them one by one.

EagleKeeper · Jan 13, 2006

Teleport Pro is a utility that I have used for a similar purpose. It can actually build a copy of the Web pages on your drive.

Set it up to extract Web pages to your local hard drive. It starts from a URL that you set and can drill down a set number of links within the site and/or links to additional sites.

Then you need to develop some additional tool that can search for the text that you are looking for.

If you know that the text string will be fixed within a web page (ie. if it exists within a web page, it will be related to what you need), then a simple Windows Search for Text will listall the web pages that contain the text.

There was a guy that wanted horse stats without having to transpose it from paper to a spreadsheet.

We setup the program to download the HTML files, and wrote a small program to extract the relevant data and store it into a database.

kamper · Jan 13, 2006

There are screen scrapers written in many languages so that you don't have to go through the pain of trying to interpret the html yourself. Before starting from scratch, I'd look for one of them.

But before that, I'd check with SEC and see if they offer webservices or something else designed for computer consumption so that you don't have to worry about html at all.

EagleKeeper · Jan 13, 2006

Originally posted by: kamper
... I'd check with SEC and see if they offer webservices or something else designed for computer consumption so that you don't have to worry about html at all.

:thumbsup:

sunase · Jan 19, 2006

I was playing with this in Ruby (similar to Perl) last week just for practice, but with two power outages I didn't get to clean it up and test it until today. ^^ I just return the URL of the

HTML:

 link for matching filings.  That doesn't change day to day so it's just as good as downloading AFAICT.

Apparently there's also an official SEC FTP with EDGAR data files you can access.  I also saw something about an EDGAR search with keywords at NYU, but couldn't find a current URL for that.  Anyway, here's the code in case anyone else is interested in the programmatic solution:

sunase · Jan 19, 2006

Sample output (Merisel had three listings on the EDGAR search results today with three different, but similar text files, no clue why):

Search

Programming help - need to get data off SEC website

petesamprs

Senior member

xtknight

Elite Member

EagleKeeper

Discussion Club Moderator<br>Elite Member

petesamprs

Senior member

EagleKeeper

Discussion Club Moderator<br>Elite Member

kamper

Diamond Member

EagleKeeper

Discussion Club Moderator<br>Elite Member

sunase

Senior member

sunase

Senior member

TRENDING THREADS