Programming help - need to get data off SEC website

petesamprs

Senior member
Aug 2, 2003
278
0
76
I'm looking for some help to do the following:
Go to the SEC website everyday, identify any new 10Ks or 10Qs that were released that day, and then output any 10Ks or 10Qs that contain a specific search field I'm looking for.

The SEC website actually has a site dedicated to "latest filings", which is for any new filings for that day. The URLs for identifying 10Ks and 10Qs for any given day are:

10Q:
http://sec.gov/cgi-bin/browse-edgar?com...er=include&count=100&action=getcurrent

10K:
http://sec.gov/cgi-bin/browse-edgar?com...er=include&count=100&action=getcurrent

That's the easy part. I want to see if there is a way to search the retuned filings for some specified text. If it finds it, download the HTML file. Otherwise, skip it and go onto the next one.

My wife, who's a software developer, says it is pretty doable but she's too busy to do it. And she said something about perl. What do you guys think?
 

xtknight

Elite Member
Oct 15, 2004
12,974
0
71
Here's what the URL looks like (this particular one is 10-K): http://\sec.gov/cgi-bin/browse-edga...e=10-k&SIC=&State=&CIK=&owner=include&accno=&start=10&count=10

(I put a \ after http: so this forum doesn't shorten that URL.)

Increment start= by count= to get to the next page. So for next page go to start=20. That website supports count=100. So likewise just increase start by 100 until you get a page that no results are found. start=0 is the first page no matter what count= is.

What text do you want to search for? Is the text actually in the body of the individual results themselves or is it on the returned search page (like the title of the items)?

Perl would actually be a good way to do this.
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
Download the file.

Parse it in your favorite language.

If you do not find what you are looking for, delete it.
 

petesamprs

Senior member
Aug 2, 2003
278
0
76
Thanks for the initial comments.

xtknight - good suggestions on the increasing the count. During filing seasons it's possible to have more than 100 10Ks or 10Qs per day. I'm searching for a specific text string within the actual filing (not in the exhibits)....for example, "Issuer Sale of Equity" to identify if the issuer sold equity in that period. The text is actually in the filing (so you have to select "TEXT" and find that string, or select "HTML", select the actual filing, and find that string).

eagle - where can you point me to figure out how to actually MASS download the results and auto-parse them one by one.
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
Teleport Pro is a utility that I have used for a similar purpose. It can actually build a copy of the Web pages on your drive.

Set it up to extract Web pages to your local hard drive. It starts from a URL that you set and can drill down a set number of links within the site and/or links to additional sites.

Then you need to develop some additional tool that can search for the text that you are looking for.

If you know that the text string will be fixed within a web page (ie. if it exists within a web page, it will be related to what you need), then a simple Windows Search for Text will listall the web pages that contain the text.



There was a guy that wanted horse stats without having to transpose it from paper to a spreadsheet.

We setup the program to download the HTML files, and wrote a small program to extract the relevant data and store it into a database.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
There are screen scrapers written in many languages so that you don't have to go through the pain of trying to interpret the html yourself. Before starting from scratch, I'd look for one of them.

But before that, I'd check with SEC and see if they offer webservices or something else designed for computer consumption so that you don't have to worry about html at all.
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
Originally posted by: kamper
... I'd check with SEC and see if they offer webservices or something else designed for computer consumption so that you don't have to worry about html at all.
:thumbsup:

 

sunase

Senior member
Nov 28, 2002
551
0
0
I was playing with this in Ruby (similar to Perl) last week just for practice, but with two power outages I didn't get to clean it up and test it until today. ^^ I just return the URL of the
HTML:
 link for matching filings.  That doesn't change day to day so it's just as good as downloading AFAICT.

Apparently there's also an official SEC FTP with EDGAR data files you can access.  I also saw something about an EDGAR search with keywords at NYU, but couldn't find a current URL for that.  Anyway, here's the code in case anyone else is interested in the programmatic solution:
 

sunase

Senior member
Nov 28, 2002
551
0
0
Sample output (Merisel had three listings on the EDGAR search results today with three different, but similar text files, no clue why):