How hard is it to write a simple crawler in Java?

enwar3

Golden Member
Jun 26, 2005
1,086
0
0
A startup I want to work at wants me to complete a programming assignment (which seems pretty involved for an interview process). They asked for a simple crawler that makes a query to a search site and returns the number of results and the actual results in an array.

My question is, how long does this take to write in Java? Is this a huge project or is this something I can do in a couple hours? I'm pretty proficient in Java but I haven't done any iostream web stuff.
 

Sea Moose

Diamond Member
May 12, 2009
6,933
7
76
I like people who don't nef in our forum

Markbnj
Programming moderator
 
Last edited by a moderator:

degibson

Golden Member
Mar 21, 2008
1,389
0
0
I agree with szechuanpork. The whole notion of a 'homework assignment' is a bit over the top.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
I've had companies require me to take timed, online tests that took close to an hour, or debug problem code, not to mention multiple interviews that each involved investing at least an hour and sometimes more. I think if a company asked me to submit a small program that might take up to two hours to create I would actually be pleased. I can do at home, I don't need to put on a tie, I know I can do a good job at it.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
... I think if a company asked me to submit a small program that might take up to two hours to create I would actually be pleased...

Quoting Admiral Ackbar, It's a Trap. For two, maybe three, reasons.

1. No question is ever so well asked that there is not wiggle room in a correct answer. Only a complete program spec, covering all situations, is really sufficient. And adherence to a complete spec would take way too long for a take-home interview question. Language? Platform? Language version? Allowed libraries? Environment assumptions? Compiler? Compiler version? Efficiency? Memory usage? I/O requirements? Naming conventions? Program structure? etc.

2. Regardless of the quality of your coding skills, there is going to be something 'wrong' with it. Every company, and consequently, every technical interviewer, has a different coding practice and different coding standards. If the organization has a coding standard, the interviewer is used to that standard. Deviations from that standard automatically seem conspicuous, even if the code is correct.

E.g., How much error checking is expected? Too much and the code looks overly complicated. Too little, and the code looks careless. What is the company culture about this kind of thing? You, the interviewee, don't know, because you don't work for the company...

3. There is no guarantee that whoever reviews your code knows their @$$ from their elbow in the first place. Its very common to find that your interviewer knows less about programming (or whatever the field happens to be) than one's self. However, interviews are commonly organized such that the interviewer is expected to be in the position of superiority, both intellectually and otherwise. When those roles are disturbed, people react in funny ways, e.g., with unjust negative opinions.

This point (#3) really only counts for half credit, because it is not specific to take-home coding assignments -- it can happen in any interview -- but a take-home assignment gives the @$$hat interviewer ammunition.
 
Last edited:

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
The scope would certainly have to be defined, but if you can't get by most of those issues in a discussion of your solution then you probably don't want to work for them anyway.
 

invidia

Platinum Member
Oct 8, 2006
2,151
1
0
I've had companies require me to take timed, online tests that took close to an hour, or debug problem code, not to mention multiple interviews that each involved investing at least an hour and sometimes more. I think if a company asked me to submit a small program that might take up to two hours to create I would actually be pleased. I can do at home, I don't need to put on a tie, I know I can do a good job at it.

Agreed. No pressure with some employer looking over your shoulder when you solve/do programming.

What I don't understand is why I have to go over 4-6 rounds of interviews over a period of several months just to get a software development position. I'm on my 4th round so far for a job I applied back in November. It's really frustrating and nerve racking to keep waiting and waiting to see if you will advance to the "semi finals".
 
Oct 27, 2007
17,009
5
0
My first ever GUI Java program (first ever self-taught programming experience) was a Java image crawler whch I used to collect lots and lots of free porn. It's extraordinarily primitive and very, very poorly coded but you're welcome to look at it if you want.
http://blog.martindoms.com/2009/08/08/image-crawler/

Please don't laugh at my code, I had no idea what I was doing when I wrote that :p
 

wwswimming

Banned
Jan 21, 2006
3,695
1
0
A startup I want to work at wants me to complete a programming assignment (which seems pretty involved for an interview process). They asked for a simple crawler that makes a query to a search site and returns the number of results and the actual results in an array.

My question is, how long does this take to write in Java? Is this a huge project or is this something I can do in a couple hours? I'm pretty proficient in Java but I haven't done any iostream web stuff.

http://www.gotoandlearn.com/

has a similar task, though written in Actionscript as part of the "Flex in a Week" class.

(it's either at gotoandlearn OR Adobe "Flex in a Week" series.)

it queries the Flickr API, and returns a table of search results.

might be worth taking a look at. different code, similar task.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Tip: watch out for loops / cycles in the website.

You need to track all of the URLs you've visited so far, while still allowing for similar links with different parameters, e.g. .../contentserve.php?page=1234 is not .../contentserve.php?page=4567
 

Sureshot324

Diamond Member
Feb 4, 2003
3,370
0
71
I would look at using HTTP GET/POST methods with a HttpURLConnection object to post the search request and then receive the HTML file with the search results. Once you have the HTML file, use an XML parser to parse them (since HTML is basically XML). There's an XML parser in the Java standard library. From there it should be pretty straightforward.

It shouldn't be that hard of a project but I think it would take more than a couple of hours, depending on how familiar you are with this stuff already.