RSS


FC Member Blog

Internal Searching with Open Source Software

BY William MathewsWed Sep 2, 2009 at 2:26 PM
This blog is written by a member of our blogging community and expresses that member's views alone.

The challenge was to find an Open Source Search engine to use internally that was comparable to Google’s Search Appliance. This sounded easy enough but wow was I wrong. There seems to be a few projects that focus on giving you the tools to build your own search engine but very few completed search engines. I wasn’t terribly interested in building my own so using Lucene was kind of out of the question (until I got desperate).

My ticket sat in my queue for quite some time while I worked on other things Nagzilla and IPIntel to name a couple. Instead of following my own rule when doing development (never base today’s decision off of yesterday’s data), I decided to just start building my own search engine for us after I rolled off those projects. I looked again at Lucene and it’s other friends which are almost always written in Java. I have a long documented hate affair with Java so I looked heavily at pylucene. This is essentially a port of Lucene to Python, hence the clever name. I got so far as to set up a directory structure for this new project and started some initial prototyping code, then it hit me! “Hey dummy, go see if anything has changed in the 6 months since the last time you looked!”, I said to myself. So I did what the voice in my head told me and did some, duh, searching.

I came across much of what I had originally until I did some really clever searching and found a relatively new project. Hounder aims to be “a simple and complete search system”, out of the box. For the most part it is. Here is one man’s journey.

After a few bumps and bruises (and several discussions with the developers and a few fixes on their part) I was able to follow their “5 minute tutorial” and get most things going. Overall a pretty positive experience with just a few caveats.

Caveats:
1) Hounder is VERY finicky about networking and hostnames.
2) Hounder requires the running user to have ssh keys and passwordless ssh to every interface. (This may not be 100% accurate but it was on my set up)
3) Hounder has some documentation and the 5 minute tutorial is pretty good.
4) Do NOT use wikipedia as your test site as the 5 minute tutorial suggests. Use something you have access to and have permission to crawl.
5) It’s written in Java but I’m going to let that go, for now.

Good things:
1) Hounder is pretty easy to install/configure once everything is working properly.
2) Hounder is distributable which means you can grow your installation with your needs. This could be THE feature that sets it apart from even Google’s GSA.
3) It seemingly can be used to build out your own search solutions but I’m not going to be starting on that work until this week.
4) It seems to keep its index up to date pretty well and is very quick. I’m not much of a benchmarker but I didn’t have to wait very long to be able to search, so that’s good enough for me.
5) The developers were very quick to respond to questions, fixes, etc. This is the focal point of any great Open Source Project.

Topics:

Innovation, Technology, Leadership, information security, information technology, technology + computers, Computer Technology, Science and Technology, Technology, Software, Search Software


Sign in or register to comment.
or