Whatterz


ColdFusion & Lucene

by Simon. Average Reading Time: about 2 minutes.

One of the many reasons to use ColdFusion MX is that it has a large, standard toolset that enables the creation of full-featured, dynamic Web applications. The tag-based language makes it relatively simple to query a relational database and send e-mail. In a similar way, you can create and search Verity full-text indexes.

However, there are situations where you cannot use the full-text searching capabilities of Verity. For example, Verity only runs on Windows, Linux and Solaris, therefore the ability to run ColdFusion MX on the Apple OS X operating system, whilst advantageuos to developers who code on the Apple platform, does not include the ability to use Verity. Furthermore, programmers who work in a hybrid J2EE/ColdFusion MX environment cannot natively use the Verity search capabilities in the J2EE environment. Finally, programmers who need customized searching and indexing capabilities may find the standard Verity integration limiting. There are work-arounds include installing Verity on a Windows, Linux, or Solaris server and configuring your ColdFusion server to use the remote Verity server, however these may not only be impractical, but cost-prohibitative.

Enter Lucene, an open source full-text searching framework from the Apache Jakarta project, which, when combined with ColdFusion MX, can be run on Apple OS X, can be programmatically accessed by both J2EE and ColdFusion MX developers, and can be fully customized and extended.Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Apache Lucene is an open source project available for free download.

Features

  • Lucene offers powerful features through a simple API.

Scalable, High-Performance Indexing

  • Over 20MB/minute on Pentium M 1.5GHz
  • Small RAM requirements — only 1MB heap
  • Incremental indexing as fast as batch indexing
  • Index size roughly 20-30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

  • Ranked searching — best results returned first
  • Many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
  • Fielded searching (e.g., title, author, contents)
  • Date-range searching
  • Sorting by any field
  • Multiple-index searching with merged results
  • Allows simultaneous update and searching

ColdFusion & Lucene Implementations

If you don’t fancy attempting the task of writing your own ColdFusion implementation of Lucene, below are a couple of projects that will give you a kick-start along the road to indexing database content. With the addition of other Java projects such as PDFBox the textual content of a pdf can also be extracted and indexed.

Aaron Johnson

Inspired by Lindex, Aaron Johnson has created a CFX Tag called CFX_Lucene that closely mimics the ColdFusion cfsearch tag, but uses Lucene rather than Verity.

http://cephas.net/blog/lucene/index.html

CFLucene

CFLucene is an open source project that attempts to provide developers an easy way to integrate the indexing and searching functions of the Apache Lucene Java library with a ColdFusion web application. The CFLucene is a collection of ColdFusion Components that natively call the Lucene Java classes to index and search any sort of textual data.

http://www.cflucene.org/

This article has been tagged

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Other articles I recommend

New Atlanta Open-Sources BlueDragon

New Atlanta is announcing today, at CFUnited Europe – a ColdFusion Markup Language (CFML) technology conference in London, U.K.- that they will be creating and distributing a free open-source Java Platform, Enterprise Edition (Java EE) version of BlueDragon, their ColdFusion-compatible web application server.

Setting up PHP on Mac OSX 10.6 (Snow Leopard)

Since Apple launched Mac OS X 10.5 (Leopard), PHP has been installed by default, albeit disabled. Here is a quick run through of what you need to do to get it up and running.

ColdFusion Becomes a Teenager

Today ColdFusion moved into the next stage of its life and became a teenager, hopefully not a precocious one!

  • Definitely saving this bookmark to my del.icio.us. I’m at the point where I’m going to be considering which search technology to use with my CF app. The record limit that comes along with CF’s Verity always makes me nervous. (Isn’t it something like 100,000 records on Enterprise?)

    Are there any other CF+Lucene projects that have surfaced since you wrote this a year ago?

  • If I’m not mistaken ColdFusion Standard allows for 100k records, whilst ColdFusion Enterprise is 250k. Whatever the number I believe Lucene is a better bet and indeed I know a number of organisations which have chosen this route over the native verity.

    Unfortunately I haven’t seen any new projects on ColdFusion and Lucene, which may be a call-to-action for me.

  • Jim

    Excuse my ignorance, but is Lucene compatible with IIS (since it is an Apache product)? We are looking for an alternative to Verity, but are tied to IIS….