ColdFusion & Lucene
by Simon. Average Reading Time: about 2 minutes.
One of the many reasons to use ColdFusion MX is that it has a large, standard toolset that enables the creation of full-featured, dynamic Web applications. The tag-based language makes it relatively simple to query a relational database and send e-mail. In a similar way, you can create and search Verity full-text indexes.
However, there are situations where you cannot use the full-text searching capabilities of Verity. For example, Verity only runs on Windows, Linux and Solaris, therefore the ability to run ColdFusion MX on the Apple OS X operating system, whilst advantageuos to developers who code on the Apple platform, does not include the ability to use Verity. Furthermore, programmers who work in a hybrid J2EE/ColdFusion MX environment cannot natively use the Verity search capabilities in the J2EE environment. Finally, programmers who need customized searching and indexing capabilities may find the standard Verity integration limiting. There are work-arounds include installing Verity on a Windows, Linux, or Solaris server and configuring your ColdFusion server to use the remote Verity server, however these may not only be impractical, but cost-prohibitative.
Enter Lucene, an open source full-text searching framework from the Apache Jakarta project, which, when combined with ColdFusion MX, can be run on Apple OS X, can be programmatically accessed by both J2EE and ColdFusion MX developers, and can be fully customized and extended.Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Apache Lucene is an open source project available for free download.
Features
- Lucene offers powerful features through a simple API.
Scalable, High-Performance Indexing
- Over 20MB/minute on Pentium M 1.5GHz
- Small RAM requirements — only 1MB heap
- Incremental indexing as fast as batch indexing
- Index size roughly 20-30% the size of text indexed
Powerful, Accurate and Efficient Search Algorithms
- Ranked searching — best results returned first
- Many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
- Fielded searching (e.g., title, author, contents)
- Date-range searching
- Sorting by any field
- Multiple-index searching with merged results
- Allows simultaneous update and searching
ColdFusion & Lucene Implementations
If you don’t fancy attempting the task of writing your own ColdFusion implementation of Lucene, below are a couple of projects that will give you a kick-start along the road to indexing database content. With the addition of other Java projects such as PDFBox the textual content of a pdf can also be extracted and indexed.
Aaron Johnson
Inspired by Lindex, Aaron Johnson has created a CFX Tag called CFX_Lucene that closely mimics the ColdFusion cfsearch tag, but uses Lucene rather than Verity.
http://cephas.net/blog/lucene/index.html
CFLucene
CFLucene is an open source project that attempts to provide developers an easy way to integrate the indexing and searching functions of the Apache Lucene Java library with a ColdFusion web application. The CFLucene is a collection of ColdFusion Components that natively call the Lucene Java classes to index and search any sort of textual data.