How Internal Site Search Works
Internal site search is one of those things that seems very simple: type in a term or phrase on a website, and you'll get a list of results of pages on that site that are likely to contain what you're looking for -- very straightforward. Almost every site has search functionality built in.
It's not quite that simple, of course, even though Google makes Internet search look like the easiest thing in the world.
What Do You Mean By Search?
First, there needs to be clarity on what someone means by talking about "search" -- are we talking about search functionality within the website itself (ie, internal site search)? or are we talking about how people might search to find your site (i.e., external search)? There are approaches and techniques to deal with each type of search, and there is certainly some overlap between them, but they are very different issues. The information here is focused on internal site search.
Where Does That Search Result Come From?
It's important to understand where your search results are really coming from. Most searches are not done in real-time (that is, actually searching your website content at the time you put in your search term), but rather are running against a search index of some kind. A search index is a pre-defined, static pool of possible results. The value of a search index is that all the hard work of filtering for possible relevant terms, weighting relevancy, and optimization have already been done -- so when a site visitor actually performs a search, it's usually pretty fast and pretty on-target for what that person was looking for. The downside of a search index is that it's a picture of your content at a particular time, so if that content changes (including adding content or deleting content) but the index isn't updated, people won't find that updated content.
There are some specialized searches that are done in real-time. For instance, many of Duo's legal sites have specialized attorney searches that filter by office or practice area. Since these searches are running directly against the main content, they are always up-to-date -- but they may not be as fast, and there's no "results weighting" that can be done once the results come back.
Results weighting is also known as relevancy. It is a very important element to your search results. If you search for "dog", you probably want those top results to be really focused on dog information. Search engines generally "guess" at that relevancy through a series of rules or parameters. For instance: is "dog" in the title of the page? Is it a term that's used repeatedly in the body of the content? etc., etc. Very good search engines have been refining those rules for years now, but they rarely expose those rules to the public, since those rules are their competitive advantage.
There are, however, search engines that you can use that will allow you, as the website owner, the ability to tweak those rules to best match your particular content. Obviously the advantage of using one of these engines is that you have the ability to fine-tune your website's search results. The downside is that this does require some effort, and that effort is ongoing -- as your content changes over time, you are likely going to need to periodically revisit the rules you've established in your search engine.
So on a practical level, what are your options for search? Listed below are some of the search options that Duo Consulting uses. This is not, by any means, an exhaustive listing of search options available. It's just meant to provide a sense of the variety of options that exist.
Drupal, another Content Management System, also has search functionality built into it. As with eZ Search, it's relatively simple and basic, although you do have control over how frequently your content is indexed. There are some modules (ie, Porter-Stemmer) that can be used in conjunction with the default search to raise the level of sophistication.
Acquia Drupal (Solr) Search
Acquia is a commercially supported version of Drupal, and they have a search engine which is also based on Solr technology. It can either be used as an externally hosted service (billing is based on the volume of content), or you can use the module built by Acquia and set up a local Solr server. It offers relevance, author filtering (useful for sites using social media), term highlighting, and content recommendations.
Third-party search engines, such as those by Google, can be seamlessly integrated into your website. You can retain the look and feel of your site, while at the same time leveraging the power of a search engine company that has already spent the time and effort to refine the relevancy rules. The search index itself, however, is something that you will only have limited control over in terms of indexing frequency and relevancy weighting.
There are a few things that you can do to guide third-party search engines. You can create sitemaps (essentially an XML map of your content that's easy for search engines to digest), and some third-party search engines do have an on-demand indexing option.
A custom search is a search that's so specific to certain content that it makes the most sense to custom-code it into the website. This often works for simple things (again, the example of filtering attorneys by office and/or practice area), and the search results are always up-to-date -- but a more complex search result requirement begins to creep into re-inventing-the-wheel territory, so this should be used judiciously.
Planning Your Internal Site Search Strategy
So having said all of this, what kind of search will work best for you? If you can answer the following questions, you'll be well on your way to figuring out the best solution to fit your needs.
How much content will I have?
If you have a lot of website content, then you will probably need a more sophisticated search engine for the internal search on your site. If you don't have a lot of content now, then certainly this is something you'd want to evaluate as the site grows -- what works on Day One of the site launching may not be as useful a year later.
How much traffic will I have?
If your site has a lot of traffic, then you'll probably want to consider moving the search traffic off to a server that's separate from your own site, or a third-party search engine. As with content, this is a situation you'll want to periodically re-evaluate.
How frequently will I want that content indexed, so that the search results are freshest?
Instantly? Hourly? daily? weekly? If your content must be 100% fresh 100% of the time, then you've narrowed your choices to search engines that you'll have direct access to controlling the refresh time. That said, this is something to look at carefully -- does it really have to be that closely in sync with real-time content updates?
How much fine-tuning and/or control do I want over the search results?
...and how much effort do you want to put towards that fine-tuning and control? If this is important for you (i.e., you want to make sure that content within an employee's primary biography is weighted more heavily in search results than his or her community activities), then you'll want a search engine that you can have that level of control over. On the other hand, it will mean a commitment on your part to invest in the time to experiment and tweak those settings, and that will probably need to be revisited over time as your content changes as well.
There are still quite a few nuances to talk through beyond this, but tackling these questions at the beginning will certainly get you most of the way there.
What About External Search?
If you are concerned about how public search engines (Google, Yahoo, Bing, etc.) will find your site, then you will be focused on optimizing your site for that: Search Engine Optimization (SEO). They key to good SEO for your site is good content, of course, but beyond that, Duo would be happy to speak with you about detailed approaches and strategies for your site.