E-Commerce On-Site Search 101: Databases, Technologies, and Data

data center.jpg

Understanding the different types of e-commerce platforms and their various functions can be difficult. One critical area of e-commerce website technology that has evolved significantly over the past several years is on-site search capability.

This brief tutorial will help e-commerce, marketing, and IT professionals better understand the core components of e-commerce systems involved in setting up advanced website search.

1. E-commerce Platform Types

E-commerce platforms manage product catalogs, allow customers to add products to a cart, manage wish lists, facilitate checkout, enable shoppers to select shipping options, process payments with a card, track their order status, and perform other buying-related functions.

Let’s divide these platforms into three major “classes”:

  • Class 1: Enterprise e-commerce — usually a custom buildout for large e-commerce businesses like Amazon, Walmart, Zappos, Target, Staples, etc.
  • Class 2: E-commerce-as-a-service for SMBs. These include platforms like Shopify, Wix, Squarespace, and Yahoo Merchant Solutions (which is extremely outdated and should, by all rights, be dead by now). These platforms target very small stores with a small number of products.
  • Class 3: DIY-ish e-commerce platforms — these include e-commerce systems such as Magento, OpenCart, PrestaShop, and WooCommerce, installed on the company’s own hardware or cloud-hosted. These are highly customizable platforms that one can download, install on a server, and modify until there is nothing left of the original platform (a relatively common scenario).

2. Out-of-box site search on common platforms

An e-commerce/website on-site search function allows visitors to search and filter search results. Obviously, most e-commerce platforms come with built-in search.

  • Enterprise commerce platforms (class 1) have few problems with site search or filtering and handle large catalogs of products.
  • SMB e-commerce-as-a-service platforms (class 2) provide some site search functionality. Their built-in features, such as autocomplete, autosuggestions, and filtering, are lacking or unusable for larger catalogs. These platforms are very cheap, so it would be unfeasible for their providers to include advanced search with their offerings.
  • DIY-ish commerce platforms (class 3) come with similar capabilities as the class 2 systems out-of-box, though they provide a better ability to tweak and optimize the data. To simplify things for developers, these platforms typically use relational databases for data storage.

3. Problems with on-site search when using relational databases (MSSQL, MySQL, PostgreSQL)

Relational databases are well-suited for storing and retrieving exact transactional data, but they lack built-in robust mechanisms for full-text search, fuzzy matching, and suggestions.

E-commerce website search systems can be built with relational databases, but they have serious scaling, speed, and relevancy limitations. Fuzzy matching (fuzzy implies results that are similar to, but not an exact match, to the search query) becomes too computationally intensive as the product catalog grows. This means that either the quality of search results must be sacrificed for speed, or consumers will have to wait longer to obtain the most relevant results for their search.

Faceted filter search with relational databases

Search results filtering (faceted filters) becomes an even bigger problem. As more filtering attributes (e.g., size, color, compatibility) are added, searches become slower and slower. According to Google PageSpeed Insights, users will barely wait for ⅕th of a second. With a relational database of a relatively small catalog of 10,000 products, doing a keyword search with four filters selected could easily take several seconds.

Autocomplete is also possible with relational databases, but once again, it is very difficult to make it fast enough to show relevant suggestions as users type.

4. Managed site search systems

Managed search systems, such as Swiftype, Algolia, SearchSpring, BloomReach, or SLI Systems, address the problems of site search speed. They’re specialized site search systems that integrate with the e-commerce platforms above (“class 2” or “class 3”) to provide speed and advanced functionality. These functionalities can include:

  • search results relevancy sorting
  • fuzzy searches
  • results weights
  • product/document boosting
  • autocomplete, autosuggestions - though these sound similar, they work differently from a technical standpoint
  • faceted filters
  • synonyms

These systems are relatively easy to integrate into an e-commerce platform, but do not provide a completely seamless experience. Site admins and marketers will need to manage product information in their e-commerce system and adjust search-specific settings and configurations in external search system admin interfaces. Site visitors may notice some user interface differences between pages powered by the e-commerce platform vs. pages powered by the site search platform. Updating search data frequently enough to account for product availability changes is challenging with such systems. Additionally, analytics and visitor tracking become trickier.

Internally, these systems use something like Solr or Elasticsearch.

If you are interested in reading more about the various search service providers and their underlying technologies, as well as when they came around, check out our overview of on-site search from 1999 to 2018.

5. Unmanaged site search systems

Unmanaged search systems, such as Elasticsearch and Solr, are the most robust options for creating site search functionality that combines the best relevance of results with speed. However, they have very steep learning curves. Most novice or mid-level web developers are unable to handle the complexity of these systems practically.

These systems also require an experienced DevOps engineer (server administrator) to set up and manage them. The way these systems consume computational, memory, and storage resources, and the way they scale is rather different from what relational databases do.

Some e-commerce businesses opt to build their search functionality by integrating Elasticsearch or Solr with their existing e-commerce platform of “class 3” (e.g., Magento, PrestaShop).

It’s a very involved undertaking with many sleepless nights, endless debugging, wasted time, and never-ending maintenance. IT staff will need to set up synchronization between their relational database and search, manage additional servers, and rebuild large parts of their website to retrieve data from Solr or Elasticsearch instead of MySQL or PostgreSQL. They will also need to create admin interfaces to manage weights, synonyms, and boosted products, among other tasks.

Benefits of developing your search functionality

Users will enjoy a completely seamless experience because search becomes a fully integrated part of the e-commerce platform. That means customers stay within one system the entire time they are at the site, even though the system utilizes different data sources for various operations. Users will experience the fastest searches possible, even if your company and catalog grow exponentially, because Elasticsearch and Solr scale to any size an e-commerce company wants – just throw more money at it to add more search servers. Finally, with this setup, site analytics can be easily configured to track anything marketers can imagine with any granularity they desire.

Another challenge with any of the setups described above is with data quality and management of that data. For keyword search, e-commerce companies must ensure the use of a variety of keywords and synonyms, and continually make adjustments based on search history. The technical name of a product isn’t always what people refer to it as, and correct spelling isn’t always how people search for it. For example, “spotting scope” is two words, but 40 percent of people search for “spottingscope”, while “riflescope” is one word, but almost everyone searches for “rifle scope”. At least “telescope” is an easy one.

Another even more challenging data issue concerns product attributes. How do you set up filters based on product attributes for a shoe store when one brand uses US men’s sizes 5–13, another one uses XS, S, M, L, XL, XXL, one more uses women’s sizes, some other brand only lists Euro sizes and their granularity is different from US sizes? What about an eyewear or cosmetics store where nearly the same color has hundreds of different names depending on the brand, collection, and the phase of the moon?

For B2B e-commerce sites, this becomes even more critical. Sometimes, a product must comply with specific standards, so a keyword search may not be sufficient. Users must be able to select ISO-12345 standard as a filter and find exact matches. The same applies to features such as pipe thread or step-motor precision and power.

Safety standards also get complex; one business customer might know the exact standard they require, and another might want to see all thingamabobs that are designed to work in temperatures above 300°F or resist corrosion from specific chemicals.

Two main product filtering challenges

  • Keeping filtering options sane and easy to use — Customers don’t want to select shoe sizes 11½, 12, 12½, L, XL, EU 45, EU 46, and EU 47 and still get irrelevant or incomplete results. Customers also don’t want to stare at size charts; they want to select a size and let the system handle the rest.
  • Maintaining accurate and consistent product attributes data — entering or editing product attributes for each product one by one is a nightmare. Admins will need to ensure that they use consistent attribute names and values across different brands and collections. How does an admin find products that are missing attributes X and Y, part numbers, or UPC codes? How do they deal with periodic catalog updates from their suppliers? How do they deal with inconsistencies in supplier data feeds?

If your current systems don’t let you easily manage data to ensure the quality of your catalog, read up on what product information management (PIM) systems can do for you.

Steersman’s solution

Steersman’s e-commerce system is more than just an e-commerce catalog, shopping cart, and search engine. It’s a complete technology solution that can be customized to any B2B or B2C e-commerce company’s needs.

However, the subject of this article is website search, so let’s examine Steersman’s solution for this challenging aspect of e-commerce.

Data storage used on Steersman e-commerce systems

We use several data storage and retrieval solutions, each best suited for a particular type of task. The main “source of truth” is PostgreSQL — an enterprise-grade, open-source relational database. Keeping the PostgreSQL relational database for data that doesn’t need to be retrieved instantly, our platform automatically synchronizes relevant pieces of data into Elasticsearch, which is then sent to the e-commerce front end.

Thus, the customer-facing e-commerce catalog and search are internally powered by Elasticsearch for a completely seamless, fast, and accurate user experience. Selecting 10 filters is as quick as selecting one.

Attribute filter data quality issues are mitigated in several ways

  • Vendor-specific attribute values can be mapped to a sane list of options visible to customers.
    • E.g., shoe sizes 11½, 12, 12½, L, XL, EU 45, EU 46, and EU 47 can be mapped to 12
    • 10½, 11, 11½, L, EU 43, EU 44, and EU 45 mapped to 11.
    • Notice the overlap highlighted in bold. When searching, customers will be presented with a short list of US shoe sizes, but the system will search for all mapped values.
  • Bulk-editing interfaces and several vendor data ingestion options help with entering attribute data. There is a notion of “product type”, which is something that can be defined to validate product attributes attached to a product. This is particularly helpful during data cleanup and maintenance.
  • Several reports and alerts are built in to track data quality and consistency.

Don’t do it alone

While e-commerce companies have several options for integrating site search into their platforms, almost all are either too costly, too labor-intensive, or don’t provide a viable combination of relevant results and speed.

Steersman’s e-commerce site search is just one part of our comprehensive system, which ensures business process automation and consolidation, from website sales to inventory forecasting, manufacturing management, shipping automation, and financial reporting.

Other information to consider when deciding on e-commerce systems

Steersman advanced e-commerce platform with smart on-site search comes as a component of Steersman Business Suite, an all-in-one business automation system.

Work with Steersman directly or through your web agency to set up Steersman Business Suite and website SaaS. Front-end design, layouts, and coding can be handled by Steersman or a capable design and implementation partner firm.

~ by Art Shipulin, CTO at Steersman