Ecommerce on-site search 101: databases, technologies and data

Ecommerce on-site search 101: databases, technologies and data

Understanding the different types of ecommerce platforms and their various functions can be difficult. One critical area of ecommerce website technology that's evolved significantly during the past several years is on-site search capability.

This brief tutorial will help e-commerce, marketing, and IT professionals better understand the core components of ecommerce systems involved in setting up advanced website search.

  1. Ecommerce Platform Types

    Ecommerce platforms manage product catalogs, allow customers to add products to a cart, manage wish lists, help shoppers proceed to checkout, select shipping options, pay with a card, track their order status and perform other buying-related functions.

    Let's divide these platforms into three major "classes":

    • Class 1 — Enterprise ecommerce — usually a custom buildout for large ecommerce businesses like Amazon, Walmart, Zappos, Target, Staples, etc.
    • Class 2 — Ecommerce-as-a-service for SMBs. These include platforms like Shopify, Wix, Squarespace and Yahoo Merchant Solutions (which is extremely outdated and should, by all rights, be dead by now). These platforms target very small stores with a small number of products.
    • Class 3 — DYI-ish ecommerce platforms — these include ecommerce systems like Magento, OpenCart, PrestaShop, and WooCommerce, installed on company's own hardware or cloud-hosted. These are highly customizable platforms that one can download, install on a server and modify until there is nothing left of the original platform (a rather common scenario).
  2. Out-of-box site search on common platforms

    An ecommerce/website on-site search function allows visitors to search and filter search results. Obviously, most ecommerce platforms come with built-in search.

    • Enterprise commerce platforms (class 1) have few problems with site search or filtering and handle large catalogs of products.
    • SMB e-commerce-as-a-service platforms (class 2) provide some site search functionality. Their built-in features like autocomplete, autosuggestions and filtering are lacking or unusable for larger catalogs. These platforms are very cheap, so it would be unfeasible for their providers to include advanced search with their offerings.
    • DIY-ish commerce platforms (class 3) come with similar capabilities as the class 2 systems out-of-box, though they provide better ability to tweak and optimize the data. To make things simpler for developers, such platforms normally use relational databases for data storage.
  3. Problems with on-site search when using relational databases (MSSQL, MySQL, PostgreSQL)

    Relational databases are very good for storing and retrieving exact transactional data, but they do not have built-in robust mechanisms for full text search, fuzzy matching and suggestions.

    Ecommerce website search systems can be built with relational databases, but they have serious scaling, speed and relevancy limitations. Fuzzy matching (fuzzy implies results that are similar to, but not an exact match to search query) becomes too computationally intensive as product catalog grows. This means either that the quality of search results must be sacrificed for speed, or consumers will have to wait longer in order to get the most relevant results for their search.

    Faceted filter search with relational databases

    Search results filtering (faceted filters) becomes an even bigger problem. As more filtering attributes (e.g., size, color, compatibility) are added, searches become slower and slower. According to Google PageSpeed Insights users will barely wait for 1/5th of a second. With a relational database of a relatively small catalog of 10,000 products, doing a keyword search with four filters selected could easily take several seconds.

    Autocomplete is also possible with relational databases, but, once again, it is very difficult to make it fast enough to show relevant suggestions as users type.

  4. Managed site search systems

    Managed search systems like Swiftype, Algolia, SearchSpring, BloomReach, or SLI Systems address the problems of site search speed. They're specialized site search systems that integrate with the ecommerce platforms above ("class 2" or "class 3") to provide speed and advanced functionality. These functionalities can include:

    • search results relevancy sorting
    • fuzzy searches
    • results weights
    • product/document boosting
    • autocomplete, autosuggestions - though these sound similar, they work differently from a technical standpoint
    • faceted filters
    • synonyms

    These systems are relatively easy to integrate into an ecommerce platform, but do not provide a completely seamless experience. Site admins/marketers will have to manage product information in their ecommerce system, and manage search-specific settings and adjustments in external search system admin interfaces. Site visitors may notice some user interface differences between pages powered by the ecommerce platform vs. pages powered by site search platform. Updating search data frequently enough to account for product availability changes is challenging with such systems. Additionally, analytics and visitor tracking become trickier.

    Internally these systems use something like Solr or Elasticsearch.

    If you are interested in reading more about the various search service providers and their underlying technologies, as well as when they came around, check out our overview of on-site search from 1999 to 2018.

  5. Unmanaged site search systems

    Unmanaged search systems like Elasticsearch and Solr are the most robust options for creating site search functionality that combines the best relevancy of results and speed. However, they have very steep learning curves. Most novice or mid-level web developers cannot handle the complexity of these systems in a practical manner.

    These systems also require an experienced DevOps engineer (server administrator) to set up and manage them. The way these systems consume computational, memory and storage resources, and the way they scale is rather different from what relational databases do.

    Some ecommerce businesses turn to building their own search functionality – integrating Elasticsearch or Solr with their current ecommerce platform of "class 3" (e.g., Magento, PrestaShop).

    Pains of custom-building your on-site search

    It's a very involved undertaking with many sleepless nights, endless debugging, wasted time and never-ending maintenance. IT staff will have to set up synchronization between their relational database and their search, manage more servers and rebuild large parts of their website so it will retrieve data from Solr or Elasticsearch instead of MySQL or PostgreSQL. They will also have to create admin interfaces to manage weights, synonyms, boosted products, etc.

    Benefits to developing your own search functionality

    Users will get a completely seamless experience because search becomes a fully integrated part of the ecommerce platform. That means customers stay within one system the whole time they are at the site, even though the system uses different data sources for different operations. Users will experience the fastest searches possible even if your company and catalog grow exponentially, because Elasticsearch and Solr scale to any size an ecommerce company wants – just throw more money at it to add more search servers. Finally, with this setup, site analytics can be easily configured to track anything marketers can imagine with any granularity they desire.

  6. Data quality for optimal search

    Another challenge with any of the setups described above is with data quality and management of that data. For keyword search, ecommerce companies must make sure to use a variety of keywords and synonyms and keep making adjustments based on search history. The technical name of a product isn't always what people call it and correct spelling isn't always how people search for it. For example, "spotting scope" is two words, but 40 percent of people search for "spottingscope", while "riflescope" is one word but almost everyone searches for "rifle scope". At least "telescope" is an easy one.

    Data for faceted filter search

    Another even more challenging data issue concerns product attributes. How do you set up filters based on product attributes for a shoe store when one brand uses US men's sizes 5–13, another one uses XS, S, M, L, XL, XXL, one more uses women's sizes, some other brand only lists Euro sizes and their granularity is different from US sizes? What about an eyewear or cosmetics store where nearly the same color has hundreds of different names depending on the brand, collection and the phase of the moon?

    For B2B ecommerce sites, this becomes even more critical. Sometimes a product must comply with specific standards, so a keyword search won't be good enough there. Users must be able to select ISO-12345 standard as a filter and find exact matches. The same goes for anything like pipe thread or step-motor precision and power.

    Safety standards also get complex; one business customer might know the exact standard they require, and another might want to see all thingamabobs that are designed to work in temperatures above 300°F or resist corrosion from specific chemicals.

    Two main product filtering challenges

    • Keeping filtering options sane and easy to use — Customers don't want to select shoe sizes 11½, 12, 12½, L, XL, EU 45, EU 46, and EU 47 and still get irrelevant or incomplete results. Customers also don't want to stare at size charts, customers just want to select 12 and let the system deal with the rest.
    • Maintaining accurate and consistent product attributes data — entering or editing product attributes for each product one by one is a nightmare. Admins will need to make sure to use consistent attribute names and values across different brands and collections. How does an admin find products that are missing attributes X and Y, part numbers or UPC codes? How do they deal with periodic catalog updates from their suppliers? How do they deal with inconsistencies in supplier data feeds?

    If your current systems don't let you easily manage data to ensure quality of your catalog, read up on what product information management (PIM) systems can do for you.

Steersman's solution

Steersman's ecommerce system is much more than just an ecommerce catalog, shopping cart and search. It's a complete technology solution that can be customized to any B2B or B2C ecommerce company's needs.

However, the subject of this article is website search, so let's take a look at Steersman's solution for this tricky aspect of ecommerce.

Data storage used on Steersman ecommerce systems

We use several data storage and retrieval solutions, each best suited for a particular type of tasks. The main "source of truth" is PostgreSQL — an enterprise-grade, open-source relational database. Keeping the PostgreSQL relational database for data that doesn't need to be retrieved instantaneously, our platform automatically synchronizes relevant pieces of data into Elasticsearch to then be sent to the ecommerce front end.

Thus, the customer-facing ecommerce catalog and search are internally powered by Elasticsearch for a completely seamless, fast, and accurate user experience. Selecting 10 filters is as fast as selecting one.

Attribute filter data quality issues are mitigated in several ways

  • Vendor-specific attribute values can be mapped to a sane list of options visible to customers.

    E.g. shoe sizes 11½, 12, 12½, L, XL, EU 45, EU 46, and EU 47 can be mapped to 12

    10½, 11, 11½, L, EU 43, EU 44, and EU 45 mapped to 11.

    Notice the overlap highlighted in green. When searching, customers will be presented with a short list of US shoe sizes, but the system will search for all mapped values.

  • Bulk-editing interfaces and several vendor data ingestion options help with entering attributes data. There is a notion of "product type", which is something that can be defined to validate product attributes attached to a product. This is very helpful during data clean up and maintenance.
  • Several reports and alerts are built in to track data quality and consistency.

Don't do it alone

While ecommerce companies have several options for integrating site search into their platforms, almost all are either too costly, too labor-intensive, or don't provide a viable combination of relevant results and speed.

Steersman's ecommerce site search is just one part of our complete system that ensures business process automation and consolidation, from website sales to inventory forecasting, manufacturing management, shipping, and financial reporting.

Other information to consider when deciding on ecommerce systems

How to get Steersman advanced on-site search

Steersman advanced ecommerce platform with smart on-site search comes as a component of Steersman Business Suite, an all-in-one business automation system.

Work with Steersman directly or through your web agency to setup Steersman Business Suite and website SaaS. Front end design, layouts, and coding can be done by Steersman or capable design and implementation partner firms.

— by Art Shipulin, CTO at Steersman

Request a demo

Required
Required
Must be a valid email address
Must be a valid US number

Help us get to know you

Required
Required