In the context of e-commerce, a product search system matches a query with relevant products. In general, a query can have multiple forms, e.g. a keyword, a short text, an image, or even an audio. On the other hand, a product is usually represented as a data object consisting of attribute-value pairs and array data types. These data objects are indexed to a file system in advance. During the query time, the product search system receives a query from a user. The system first employs some algorithm to measure the similarity of the query and all indexed products, then sorts the product by relevance, and finally retrieves the top-K products and displays to the user.
At Zalando, our in house search engine currently matches full text queries with products through a cascaded architecture. Each step in the cascade processes the input in a specific way, for example, locating mentions of brands, spell checking, and disambiguation. Finally the preprocessed input is used to filter the product attributes (such as color=red, brand=Nike). However, this architecture has multiple drawbacks, such as fragility, limited scalability and extensibility. We are therefore working on replacing this cascade with a single step end-2-end deep learning architecture which involves no textual preprocessing and directly filters image content as well as product meta-data. By eliminating all intermediate components in pipeline, the end-2-end product search system is expected to enjoy simpler architecture, more robustness and greater scalability than its pipeline counterpart. Moreover, maintaining and improving the end-2-end system is considerably easier.
In this talk, I will describe the components involved in such a system, as well as potential advantages and disadvantages.
The event took place on January 26th, 2018.
Find Duncan’s slides on SlideShare.