The third largest retailer in the world, operating in twelve countries and employing close to 500,000 people.
The retailer’s inventory comprises over 800,000 unique products. Data is held electronically, using a taxonomy, or structured tree, five layers deep with over 4,000 unique nodes.
The laborious process of adding new products into the inventory requires a text-only description of the product (including brand and ingredient list) to be added manually. The user must then decide into which category to place the item, often facing more than ten possible choices at each layer. With five layers to work through, this complex, time-consuming process is open to error, causing a major bottleneck.
The project concentrated on classifying products in three major categories (clothing and footwear, fresh groceries and dry groceries), which between them account for approximately 63% of all products sold by the retailer.
First, all of the available text for a particular product was automatically collated and presented in a single paragraph. This was then converted into a word frequency vector, with every element in the vector corresponding to the number of times the word appeared in the paragraph. All vectors were combined to form a matrix and build a Support Vector Machine classifier.
Results for items in the clothing and footwear category were strong, with the algorithm achieving 98% accuracy as it matched items all the way to the last layer of classification. For fresh and dry groceries, the results were also promising. The first two layers achieved above 90% accuracy, but results dropped to between 70% and 80% in the final layer. Further analysis revealed that this was in part due to ambiguous and overlapping categories at the last level, and our work helped to draw attention to this fact.
For clothing and footwear, the 98% accuracy and time saved is an improvement on the 95% to 98% accuracy achieved with manual inputting. For fresh and dry groceries, the algorithm was developed to offer further support to the user, ranking and presenting the matches in order. As a result, in over 70% of cases the user need only click one button to confirm the selection made by the algorithm.