Multi-level Product Category Prediction through Text Classification

Maia, Wesley Ferreira; Carmignani, Angelo; Bortoli, Gabriel; Maretti, Lucas; Luz, David; Guzman, Daniel Camilo Fuentes; Henriques, Marcos Jardel; Neto, Francisco Louzada

Computer Science > Computation and Language

arXiv:2403.01638 (cs)

[Submitted on 3 Mar 2024]

Title:Multi-level Product Category Prediction through Text Classification

Authors:Wesley Ferreira Maia, Angelo Carmignani, Gabriel Bortoli, Lucas Maretti, David Luz, Daniel Camilo Fuentes Guzman, Marcos Jardel Henriques, Francisco Louzada Neto

View PDF HTML (experimental)

Abstract:This article investigates applying advanced machine learning models, specifically LSTM and BERT, for text classification to predict multiple categories in the retail sector. The study demonstrates how applying data augmentation techniques and the focal loss function can significantly enhance accuracy in classifying products into multiple categories using a robust Brazilian retail dataset. The LSTM model, enriched with Brazilian word embedding, and BERT, known for its effectiveness in understanding complex contexts, were adapted and optimized for this specific task. The results showed that the BERT model, with an F1 Macro Score of up to $99\%$ for segments, $96\%$ for categories and subcategories and $93\%$ for name products, outperformed LSTM in more detailed categories. However, LSTM also achieved high performance, especially after applying data augmentation and focal loss techniques. These results underscore the effectiveness of NLP techniques in retail and highlight the importance of the careful selection of modelling and preprocessing strategies. This work contributes significantly to the field of NLP in retail, providing valuable insights for future research and practical applications.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2403.01638 [cs.CL]
	(or arXiv:2403.01638v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.01638

Submission history

From: Daniel Camilo Fuentes Guzman [view email]
[v1] Sun, 3 Mar 2024 23:10:36 UTC (931 KB)

Computer Science > Computation and Language

Title:Multi-level Product Category Prediction through Text Classification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Multi-level Product Category Prediction through Text Classification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators