Classification of Poverty Condition Using Natural Language Processing

Muneton-Santa G, Escobar-Grisales D, Orlando Lopez-Pabon F, Pérez Toro PA, Orozco Arroyave JR (2022)

Publication Type: Journal article

Publication year: 2022


DOI: 10.1007/s11205-022-02883-z


This work introduces a methodology to classify between poor and extremely poor people through Natural Language Processing. The approach serves as a baseline to understand and classify poverty through the people's discourses using machine learning algorithms. Based on classical and modern word vector representations we propose two strategies for document level representations: (1) document-level features based on the concatenation of descriptive statistics and (2) Gaussian mixture models. Three classification methods are systematically evaluated: Support Vector Machines, Random Forest, and Extreme Gradient Boosting. The fourth best experiments yielded around 55% of accuracy, while the embeddings based on GloVe word vectors yielded a sensitivity of 79.6% which could be of great interest for the public policy makers to accurately find people who need to be prioritized in social programs.

Authors with CRIS profile

Involved external institutions

How to cite


Muneton-Santa, G., Escobar-Grisales, D., Orlando Lopez-Pabon, F., Pérez Toro, P.A., & Orozco Arroyave, J.R. (2022). Classification of Poverty Condition Using Natural Language Processing. Social Indicators Research.


Muneton-Santa, Guberney, et al. "Classification of Poverty Condition Using Natural Language Processing." Social Indicators Research (2022).

BibTeX: Download