The Effect of Dimensionality Reduction on the Real Estate Appraisal Performance Using Tree-Based Machine Learning Models
Jamal A. A. NUMAN1, Izham Mohamad YUSOFF*2
* Corresponding author
1 Universiti Sains Malaysia, Institute of Postgraduate Studies, Penang, MALAYSIA
2 Universiti Sains Malaysia, Geography Section, Transdisciplinary Research on Environmental Science, Occupational Safety and Health, Penang, MALAYSIA
E-mail: jamalnuman@student.usm.my; ORCID: 0009-0003-7861-4790
E-mail: izham@usm.my; ORCID: 0000-0003-0805-804X
Pages: 15-30. DOI: 10.24193/JSSP.2025.1.02
Received: 03 August 2024
Received in revised form: 22 January 2025
Accepted for publication: 03 June 2025
Available online: 20 June 2025
Cite: Numan J. A. A., Yusoff I. M. (2025), The Effect of Dimensionality Reduction on the Real Estate Appraisal Performance Using Tree-Based Machine Learning Models. Journal of Settlements and Spatial Planning, 16(1), 15-30. DOI: 10.24193/JSSP.2025.1.02
Abstract. Real estate appraisal is a critical process essential for economic, financial, and business transactions, including buying and selling, mortgage lending, insurance, and property taxation. In this context, model-based real estate appraisal methods face significant challenges such as performance, interpretability, stability, reliability, scalability, flexibility, simplicity, adaptability, applicability, generalizability, comprehensibility, data availability, and evaluation metrics. Among these challenges, performance consistently stands out as a key concern, attracting considerable attention from both academic researchers and industry professionals. With the aim of investigating the effect of dimensionality reduction (DR) on the appraisal performance, three objectives are crafted: identifying the initial features affecting real estate appraisal within Al Bireh city, Palestine, selecting the most influential features, and evaluating model performance when all features are included versus when only the most influential are used employing five statistical metrics. The originality of this research lies in the explicit implementation of DR using multiple feature importance (FI) techniques, multiple models, and multiple evaluation metrics. Specifically, this study includes two FI techniques—namely, inherent FI and Shapley Additive Explanation (SHAP); four models – three tree-based models (decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost)) and a linear regression (LR) model used as a benchmark; and five evaluation metrics: MSE, RMSE, MAE, MAPE, and R². The results indicate no performance improvement when DR is conducted. However, with DR reducing the features from 28 to 6, the relative performance metric decrease is minor, remaining below 5% for all models except LR, and as low as 0.7% in terms of R² for RF, thus concluding the need for a trade-off between the minor decrease in performance and gains in computational efficiency, hardware resources, and data collection. The key implications of DR provide stakeholders with a checklist of key features influencing appraisal value, and increase efficiency by reducing processing time, resources, and data collection.
K e y w o r d s: dimensionality reduction, feature selection, feature importance, Tree-Based models, linear regression
