Next Article in Journal
Analyzing the Impact of Deep Excavation on Retaining Structure Deformation Based on Element Tracking
Previous Article in Journal
Association between the Indoor Visual Environment and Cognition in Older Adults: A Systematic Review
Previous Article in Special Issue
Learning from the Past, Looking to Resilience: Housing in Serbia in the Post-Pandemic Era
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Data Analytics Methodology for Analyzing Real Estate Brokerage Markets with Case Study of Dubai

by
Ahmed Saif Al Abdulsalam
,
Maged Mohammed Al-Baiti Al Hashemi
,
Mohammed Zayed Sulaiman Aleissaee
,
Abdelaziz Saleh Husain Almansoori
,
Gurdal Ertek
* and
Thouraya Gherissi Labben
College of Business and Economics, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
*
Author to whom correspondence should be addressed.
Buildings 2024, 14(10), 3068; https://doi.org/10.3390/buildings14103068
Submission received: 29 June 2024 / Revised: 29 August 2024 / Accepted: 30 August 2024 / Published: 25 September 2024

Abstract

:
Despite the vast economic impact of real estate markets worldwide, research on real estate brokerage markets remains limited. Specifically, there are few studies that provide a systematic, integrated, and replicable analytical methodology to analyze and benchmark a given real estate brokerage market. To this end, this paper introduces a data analytics methodology for analyzing real estate brokerage markets, integrating various statistical and analytical methods to extract insights from market data, supporting real estate investment decisions. The applicability of the methodology is demonstrated with a case study analyzing data from the top 50 real estate brokerage firms in Dubai, UAE. As shown in the case study, applying this methodology to brokerage market data enables the visual benchmarking of firms, identification of similarities between them, profiling and comparison of clusters of firms, and exploration of the impacts of various categorical and numerical attributes on performance. A notable finding for the Dubai real estate brokerage market is that it takes a minimum of 700 days for a brokerage firm to mature and advance to the next level of business success.

1. Introduction

Real estate is one of the primary investment areas for both institutional and individual investors. According to the most recent statistics [1], the value of real estate in the world reached USD 379.7 trillion. China holds 26% of this value, Europe 43%, and the US 19%. The value of real estate in the GCC countries is around USD 1.36 trillion. It is forecasted that the size of the worldwide real estate market will increase at a compound annual growth rate (CAGR) of 5% [2]. This increase can be explained by the expansion of public–private partnerships, the increase in the global population, and the general willingness for household ownership. In the UAE, which is the focus country of this study, the total real estate market value is expected to approach USD 0.71 trillion in 2024 [3]. The increasing high demand in Dubai has expanded the total volume of real estate transactions and reached a new record of a 43.3% increase in the first 6 months of 2023 alone [4].
The focus market in this research, to demonstrate the applicability of the developed analytical benchmarking methodology, is Dubai, a city/Emirate within the United Arab Emirates (UAE). UAE is a Gulf country, and a union of seven “Emirates” (states/regions), whose estimated population is >9 million, including both citizens and non-citizen expatriate residents (https://www.britannica.com/place/United-Arab-Emirates accessed on 11 August 2023). While Abu Dhabi Emirate is the home to the capital city of Abu Dhabi, the Emirate/city of Dubai is the most populous, with an estimated population of >3.5 million (https://www.dsc.gov.ae/en-us/Pages/default.aspx accessed on 11 August 2023).
There are multiple unique aspects of the UAE, and particularly Dubai, that deserve special attention and economic/business research: first, Dubai has become a success story in transforming an oil-dependent economy into a completely oil-independent and diversified economy. Dubai is a hub of commerce, finance, tourism, hospitality, and a multitude of other sectors. Second, UAE is one of the most attractive countries to work and live in the Middle East and also the larger geographical region, from West Africa to Southeast Asia. For example, for the 12th year in a row, a survey among the Arab youth ranked UAE as the top country in the world that they would like to live in (https://arabyouthsurvey.com/en/findings/#my-global-citizenship-17 accessed on 11 August 2023). Third, a unique aspect of the UAE is that it is the top country in the world with respect to the percentage of expatriates living in the country as residents, with an estimated 88% of the population being expatriates (https://www.globalmediainsight.com/blog/uae-population-statistics/ accessed on 11 August 2023). This implies a constant flow of international residents into and out of the UAE. Fourth, the UAE ranks among the better countries in the world to do business (https://www.forbes.com/places/united-arab-emirates/ accessed on 11 August 2023), resulting in a further flow of people. Fifth, relating to the presented study, there is a vibrant and growing real estate sector in the UAE, and especially in Dubai, fueled by the tourism and hospitality sectors and expatriates living long term in the country. Due to the listed aspects, UAE, and especially Dubai, is a real estate market worth researching.
Real estate is considered a lower-risk investment alternative, beneficial for portfolio diversification, and protects investors against inflation [5]. In every real estate market, transactions are enabled and facilitated significantly through real estate brokerage firms, which constitute the real estate brokerage market. For example, in Dubai, the number of real estate brokers (agents), which was 5181 in the first half of 2018 [6], has, in about five and a half years, reached 15,367 by December 2023 [7]. Such a high increase, with a tripling of the number of brokers in about five years, also signals a possible over-saturation in the brokerage markets. Thus, it is crucial to investigate the brokerage markets to identify strategies for success in such a saturated market.
The vast amount of data pertinent to the real estate markets is systematically collected, harnessed, processed, and presented as information through business intelligence systems. This information is then used for decision making by investors, real estate developers, real estate brokerage firms, and finally regional, national, and international government and non-profit organizations. Government bodies also publish open data regarding the real estate markets and brokerage markets. For example, the list of data providers for the U.S.-based RERI (Real Estate Research Institute) include various real estate, finance, retail associations, advisors, and firms, as well as government organizations (https://www.reri.org/dataproviders/index.cfm accessed on 1 April 2024). Popular online services for accessing UAE real estate market data include DataDollar (https://datadollar.ae accessed 1 on April 2024), Wasila (https://wasiladatabase.com/dubai-real-estate-database/ accessed on 1 April 2024), and Land Sterling (https://landsterling.com/strategic-consultancy/data-services/ accessed on 1 April 2024). For example, the Dubai government openly publishes extensive data on both the real estate market and real estate brokerage market, both accessible to the public. The focus of this research is the latter, namely the brokerage market [7].
Despite new heights in the real estate market volume in recent decades, there are only limited data analytics studies that focus on the benchmarking of brokerage firms. Notable research on brokerage markets, including notable analytics research, can be foundin References [8,9,10,11,12,13,14,15,16]. These and other research studies mostly apply econometrics techniques or simple summary statistics successfully to analyze various aspects of the real estate sector.
There are multiple data-driven studies that focus on real estate brokerage markets. Existing studies were found to apply either qualitative methods (as in Reference [13]), basic statistical summaries (as in Reference [14]), regression (as in References [11,15,16]), or other statistical methods (as in References [8,9]). However, none of the studies that were encountered apply more than two statistical and analytical methods within an integrated methodology. Furthermore, none of the encountered studies on the real estate brokerage markets were found to apply the machine learning (ML) method of multi-dimensional scaling (MDS) or integrate ML with statistical tests of comparison. The benefit of developing a novel analytical methodology that integrates a multitude of methods, as in this paper, is that such an approach enables the derivation of novel insights that could not be obtained otherwise. Indeed, the current paper presents new types of insights that were not encountered in any of the earlier studies on the real estate brokerage markets.
Specifically, during the review of the literature, there was no study encountered that allowed the systematic answering of the following research questions:
  • RQ1: How can brokerage firms be benchmarked visually with respect to various attributes?
  • RQ2: Which brokerage firms are similar to each other and different than others?
  • RQ3: How do the different clusters of brokerage firms differ from each other?
  • RQ4: How do the values of the categorical attributes of brokerage firms affect their other attributes?
  • RQ5: How do the values of the numerical attributes of brokerage firms affect their other attributes?
To this end, the focus of this paper is the development and introduction of a novel, custom-developed, and integrated data analytics methodology. The objective of the methodology is to analyze real estate brokerage markets so as to be able to answer all of the above five research questions within a unified framework.
To support the validity and value of the developed data analytics methodology, it is necessary to demonstrate -through at least one case study- how the methodology is implemented, and what insights it could generate. Thus, in this paper, where we introduce the methodology, we also apply the developed/introduced methodology. Specifically, we analyzed the real estate brokerage firms in Dubai, using public data published by Dubai Land Department (RERA, Real Estate Regulatory Agency) [7]. RERA is the government body in Dubai that is solely authorized to regulate, monitor, and govern the real estate sector within the Dubai emirate of UAE (United Arab Emirates). To analyze the data at hand, it was necessary to develop a customized data analytics methodology/workflow that uses various statistical and analytical methods. The developed methodology consists of comparison of means (Student’s t test), visual analytics, hierarchical clustering, k-Means clustering, MDS (Multi-dimensional Scaling), outlier analysis, and linear regression. The methodology was developed using and targeting the analysis of data published by Dubai Land Department (RERA) in particular. Yet, the methodology can be extended to include other types of analysis, for data published for other cities/states/regions/countries around the world.
This Introduction section introduced the topic of study, research questions at hand, the data and methods used, the need for a new approach, and the scientific novelty of the custom-developed analytics methodology. Section 2 provides a summary of the relevant literature. Section 3 presents the custom-developed methodology as an analytics workflow and describes the specific statistical and analytical methods integrated into the methodology. Section 4 describes the data collected and shares the analysis and results obtained. Here, the methodology is applied with data collected from the selected real estate brokerage market, namely that of Dubai, UAE. Section 4 illustrates, one by one, how each of the five research questions shared in the Introduction can be answered through the methodology. Finally, Section 5 summarizes the study, with a discussion of possible threats to validity. The Supplementary Materials to the paper contains additional analysis and results, which have been excluded from the main body of the paper for the sake of succinctness.

2. Literature

For survival purposes, real estate brokerage firms need to stay current and innovate to adjust to changes in the economic and social environments [17]. Technological advancements are transforming the real estate brokerages sustaining the emergence of new approaches and strategies. The latter was facilitated by the transition of real estate data from the private domain, reserved for professionals, to the public domain [18]. Accordingly, big data and data analytics have demonstrated their efficiency in improving decision-making in different domains with real estate becoming a significant area of application [19]. According to [18], data analytics is beneficial for real estate agents as it enhances property sales, shortens transaction time, and improves services from the supply and demand sides.
There are many studies that apply various data analytics methods for analyzing real estate markets.
In the era of the internet and digitization, real-time housing data, which is gathered by multiple listing services, is typically further enhanced with spatiotemporal geo-information data. Reference [20] used a dataset of more than three million observations from the German home market, to investigate and compare the prediction accuracy of multiple machine learning methods, such as stacked regression, XGboost, and random forest regressions. Reference [21] predicted several aspects of the real estate market using multilevel structured additive regression models. These models were especially used to identify the characteristics that have a statistically significant impact on real estate valuation. In another study that applied regression as the analytical technique, Reference [22] revealed that in Australia buildings with high energy ratings, denoting high level of environmental friendliness, were found to perform better financially overall.
Reference [23] presented a dynamic visual analytics solution, namely HomeSeeker, which serves a wide range of stakeholders involved in the local real estate market. The authors created visual analytics dashboards to help prospective homeowners conveniently view, assess, and evaluate real estate information. Additionally, HomeSeeker suggests a strategy to problem abstraction that promotes gradual education in the area real estate market, especially for consumers with less expertise.
Reference [24] developed an automated, machine-based approach method for determining the value of commercial real estate. The authors used a unique dataset on U.S. multifamily properties and apply advanced modeling techniques. Reference [25] applied grey relational analysis-analytical hierarchy process (GRA-AHP) to develop an advanced real estate investment decision-making model.
Since data from real estate markets can be highly complex, it may be difficult for both practitioners and researchers to overcome the complexities. To this end, Reference [26] provided a knowledge graph-based approach for knowledge extraction for the housing market. The authors’ approach involves four main processes, namely entity linking, question answering, data gathering, and finally data cleaning, in order to handle the complexity. The system can make personalized recommendations based on keywords and historical market data. A user-based review highlighted the effectiveness of this method and its usefulness in real-world scenarios. Reference [27] focused on the efficient planning, supply, and administration of building life cycle documentation to develop novel best practices. The authors developed a framework for document classification and demonstrate their framework with an empirical study. The authors analyzed 8965 digital documents from 14 properties belonging to eight different entities, serving as an example of automated information extraction via artificial intelligence in real estate development.
Reference [28] provided a review of how big data is used for decision making in the real estate sector. Reference [29] used an entropy-based approach for ranking the importance of factors that influence residential real estate prices, using data from Auckland, New Zealand (NZ).
Big data and data analytics were also used to study the real estate market in specific conditions. Reference [30] investigated the factors that may impact the price revision of properties during COVID-19. To collect data about approximately 19,000 properties the authors used web-scraping, with subsequently 15 machine learning models being selected to analyze the scraped data. The results confirmed that the real estate market resisted the pandemic impact as the prices did not drop significantly as expected. Furthermore, the gradient boosting model was the most accurate and demonstrated that the main variable explaining price revision and prediction was the time a property was available on the market for sale.
Beyond predicting financial aspects related to real estate such as price and risk, machine learning (ML) frameworks and models were also used to identify the best location for a real estate investment. To this end, Reference [31] contrasted two approaches. Both approaches rely on decision trees, but the first is based on principal component analysis, and the second is artificial neural networks. [32] applied various machine learning techniques to predict sale-to-list ratio for the US real estate market. The study revealed, with 85% prediction accuracy, the strong influence of some temporal, pricing-related, and market-condition factors.
Data analytics was also used to facilitate buyers’ and investors’ comparison and selection of properties. One specific application of statistical/analytical methods is for forecasting prices and rents. An example of this line of research is Reference [33], where the authors used the econometrical error correction model (ECM) for forecasting house and unit price changes in the Greater Sydney Area.
Another line of research in data analytics for real estate markets is when sentiment analysis is at the core of analytics. One example is [34], where the authors model the impact of COVID-19 sentiment on the office real estate rents and occupancies in China.
The focus of the current research is the real estate brokerage markets within the real estate sector. To this end, notable studies on real estate brokerage markets are discussed next.
Reference [8] applied data analytics and statistical visualization for better decision-making in the real estate brokerage sector. The study examined data from 62 Nigerian PropTech firms using clustering and linear regression, offering insightful information to agents, managers, and clients. The research advanced knowledge of the Nigerian real estate brokerage industry while providing stakeholders with the facility of data-driven decision making.
Reference [9] identified economies of scope in residential real estate brokerage markets. Specifically, the authors found that it was on average more efficient when sales and listings are carried out by a single firm, rather than two separate firms. Reference [11] analyzed the relations between real estate brokers’ attributes, agents’ attributes, prices, and selling time. Similar to the results of [10], Reference [11] found that there are benefits in economies of scope.
Reference [10] focused on the comparison of the relative performances of franchised vs. independent real estate brokerage firms. The study revealed the significance of and the need to consider the self-selection behavior of the agents, in other words, which type of brokers the agents prefer to choose. Reference [12] analyzed the effect of brokerage firms on the residential real estate prices, revealing that homes sold through brokers realized higher prices.
As mentioned earlier, existing studies were found to apply either qualitative methods (as in Reference [13]), basic statistical summaries (as in Reference [14]), regression (as in References [11,15,16]), or another statistical method (as in References [8,9]). However, none of the studies that were encountered apply more than two statistical and analytical methods within an integrated methodology. No study on the real estate brokerage markets was found to apply the machine learning (ML) method of multi-dimensional scaling (MDS). Thanks to the novel custom-developed analytical methodology that integrates a multitude of methods, this paper enables the derivation of novel insights that could not be obtained in earlier literature.

3. Methods

The research contributions of the presented work are methodological and practical. The theoretical contribution is a novel custom-developed analytics methodology for analyzing brokerage firms in the real estate sector. The main practical contribution is the application of the developed methodology for a particular brokerage market, namely the Dubai brokerage market, yielding insights into that market. Figure 1 displays the analytics methodology as a workflow in Orange software (https://orangedatamining.com accessed on 1 April 2024), with a variety of different statistical and analytical methods systematically applied. For example, scatter plots explain the relationships between two different variables and linear regression explains multiple variables affect multiple factors of response.
“Data analytics” is the analysis and modeling of data to discover insights. The term “custom-developed analytics methodology” refers to an analytics methodology that caters to a specific type of data, domain, or objective. Alternatively, such a methodology could appear as an “analysis pipeline” [35], “analytics workflow” or another similar term. Such a methodology, by definition, is not globally applicable to any dataset/domain/objective. Instead, it is tailor-designed to perform effectively and/or efficiently for the specific dataset/domain/objective it is intended for. Such a custom-developed analytics methodology is valuable especially when standard analytics workflows may not be sufficient in generating insights in the given context [36]. Research studies reporting custom-developed analytics methodologies can be found in practically all domains, including digital marketing [37], e-commerce [38], smart manufacturing [39,40], agriculture [41], renewable energy [42], marine sciences [43], and bioinformatics [35]. Therefore, it is reasonable to custom-develop novel analytics methodologies in the real estate domain, for benchmarking real estate brokerage firms.
In data analytics research that custom-develops novel methodologies, the term “method” or “technique” is typically used to refer to the specific statistics/analytics techniques, such as hierarchical clustering or scatter plot visualization, whereas the term “methodology” or “framework” is typically used to refer to the workflow that systematically integrates a portfolio of techniques. In this sense, while the “methods” of statistics/analytics are well-established in the literature, the “methodology” in the paper is novel; because the existing methods/techniques have not been put together within an integrated whole as such and have thus not been applied as the customized whole.
The statistical and analytical methods/techniques integrated into the methodology are as follows:
  • A scatter plot visualization plots two variables on a two-dimensional Cartesian plane to observe the relationship between the two variables. Scatter plots are particularly beneficial to reveal patterns and correlations between the variables.
  • Hierarchical clustering is a cluster analysis technique that creates a hierarchical tree of clusters. At each level of the hierarchy, the distance between the observations within clusters is minimized, and then clusters are broken into smaller clusters [29].
  • k-Means clustering divides the observations in the dataset into k clusters, where each observation is successively assigned to the cluster whose centroid is at the smallest Euclidean distance [44].
  • MDS (multi-dimensional scaling) maps multidimensional data to a two-dimensional Cartesian plane such that the data points that have less distance are closer to each other. The objective of the selected MDS algorithm is to identify an accurate lower-dimensional representation of the—typically much—higher dimensional data while maintaining, as much as possible, the distances between the data points.
  • A strip plot maps the distribution of points in the subsets of a dataset, where each subset typically corresponds to data points having a particular value of a categorical attribute [45].
  • A violin plot is similar to strip plot, yet also includes two symmetric density plots that accompany the strip plot [46]. Furthermore, the points on the strip plot are typically jittered (distorted with small deviations) to distinguish them.
  • A box plot roughly portrays the distribution of points by drawing a rectangular box that shows the median, 25% and 75% quartiles, range, and outliers.
  • Comparison of means test determines whether there is a statistically significant difference between the means of two or more groups [47]. If the groups are normally distributed or are large in size, the Student’s t-test can be applied, with smaller p values suggesting that there is a statistically significant difference between the means of at least two groups.
  • Outlier analysis identifies outlier data points, which are data points that vary from and are significantly outside the other data points in the dataset. If the outliers are not appropriately handled, they might create bias in the statistical results [48].
  • Linear regression describes the connection between one or more independent variables and a dependent variable through a linear equation, where the dependent variable is expressed as a linear function of the dependent variable(s) and a random error term [49].

4. Analysis and Results

Our case study analysis, where we demonstrate the developed methodology, is focused on the Real Estate Brokerage Market in Dubai, United Arab Emirates (UAE).

4.1. Data

As the source dataset, we extracted data for the largest 50 licensed real estate brokers in Dubai [7] with the largest number of agents as of 10 May 2023. The data was obtained from the website of Dubai Land Authority (https://dubailand.gov.ae/ accessed on 1 April 2024), which is the government entity in Dubai that regulates the real estate market. After collecting and cleaning the data, we developed a novel data analytics methodology that is customized for this specific dataset and datasets with similar attributes and structure. Then, we applied the methodology using Orange software to analyze the collected data for the selected market.
The Dubai housing market is continuing to grow. From 2022 to July 2024, the Federal Reserve of the United States has implemented a series of interest rate hikes, to mitigate against inflation (https://www.federalreserve.gov/monetarypolicy/openmarket.htm accessed on 11 August 2024). The hikes started on 17 March 2022, with an increase from 0.25% to 0.50%. Following multiple increases, finally, as of 31 July 2024, the Federal Funds interest rate was increased again from 5.25% to 5.50%. As of early 2023, Reference [50] revealed negative spillover effects of these interest rate hikes on the Dubai housing market, causing decreases in demand and prices in Dubai, as well as other countries. Yet, starting February 2023, the UAE house price annual price change has mostly increased, reaching from 6.34% in February 2023 to 16.18% in April 2024 (https://www.globalpropertyguide.com/middle-east/united-arab-emirates/price-history accessed on 11 August 2024). Sales transactions in Dubai showed 38% increase from 2022 to 2023. Furthermore, in Q1 of 2024, Dubai’s residential sales transactions showed a 20% increase in volume as compared to Q1 of 2023 (https://www.jll-mena.com/content/dam/jll-com/documents/pdf/research/emea/mena/jll-the-uae-real-estate-market-overview-q1-2024.pdf accessed on 11 August 2024). In other words, while the series of interest rate hikes by the Federal Reserve initially had a negative spillover effect on Dubai till 2023, the Dubai housing market recovered and grew in volume in 2023 till 2024 Q1.
The data attributes are provided in Table 1. The primary target attribute is “RankingTotalPoint” which is numerical. The eight attributes that start with the text “Points” are points/scores corresponding to different dimensions of performance. The remaining are derived attributes, meaning that they were derived from the original dataset through calculations.

4.2. Benchmarking Brokers

The first research question that can be answered through data analytics is the following:
RQ1: How can brokerage firms be benchmarked visually with respect to various attributes?”
Figure 2 is a scatter plot with variable “NumberOfBrokers” (Number of Brokers) on the x-axis and “RankingTotalPoint” (Ranking total point) on the y-axis. In addition, other attributes are denoted with color (“DaysSinceLicense”) and size (“NumberOfAwards”). “OfficeCode”, the label, is the name of the real estate broker.
The scatter plot illustrates that there is no single consistent pattern, such as a positive or negative relationship between the two variables. However, this scatter plot can still yield a highly beneficial insight: A real estate buyer would prefer high values of “RankingTotalPoint” (y axis) and “NumberOfAwards” (size). It is possible to identify such brokers from this scatter plot as the larger circles on the upper region of the plot such as HARBOR, PREMIER ESTATES, REM, HS, ALLSP, and LUXHABITAT. Furthermore, analyzing the sector, one would especially be interested in identifying younger firms with fewer brokers as employees but with a high level of success. These firms are denoted by larger circles in the upper left region, as well as darker colors and towards the left side of the plot. All the firms mentioned in this paragraph qualify for these criteria, except for HARBOR, which is established well before the other mentioned.

4.3. Mapping Brokers

The second research question that can be answered through data analytics is the following:
RQ2: Which brokerage firms are similar to each other and different than others?”
To answer this question, we apply hierarchical clustering, k-means clustering, and multidimensional scaling (MDS).
Figure 3 shows the application of hierarchical clustering with “OfficeCode”. The attributes used as dimensions are the eight numerical “Points” attributes. The hierarchical clustering algorithm, whose results are illustrated in the dendrogram of Figure 3, creates from 2 to 27 clusters, revealing varying levels of similarity. Some of the clusters consist of a single element. Even at the most detailed level, with 27 clusters, the number of firms can reach up to 7. For example, on the upper right of the figure, one can observe that the five brokers CA = (HAMPTONS, AEONTRISL, PROVIDENT, FAM, ESPACE) are grouped together, meaning that these firms are most similar with respect to the values of attributes used in clustering. The largest cluster at this level consists of the seven firms CB = (PATRIOT, LUXURY, BANKE, DRIVEN2, HAUSBAUS, FAM1, EV).
There are multiple implications of the results. For example, while other FAM branches (FAM2, …, FAM6) reside together in a different cluster, the FAM1 branch resides within a cluster consisting of firms under different ownership. The reasons for this divergence are worth investigating, especially given that RankingTotalPoint = 53 for FAM1, whereas it is 36 for other branches. The parent company FAM, which has even higher points, can have FAM1 managers work with other FAM branches to increase their performance.
The level of merging of clusters also is noteworthy. For example, cluster CA merges with a cluster of other firms CC = (BETTER, EXCLUSIVELINKS, ALLSOPP, AZCO) almost immediately, meaning that, before larger clusters are formed, the firms in clusters CA and CC are closer to each other, more than their similarity to other firms. Thus, these firms do better in similar dimensions and do worse also in other similar dimensions.
For a firm, it would be especially interesting to benchmark with a more successful firm that has significantly higher RankingTotalPoint value, yet in the same cluster. Because this means that the more successful company is doing better in similar dimensions of performance. A careful investigation of the source data revealed that the firms within cluster CA had exact same values for the eight dimensions of performance (PointsNumberofTransactions, …, PointsInitiatives). The same situation was observed for cluster CB. For cluster CC, there was only a very slight difference, with BETTER doing only slightly better in the PointsNumberOfBranches dimension compared to the other three firms in that cluster, hence it being a cluster of its own at the most detailed level. However, when clusters are started to get merged, the situation changes. For example, when clusters CA and CC are merged, since the firms in CA all have RankingTotalPoint = 65, which is higher than the score for CC, it is reasonable for the firms in CC to look up to the firms in CA.
The next analysis is k-means clustering, whose results highly depend on the selection of the number of clusters. The best choice of k can be determined through the Silhoutte scores [51], and the k value that maximizes the Silhouette score can be chosen. Figure 4 displays the Silhouette scores for different values for k. Silhouette score is highest for k = 4, thus we selected k = 4 as the number of clusters. Other parameters for pre-processing are provided in Figure 4.
After applying k-means clustering, each observation is mapped to a cluster. A reasonable successive analysis is understanding how these observations in different clusters are related. To this end, we applied multi-dimensional scaling (MDS), which maps multi-dimensional data onto a two-dimensional Cartesian plane. In the MDS, “Show similar pairs” option is selected, which automatically turns the MDS results into a graph/network, enabling the analysis of graph analytics [52] techniques as future work.
Figure 5 shows a zoomed region within the results of MDS, with color regions displayed. Color denotes the cluster which the brokerage firm belongs to, with the tone of the color denoting the strength of belonging in the cluster, i.e., being close to the cluster centroid. There are four clusters, yet cluster C2 alone has most of the observations. Thus, we focus on cluster C2, which is denoted by red color. The firms close to each other, especially those that are connected on the MDS plot are similar and closer to each other in the higher dimensional plane. Within cluster C2, there are two major subclusters consisting of connected brokers. In graph analytics, these would be referred to as components, as the nodes in each of these components being reachable from (connected with) each other and yet disconnected from the rest of the graph/network.
The third research question, which is also related to the mapping of the brokerage firms, that can be answered through data analytics is the following:
RQ3: How do the different clusters of brokerage firms differ from each other?”
To answer RQ3, we will apply strip plots, violin plots, box plots, and statistical hypothesis test for the comparison of means.
Earlier analysis revealed four clusters of firms. Through a violin plot and a statistical comparison test, these identified clusters can be characterized and analyzed deeper for insights.
Figure 6 shows office “DaysSinceLicense” on the Y axis while it shows the number of Cluster on the x axis. Cluster 4 shows an old firm, while cluster C2 and C3 show younger firms than cluster 4. Moreover, cluster 1 shows younger firms.
Figure 7 shows “NumberOfBrokers” on the Y axis while it shows the number of Cluster on the x axis. The violin plot shows that cluster C2 has the highest range, followed by cluster C4, while cluster C1 has the lowest. Furthermore, cluster C3 is slightly higher than cluster C1 yet still lower than cluster C2.
Figure 8 shows “RankingTotalPoint” on the Y axis while it shows the number of Cluster on the x axis. The violin plot shows that cluster 3 has the highest range followed by cluster 2 which is lower, while cluster 1 has the lowest of all clusters. Furthermore, cluster 4 is lower than cluster 2.
The results in Figure 6, Figure 7 and Figure 8 can be combined as follows, to generate two novel insights for the market: Younger firms, which have less than 700 days or less since license issue (Figure 6) have much fewer number of brokers, being less than 20, and ranking lower (Figure 7), according to the scoring system of Dubai Land Authority.
In other words, to move from cluster C1 of young low-ranking firms to cluster C2 of higher-ranking firms, at least 700 days (approximately two years) are needed. This “700-day rule”, similar in essence to the “10,000-h rule” popularized by Malcolm Gladwell’s “Outliers” book [53], states the amount of time (in number of days) needed to mature and advance to the next level of business success in Dubai’s brokerage market.

4.4. Effects of Categorical Attributes

The fourth research question that can be answered through data analytics is the following:
RQ4: How do the values of the categorical attributes of brokerage firms affect their other attributes?”
To answer RQ4, we will apply strip plots, violin plots, box plots, and statistical hypothesis test for the comparison of means.
Figure 9 shows a strip plot visualization and comparison of firms based on whether they have an email domain or not. The blue strip is for firms which do not have an email domain and the red plot is for those with an email domain. The Y axis shows “DaysSinceLicense” and each circle represents a brokerage firm. The value range for the red strip plot (firms with email domains) is much larger than the value range for the other group. Furthermore, there seems to be many observations within the red strip that have higher values than the observations in the blue strip. Our observations from Figure 9 call for the application of proper statistical test for the comparison of means of the two groups.
From the boxplot in Figure 10, we can also observe that the range and interval in the firms who have no “EmailDomain” are narrower, while the range and interval on the firms who have an email domain is larger. Figure 10 also displays the results of student’s t test with a value of p = 0.001 obtained, suggesting strong statistical support for the differences in means. In other words, firms with an “EmailDomain” name have higher values of “DaySinceLicense”.
Similar analysis is then conducted for other dependent variables, namely “NumberOfBrokers” and “RankingTotalPoint”, whose results can be found in the Supplementary Materials [54].

4.5. Effects of Numerical Attributes

The fifth research question that can be answered through data analytics is the following:
RQ5: How do the values of the numerical attributes of brokerage firms affect their other attributes?”
To this end, scatter plot and regression analysis are applied to answer this question.
The next scatter plot analysis is aimed at advancing a better understanding of relation between “DayasSinceLicense”, “NumberOfBrokers”, “NumberOfAwards”, “EmailDomain”, and “IsMobileNumber”, as the independent factors versus “RankingTotalPoint” as the dependent variable response. To this end, the following steps are carried out:
  • The mentioned five independent attributes and the dependent attribute are plotted in a scatter plot identify if any outliers exist.
  • Outlier analysis is carried out and outlier data points are removed before the regression analysis.
  • Regression analysis is carried out to fit an open formula that can represent the relation between independent factors and the dependent response.
Figure 11 shows a scatter plot of the selected attributes. We can observe a relationship between “RankingTotalPoints” and “DaysSinceLicense” being positive and somewhat linear. In Figure 11 we can observe that there is an outlier observation, represented by the yellow node on the top of the plot. Therefore, to be able to conduct a more reliable analysis, it is important to remove outliers.
Outlier analysis is carried out using the default “Local Outlier Factor” method, with 10% contamination, 20 neighbors, and the Minkowski metric. There were five outlier brokers that were identified, namely BETTER, DACHA, HAMPTONS, HOMEMATTERS, and A1. These brokers diverge from the rest of the market. The attribute values of these outlier brokers can be compared to the other inlier brokers to see why and how they diverge as outliers.
Linear regression conducted using “leave one out” option resulted in an R-squared value of 0.313, indicating that the total variation of the data explained using regression analysis is 31.3%. Although this is not nearly as high as one would hope to see, it can still be statistically significant. Further analysis using proper statistical tests is needed. The resulting regression equation is (1):
RankingTotalPoint = 51.0441 + 0.00439035 × DaysSinceLicense − 0.0483027 × NumberOfBrokers +
4.42648 × NumberOfAwards + 4.86925 × (EmailDomain = 0) − 4.86925 × (EmailDomain = 1) + 1.66099 ×
(IsMobileNumber = 0) − 1.66099 × (IsMobileNumber = 1)
Brokerage firms can use this regression formula to judge where they should be on the average and benchmark themselves without the need to collect any data. If a broker’s actual ranking total point is below what it is suggested by the regression function, then the brokerage firm can know that it should work harder towards the metrics.

5. Conclusions

After applying various studies, we developed an analytics workflow with a clear understanding of interpreting brokerage firms in the real estate sector. To summarize our main breakthroughs, it was necessary to clean our dataset and identify some relationships between our dataset’s attributes. Many patterns and trends were spotted after using our statistical methods that would help any reader or a real estate buyer to make decisions regarding choosing a broker. After developing our first scatter plot, we concluded that a real estate buyer would favor brokers with high-ranking points and with an also high number of awards. Moreover, a hierarchical clustering was made so we selected k = 4 as the number of clusters after going through our Silhouette scores. Furthermore, we used the multi-dimensional scaling (MDS) technique to represent complex, multi-dimensional data in a simplified two-dimensional Cartesian plane. This mapping method allows for a clearer visualization and easier interpretation of the data by reducing its dimensionality while preserving its inherent structure. In addition, we created more than one strip plot to show some comparisons that will help many real estate buyers to see the differences if different attributes were allocated. On top of that, we utilized the usage of boxplots to obtain different values of the p-value that will affect our statistical support. By using the Minkowski metric, we also used outlier analysis to see that we identified 5 outliers. It is worth noting that we used linear regression models and interpreted them by using test and score results.
The validity of our research could be affected by some threats, which can be addressed as the following:
  • It is necessary to note that this dataset was analyzed in May 2023 since there would be future changes in the market since then that could affect the credibility of our research. For instance, some offices might have received awards in the past year.
  • For some brokerage firms, some missing values that were not found in the original dataset such as an office phone number.
  • It is essential to acknowledge that the following study was based on Dubai real estate brokerage market and the results may vary based on the selected city.
  • There might be a chance that there are some real estate brokers who might not be included in our dataset which could also compromise the reliability of the research.
Future research on the topic can be towards the extension of the methodology to encompass other types of analysis and discover other insight types. Another line of possible future research is analyzing other brokerage markets and possibly analyzing multiple markets comparatively.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/buildings14103068/s1. Figure S1. Box plot and statistical test for differences in means. Figure S2. Box plot and statistical test for differences in means. Figure S3. Violin plot visualization, where x axis denotes whether firms have email domain or not, and y axis denotes the “NumberOfBrokers”. Figure S4. Box plot and statistical test for differences in means, as a complement to the analysis in Figure S3. Figure S5. Violin plot visualization, where x axis denotes “RankingTotalPoint”, and y axis denotes whether firms have “EmailDomain” or not. Figure S6. Box plot and statistical test for differences in means, as a complement to the analysis in Figure S5.

Author Contributions

Conceptualization, G.E.; Data curation, A.S.A.A., M.M.A.-B.A.H., M.Z.S.A. and A.S.H.A.; Formal analysis, A.S.A.A., M.M.A.-B.A.H., M.Z.S.A., A.S.H.A. and G.E.; Investigation, A.S.A.A., M.M.A.-B.A.H., M.Z.S.A., A.S.H.A. and G.E.; Methodology, A.S.A.A., M.M.A.-B.A.H., M.Z.S.A., A.S.H.A. and G.E.; Project administration, G.E.; Resources, G.E.; Software, A.S.A.A., M.M.A.-B.A.H., M.Z.S.A. and A.S.H.A.; Supervision, G.E. and T.G.L.; Validation, G.E. and T.G.L.; Visualization, A.S.A.A., M.M.A.-B.A.H., M.Z.S.A. and A.S.H.A.; Writing—original draft, A.S.A.A., M.M.A.-B.A.H., Mohamm—ed Aleissaee, A.S.H.A. and G.E.; Writing—review and editing, G.E. and T.G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets presented in this article are not readily available because of copyright restrictions of Dubai Land Department (RERA). To access the current database and construct an up-to-date dataset, readers should visit RERA website.

Acknowledgments

We thank Dubai Land Department (RERA) for publicly sharing market data on their website.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Tostevin, P.; Rushton, C. Total Value of Global Real Estate: Property Remains the World’s Biggest Store of Wealth, Savills, Impacts the Future of Global Real Estate, 2023. Available online: https://www.savills.com/impacts/market-trends/the-total-value-of-global-real-estate-property-remains-the-worlds-biggest-store-of-wealth.html#:~:text=The%20total%20value%20of%20the,a%20significant%20store%20of%20wealth (accessed on 1 April 2024).
  2. Grand View Research. Real Estate Market Size, Share & Trends Analysis: Report by Property (Residential, Commercial, Industrial, Land), by Type (Sales, Rental, Lease), by Region, and Segment Forecasts, 2022–2030, Report ID: GVR-2-68038-354-6, 2024. Available online: https://www.grandviewresearch.com/industry-analysis/real-estate-market# (accessed on 1 April 2024).
  3. Statistica, Market Insight—Financial—Real Estate Market—United Arab Emirates, 2024. Available online: https://www.statista.com/outlook/fmo/real-estate/united-arab-emirates (accessed on 1 April 2024).
  4. Khan, T. CBRE—Report—Intelligent Investment—2023 UAE Real Estate Market Outlook Mid-Year Review, 2023. Available online: https://www.cbre.ae/insights/reports/2023-uae-real-estate-market-outlook-mid-year-review (accessed on 1 April 2024).
  5. Zhang, W.; Li, B.; Roca, E. Moments and momentum in the returns of securitized real estate: A cross-country study of risk factors driving real estate investment trusts before and during COVID-19. Heliyon 2023, 9, e18476. [Google Scholar] [CrossRef] [PubMed]
  6. Dubai Land Department. Dubai Real Estate Market Attracts 5933 Brokers & 2285 Registered Brokerage Offices, 2017. Available online: https://dubailand.gov.ae/en/news-media/dubai-s-real-estate-market-attracts-5-933-brokers-2-285-registered-brokerage-offices#/ (accessed on 1 April 2024).
  7. Dubai Land Department. Licensed Real Estate Brokers. 2023. Available online: https://dubailand.gov.ae/en/eservices/licensed-real-estate-brokers/licensed-real-estate-brokers-list/#/ (accessed on 10 May 2023).
  8. Obinna, W.K.; Udo, M.J. Improving online Real Estate Management System using data analytics. J. Emerg. Technol. 2022, 2, 66–75. [Google Scholar] [CrossRef]
  9. Locke, S.L. Paying for a name? Comparing the performance of franchised real estate brokerage firms. J. Real Estate Financ. Econ. 2020, 61, 115–128. [Google Scholar] [CrossRef]
  10. Lewis, D. Optimal Economies of Scope in the Residential Real Estate Brokerage Industry. South. Univ. Coll. Bus. E-J. 2022, 7, 1. Available online: https://digitalcommons.subr.edu/cbej/vol7/iss1/1/ (accessed on 1 April 2024).
  11. Turnbull, G.K.; Dombrow, J. Individual agents, firms, and the real estate brokerage process. J. Real Estate Financ. Econ. 2007, 35, 57–76. [Google Scholar] [CrossRef]
  12. Stelk, S.; Zumpano, L.V. Can real estate brokers affect home prices under extreme market conditions? Int. Real Estate Rev. 2017, 20, 51–73. Available online: https://ideas.repec.org/a/ire/issued/v20n012017p51-73.html (accessed on 1 April 2024). [CrossRef]
  13. Asensio-Soto, J.C.; Navarro Astor, E. Proptech: A qualitative analysis of online real estate brokerage agencies in Spain. Intang. Cap. 2022, 18, 489–505. [Google Scholar] [CrossRef]
  14. Berggren, B.; Engström, R.; Kopsch, F.; Lind, H. The evolution of the real estate brokerage market: The case of Sweden. Int. J. Eng. Technol. Sci. Innov. 2019, 4, 16–32. Available online: https://ijetsi.org/more2019.php?id=3 (accessed on 1 April 2024).
  15. Rosenthal, S.S.; Strange, W.C.; Urrego, J.A. JUE insight: Are city centers losing their appeal? Commercial real estate, urban spatial structure, and COVID-19. J. Urban Econ. 2022, 127, 103381. [Google Scholar] [CrossRef]
  16. Barwick, P.J.; Wong, M. Competition in the Real Estate Brokerage Industry: A Critical Review. The Brookings Institution Publication 2019. Available online: https://www.brookings.edu/wp-content/uploads/2019/12/ES-12.12.19-Barwick-Wong.pdf (accessed on 28 June 2024).
  17. Kuc, B.R. Enhancing social responsibility and sustainability in real estate industry. Turk. J. Comput. Math. Educ. 2021, 12, 4999–5013. [Google Scholar]
  18. Javadpour, L.; Khazaeli, M. Business Intelligence in the Real Estate Industry and Effect of Data Analytics Adoption. In Proceedings of the IIE Annual Conference, Nashville, TN, USA, 30 May–2 June 2015; Proceedings. Institute of Industrial and Systems Engineers (IISE): Peachtree Corners, GA, USA, 2015; p. 2212. [Google Scholar]
  19. Singh, A.; Sharma, A.; Dubey, G. Big data analytics predicting real estate prices. Int. J. Syst. Assur. Eng. Manag. 2020, 11, 208–219. [Google Scholar] [CrossRef]
  20. Cajias, M. Can a machine understand real estate pricing?—Evaluating machine learning approaches with big data. In Proceedings of the 26th Annual European Real Estate Society Conference, Cergy-Pontoise, France, 3–6 July 2019. [Google Scholar] [CrossRef]
  21. Razen, A.; Brunauer, W.; Klein, N.; Kneib, T.; Lang, S.; Umlauf, N. Statistical Risk Analysis for Real Estate Collateral Valuation Using Bayesian Distributional and Quantile Regression. Working Papers in Economics and Statistics (No. 2014-12) 2014. Available online: https://www.econstor.eu/handle/10419/101094 (accessed on 1 April 2024).
  22. Lee, C.L.; Gumulya, N.; Bangura, M. The Role of Mandatory Building Efficiency Disclosure on Green Building Price Premium: Evidence from Australia. Buildings 2022, 12, 297. [Google Scholar] [CrossRef]
  23. Li, M.; Bao, Z.; Sellis, T.; Yan, S.; Zhang, R. HomeSeeker: A visual analytics system of real estate data. J. Vis. Lang. Comput. 2018, 45, 1–16. [Google Scholar] [CrossRef]
  24. Kok, N.; Koponen, E.L.; Martínez-Barbosa, C.A. Big data in real estate? From manual appraisal to automated valuation. J. Portf. Manag. 2017, 43, 202–211. [Google Scholar] [CrossRef]
  25. Zhuang, M.; Pan, W.T.; Shi, Z.; Zhou, Y.; Zhong, Z. Application of Data Mining Technology in Evaluating Real Estate Investment Plan Based on GRA-AHP. J. Phys. Conf. Ser. 2019, 1284, 012037. [Google Scholar] [CrossRef]
  26. Hu, Z.; Zhao, Z.; Rostami, M.; Ilievski, F.; Shbita, B. Demo: Knowledge Graph-Based Housing Market Analysis. In Knowledge Graph Construction 2021 (KGCW 2021), Proceedings of the 2nd International Workshop on Knowledge Graph Construction Co-Located with 18th Extended Semantic Web Conference (ESWC 2021), Online, 2–6 June 2021; 2021; Available online: https://ceur-ws.org/Vol-2873/paper4.pdf (accessed on 1 April 2024).
  27. Bodenbender, M.; Kurzrock, B.M.; Müller, P.M. Broad application of artificial intelligence for document classification, information extraction and predictive analytics in real estate. J. Gen. Manag. 2019, 44, 170–179. [Google Scholar] [CrossRef]
  28. Cheryshenko, M.S.; Pomernyuk, Y.Y. Integration of big data in the decision-making process in the real estate sector. IOP Conf. Ser. Earth Environ. Sci. 2021, 751, 012096. [Google Scholar] [CrossRef]
  29. Ge, X.J.; Du, Y. Main variables influencing residential property values using the entropy method–The case of Auckland. In Proceedings of the 5th International Structural Engineering and Construction Conference, Shunan, Japan, 2007; Available online: https://www.asres2007.um.edu.mo/papers/041%20-%20PAPER.pdf (accessed on 1 April 2024).
  30. Grybauskas, A.; Pilinkienė, V.; Stundžienė, A. Predictive analytics using Big Data for the real estate market during the COVID-19 pandemic. J. Big Data 2021, 8, 105. [Google Scholar] [CrossRef]
  31. Sandeep Kumar, E.; Talasila, V.; Rishe, N.; Suresh Kumar, T.V.; Iyengar, S.S. Location identification for real estate investment using data analytics. Int. J. Data Sci. Anal. 2019, 8, 299–323. [Google Scholar] [CrossRef]
  32. Sobieraj, J.; Metelski, D. Machine Learning Insights: Exploring Key Factors Influencing Sale-to-List Ratio—Insights from SVM Classification and Recursive Feature Selection in the US Real Estate Market. Buildings 2024, 14, 1471. [Google Scholar] [CrossRef]
  33. Shi, S.; Mangioni, V.; Ge, X.J.; Herath, S.; Rabhi, F.; Ouysse, R. House price forecasting from investment perspectives. Land 2021, 10, 1009. [Google Scholar] [CrossRef]
  34. Wang, S.; Lee, C.L.; Song, Y. The COVID-19 Sentiment and Office Markets: Evidence from China. Buildings 2022, 12, 2100. [Google Scholar] [CrossRef]
  35. Wratten, L.; Wilm, A.; Göke, J. Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat. Methods 2021, 18, 1161–1168. [Google Scholar] [CrossRef]
  36. Edwards, J.S.; Rodriguez, E. Remedies against bias in analytics systems. J. Bus. Anal. 2019, 2, 74–87. [Google Scholar] [CrossRef]
  37. Miklosik, A.; Kuchta, M.; Evans, N.; Zak, S. Towards the adoption of machine learning-based analytical tools in digital marketing. IEEE Access 2019, 7, 85705–85718. [Google Scholar] [CrossRef]
  38. Al Akasheh, M.; Eleyan, N.; Ertek, G. A Predictive Data Analytics Methodology for Online Food Delivery. In Proceedings of the 2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS), Milan, Italy, 29 November–1 December 2022; pp. 1–7. [Google Scholar] [CrossRef]
  39. Wanner, J.; Wissuchek, C.; Welsch, G.; Janiesch, C. A taxonomy and archetypes of business analytics in smart manufacturing. ACM SIGMIS Database DATABASE Adv. Inf. Syst. 2023, 54, 11–45. [Google Scholar] [CrossRef]
  40. Kahveci, S.; Alkan, B.; Mus’ab, H.A.; Ahmad, B.; Harrison, R. An end-to-end big data analytics platform for IoT-enabled smart factories: A case study of battery module assembly system for electric vehicles. J. Manuf. Syst. 2022, 63, 214–223. [Google Scholar] [CrossRef]
  41. Petrea, Ș.M.; Cristea, D.S.; Turek Rahoveanu, M.M.; Zamfir, C.G.; Turek Rahoveanu, A.; Zugravu, G.A.; Nancu, D. Perspectives of the moldavian agricultural sector by using a custom-developed analytical framework. Sustainability 2020, 12, 4671. [Google Scholar] [CrossRef]
  42. Ertek, G.; Kailas, L. Analyzing a Decade of Wind Turbine Accident News with Topic Modeling. Sustainability 2021, 13, 12757. [Google Scholar] [CrossRef]
  43. Schoening, T.; Köser, K.; Greinert, J. An acquisition, curation and management workflow for sustainable, terabyte-scale marine image analysis. Sci. Data 2018, 5, 180181. [Google Scholar] [CrossRef]
  44. Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
  45. Newburger, E.; Correll, M.; Elmqvist, N. Fitting bell curves to data distributions using visualization. IEEE Trans. Vis. Comput. Graph. 2022, 29, 5372–5383. [Google Scholar] [CrossRef] [PubMed]
  46. Min, S.H.; Zhou, J. Smplot: An R package for easy and elegant data visualization. Front. Genet. 2021, 12, 802894. [Google Scholar] [CrossRef] [PubMed]
  47. Casella, G.; Berger, R. Statistical Inference; CRC Press: Boca Raton, FL, USA, 2024. [Google Scholar]
  48. Boukerche, A.; Zheng, L.; Alfandi, O. Outlier detection: Methods, models, and classification. ACM Comput. Surv. (CSUR) 2020, 53, 1–37. [Google Scholar] [CrossRef]
  49. Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
  50. Rashad, A.S.; Farghally, M. The US monetary conditions and Dubai’s real estate market: Twist or tango? Int. J. Hous. Mark. Anal. 2023, 17, 1225–1242. [Google Scholar] [CrossRef]
  51. Shahapure, K.R.; Nicholas, C. Cluster quality analysis using silhouette score. In Proceedings of the 2020 IEEE 7th International Conference on Data science and Advanced Analytics (DSAA), Sydney, Australia, 6–9 October 2020; pp. 747–748. [Google Scholar] [CrossRef]
  52. Brath, R.; Jonker, D. Graph Analysis and Visualization: Discovering Business Opportunity in Linked Data; Wiley: Hoboken, NJ, USA, 2015; ISBN 9781118845844. [Google Scholar] [CrossRef]
  53. Gladwell, M. Outliers: The Story of Success; Back Bay Books: New York, NY, USA, 2011. [Google Scholar]
  54. Al Abdulsalam, S.A.; Al Hashemi, M.M.A.; Aleissaee, M.Z.S.; Almansoori, A.S.H.; Ertek, G.; Labben, T.G. Supplement to “A Novel Data Analytics Methodology for Analyzing Real Estate Brokerage Markets with Case Study of Dubai”. 2024. Available online: https://ertekprojects.com/ftp/supp/27.pdf (accessed on 28 June 2024).
Figure 1. The data analytics methodology developed and applied in the study.
Figure 1. The data analytics methodology developed and applied in the study.
Buildings 14 03068 g001
Figure 2. Scatter plot analysis of multiple attributes at once.
Figure 2. Scatter plot analysis of multiple attributes at once.
Buildings 14 03068 g002
Figure 3. Hierarchical clustering results.
Figure 3. Hierarchical clustering results.
Buildings 14 03068 g003
Figure 4. Parameters for k-Means clustering.
Figure 4. Parameters for k-Means clustering.
Buildings 14 03068 g004
Figure 5. Results of multi-dimensional scaling (MDS).
Figure 5. Results of multi-dimensional scaling (MDS).
Buildings 14 03068 g005
Figure 6. Strip plot for comparing clusters with respect to DaysSinceLicense.
Figure 6. Strip plot for comparing clusters with respect to DaysSinceLicense.
Buildings 14 03068 g006
Figure 7. A violin plot for comparing clusters with respect to NumberOfBrokers.
Figure 7. A violin plot for comparing clusters with respect to NumberOfBrokers.
Buildings 14 03068 g007
Figure 8. A violin plot for comparing clusters with respect to RankingTotalPoint.
Figure 8. A violin plot for comparing clusters with respect to RankingTotalPoint.
Buildings 14 03068 g008
Figure 9. A violin plot visualization, where the x axis denotes whether firms have an email domain or not, and the y axis denotes the “DaysSinceLicense”.
Figure 9. A violin plot visualization, where the x axis denotes whether firms have an email domain or not, and the y axis denotes the “DaysSinceLicense”.
Buildings 14 03068 g009
Figure 10. Box plot and statistical test for differences in means, as a complement to the analysis in Figure 9.
Figure 10. Box plot and statistical test for differences in means, as a complement to the analysis in Figure 9.
Buildings 14 03068 g010
Figure 11. Scatter plot of select attributes before regression.
Figure 11. Scatter plot of select attributes before regression.
Buildings 14 03068 g011
Table 1. Data attributes for the analyzed data [7].
Table 1. Data attributes for the analyzed data [7].
AttributeData TypeDescription
OfficeCode (Key Attribute)TextCustom created office code as ID attribute (to be used instead of OfficeNumber)
NameTextFull name of the office
TotalBrokersAllBranchesNumericalNumber of brokers in the whole firm, including all offices of that firm
NumberOfBrokersNumericalNumber of brokers in this particular office (not the whole firm)
RankingTotalPointNumericalTotal points for an office, calculated as the sum of Points attributes (out of 100)
PointsNumberofTransactionsNumericalPoints for number of transactions
PointsTransactionTotalWorthNumericalPoints for transactions total worth
PointsLegalNoticeNumericalPoints for legal notice
PointsComplianceWithLawsNumericalPoints for compliance with laws
PointsRealEstateExperienceNumericalPoints for real estate experience
PointsNumberOfBranchesNumericalPoints for number of branches
PointsLocalizationNumericalPoints for localization
PointsInitiativesNumericalPoints for initiatives
NumberOfAwardsNumericalNumber of awards that the office has received
HasEmailDomainBinary1 if the office has email domain of its own, 0 otherwise
IsMobileNumberBinary1 if the office has a mobile number listed as contact, 0 otherwise
LicenseIssueYearNumericYear in which license was issued by the Dubai Land Department
DaysSinceLicenseNumericNumber of days since license was issued by the Dubai Land Department
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al Abdulsalam, A.S.; Al Hashemi, M.M.A.-B.; Aleissaee, M.Z.S.; Almansoori, A.S.H.; Ertek, G.; Labben, T.G. A Novel Data Analytics Methodology for Analyzing Real Estate Brokerage Markets with Case Study of Dubai. Buildings 2024, 14, 3068. https://doi.org/10.3390/buildings14103068

AMA Style

Al Abdulsalam AS, Al Hashemi MMA-B, Aleissaee MZS, Almansoori ASH, Ertek G, Labben TG. A Novel Data Analytics Methodology for Analyzing Real Estate Brokerage Markets with Case Study of Dubai. Buildings. 2024; 14(10):3068. https://doi.org/10.3390/buildings14103068

Chicago/Turabian Style

Al Abdulsalam, Ahmed Saif, Maged Mohammed Al-Baiti Al Hashemi, Mohammed Zayed Sulaiman Aleissaee, Abdelaziz Saleh Husain Almansoori, Gurdal Ertek, and Thouraya Gherissi Labben. 2024. "A Novel Data Analytics Methodology for Analyzing Real Estate Brokerage Markets with Case Study of Dubai" Buildings 14, no. 10: 3068. https://doi.org/10.3390/buildings14103068

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop