VIETNAM NATIONAL UNIVERSITY, HANOI VIETNAM JAPAN UNIVERSITY NGUYEN THE HUNG USING BIG DATA TO CONSTRUCT THE RESIDENTIAL PROPERTY PRICE INDEX IN VIETNAM: THE CASE OF HO CHI MINH CIT
Trang 1VIETNAM NATIONAL UNIVERSITY, HANOI
VIETNAM JAPAN UNIVERSITY
NGUYEN THE HUNG
USING BIG DATA TO CONSTRUCT THE RESIDENTIAL PROPERTY PRICE INDEX IN
VIETNAM:
THE CASE OF HO CHI MINH CITY
MAJOR: PUBLIC POLICY CODE: ………
RESEARCH SUPERVISORS:
Dr Vu Hoang Linh
Hanoi, 2019
Trang 2i
TABLE OF CONTENTS
DECLARATION iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
LIST OF ABBREVIATIONS vi
LIST OF AND FIGURES AND TABLE viii
CHAPTER 1 INTRODUCTION 1
1.1 Background of the study 1
1.2 Rationale of the study 2
1.3 Aims and objectives of the study 6
1.4 Research instrument 6
1.5 Structure of the study 7
CHAPTER 2 LITERATURE REVIEW 8
2.1 1 The Handbook on Residential Property Price Index 8
2.1.1 Median/mean transactions price 8
2.1.2 Stratification or Mix adjustment 9
2.1.3 Repeat-sales 10
2.1.4 Hedonic method 10
2.2 The previous residential property price indexes 12
2.2.1 RPPI of Ireland 12
Trang 3ii
2.2.2 RPPI of Austria 13
2.2.3 RPPI of Malta 14
2.2.4 RPPI of Thailand 15
2.2.5 RPPI of Indonesia 16
2.2.6 RPPI of Savills Vietnam 17
CHAPTER 3 DEVELOPING RPPI IN VIETNAM, THE CASE OF HO CHI MINH CITY 19 3.1 The overview of real estate transaction in Vietnam 19
3.2 The data sources on real estate price in Vietnam 20
3.3 Building big data for RPPI calculating 22
3.4 Calculating RPPI for apartment in Ho Chi Minh City 26
CHAPTER 4 FINDINGS AND DISCUSSIONS 34
CHAPTER 5 POLICY IMPLICATION AND FURTHER STUDY 36
5.1 Policy implication 36
5.2 Further study 38
CHAPTER 6 REFERENCES 40
Trang 4Signature
Trang 5iv
ACKNOWLEDGEMENTS
No one can achieve anything without the help of others This thesis could not be completed without priceless assistances of many people I would like to express
my gratitude to all of them
Firstly of all, I would like to express my deepest thanks of gratitude to my respectable supervisor, Dr Vu Hoang Linh for his friendly and sympathetic assistance and dedicated involvement throughout the process of this thesis With profound knowledge and experience, he helped me improving my research Without his instructions, the thesis would be undone
Secondly, I would also like to be grateful to all my dear professors, JICA experts in Vietnam Japan University who conveyed to me numerous courses and knowledge and classmates of the Master of Public Policy, for their helpful as well as practical suggestions I will keep in mind all the memories that we had during my time at Vietnam Japan University
Last but not least, I also own a great debt of gratitude to my family and friends for their immeasurable support bot all my degree and in this arduous process of this
study
Trang 6v
ABSTRACT
Calculating real estate price index is one of the major challenges for statistical agencies around the world However, the need for tools to monitor the real estate market is essential from all levels from micro to macro management Therefore, statistical agencies of some countries in the world and some real estate companies like Savill Vietnam have built their own methods based on their actual conditions
to calculate this index Thus, it might be impossible to compare the results
Recently, international statistical organizations have jointly published a manual to guide the general methodology for calculating this indicator
In addition, the development of information technology has also brought many new tools to serve economic management including big data sources
This study attempts to develop the residential property price index (RPPI) in Vietnam with specific in the apartment market in Ho Chi Minh City using big data from property advertisement web portals as a prototype The hedonic regression method is used to calculate this index
The research results show that the calculation residential property price index from big data source is completely feasible and that is suggestions for using big data to calculate other statistical indicators
Keywords: Big data, Hedonic Regressions, Ho Chi Minh City apartment, Residential Property Price Index, web crawler.
Trang 7vi
LIST OF ABBREVIATIONS
ABS: Australian Bureau of Statistics
API: Application Programming Interface
BDP: Big data processing
BI: Bank of Indonesia
CSO: Central Statistics Office of Ireland
Eurostat: The statistical office of the European Union
GDP: Gross Domestic Product
GRDP: Gross Regional Domestic Product
GSO: The General Statistics Office of Vietnam
HoREA: Ho Chi Minh Real Estate Association
ILO: International Labor Organization
IMF: International Monetary Fund
MAD: Median absolute deviation
MPD: Mobile position data
NER: Named Entity Recognition
OECD: The Organisation for Economic Co-operation and Development RPPI: Residential Property Price Index
Trang 8vii
RFID: Radio Frequency Identification
SDGs: Sustainable Development Goals
SBV: State Bank of Vietnam
UNECE: The United Nations Economic Commission for Europe WB: The World Bank
Trang 9viii
LIST OF AND FIGURES AND TABLE
List of figures
Figure 1.1 Five characteristics of Big data 4
Figure 3.1 The house selling/ buying flow in Vietnam 19
Figure 3.2 The Flow of building database 24
Figure 3.3 Map of apartments advertised in Ho Chi Minh city 25
Figure 3.4 Extract data fields from advertisements 26
Figure 3.5 Distribution of Price 29
Figure 3.6 RPPI_aparment of Hochiminh City with Jan, 2018 is reference 33
List of Tables Table 3.1 Summary statistics of database 28
Table 3.2 Dummy Hedonic Regression result 30
Table 3.3 RPPI_apartment in Ho Chi Minh city with Mar,2018 is reference 32
Table 3.4 RPPI_apartment in Ho Chi Minh city with Jan,2018 is reference 32
Trang 10ix
Trang 111
1.1 Background of the study
Fluctuations in housing prices have important impacts on substantial economy
In the period 2007-2009, the bubble of real esates in Vietnam led to marcoeconomic instability likes high inflation, trade deflicit and affected economic growth More seriously, there has been a considerably increase in housing price and its reversal in the United States which resulted during the global financial crisis
As an asset price in the measurement of inflation, property price becomes an important leading indicator of economy‟s dynamic since investment in a property sector is a long-term type of investment Property statistics could provide an early sign of economic cycle movement Rising of property prices often leads to an expansionary phase period (boom) whereas falling of property prices indicate a contractionary phase (bust) (Eurostat, 2013)
The requirement for suitable indexes enabling one to record changes in real estate prices with precision was extremely crucial in such good conditions Not only does this assist policy makers but also market participants seeking the time when housing prices hit either bottom or top
Thus, it is necessary to develop housing price indexes that can adequately capture housing market trends
The development of such indexes, however, is not an easy task due to the fact that residential properties are heterogeneous in terms of their structural characteristics such as location, size, and facilities Each of the house consisting
of the location, maintenance and the appliances has different and distinguished characteristics in various degrees, so there are no two houses that are identical in
Trang 122
terms of quality Even if the location and basic structure maintain an equivalence at two periods of time, ages or quality of the buildings and the houses are not the same through out the time due to renovations and depreciation of the structure Furthermore, houses are infrequently sold, meaning that the limited frequency of transaction data available so that it is very difficult to apply the “like with like” method to house pricing as the method of other price index as consumer price index or producer price index
Consequently, the development of the housing price indexes was one of the most difficult tasks for national statistical agencies in terms of methodology However, as nations need indicators to help reflect the real estate continuance in macro management of economy, they go on constructing various methods in order to calculate the property prices indexes based on their actual context Thus, it might be impossible to compare the results
1.2 Rationale of the study
Recently, many researchers and international statististical agencies have developed the methods of compiling appropriate residential real estate price
indexes The Handbook on Residential Property Price Index, co-ordinated by
Eurostat, ILO (International Labor Organization), IMF (International Monetary Fund), OECD (the Organization for Economic Co-operation and Development), UNECE (The United Nations Economic Commission for Europe ), WB (the World Bank) was published This coordination produced the very first multidimension abstract of the conceptual and practical issues, which is related
to the collection of residential and real estate property price indexes Moreover, this book also offers guides on the collection of house price indexes and helps increase international comparative study on residential property price indexes as well Based on this document, each country develop its own way to calculate the index which are pretty adaptable with the practical situation
Trang 133
In Vietnam, the real estate is crucial for the economic growth with more than 14 trillion US dollar turnover per year (Dragon Capital, 2017), but up to now, the real estate indicator system reflecting on its position in the picture of the whole economy including price index is very poor Until the present, related to the real estate price indicator, only the real estate service provider named Savill is publishing the housing price index, whose methodology and data sources are unclear for the public Thus, it is major utilised in Savill‟s business purpose without the other researchers or policy makers
Consequently, system of appropriate indexes for real estate is necessary for policy makers, analysts, and financial instituttions to have deeper knowledge on the real estate market and financial market as well as to monitor impacts on Vietnamese economy and the health of the financial market
In recent time, information technology has grown very quickly and has created an
extra-large amount of digital data known as “Big Data” – “a term that describes the large volume of data- both structured and unstructured - that inundates a business on a day-to-day basis” (the SAS, Inc)
It is simple that big data is more abundant and complicated data compared to the basic data such as survey data Especially, new data sources might originate from the Internet, the satellite, post in the social network, etc This kind of data could be from the different forms such as images, sounds These packs of data are so extremely resilient that basic data software cannot analyze them However, these data can be used to solve all sources of your problems that you were not able to get rid of before Big data has five important characteristics known as 5Vs as the figure:
Trang 144
Figure 1.1 Five characteristics of Big data
(Source: Yuri Demchenko)
As can be seen from the figure above, there are 5Vs of Big Data including Volume, Velocity, Value, Veracity, and Variety
The first is that Volume is defined as the large sources of data generated every
millisecond It could be all the emails, messages, pictures, clips, memes, etc., comprised With big data scientist, these data will be stored and widely used with the help of the distributed systems in different locations
Trang 155
In term of Velocity, this phenomenon means how fast the new data generated
and spread around For example, online messages, which spread viral instantly,
or the captured photos by the satellite, transferred the data processing software Therefore, big data technology allows scientists to simultaneously analyze the data while it continue generating without putting it into the databases
Next, Variety refers to the divergent kinds of data, which have been used In the
past, basic data are said to focus on how to fit into the assumptions, figures, tables or relational databases, such as financial data With big data technology, there will be multiple types of data (structured and unstructured) including messages, social media conversations, photos, sensor data, video or voice recordings
The following term is Veracity identified as the trustworthiness of the data
Many forms of big data like posts on Twitter, Facebook, Instagram, etc., consisting of false information, distorted news which might be unreliable, inaccurate, and less controllable However, the analytical software for big data is able to help users to cope with these data Hence, a large database often create
the issues of lacking quality or accuracy Finally, it can be believed that Value is
one of the most noticeable Vs in this figure Importantly, businesses make a business decisions in any attempt to gather and process their databases Clearly,
it is possible for the users to be trapped into the buzz trap if they start using big data without a clear analysis of cost and benefit
Besides requiring new data processing and management methods, big data offers several benefits to a number of users such as the government bodies, enterprises, researchers in their fields
It is more necessary for statistics, academic institutions if they can create new measurement, bridging the lagging time of existing statistics and provide an advanced source of data to produce official statistics Those advantages and
Trang 166
disadvantages of big data prompted statistical institution to be more careful when implementing big data as new source of official statistics As mentioned by Hammer et al (2017) “Big Data offers opportunities, challenges, and implications for official statistics that compilers and users of statistics need to be aware of when they start to incorporate big data into their work plan to the extent relevant.”
1.3 Aims and objectives of the study
The overarching goal of this research is to calculate the residential property price index in Ho Chi Minh City especially for apartment sector from big data source
For the achievement of the overall aims, this research seeks out for the following specific objectives:
(1) Buiding a tool to crawl real estate advertisments from the Internet
(2) Extracting information from advertisments to build up real estate database (3) Calculating the residential property price index in Ho Chi Minh City for apartment sector
(iii) The Big data processing (BDP) to extract information from the real estate advertisements
Trang 177
1.5 Structure of the study
The study comprised of three parts: Introduction, Development and the Conclusion with five chapters
Chapter 1: Introduction presents the background, rationale, aims and
objectives, methods, and design of the study
Chapter 2: Literature Review is intended to give some theoretical background
related to calculate residential property price index
Chapter 3: Developing RPPI in Vietnam: a case of Ho Chi Minh City deals
with research governing orientation, research methods and presents the situation analysis, data collection instruments, data collection procedures and calculate RPPI The detailed results of the database and a comprehensive analysis on the data collected are focused
Chapter 4: Findings and Discussions shows major findings and discussions for
residential housing price index in HCM
Chapter 5: Conclusion and Implications summarizes the issues addressed,
recapitulates the research procedure, and further makes recommendations for the act of using Big Data in producing indicator and measurements in the following research
Trang 188
Like any other price indexes, house price index measures changes in the average price of properties reflecting changes in the quality-mix of properties transacted over two periods of time There are many areas of society where individuals or organizations use residential property price indices (RPPIs) directly or indirectly either to influence practical decision making or to inform the formulation and conduct of economic policy, (OECD, 2013), so that this kind of index have a number of important uses
Before 2013, international statistical institutes like IMF, UNSD, ect have not just publiced the muanual as the international guide-line for this index, so that each country has different methods depending on their actual context, and it might be
impossible to compare the results After the Handbook on Residential Property Price Index was publiced in 2013 by Eurostat and some major international
statistics organizations , some countries have developed their index base on this guide lines
2.1 1 The Handbook on Residential Property Price Index
This handbook is the first comprehensive overview of conceptual and practical issues related to the compilation of price indexes for residential properties and also provide international guidance on the compilation of house price indexes and
to increase international comparability of residential property price indexes According to this documents, there are four methods can be applied to calculate this index
2.1.1 Median/mean transactions price
Using the indicators of the main inclination from the distribution of housing price for purchased houses during the period is one of the easiest way to
Trang 199
calculate house prices As residential property price distributions are often mingled (mainly representing the repercussion of the varied and mixed traits of houses, the correlation in income distributions and the zero lower is positive, bounding on transaction prices), this is more preferable to use the median than the mean In addition, because there is no need for data on housing traits, but only the house‟s size or location of the house to measure median or mean, a number of prices can be applied
However, the inevitable obstacle of this method is that it is bounded by the buckle of „compositional‟ factors The compositional factors contain the amount of the sales of real estate within specific price level If there is any report about the selling of low-value properties in the area in a monnth, and few of the higher-value properties in that area, this created an implication that there has been a decline in the median or average Nevertheless, most sales of the following month in that area may be in superior properties (i.e., higher values); then this case may imply that the median and average price experienced an increase even the actual overall values may have dropped Compositional change and seasonality may be counted as influential factors on median prices Therefore, it is not likely that the samples of observed transactions can be judged as random Although the median prices are widely used, many other methods are being used in multiple countries to tackle the problem of compositional changes and gain better measures of housing prices
2.1.2 Stratification or Mix adjustment
By the application stratification to establish a mix-adjusted measure of house prices is one of the effective means to control the modifications of the sold mix-properties It is reported that this method is widely exerted by the Australian Bureau of Statistics (ABS) to support the caculation of their house prices Measures with compiling adjustment have been stated to be used in different
Trang 2010
countries like Canada, Germany and the United Kingdom It should be notified that there are variations in the approaches applied in specific condition of each country because of the diversity in characteristics of housing markets in different areas Using this method, some small regions (e.g., suburbs) are accumulated into bigger regions and then they compared average of price changes in these bigger regions Another method is to use the stratificatio of price on the basis of the compositional change between lower- and higher-priced suburbs It is obvious that this method can prove highly effectiveness in minimizing the impacts of compositional change For instance, at any time , sold real estate can be classified into groups (or strata) , according to the long run median price of their respective suburbs The mix-adjustment measure of the city-wide price average changes, then calculated as the average changes in the medians for each group
2.1.3 Repeat-sales
Instead of taking each transaction‟s price level as a focus, this method depends on the modifications of the price of real estate properities, which are sold more than once It aims to figure out the same component in the price modification, over a specific period of time However, a hindrance of this repeating sales method is that it is only able with the figures from the transactions which involve real estate with a record of the previous sales Another limitation is that estimating price modications in aspecific period of time, a quarter, it will continue to be rennovated , based on ther sales, occurred
in subsequent quarters
2.1.4 Hedonic method
In order to repeat the sales method, hedonic regression model will also been used by researchers, and some countries like United Kingdom and United States, which have applied this method as one of the official measures The hedonic
Trang 2111
approach defines properties as some bundle of characteristics such as the determining ones (house‟s size, bedrooms, location, etc) And the key idea in which they are “bounded” is that the characteristics are separated and no price in the market for each characteristic, only the characteristic for the house, structure and land, as a whole Therefore, a hedonic method can take account of changes
price-in the composition of transactions price-in each period In prprice-inciple, it can also control quality improvements, although the possiblility and feasibility of the method lie
on the sufficience and availibility of data on characteristics of residential properties
This handbook also provides three hedonic approaches: (i) the hedonic time dummy approach, (ii) characteristics approach, and (iii) imputation approach This follows previous literature in this area including Triplett (2006), Diewert, W.E., S Heravi and M Silver (2009) and Hill (2010) A problem is that there are many subtitute apparatus for each approach, depending on the estimated period
of hedonic coefficients, characteristics, and weights are held constantly; whether double or single imputation will be used for either prices or weights; a direct of indirect formation will be used; chained, rolling window or fixed baskets of characterics, and more
Each method requires each data source and has both advantages and drawbacks; therefore, countries rely on their real situation to chose the most reasonable approaches According to the Bank of International Settlement, 41/58 countries
in the world use Hedonic method to calculate their RPPIs with data contain characteristics of residential property
Trang 22The hedonic method used for the RPPI uses a log-linear functional form The equation is as follows:
ln (𝑝𝑖𝑡) = 𝑥𝑖𝑡𝛽 + 𝛿𝑡𝐷𝑡 + 𝜇𝑖𝑡where:
𝑝𝑖𝑡 is the price of dwelling i in period t
Trang 2313
𝑥𝑖𝑡 is a vector of explanatory variablesof dwelling i in period t: Total floor area (m2), Dwelling type (semi-detached/ detached/ terraced), Eircode routing key and Deprivation Index.
𝛽 is a vector of explanatory price coefficients
𝛿𝑡 is a vector of time period coefficients
𝐷𝑡 is a „time dummy‟ (value=1 if in time period t, otherwise 0)
𝜇𝑖𝑡 is an error term
When the regression is applied to a pool of data covering multiple time periods, the time coefficient 𝛿𝑡 can be derived for each period (except the reference period, typically the first period, where 𝛿1 = 1)
For any two successive time periods, t-1 and t, the antilog of 𝛿𝑡 divided by the antilog of 𝛿𝑡−1 provides an estimate of the aggregate quality-adjusted house price change that has occurred (i.e the change in house prices after changes in the various known explanatory variables have been accounted for)
Thus the index for period t is given by:
𝐼𝑡 = 𝑒𝜕 𝑡
𝑒𝜕 𝑡−1 × 𝐼𝑡−1
Where:
𝐼𝑡 is the index in period t
𝐼𝑡−1 is the index in period t-1
2.2.2 RPPI of Austria
In Austria, they use big data as the data source for RPPI also The RPPI relies on quotation prices or asking price, which includes the evaluation of prices, combined with “structural” (specific-price) After the evaluation, about 10,000 samples are available each year In the advertisements, they extract the
Trang 2414
data to calculate RPPI for single family house and Condominiums for Vienna and whole country The “Location” (specific-address) variables are used to modify the spatial differences of the concencus level The calculation used a hedonic regression model with a fixed structure over time This approach may produce biased estimates if the effects of the variables change over time The RPPI for Austria excluding Vienna was composed of the index for condominiums and the index for single-family houses at a ratio of 70% to 30%, with the aggregated index for condominiums comprising the index for new condominiums and that for used condominiums at a ratio of 12.7% to 87.3%
2.2.3 RPPI of Malta
In Malta, they collect advertisements for the sale of properties in newpapers The property include flats, maisonettes, both in shell and in finished form, together with terraced houses, townhouses, house of character and villas They also use Hedonic regression for calculating this index
From the aforementioned classification, the author had insisted that the method is mostly deemed as the first priority methodology to measure the changes of the real estate price By this method, a number of house‟s traits, which affected the prices were analyzed For instance, the impact on the housing price can be calculated and monitored This would allow the calculation of the author‟s preffered index for a price modification to get a constant pack of housing characteristics, in other words, a land property price index.To apply this method, the data must consist of sufficient statistic of housing traits and observations
Trang 2515
2.2.4 RPPI of Thailand
In Thailand, they publish monthly this index with the volume of house price indices, consisted of 4 measurements as follows: private house with land, with land, apartment, and land
The dataset for compiling RPPI comes from the structured big data source
of Commercial Bank Mortgage Loan
The indices were collected and measured from 17 most common commercial bank‟s mortgage loan in Bangkok and proximate area (Bangkok, Samut Prakan, Nonthaburi, Pathum Thani, Nakhon Pathom and Samut Sakhon) Single detached house with land, town house with land and condominium price indices were compiled by using Rolling window and time dummy hedonic regressions method (3-month moving average), controlling 4 housing characteristics (age, storey, entrepreneur and distance to metropolitan transportation services such as the sky train, the underground and express way)
ln 𝑃𝑡 = ln(𝑃0)+ 𝛽 𝑘
6
𝑘=1
𝑋𝑘+ 𝛼 𝑗 𝑇𝐷𝑗12
- The distance of the center of the district to the nearest motorway;
- The distance of the center of the district to the nearest Chao Praya express boat;
Trang 26Big data preparation and extraction from web portals server are processed using virtual machines and Hadoop Software with approximately 2.2 million ads every month
For RPPI compilation, they only included first instance of listings for each property at a unique price in each month These data are individual listings with details on asking price, property type, lot size, dwelling size, number of bedrooms, number of bathrooms, address, and additional characteristics which are recorded as “free-text” (such as garage, gated property, swimming pool) They use time dummy Hedonic methods, since the price index can immediately
be derived from the estimated time dummy regression coefficients to estimate property prices based on its characteristics and utilize all the information available