If you've been checking the news
recently, you'll have seen
a lot
of articles
referencing shifting real estate trends around the world. The premise of many of these articles is that
expensive, high-class office buildings that tend to be flagships of metro areas, are becoming
less and less popular. As interest rates rise and work-from-home continues to become more
common, one can only wonder how these real-estate trends will evolve from here, and the impact
these trends might have on property owners.
At Rearc, we want to do more than just wonder. We want to use the data and tools we have at our
disposal to help our partners and clients make informed decisions about their properties.
Recently, for the Databricks Data + AI Summit 2023,
we decided to zoom in on San Francisco, the host city, and build a product that showcases the
power of the Rearc Data Platform, Databricks, and Delta Sharing in the context of these trends.
What if sentiment analysis, locally relevant economic metrics, reports of closing businesses,
and more could be utilized, not just to remind us of an uncertain industry, but to help
property owners make better decisions about their commercial properties and tenants?
Thanks to a vast amount of privately/publicly available data, as well as the remarkable
capabilities of Databricks, we've been able to create a pilot
default risk score.
What We Built
The results of our analysis are presented below. Here you can interact with a selection of
properties across San Francisco. You'll find an assortment of
descriptive information, including the property's address, the name of the business that
occupies the property, and risk scores computed according to each of the following categories:
Financial Risk: Assessing the tenant's financial well-being as well as broader economic factors such as interest rates, GDP growth, and sector employment trends.
Sentiment Risk: Evaluating sentiment and perception related to the property and tenant.
Geographic Risk: Considering geography-specific risks like local crime, as well as weather events, earthquakes, etc.
Tenant/Site Risk: Analyzing tenant history, including payment reliability and tenure.
Overall Risk: A weighted composite score of the above metrics.
Each of these risk scores falls between 0 and 1, where the risk score is intended to communicate the likelihood that a
given tenant/property will default. The higher-risk properties are indicated by a darker shade of red, while relatively
safe properties are represented with green, with varying shades of yellow covering the properties in between.
Why Understanding Property Risk Matters
Understanding property risk is crucial for property owners and investors as it enables them to make informed decisions
about their commercial properties and tenants. The applications of this analysis stretch far and wide, but here a few
examples of how property risk analysis can be used to inform decision-making:
Mitigating Financial Loss: By assessing property risk, owners can identify potential risks and take proactive
measures to mitigate them. This can help prevent financial loss due to factors such as
tenant default, economic
downturns,
or changes in market conditions.
Optimizing Tenant Selection: Property risk analysis provides insights into the financial stability and reliability
of tenants.
By understanding the risks associated with prospective tenants, property owners can make better-informed
decisions when selecting tenants, reducing the likelihood of tenant default and related financial implications.
Identifying Market Trends: Property risk analysis involves evaluating broader economic factors, market trends, and
sentiment related to the property and tenant. This
information helps property owners stay abreast of changing market dynamics, such as shifts in demand, emerging
business trends, or changes in perception that may impact the property's value or attractiveness.
Ensuring Long-Term Sustainability: By proactively managing property risks, owners can ensure the long-term
sustainability and profitability of their investments. Understanding the risks associated with a property enables owners
to implement preventive measures, contingency plans, and risk mitigation strategies to safeguard against potential
threats and maintain the property's value over time.
In summary, understanding property risk is vital for owners seeking to navigate the evolving world of real estate.
However, a comprehensive property risk solution requires more than just understanding the importance of risk analysis.
It necessitates the sourcing, transformation, and harmonization of a diverse data landscape. This is where Rearc's
capabilities shine. With our expertise in all stages of the data life cycle, we make it our mission to minimize the
complexities of this process.
How We Built It
Obtaining the Data
The Rearc Data team has a robust data platform built on top of
Apache Airflow which we have used in collaboration with
multiple partners to deliver a variety of complex data requests over the years, and we were
already sourcing and publishing data to Unity Catalog
from several of the sources used in this analysis, including the Bureau of Labor Statistics,
the Bureau of Economic Analysis, and the
Federal Reserve.
We also used data from the
San Francisco Government Open Data project. From this source, we extracted building
footprints, local business information, and other locally relevant datasets which we were
easily able to assimilate into our workflow with our Data Platform and Delta Sharing. To use these, in addition to our
own data, we simply load the delta files from Unity Catalog using the Databricks notebook interface.
## Load Rearc Datasetsinterest_rates = sqlContext.sql("SELECT * FROM rearc_catalog.fs_federalreserveboard.frb_h15")sector_employment = sqlContext.sql("SELECT * FROM rearc_catalog.stat_bls.bls_employment_national_data_monthly")metro_gdp = sqlContext.sql("SELECT * FROM rearc_catalog.stat_employ_usa.employ_usa_gdp_by_county_metro_yearly_bea")## Load San Francisco Open Databuildings = sqlContext.sql("SELECT * FROM rearc_catalog.stat_lnd.lnd_usa_sanfrancisco_building_footprints_static_sfgov")businesses = sqlContext.sql("SELECT * FROM rearc_catalog.stat_lnd.lnd_usa_sanfrancisco_registered_businesses_static_sfgov")
Finally, we incorporated GDELT, a news "firehose" dataset, into our analysis.
GDELT is a very large dataset (more than 8 trillion datapoints!) that indexes almost every
news item in the world. This would be quite difficult to source using standard methods, but
the data can be
found and accessed via the Databricks Marketplace.
For this project, we heavily utilized Databricks, a unified
analytics platform that integrates Apache Spark and provides collaborative tools for
processing and analyzing large-scale data. In addition, the Databricks Marketplace
includes a wide assortment of datasets to add to our analysis. Because we have existing
pipelines that publish our data to Databricks, this tool was a natural choice for the data
collection phase of this work. By pulling data from Rearc's data catalog using the
Marketplace and Delta Sharing,
we gain seamless access to diverse and up-to-date data sources, greatly accelerating our
analysis.
Generating Risk Scores using Databricks
The goal in this product is to show how Databricks and Delta Sharing can help estimate a
property risk score. With
this score, we want to provide property owners with valuable insights to inform their decision-making processes.
Because we are already using Databricks to centralize all of our data, we decided to continue to use its
capabilities (particularly notebooks, Spark, and SQL) to generate the risk scores, and we will walk through the process below.
1. Incorporating Historical Features
Historical features such as interest rates, GDP growth, and others were crucial in our risk
analysis. To incorporate trends, we utilized a combination of aggregation and time-series
methods, allowing us to capture important historical patterns and their
potential impact on property risk.
2. Handling GDELT Data
GDELT, a large-scale dataset, presented challenges due to its size. However, utilizing Databricks' capabilities with
the below code, we were able to efficiently scan the GDELT database for news items relating to the businesses we found
in San Francisco.
gdelt_extract =( spark.sql("""
SELECT DATE, TONE, EXPLODE(SPLIT(ORGANIZATIONS, ';')) AS organization
FROM `external_shares_gdelt`.`<user_catalog>`.`gkg_v1_daily`
""").where(col('organization').isin(companies_list)).toPandas())
Additionally, we created two sentiment scores: visibility (measuring the level of
recognition for a company on a scale of 0 to 1) and perception (evaluating the positive or
negative perception of a company). See the below plot for an example from March 2023.
3. Creating Synthetic Tenants/Properties Data
In addition to the wealth of publicly available data, many property owners also store their own internal data about the
histories of their tenants and/or properties. To demonstrate how property owners could utilize their own data to enrich
this solution, we generated synthetic data which includes:
Tenant Data (Tenure, Missed Payments, Complaints, etc.)
Property Data (Age, Size, Monthly Rent, Occupancy Rate, etc.)
4. Scaling and Harmonizing Data
To ensure consistency across different risk categories and datasets, we applied scaling and
harmonization techniques. These methods allowed us to normalize and standardize the data,
facilitating a comprehensive assessment of property risk.
Conclusion
By harnessing the capabilities of the Rearc Data Platform, Databricks, and Delta Sharing, we can provide
property owners and investors around the world with the tools they need to make informed
decisions. Our data-driven risk analysis facilitates proactive risk management and enables
individuals to navigate the rental property market with confidence.
Our ability to help your data endeavors doesn't stop here though. Every day in the data world is marked by transformative advancements like
Generative AI, machine learning, data marketplaces, data clean rooms, and the list of cutting-edge technologies goes on.
The data landscape is rapidly undergoing massive changes, and Rearc is a partner that is poised to support organizations
on their journey to, not simply survive these changes, but to thrive in them. Our capabilities and experience enable us
to quickly adapt in these shifting tides, and we'd like to help our partners do the same, allowing them to harness the
full potential of data-driven insights.
More information about this project can be found at our Product Page. Additionally, if you would like access
to the content and slides we presented at the Databricks Data + AI Summit 2023, or if you would like
to know more about how Rearc Data can help you advance your data capabilities, fill out the form below and we'll be in touch!