Blog

Property Risk Analysis Pilot Using Databricks

If you've been checking the news recently, you'll have seen a lot of articles referencing shifting real estate trends around the world. The premise of many of these articles is that expensive, high-class office buildings that tend to be flagships of metro areas, are becoming less and less popular. As interest rates rise and work-from-home continues to become more common, one can only wonder how these real-estate trends will evolve from here, and the impact these trends might have on property owners.

At Rearc, we want to do more than just wonder. We want to use the data and tools we have at our disposal to help our partners and clients make informed decisions about their properties. Recently, for the Databricks Data + AI Summit 2023, we decided to zoom in on San Francisco, the host city, and build a product that showcases the power of the Rearc Data Platform, Databricks, and Delta Sharing in the context of these trends.

What if sentiment analysis, locally relevant economic metrics, reports of closing businesses, and more could be utilized, not just to remind us of an uncertain industry, but to help property owners make better decisions about their commercial properties and tenants? Thanks to a vast amount of privately/publicly available data, as well as the remarkable capabilities of Databricks, we've been able to create a pilot default risk score.

What We Built

The results of our analysis are presented below. Here you can interact with a selection of properties across San Francisco. You'll find an assortment of descriptive information, including the property's address, the name of the business that occupies the property, and risk scores computed according to each of the following categories:

  • Financial Risk: Assessing the tenant's financial well-being as well as broader economic factors such as interest rates, GDP growth, and sector employment trends.
  • Sentiment Risk: Evaluating sentiment and perception related to the property and tenant.
  • Geographic Risk: Considering geography-specific risks like local crime, as well as weather events, earthquakes, etc.
  • Tenant/Site Risk: Analyzing tenant history, including payment reliability and tenure.
  • Overall Risk: A weighted composite score of the above metrics.

Each of these risk scores falls between 0 and 1, where the risk score is intended to communicate the likelihood that a given tenant/property will default. The higher-risk properties are indicated by a darker shade of red, while relatively safe properties are represented with green, with varying shades of yellow covering the properties in between.

Why Understanding Property Risk Matters

Understanding property risk is crucial for property owners and investors as it enables them to make informed decisions about their commercial properties and tenants. The applications of this analysis stretch far and wide, but here a few examples of how property risk analysis can be used to inform decision-making:

  • Mitigating Financial Loss: By assessing property risk, owners can identify potential risks and take proactive measures to mitigate them. This can help prevent financial loss due to factors such as tenant default, economic downturns, or changes in market conditions.
  • Optimizing Tenant Selection: Property risk analysis provides insights into the financial stability and reliability of tenants. By understanding the risks associated with prospective tenants, property owners can make better-informed decisions when selecting tenants, reducing the likelihood of tenant default and related financial implications.
  • Identifying Market Trends: Property risk analysis involves evaluating broader economic factors, market trends, and sentiment related to the property and tenant. This information helps property owners stay abreast of changing market dynamics, such as shifts in demand, emerging business trends, or changes in perception that may impact the property's value or attractiveness.
  • Ensuring Long-Term Sustainability: By proactively managing property risks, owners can ensure the long-term sustainability and profitability of their investments. Understanding the risks associated with a property enables owners to implement preventive measures, contingency plans, and risk mitigation strategies to safeguard against potential threats and maintain the property's value over time.

In summary, understanding property risk is vital for owners seeking to navigate the evolving world of real estate. However, a comprehensive property risk solution requires more than just understanding the importance of risk analysis. It necessitates the sourcing, transformation, and harmonization of a diverse data landscape. This is where Rearc's capabilities shine. With our expertise in all stages of the data life cycle, we make it our mission to minimize the complexities of this process.

How We Built It

Obtaining the Data

The Rearc Data team has a robust data platform built on top of Apache Airflow which we have used in collaboration with multiple partners to deliver a variety of complex data requests over the years, and we were already sourcing and publishing data to Unity Catalog from several of the sources used in this analysis, including the Bureau of Labor Statistics, the Bureau of Economic Analysis, and the Federal Reserve.

We also used data from the San Francisco Government Open Data project. From this source, we extracted building footprints, local business information, and other locally relevant datasets which we were easily able to assimilate into our workflow with our Data Platform and Delta Sharing. To use these, in addition to our own data, we simply load the delta files from Unity Catalog using the Databricks notebook interface.

## Load Rearc Datasets
interest_rates = sqlContext.sql(
  "SELECT * FROM rearc_catalog.fs_federalreserveboard.frb_h15")
sector_employment = sqlContext.sql(
  "SELECT * FROM rearc_catalog.stat_bls.bls_employment_national_data_monthly")
metro_gdp = sqlContext.sql(
  "SELECT * FROM rearc_catalog.stat_employ_usa.employ_usa_gdp_by_county_metro_yearly_bea")

## Load San Francisco Open Data
buildings = sqlContext.sql(
  "SELECT * FROM rearc_catalog.stat_lnd.lnd_usa_sanfrancisco_building_footprints_static_sfgov")
businesses = sqlContext.sql(
  "SELECT * FROM rearc_catalog.stat_lnd.lnd_usa_sanfrancisco_registered_businesses_static_sfgov")

Finally, we incorporated GDELT, a news "firehose" dataset, into our analysis. GDELT is a very large dataset (more than 8 trillion datapoints!) that indexes almost every news item in the world. This would be quite difficult to source using standard methods, but the data can be found and accessed via the Databricks Marketplace.

For this project, we heavily utilized Databricks, a unified analytics platform that integrates Apache Spark and provides collaborative tools for processing and analyzing large-scale data. In addition, the Databricks Marketplace includes a wide assortment of datasets to add to our analysis. Because we have existing pipelines that publish our data to Databricks, this tool was a natural choice for the data collection phase of this work. By pulling data from Rearc's data catalog using the Marketplace and Delta Sharing, we gain seamless access to diverse and up-to-date data sources, greatly accelerating our analysis.

diagram of dais architecture

Generating Risk Scores using Databricks

The goal in this product is to show how Databricks and Delta Sharing can help estimate a property risk score. With this score, we want to provide property owners with valuable insights to inform their decision-making processes. Because we are already using Databricks to centralize all of our data, we decided to continue to use its capabilities (particularly notebooks, Spark, and SQL) to generate the risk scores, and we will walk through the process below.

1. Incorporating Historical Features

Historical features such as interest rates, GDP growth, and others were crucial in our risk analysis. To incorporate trends, we utilized a combination of aggregation and time-series methods, allowing us to capture important historical patterns and their potential impact on property risk.

2. Handling GDELT Data

GDELT, a large-scale dataset, presented challenges due to its size. However, utilizing Databricks' capabilities with the below code, we were able to efficiently scan the GDELT database for news items relating to the businesses we found in San Francisco.

gdelt_extract = (
  spark.sql("""
    SELECT DATE, TONE, EXPLODE(SPLIT(ORGANIZATIONS, ';')) AS organization
    FROM `external_shares_gdelt`.`<user_catalog>`.`gkg_v1_daily`
    """
  )
  .where(col('organization').isin(companies_list))
  .toPandas()
)

Additionally, we created two sentiment scores: visibility (measuring the level of recognition for a company on a scale of 0 to 1) and perception (evaluating the positive or negative perception of a company). See the below plot for an example from March 2023.

3. Creating Synthetic Tenants/Properties Data

In addition to the wealth of publicly available data, many property owners also store their own internal data about the histories of their tenants and/or properties. To demonstrate how property owners could utilize their own data to enrich this solution, we generated synthetic data which includes:

  • Tenant Data (Tenure, Missed Payments, Complaints, etc.)
  • Property Data (Age, Size, Monthly Rent, Occupancy Rate, etc.)

4. Scaling and Harmonizing Data

To ensure consistency across different risk categories and datasets, we applied scaling and harmonization techniques. These methods allowed us to normalize and standardize the data, facilitating a comprehensive assessment of property risk.

Conclusion

By harnessing the capabilities of the Rearc Data Platform, Databricks, and Delta Sharing, we can provide property owners and investors around the world with the tools they need to make informed decisions. Our data-driven risk analysis facilitates proactive risk management and enables individuals to navigate the rental property market with confidence.

Our ability to help your data endeavors doesn't stop here though. Every day in the data world is marked by transformative advancements like Generative AI, machine learning, data marketplaces, data clean rooms, and the list of cutting-edge technologies goes on. The data landscape is rapidly undergoing massive changes, and Rearc is a partner that is poised to support organizations on their journey to, not simply survive these changes, but to thrive in them. Our capabilities and experience enable us to quickly adapt in these shifting tides, and we'd like to help our partners do the same, allowing them to harness the full potential of data-driven insights.

More information about this project can be found at our Product Page. Additionally, if you would like access to the content and slides we presented at the Databricks Data + AI Summit 2023, or if you would like to know more about how Rearc Data can help you advance your data capabilities, fill out the form below and we'll be in touch!

Source Data Sets

Next steps

Ready to talk about your next project?

1

Tell us more about your custom needs.

2

We’ll get back to you, really fast

3

Kick-off meeting

Let's Talk