Deploying data science to find optimal care home locations:

Advanced data science solutions – combined with experienced land agents – increase the likelihood of acquiring high occupancy care home sites.

Choosing an optimal site for a care home is not easy.

  • Does it have the right population demographics?

  • Are there sufficient transport links?

  • Enough green and blue spaces?

  • Is the housing density appropriate?

  • How far will residents and visitors travel?

  • Are air pollution levels too high?

  • Is essential retail accessible on foot?

A good care home site needs to fulfil hundreds of complex environmental and locational requirements in order to achieve a sustainable occupancy rate. At an average resident cost of £1,500 per week in a competitive market, there is little room for error.

Traditionally, land buyers have used decades of hard-earned experience, available public and commercial data and insights from local agents to inform complex land investment decisions. But national level data shows nearly 40% of homes do not achieve desired occupancy thresholds (over 80%) after 5 years of operation.

Humans cannot process the vast amounts of micro- and macro-economic data existing across all postcodes in England & Wales, and this makes it nearly impossible to identify the complex correlations and relationships that exist between the hundreds of datasets. Human experience can get us most of the answer, but coupled with the power of data science it can lead to significant improvements.

Arca Blanca was approached by a large UK care home builder and operator to marry the knowledge and experience of their land agents with our data science capabilities and property data platform. Through this collaboration, we built a powerful Machine Learning model which leverages both internal client data (such as individual care home performance) and over 450 external data sources (demographic, micro- and macro-economic) covering the past 30 years.

The Machine Learning solution

1) Analysing occupancy over time

We set out to find the likelihood of a care home achieving over 80% occupancy within the next 5 years by analysing changes in occupancy over time.

To enable robust occupancy forecasting, external data such as demographic data on wealth, geography, and the green space index of the area are essential and must be combined with the care home’s internal data. We identified a monthly cadence of internal occupancy update as most beneficial as this helps identify and minimise effects caused by seasonality.

2) Identifying the right algorithm

For each time period (monthly in this case), we can view the problem either as ‘categorising’ (“Will the occupancy be 80% at the end of 5th year? Yes or No”) or ‘prediction’: (“What % of the home will be occupied at the end of 5th year?”). In addition, we may adopt the latter approach to predict occupancy trends throughout the 5 years – a preferred method when data availability is limited.

To achieve high accuracy predictions, the chosen approach must be coupled with techniques such as hyperparameter tuning and cross-validation to identify the right parameters for the model to maximise the prediction accuracy on new/unseen data.

3) Validating the model

The algorithm is trained on over 5 years worth of internal and external data – but it must also be tested for accuracy on data it has not ‘seen’ before. This is done by setting aside a few sites for which we already know the historical occupancy and running them through the trained model. To achieve this we regress the model to the historical date when the sites went operational to see what it would have predicted at that point and what was eventually achieved. The absolute difference between the actual occupancy and predicted occupancy is called the prediction error. The model must be tuned using an iterative approach to keep this error as low as possible.

This model was tested on a wide variety of sites and has an average error rate of only 9% – significantly better than current human-made predictions. This is an incredible result given the historical variations in data quality and availability.

4) Building confidence in the model

Machine learning models exist on a spectrum between high degrees of explainability (white box) and high levels of accuracy (black box). At Arca Blanca we strive to achieve a happy medium between the two. Without any explainability, adoption of the model becomes complex as it will be treated with suspicion. A lack of accuracy creates the same problem in a different way.

On this project we strove for an extremely high level of accuracy but provided degrees of confidence in the output based on availability of data, presence of outliers and general confidence in the outputs based on statistical intervals. This is supplemented by significant levels of local data which have a strong relationship to the outputs. Together these paint a compelling picture of the levels of confidence in the outputs and what may drive these outcomes.

Changing ways of working

Our client has adopted the model as a critical component of its investment committees. We built a bespoke dashboard to enable faster and more accurate decision making in board meetings (replacing the cumbersome house-view), so they can run live scenarios and dismiss large numbers of potential sites without requiring lengthy and costly investigations or site visits. All land acquisition opportunities are now quickly prioritised; the local data as well as the model outputs form a daily and essential support to the land acquisition team.

Importantly, the organisation has embraced Machine Learning and the potential it offers – not as a threat to jobs and ways of working but as an essential tool to create unique advantages in a complex and challenging investment market.

“The complexity of interpreting hundreds of variables to define their relationships to success highlights the need for AI-driven models to enhance human decision-making.”

Decision Support, not Decision Making

Combining multiple data sources can offer a comprehensive understanding of the various factors driving occupancy rates. In a particular use case for a retirement homes builder, we found that indicators of nearby swimming pools were one of the top 5 driving factors of occupancy – something a land agent can easily overlook! The complexity of interpreting numerous demographic features, swimming pools, greenness indicators and hundreds of other variables to define their relationships to success highlights the need for AI-driven models to enhance human decision-making.

Where AI models fall short, in particular in the property sector, is in interpreting irrational human behaviours. Elderly residents might be willing to travel to greater distances for care homes if they are closer to friends or relatives, maybe they will move to follow a son or daughter who has just switched geographies for a new job. Maybe they have no relatives and want to move further south for the “better” weather and the clearly superior quality of the local fish & chips.

It is also crucial to acknowledge that not all regions of the UK have robust demographic data collection or catalogue them in the same way (Scotland being a notable difference). AI models also can only analyse variables for which robust, quality historical data exists – it cannot measure the quality of a view from a particular site or the friendliness of the care home managers. It cannot measure the quality of the food at competing care homes or understand the particular qualities of a home’s garden or their schedule of activities. As such, the inherent limitations of these AI tools must be understood. They cannot be the only source of information in decision making. Until humans stop making irrational decisions, AI will not (yet) replace experienced land agents. They complement human decision-making rather than replace it.

Ultimately, successful implementation of occupancy prediction models requires a balanced approach that integrates data-driven insights with human expertise and understanding. The utilisation of hyper-local demographic, macro, retail, business and property data in order to predict occupancy levels extends far beyond the care home industry and can be applied to other asset classes (student accommodation, office, retail, I&L etc). Marketing teams can leverage this concept to plan targeted campaigns based on population density of specific locations as well better understand the ideal number of units or rooms and their optimal pricing levels. By harnessing the power of large data, executives can make better informed decisions and optimise operations.

This project was run by a joint team of Management Consultants, Data Scientists and Technologists over a period of 16 weeks in constant collaboration with the client team. The project was run in two phases. The first of which consisted in building a relatively low cost, low commitment Proof of Concept over a period of 4 weeks to ensure that an accurate model could be built while the second phase of 12 weeks consisted of strengthening the model with additional data sources and more robust algorithms and building a bespoke dashboard for users to interact with.