Real estate data warehouses: lake, swamp or asset?
More real estate owners are building their own data infrastructure: a data lake, a warehouse, an environment built on Power BI. The logic is simple. Data is becoming an asset. But a lake is not automatically an asset. Without structure, it becomes a swamp: plenty of data, little value, and no one quite sure what can be trusted.
What large portfolios are already doing
Some of the largest real estate organisations in Europe are several steps ahead. One major pan-European owner is building its own data lake, with a dedicated team working on it daily. Another is taking a similar path through Power BI, pulling together energy data from across its portfolio into one environment.
Both are responding to the same shift. Energy data used to exist mainly to satisfy a yearly reporting requirement. Increasingly, it is being treated as a strategic resource: something to analyse, query and build decisions on.
But having a lake is not the same as having an asset. The question is not whether an organisation has gathered its data into one place. It is whether that data can actually be trusted once it gets there.
The moment the difference becomes visible
"I was in New York for a meeting with a client's climate officer," says Benno Schwarz, Head of Growth at Hello Energy. "You could really feel how deep compliance around data has become embedded in how they talk about it. They asked: if you are missing data, do you document that somewhere? Because we need to be able to show that when something is missing, we know it is missing. If we have that documentation, we have a solid story for compliance."
That question is the difference between a lake and an asset in practice. A lake holds whatever has been collected. An asset comes with a record of what is there, what is missing, and why. Without that record, a gap in the data and a genuine drop in consumption look identical. One might mean a tenant moved out. The other might mean a meter failed three months ago and nobody noticed.
Assurance as a stress test for the warehouse
If a data warehouse needs to support compliance reporting, certification or board-level decisions, it needs to survive scrutiny. That scrutiny has a name: assurance.
"We had an American client ask whether we were SOC certified," says Kees van Alphen, Managing Director at Hello Energy. "My first thought was that this was going to cost a lot of money. But we sat down with a senior assurance specialist and went through the entire process: where the data comes from, whether it has been manipulated in any way, how it is stored, what happens when something is missing, which standards and formats we use. We documented all of it. They came back and said it was genuinely reliable."
That process is, in effect, a test of whether a data environment is a lake or an asset. A lake can absorb data from anywhere, in any format, with no record of where it came from. An asset has to be able to answer, for every data point, where it originated and what happened to it along the way.
The scale of the challenge varies enormously across the industry. According to GRESB, the global ESG benchmark for real estate, energy data coverage across the sector crossed 75% globally for the second consecutive year in 2025, the level GRESB considers representative of a building's actual energy use. But that average hides large differences: hotels and offices report energy data coverage of 89% and 85% respectively, while residential and retail report only 61% and 52%. For large parts of the market, the lake is still mostly empty, let alone structured.
Four types of data, one warehouse
Even when a warehouse is well filled, it can still become a swamp if everything inside it is treated as equally reliable.
Energy data serves different purposes, and each purpose has a different minimum standard. Certification grade data needs to be available at the right moment, in the right format, for a specific framework. Analytics grade data benefits from granularity and frequency. Billing grade data must be exact, from certified and calibrated meters. Tenant-facing data needs to be correct enough that nobody looks incompetent when a tenant sees it.
A warehouse that does not distinguish between these categories creates a specific risk: data of different reliability levels sitting side by side, indistinguishable to whoever queries it later. A number that looks identical to another number may have come from a certified meter in one case and a rough estimate in the other. Without that distinction recorded somewhere, the warehouse cannot tell the difference, and neither can the person using it.
This is not an argument for collecting less data. It is an argument for knowing, for every data point in the warehouse, which of these categories it belongs to, and therefore what it can and cannot be used for.
Even the largest portfolios are not there yet
Building this kind of structure is not simply a question of budget or ambition. Some of the best-resourced organisations in the industry are still working through it.
One of the largest real estate companies in the world has spent years building its own data lake, with a significant team dedicated to it. For a long time, that organisation believed it could collect its own energy data directly from its buildings, independent of any external partner. The ambition was there. The resources were there. And yet, integrating something as fundamental as data from its own building sensors into that lake remained unresolved for years.
The lesson is not that scale does not matter. It is that scale alone does not solve the underlying problem. A large, well-funded data lake with gaps, inconsistent formats and no record of data provenance is still a swamp. It is simply a bigger one.
Lake, swamp, or asset
The difference between these three is not about how much data an organisation has collected, or how sophisticated the technology behind the warehouse looks. It is about what happens before the data arrives: how it is collected, how gaps are tracked, how different types of data are distinguished from one another, and whether the whole structure can withstand someone asking hard questions about where it all came from.
A lake is a starting point. Whether it becomes an asset or a swamp depends on decisions made long before the data ever reaches it.
________________________________________
This article is part of a series in which Kees van Alphen and Benno Schwarz share what ten years of European energy data collection has taught them.


