How is your data integrity? Where did your data go?
Data is the foundation of any wind farm analytics, benchmarking or reporting. At Clir, we talk a lot about data integrity, of which one key aspect is simply data coverage: how much data is available? How much data is missing?
Since the ability to understand an asset and drive improvements is based on the quality of site data, the answers to these questions can significantly influence how results from analytics should be interpreted.
With Clir Portfolio, we’ve implemented smart logic for the automatic labeling and categorization of missing data periods based on all available information. Based on our experience, we have also established benchmarks of best practices for data coverage. This enables us to get the most accurate results from our client's data.
Any period of missing data at a turbine can be categorized in one of two ways:
Causes of missing data include:
Wind farm data coverage is typically 95% or more. This may sound high enough that data coverage isn’t a problem but consider the following example.
A turbine operated for a year. Data coverage was 97%. Time-based turbine availability was 95% based on the available data.
Days | |
Full period length |
365 |
Time with data |
354 |
Time with missing data |
11 |
Time turbine was known to be online |
336.3 |
Time turbine was known to be offline |
17.7 |
For the period when we have data, turbine availability was 95%, but what is the impact of the missing 3% of data? What was actual turbine availability? This depends on the turbine state during the missing data periods. Two possible scenarios are as follows:
Scenario |
A |
B |
Description |
The turbine was online and producing during the missing data periods |
The turbine was offline and not producing during the missing data periods |
Time online (days) |
347.3 |
336.3 |
Time offline (days) |
17.7 |
28.7 |
Turbine availability |
95.2% |
92.2% |
The missing 3% of the data introduces an uncertainty of 3% in the turbine availability, which is significant. Implications include:
Scenarios A and B represent the range of what could have actually happened at the turbine. The true answer is likely scenario C, which is somewhere in between: the turbine was operating and producing power during some of the missing data periods and offline for the rest of the missing data periods.
Clir’s software and services are regularly used to support wind farm transactions, namely buying and selling shares of projects. Through this role, we regularly see how missing data is treated by consultants across the industry.
One common assumption is that periods of available data are representative of periods of unavailable data. This is an easy assumption to make because it implies we don’t need to worry about missing data. One major problem with this approach is that missing data periods are often correlated to outages. Data is often missing because the turbine is offline and disconnected from power or communications while being serviced. This assumption introduces an upward bias on turbine availability.
Another approach used by the industry is to manually investigate each period of missing data using log books, monthly reports or interviews with operators to understand the turbine state. Although this works, it is not feasible at scale. It is not uncommon for there to be dozens or even hundreds of intermittent periods of missing data at a wind farm in a year.
Unfortunately, when data is missing at the source, we can’t get it back. On the positive side, there’s a lot we can do to address this problem with the data that is typically ingested.
Clir’s software and data model facilitate the ingestion, standardization and application of any data tag or data feed. There are often dozens or even hundreds of tags in the 10-minute turbine SCADA data, some of which are useful to ascertain turbine state during previous or subsequent missing periods. The following occurs during the automatic enrichment process when new data is ingested:
If the turbine is operating and producing power during most or all periods of missing data, then data integrity is considered poor. There is a problem somewhere along the way with the transmission, logging or storing of turbine SCADA data. Actual turbine availability is similar to what’s indicated by available data only, in line with Scenario A.
If the turbine is not operational during most or all missing data periods then in this regard, data integrity is considered good. Data is missing when the turbine is disconnected from power or communications. Actual turbine availability is significantly lower than what’s indicated by available data only, in line with Scenario B.
At any project, we can look at how frequently the turbine was operating during missing data periods to evaluate data integrity. These results are then benchmarked against results from a peer group of wind farms to provide further insights into data integrity. This is presented to clients through our market insights reports.
The figure below shows the percentage of missing data where the turbine was actually operational for a set of eleven wind farms with the same turbine manufacturer for a recent year. Results vary significantly by farm.
Some farms experience more frequent periods of missing turbine SCADA data than others. The extent to which the turbines are actually operating and producing power during these missing data periods varies significantly by farm. Clir supports wind farm owners in improving data integrity by identifying, quantifying and categorizing periods of missing data.
On the market insights reports, available through Clir Portfolio, we grade data practices to ensure that clients are best-in-class for data quality and coverage. This enables more robust and accurate turbine performance metrics and lower uncertainty energy yield assessment results. The increased P90 can be used to support debt optimization and improved financial returns for the project.
Thanks to Thomas Broatch, Intermediate Software Developer, for the implementation work of this new feature.