A conversation with Peaxy Chief Scientist John Ervin.
In thinking about data and how much to store, we first need to consider how much value is being created from it. The key currency for electricity production and dispatch is the megawatt-hour (MWh). Putting a ballpark dollar value on this is a complicated question that depends on many factors. To see how complex and varied it can be, just have a look at the CA ISO pricing available here or EIA pricing here. But we need a number to understand how much data we can save for batteries, so we’ll consider a reasonable realistic number of $100/MWh just to make the arithmetic easier.
Examples: Wind turbines and batteries
Let’s first take the example of a wind turbine. If we take a 2 MW wind turbine operating at a 25% capacity factor (some will say both these numbers should be bigger for modern installations, but let’s go with those) we have 12 MWh a day, or about $1,200/day from that turbine. Now that turbine produces hundreds of data points per second, let’s say 200, captured by a typical SCADA or SCADA-type system. Considering 86,400 seconds in a day where the clocks don’t change (trust me), we are looking at around 17 MiB of data, or 14 KiB of data / $ generated (call it a one-byte point for simplicity).
Batteries, however, are a very different story. Again some numbers not everyone will agree on, but we have customers that have blocks which are around 300 kW, 4h made up of 12 strings with each string producing 1,000 pts/second. So now our daily 1.2 MWh (our customers have 100% efficient batteries, yes they do) is supported by 1 GiB of data, or 8 MiB of data / $ stored, nearly 600x more data / $ than wind.
Methods of storing data
Now to store that data in database constructs, you can generally expect to pay something on the order of $0.10 / GiB / month. So to just store a year’s worth of our (uncompressed) 300 kW battery, that will cost around $438 (again the system is producing $120/day).
If you aren’t smart about what data you keep, operators will start noticing your storage costs after about two years. At Peaxy, we have come up with a variety of solutions to solve this problem for our customers. I’ll address three options:
I haven’t met the customer who wants us to just discard their data after a certain time. That’s not a solution to just toss the data after say two years, but clearly it’s an option in principal.
Of course data can be compressed – sometimes in clever ways – to dramatically alter what we are discussing here, but the cost is dramatically slower performance of any data-based analytics. Battery telemetry is time series data. Compression scientists can therefore think of it in many of the same ways you might think of a movie playing, where most of the time things aren’t changing that quickly, like say the corners of your screen, or the sky or the ground. Much like higher resolution moving pictures drives codecs through H.261 … H.264 … H.265, we are developing new and improved compression to meet the demands for our battery customers’ data.
Compression also need not be lossy. In other words, we can make the dataset small on a computer disk, but then reconstruct it exactly with no loss of information. These aren’t fuzzy JPEG images we are talking about, but lossless compression like say the FLAC format for you audiophiles out there.
A more powerful technique for saving data is resampling. Here we do disregard some amount of data, then combine it with compression in order to save our customers storage costs. The rules for how we do this resampling depend on the details, but often this can give orders of magnitude reduction in storage costs while still preserving all of the nuances of the data set while also not adversely affecting performance.
How much data to keep
So if you ask me how much data battery customers should keep, the answer is simple – keep all of it! But do it smartly, by leveraging the sort of technologies we have developed in large part by looking at what other computer scientists have done in data rich applications.
At Peaxy, we provide full access to all battery data threaded by serial number, over the lifecycle of the asset, including dynamic RUL and degradation computations by serialized battery asset. We also have experience with computing near-real-time digital twins for each serialized battery, over the life of the battery.
Our deployments are typically done in 120 days, often with far more speed and efficiency than an in-house development effort or a generalized analytics platform with extensive customizations.
Want to know more?