Telemetry & Asset Health for ESS and UPS
← Back to: Energy & Energy Storage Systems
This page explains how to move from simple BMS or UPS alarms to full asset health telemetry by selecting sensors, low-power nodes, uplinks and data integrity mechanisms that turn vibration, temperature and cycle history into actionable, long-term reliability insights for ESS and UPS fleets.
What this page solves: from fault logs to asset health
Traditional BMS and UPS logs only capture threshold-based alarms, not slow degradation. This page explains how to record vibration, temperature trends and cycle-count history, then move that data over LPWAN or cellular with integrity checks so ESS and UPS fleets can be monitored as long-life assets instead of black-box fault sources.
Pain points of fault-only logs
- Alarms fire only when thresholds are crossed, not while assets drift.
- Remote ESS and UPS sites may look “healthy” while joints and racks slowly degrade.
- Warranty and field service planning depend on sporadic inspections and anecdotal evidence.
Benefits of trend telemetry
- Vibration and temperature trends reveal loose structures and rising contact resistance early.
- Cycle-count and stress history enable data-driven replacement and de-rating decisions.
- Operators see a health view, not just a binary “OK/fault” list of events.
Key questions this page answers
- Which vibration, temperature and cycle-count indicators matter for asset health?
- How can low-power telemetry nodes log and upload this history over LPWAN or cellular?
- How is data integrity and time alignment maintained across remote ESS and UPS sites?
This section does not cover electrochemical cell models or fleet-level energy dispatch. Those topics sit with cell/BMU pages and EMS controllers. Here the focus is on capturing long-term degradation signals, storing them safely and moving them over constrained links without losing context.
Asset health targets and degradation symptoms
Asset health telemetry does not watch cells in isolation. It tracks how modules, racks, cabinets, busbars, contactors and cooling hardware age under real loading and environment. The goal is to turn measurable symptoms such as vibration changes, temperature drift, cycle-count growth and event statistics into early indicators of mechanical, electrical and thermal wear.
| Asset / Symptom | Vibration pattern | Temperature trend | Current / events | Cycles / hours | Humidity / environment |
|---|---|---|---|---|---|
| Module / pack | Rising module vibration may indicate loose mounting or transport damage history. | Slowly increasing hot-spot temperature under similar load hints at contact resistance growth. | Occasional over-current events show stress that does not yet trigger BMS faults. | Cycle-count and amp-hour throughput reveal how hard the module has been used. | Ambient humidity and condensation risk affect long-term insulation and connector life. |
| Rack / cabinet / container | Changes in cabinet vibration spectrum suggest loosened frames or shifting foundations. | Temperature gradients across racks point to blocked airflow or uneven loading. | Frequent door-open or shock events can be logged as operational stress statistics. | Operating hours at elevated ambient temperature shorten overall asset lifetime. | High humidity inside containers accelerates corrosion and insulation degradation. |
| Busbars & high-current joints | Local vibration at busbar supports may signal loose clamps or brackets. | Slowly rising joint temperature under similar current indicates growing contact resistance. | Current spikes and imbalance events highlight stressed paths before visible damage. | High-stress operating hours at rated current are a driver for joint ageing. | Condensation near busbars increases the risk of tracking and partial discharge. |
| Contactors / switches | Audible or measured vibration changes can indicate chatter or mechanical wear. | Rising contact temperature under comparable load suggests worn or pitted contacts. | Trip counts, weld-detection events and failed operations expose switching stress. | Total switching cycles and on-time hours help define safe replacement intervals. | Moisture near switching gear accelerates corrosion and reduces dielectric margin. |
| Cooling hardware | Pump or fan vibration changes are early signs of bearing wear and imbalance. | Discharge temperature drift under similar load points to degraded coolant flow. | Fault and restart events for pumps, valves and fans can be logged as stress counters. | Run-hours at high duty cycles indicate how close cooling assets are to maintenance. | High humidity and leaks in cooling compartments create additional corrosion risk. |
Different assets degrade for different physical reasons, but for telemetry design the problem reduces to a small set of observable signals. Vibration, temperature trends, current and event statistics, accumulated cycles and environmental stress together form the foundation of remote asset health monitoring.
Sensing and signal chain overview: from sensors to features
Asset health telemetry starts with vibration, temperature and usage-related measurements but does not forward raw waveforms. Sensor outputs are converted into compact features such as RMS, peak, crest factor, trend slope and hourly or daily extremes so that long-term history can be logged and sent over LPWAN or cellular links efficiently.
Vibration sensing chain: accelerometer to RMS and trend
Cabinets, racks and cooling hardware experience shocks, transport impacts and mechanical fatigue. MEMS accelerometers or IMUs mounted at key locations provide low-g, low-noise measurements across the structural band of interest. A typical chain uses a digital sensor with SPI or I²C output and an internal FIFO, allowing a low-power MCU to wake up briefly, read a burst of samples and return to sleep.
The MCU turns short sample windows into vibration features such as RMS level, peak value, crest factor and simple band-limited metrics. Longer-term indicators like daily averages and week-on-week changes are derived from these window features, giving an asset health view that is compact enough for telemetry.
Detailed spectral analysis and model-based diagnostics belong in specialised vibration pages. This overview focuses on feature-level outputs suitable for remote telemetry nodes.
Temperature sensing chain: samples to thermal trends
Temperature telemetry looks at how critical points behave over months rather than milliseconds. NTCs, RTDs or digital temperature sensors on modules, joints and cooling inlets feed a multiplexed ADC or digital bus, with readings captured at intervals that are appropriate for thermal dynamics instead of switching edges.
From these samples the node computes hourly or daily maxima, minima and averages, plus gradients between modules or racks. Slow temperature drift under similar loading becomes a key sign of contact resistance growth, blocked airflow or cooling degradation and is logged as a compact trend dataset rather than a stream of raw values.
High-precision ADC configuration, calibration routines and detailed correlation with current sensing are covered in BMU and current-sensing topics. Temperature features here are defined for long-term telemetry and fleet analytics.
Cycle-count and usage chain: from current to counters
Asset health also depends on how often and how hard batteries, busbars and contactors have been used. Current information from existing shunt, Hall or flux-gate sensors is integrated by a fuel-gauge device or MCU firmware to build cycle-count, amp-hour or kilowatt-hour throughput and operating-hour counters.
The telemetry node maintains counters for full or partial cycles, stress hours above specified thresholds and key event counts. These usage metrics are small, easy to store and straightforward to correlate with vibration and temperature features when assessing asset health across a fleet.
Detailed SOH algorithms and electrochemical models that consume these counters are handled in fuel-gauge and online diagnostics material. Telemetry focuses on delivering clean, time-aligned counters that downstream analytics can trust.
Telemetry node architecture for low-power asset health
Turning sensor features into useful asset health data requires a dedicated telemetry node. This node combines low-power processing, local non-volatile storage and LPWAN or cellular connectivity with a simple power tree so that racks, cabinets and cooling assemblies can be monitored continuously without excessive standby consumption.
MCU and power modes
At the centre of the node sits an ultra-low-power MCU with multiple sleep states, RTC wake-up and interrupt inputs from sensors and communication modules. Typical operation cycles between deep sleep and short active bursts: waking to read sensor FIFOs, compute features, manage logs and schedule uplinks before returning to a low-leakage state.
Average power is determined by sleep current, wake-up frequency and active-time budget. Hardware accelerators for CRC or basic cryptography reduce active time per packet and help keep overall energy usage low enough for auxiliary supplies, small batteries or supercapacitor-backed rails.
Local storage hierarchy
Telemetry nodes often use a small RAM buffer for recent feature windows together with SPI Flash or FRAM for persistent history. Flash provides high capacity at low cost but requires page-based erase and wear management, while FRAM offers fast, low-energy writes and excellent endurance at smaller densities.
Data is usually stored as fixed-length records combining timestamp, feature set and status flags. Ring buffers or append-only regions allow efficient logging of days or weeks of history and simplify upload scheduling when connectivity is intermittent or bandwidth constrained.
Connectivity interface block
On the communication side, the MCU connects to a sub-GHz or LPWAN transceiver, a cellular modem or a local gateway interface such as RS-485 or Ethernet. Simple state machines handle network registration, retry behaviour and batching, so that the high-power radio only wakes when enough data is queued or a maintenance session is requested.
The telemetry node may be powered from an auxiliary DC rail, a small dedicated supply or a supercapacitor charged from existing infrastructure. A power-path controller or eFuse protects the node and allows controlled start-up and shutdown during faults or maintenance operations.
LPWAN vs cellular vs local gateway: choosing the uplink
Asset health nodes eventually need a path out of the container or UPS room. Uplink options usually fall into three groups: LPWAN links such as LoRaWAN, sub-GHz proprietary and NB-IoT; direct cellular using LTE Cat-M or Cat-1; and local site gateways or EMS over Ethernet, Wi-Fi or RS-485. The right choice depends on coverage, power budget, recurring cost and how tightly the system must integrate with existing OT and IT networks.
Data volume and latency needs for asset health
Asset health telemetry normally delivers minutes-to-hours views, not millisecond control loops. After feature extraction, a typical record contains timestamp, vibration metrics, temperature extremes and usage counters, giving roughly 40–64 bytes per record. With one record every five minutes, a node produces around 12–18 KB per day, so even large ESS sites can stay within LPWAN or NB-IoT bandwidth if data is batched and compressed.
Higher-bandwidth links only become essential when waveforms, diagnostic snapshots or firmware images must be transferred frequently. For most predictive-maintenance use cases, feature-level logging with periodic uploads is sufficient.
Comparison of uplink options
| Aspect | LPWAN (LoRaWAN / NB-IoT) | Cellular direct (LTE Cat-M / Cat-1) | Local gateway / EMS |
|---|---|---|---|
| Typical data rate | Sized for small packets and feature records; suitable for kbit/s-level asset health and daily summaries. | Comfortably supports frequent packets, diagnostic snapshots and moderate-size file transfers. | Usually not bandwidth-limited; site network handles traffic between node, gateway and control systems. |
| Latency and duty cycle | Seconds to minutes with duty-cycle limits; well matched to batched health reports and rare alarms. | Seconds or better; supports on-demand queries, remote maintenance sessions and rapid event uploads. | Milliseconds to seconds within the site; end-to-end latency depends on backhaul and gateway configuration. |
| Node power consumption | Radios are optimised for long sleep periods and short bursts, enabling battery or supercapacitor-powered nodes. | Registration, attach and idle modes draw more current; nodes often rely on stable auxiliary power rails. | If Ethernet or RS-485 is available, the node radio can be simple and low power, but depends on site wiring. |
| Recurring cost | Self-hosted LoRa networks shift cost to gateways and backhaul; public LPWAN or NB-IoT adds per-device fees. | Per-SIM subscriptions and traffic charges apply, plus logistics for SIM lifecycle and roaming if needed. | Communication uses existing LAN or fieldbus; main cost lies in gateway, security infrastructure and engineering time. |
| Integration complexity | Requires RF planning, gateway deployment or operator integration and server-side handling of uplink formats. | Needs modem control, APN configuration, security hardening and integration into cloud or enterprise endpoints. | Must align with site OT/IT teams on protocols such as Modbus, IEC 60870-5-104 or MQTT, plus VPN and firewalls. |
| Best-fit scenarios | Remote sites with limited IT support where long-range, low-power links are more practical than wired networking. | Sites with good coverage that need richer diagnostics, OTA updates or portable commissioning tools. | Large facilities and substations that already host gateways or EMS platforms and prefer a centralised entry point. |
Practical selection guidelines
- Remote ESS sites without structured IT support often favour LPWAN or NB-IoT, with daily or hourly uploads and local buffering.
- Facilities with strong IT infrastructure and existing site gateways usually benefit from connecting telemetry nodes to those gateways first.
- Applications that require frequent diagnostics, rich datasets or field firmware updates lean toward cellular, sometimes combined with LPWAN for backup.
Detailed cybersecurity design, VPN architecture and EMS or cloud integration belong in dedicated gateway and control-center topics. This section focuses on uplink choices from the perspective of distributed asset health nodes on racks and containers.
Data integrity, security and timestamps for telemetry
Asset health data is only useful when every record is intact, uniquely attributable to a node and tagged with a trustworthy time. The telemetry node therefore needs local protection against corrupted writes, transport-level checks for partial or duplicated packets and a timing scheme that remains credible during outages and delayed uploads.
On-node data integrity: records, CRC and power-fail safety
Local logs typically store fixed-length records that bundle header, feature payload and a checksum. A hardware CRC engine or DMA-friendly routine computes a CRC-16 or CRC-32 over each record in RAM before it is written to Flash or FRAM. This structure allows the node to reject any entry that has been partially written or silently corrupted by bit errors.
Flash-based designs add wear leveling and power-fail protection. Append-only log layouts, block rotation and a dedicated commit flag ensure that records are either fully written or clearly marked invalid. On start-up the node scans only entries with a valid commit flag and CRC, reconstructing a clean tail of data even after abrupt power loss. A watchdog supervises the logging state machine so that stalled writes cannot leave the node in an undefined state.
Design checklists later in this page should confirm that each record carries its own CRC and commit flag and that power interruptions cannot create entries which pass parsing without a checksum check.
Uplink integrity and retries: packets, sequencing and de-duplication
When records are bundled into uplink packets, the packet builder adds device identity, sequence index and payload length around the data. A second CRC or lightweight MAC covers the transport frame so that link-layer errors can be detected independently of the on-node record checksum. This dual layer makes it clear whether corruption occurred before or during transmission.
The node keeps a queue of unacknowledged packets and resends them according to a bounded retry policy. Sequence numbers allow the backend or site gateway to discard duplicates, accept late arrivals and reassemble a coherent time line even when links are intermittent. LPWAN systems generally favour small retry counts and scheduled uploads, while cellular links can support more aggressive retry behaviour and ad hoc diagnostics.
Uplink checklist items should verify that each packet includes a device ID, sequence index and transport-level CRC and that the backend performs de-duplication and ordering within a defined time window.
Time and identity: RTC, secure elements and trusted timestamps
Predictive maintenance depends on knowing when events happened, not just what their values were. Telemetry nodes therefore pair a low-drift RTC and backup supply with periodic time synchronisation from a gateway or cloud service. Every record carries an absolute timestamp and may also track a monotonic counter so that RTC jumps or resets can be detected and corrected during analysis.
Device identity and packet authenticity are handled by a secure element or hardware security module. This device stores keys and credentials, exposes simple services for signing or generating message authentication codes and protects long-term secrets from firmware compromise. Combined with CRC checks and sequence numbers, these signatures allow the backend to trust both the origin and content of each telemetry packet.
A practical checklist should confirm that every record carries a timestamp, RTC drift is accounted for in the design and a secure element protects device identity and any cryptographic material used to sign telemetry.
Telemetry payload and on-node analytics: what to send and where to compute
Telemetry design defines how much information leaves an asset health node and which calculations stay at the edge. Early experiments often stream raw samples, while long-term deployments use feature-based payloads such as min, max, mean, RMS, daily health scores and event counters. Hybrid schemes send features continuously and only include high-resolution raw snapshots when commissioning or when abnormal conditions occur.
Raw-only, feature-only and hybrid payload modes
Raw-only telemetry streams sensor waveforms directly to a backend. This approach is helpful during lab trials and early pilots when vibration or temperature behaviour is still being characterised. However, continuous raw uploads do not scale well in LPWAN or NB-IoT environments and quickly overload storage and analytics pipelines.
Feature-only telemetry keeps processing close to the node. The firmware calculates statistics such as min, max, mean and RMS over fixed windows, trend slopes over hours or days, health scores and event counts (door open, over-vibration, over-temperature). Only these compact features are logged and transmitted, which reduces bandwidth consumption by orders of magnitude while still capturing the underlying degradation patterns.
Hybrid schemes combine both ideas. Asset health nodes run continuously on feature-based telemetry but switch into a higher-detail mode in specific situations, such as commissioning or when thresholds are exceeded. Short raw snapshots then accompany feature records for post-event diagnostics, while day-to-day traffic remains light.
Where to compute: edge analytics versus cloud analytics
Edge analytics on the telemetry node are best suited to simple, local decisions: computing statistics from accelerometer and temperature windows, counting events, tracking daily health scores and evaluating thresholds or trend limits. These operations reduce data volume and allow quick local reactions such as flagging a cabinet for inspection or recording an immediate event snapshot.
Cloud or EMS analytics specialise in fleet-wide and long-horizon analysis. The backend correlates feature streams across racks, containers and sites, aligns them with maintenance actions and environmental data and refines alert thresholds over time. Frequency-domain modelling, EIS and advanced SOH estimation remain in the dedicated online diagnostics domain to avoid overlap with this page.
Comparing payload strategies
| Mode | Data volume | Edge analytics complexity | Typical use |
|---|---|---|---|
| Raw-only | Highest, continuous waveforms or dense sample streams. | Low on-node processing; complexity moves to the backend. | Lab characterisation, short-term pilots and troubleshooting. |
| Feature-only | Lowest, compact records with statistics and counters. | Moderate; nodes compute statistics, trends and simple rules. | Volume deployments on LPWAN, NB-IoT or shared gateways. |
| Hybrid | Moderate on average; occasional spikes during snapshots. | Higher; combines feature pipelines with snapshot triggers. | Mature deployments needing both health scores and root-cause data. |
A later design checklist should confirm which payload mode is used for each site type, how thresholds and trend rules enable local alerts and how hybrid snapshots are limited so that LPWAN or cellular links remain within bandwidth and cost budgets.
Deployment patterns and mini-stories for telemetry nodes
Telemetry and asset health functions are deployed very differently in containerised BESS, central UPS rooms, distributed small UPS cabinets and mobile ESS trailers. This section groups typical field conditions, recommended node placement and real-world style outcomes so that engineers can quickly recognise their own scenario and adapt the design without starting from a blank page.
Containerised BESS at remote sites
Containerised BESS at wind and solar sites faces wide temperature swings, structural vibration from wind and nearby machinery and limited on-site staffing. Power is usually available from auxiliary rails, but IT infrastructure is minimal and backhaul often relies on cellular or microwave plus a compact site gateway. The environment favours simple, robust nodes and uplinks such as LPWAN, NB-IoT or a small embedded gateway.
A common pattern is to place one telemetry node per container, with a vibration sensor mounted on the container frame and a small cluster of temperature sensors or taps into existing BMS temperatures. The node logs feature data every few minutes, computes container-level health scores and sends them via LPWAN or the gateway. Raw snapshots are reserved for commissioning or when abnormal vibration and door-open patterns occur over several days.
- Before telemetry: faults are only visible when the BMS trips or someone visits the site for maintenance.
- After telemetry: rising vibration RMS and extra door-open events highlight mounting and sealing issues weeks before they create water ingress or electrical failures.
Centralised UPS rooms in data centres and hospitals
Centralised UPS rooms in data centres, hospitals or control buildings benefit from stable power, conditioned air and strong IT support. Network access is often available through Ethernet, Wi-Fi or industrial fieldbuses, and existing SCADA or DCIM platforms already monitor power flows and alarms. The main challenges are subtle thermal imbalances, slow degradation in cabling and connections and the risk of unnoticed cabinet-level issues.
Telemetry deployment typically uses one node per row or per group of UPS racks, with vibration sensors on representative cabinets and additional ambient temperature points near battery strings and cable terminations. Nodes feed feature-only health metrics into the site network, where a gateway or EMS integrates asset health scores into existing dashboards and alarm workflows instead of building a parallel system.
- Before telemetry: high connection resistance or airflow blockages are often detected only during yearly inspections or after trips.
- After telemetry: slowly rising cabinet-side temperatures and event counts drive targeted maintenance visits, reducing unplanned downtime without adding new on-site staff.
Distributed small UPS cabinets in buildings
Many commercial and industrial buildings use dozens of small UPS cabinets spread across floors, corridors and equipment rooms. Each cabinet has power but network access is inconsistent, and physical access requires moving between rooms and levels. Manual inspection routes are time-consuming, so minor issues can stay hidden until a power event exposes weak battery strings or over-stressed components.
Deployment patterns often group two or three cabinets under one telemetry node, using simple harnesses or multi-channel inputs for door status, internal temperature and basic load or test-result indicators. LPWAN or NB-IoT uplinks are popular where building networks are difficult to extend, and payloads concentrate on periodic health scores and battery test outcomes rather than high-rate data streams.
- Before telemetry: failing batteries or overheated cabinets are discovered during scheduled rounds or after a power interruption exposes missing runtime.
- After telemetry: repeated weak-test results and abnormal temperature drift mark individual cabinets for early replacement, improving availability while keeping field visits focused.
Mobile ESS trailers and battery trucks
Mobile ESS trailers and battery trucks combine transport stresses with demanding duty cycles at temporary sites. During transport, vibration and shock are severe, while at job sites the system may operate in dusty, hot or cold conditions with intermittent connectivity. Fleet operators need visibility into both mechanical abuse and operational stress in order to schedule maintenance and manage warranty exposure.
A practical pattern is to install one telemetry node per trailer, with vibration sensing on the chassis and temperature points in the battery compartment. The node maintains local logs and uploads feature data and shock event counts over cellular links when coverage is available. When the trailer returns to a depot, Wi-Fi or a local gateway can be used for bulk uploads of any remaining snapshots and diagnostic logs.
- Before telemetry: fleet managers see only “in-service” versus “faulted” trailers and lack data on how units were treated in the field.
- After telemetry: unusual clusters of high-g shock events and elevated operating temperatures identify problematic routes or handling practices and guide both driver training and maintenance planning.
Design checklist and IC mapping for asset health telemetry nodes
This checklist is intended to be a bridge between system requirements and IC selection. It captures sensing needs, node architecture, communication strategy and integrity constraints so that design engineers, FAEs and procurement teams can discuss telemetry nodes for ESS and UPS asset health in concrete, measurable terms.
Design checklist for telemetry and asset health nodes
-
Sensing and features
- Target vibration bandwidth (for example 0.5–500 Hz) and g-range for structural monitoring.
- Required vibration resolution and noise floor for RMS and trend calculations.
- Temperature accuracy target (for example ±0.5 °C) and number of monitored points per rack or cabinet.
- Update periods for vibration and temperature features (for example every 1, 5 or 15 minutes).
- Cycle-count granularity: kWh, Ah or cycle count per defined depth-of-discharge band.
-
Node architecture and storage
- Minimum retention time for on-node logs (for example 7, 30 or 90 days of feature data).
- Local memory capacity for feature logs and rare raw snapshots (Flash or FRAM size in Mbit).
- Power budget in sleep, measurement and uplink modes, including peak current limits.
- Supply source options: auxiliary DC rail, small battery, supercapacitor or a combination.
- Required diagnostic interfaces on the PCB: SWD/JTAG, UART console, test pads or connectors.
-
Uplink, bandwidth and time keeping
- Preferred uplink path: LPWAN/Sub-GHz, NB-IoT/LTE Cat-M, direct Ethernet/Wi-Fi or site gateway.
- Estimated payload volume per node per day in bytes for feature telemetry and event snapshots.
- Acceptable latency for asset health updates (for example minutes vs. hours).
- Time source hierarchy: local RTC only, RTC plus network time, or GNSS-assisted time stamping.
- Expected duration of worst-case network outages and required buffer depth for queued records.
-
Data integrity and security
- Required record-level protection: CRC-16 or CRC-32 on each feature record and snapshot.
- Mandatory use of a secure element or HSM for device identity and key storage.
- Signature or MAC requirement on telemetry packets to satisfy customer or regulatory demands.
- Maximum tolerable data loss window during power or network disturbances.
- Traceability requirements: unique device ID, firmware version and configuration tags in each upload.
For each project, the completed checklist should live alongside the system specification so that FAEs and suppliers can propose alternative IC combinations without changing the underlying sensing, storage or uplink requirements.
IC function mapping and representative device classes
The following function blocks summarise typical IC roles inside a telemetry and asset health node. Exact choice depends on vendor preference and ecosystem, but the categories remain consistent across most ESS and UPS deployments.
| Function block | Role in telemetry node | Representative IC examples |
|---|---|---|
| MEMS accelerometer with FIFO and self-test | Captures cabinet and container vibration, uses FIFO to reduce MCU wake-ups and provides self-test for in-field diagnostics. | ADXL355 / ADXL357, IIS2ICLX, KX134, MC3635 |
| Digital temperature sensor or multi-channel ADC | Monitors ambient and hot-spot temperatures in racks and cabinets using I²C sensors or multiplexed NTC inputs. | TMP117, LM75, STTS22H, ADS1115, LTC2485, MAX11270 |
| Low-power MCU with RTC and crypto acceleration | Runs feature extraction, logging, power modes and communication stacks while maintaining accurate timestamps. | STM32L4/L5, STM32U5, nRF52840, MSP432, GD32L series |
| SPI Flash or FRAM | Stores circular logs of features and occasional raw snapshots with wear leveling and power-fail-safe commit schemes. | W25X40CL, MX25R6435F, S25FL128, FM25W256, CY15B104Q |
| Sub-GHz transceiver or LPWAN SoC | Implements LoRaWAN or proprietary Sub-GHz links from containers and distributed cabinets to site or cloud gateways. | SX1262, SX1276, STM32WL, RL78/G1H, Semtech LR1110 |
| NB-IoT / LTE Cat-M / Cat-1 module | Provides direct cellular uplink for remote containers and mobile ESS trailers where no site LAN is available. | Quectel BG95/BG96, SIM7080G, u-blox SARA-R4/N4, nRF9160 |
| Secure element / hardware security module | Anchors device identity, stores keys and performs signatures or MAC operations for authenticated telemetry packets. | Microchip ATECC608A, NXP SE050, Infineon OPTIGA Trust |
| Power-path controller / eFuse | Protects the node supply, manages inrush and fault isolation and may provide telemetry-ready power-good and fault pins. | TPS2597, TPS25940, MAX17612, LTC4313 with hot-swap FET |
When the checklist items above are filled and mapped to function blocks and example devices, it becomes easier to compare BOM proposals from different vendors while keeping the asset health telemetry behaviour aligned with system-level requirements.
Testing and validation of telemetry and asset health systems
Reliable telemetry requires more than a working firmware build. Nodes must preserve data through vibration, temperature changes, power disturbances and network outages and must deliver correctly ordered, accurately time-stamped records to the backend. This section outlines test scenarios that exercise the telemetry chain from sensor through node and uplink to the asset health platform.
Mechanical and environmental tests
Mechanical and environmental validation checks whether the telemetry node continues to operate correctly under vibration, shock and temperature cycling that reflect containerised BESS, UPS rooms and mobile ESS trailers. A shaker table and thermal chamber are typically used to reproduce the target profile while the test bench monitors data loss, timestamp behaviour and sensor drift.
- Vibration profiles representative of transport, wind-induced motion or machinery; verify that loss of records and FIFO overflows remain below defined limits.
- Thermal cycling across the specified range (for example −20 °C to +60 °C) and humidity conditions; confirm that timestamps remain monotonic and that offsets stay within allowed drift.
- Ingress and mounting tests to ensure that sensors remain firmly attached and that enclosure sealing prevents moisture from affecting electronics.
Power and brown-out tests
Power tests focus on how the node behaves when supply rails dip, switch between sources or drop out entirely. The aim is to confirm that local logs are not corrupted, that partially written records are rejected by CRC checks and that nodes resume uploading pending data after recovery without creating gaps or duplicates.
- Repeatable power interruptions using a programmable supply or brown-out generator to emulate UPS transfer, DC rail sag and fuse trips.
- Verification that write-commit logic and CRC fields detect incomplete records and that start-up scanning restores a clean log tail.
- Measurement of maximum tolerable data gap during brown-outs and confirmation that the node resends any unacknowledged packets after power returns.
Network impairment and buffer tests
Network tests ensure that the telemetry protocol behaves predictably under packet loss, latency and long outages. LPWAN, NB-IoT and cellular links are especially sensitive to duty-cycle limits and throughput bursts, so buffer sizes and retry strategies must be proven in the lab before field deployment.
- Simulated packet loss, latency and out-of-order delivery using RF shielding, attenuators or network emulators; verify correct handling of sequence numbers and de-duplication on the backend.
- Extended no-coverage windows to confirm that on-node buffers hold the intended amount of feature data without overruns and that the node enters a safe mode when buffers fill.
- Controlled reconnection events to check that uploads resume in batches, respecting link duty cycles while maintaining correct record order.
Time consistency and ordering tests
Time consistency tests verify that timestamps remain trustworthy even when the RTC drifts, nodes reboot or network time sources are temporarily unavailable. Accurate timing is essential for correlating asset health data with maintenance activities, alarms and operating conditions across multiple sites.
- Forced RTC offsets and drift to ensure that the backend detects jumps and uses monotonic counters or resynchronisation events to reconstruct a consistent time line.
- Reboot and firmware-update scenarios where the node must mark restart events and continue logging without losing identity or ordering.
- Verification that time synchronisation intervals (via network or GNSS) are adequate to keep timestamp error within defined tolerances over the expected mission time.
Frequently asked questions about telemetry and asset health
This FAQ groups common questions that appear when adding telemetry and asset health functions to ESS and UPS projects. Each answer points back to earlier sections of this page so readers can move from a quick decision overview to deeper design details when required.
Asset health telemetry is strongly recommended for remote, lightly staffed or high-value ESS and UPS sites where failures carry high outage or warranty cost. Simple alarms only reveal faults at the moment of trip, while health telemetry exposes long-term trends in vibration, temperature and cycling. For context, see H2-1.
Temperature telemetry covers many electrical and cooling issues but does not reveal structural looseness, shock events or transport abuse. Vibration monitoring is preferred for containerised BESS, mobile ESS trailers and cabinets exposed to machinery or traffic. When mechanical stress or mounting integrity is a concern, vibration sensing should complement temperature. See H2-2.
Projects can phase in asset health telemetry, but it is important to reserve power, space, connectors and data interfaces during the initial design. Early phases may only log temperatures or manual test results, with full telemetry nodes added when fleet size and service costs grow. Deployment patterns are illustrated in H2-8.
A common starting point is one telemetry node per container or row of racks, with additional nodes where vibration, temperature gradients or accessibility justify finer granularity. Distributed UPS cabinets may share one node across several small enclosures. Final placement depends on physical layout and risk hotspots. See H2-8.
Asset health functions can reuse many BMS and fuel gauge signals such as cell voltage, current and basic temperatures, provided interfaces and sampling rates are documented. However, cabinet vibration, enclosure ambient and structural hot-spots usually require dedicated sensors tied to the telemetry node. Sensor and signal chain options are discussed in H2-3.
Asset health telemetry usually targets minutes to hours rather than seconds. Typical designs send feature summaries every 5–15 minutes and daily health scores once per day, with additional bursts when thresholds are exceeded. This keeps bandwidth and energy within LPWAN or cellular budgets while preserving trends. See H2-7 and H2-5.
LPWAN is attractive for remote, low-bandwidth sites with limited IT support. Cellular works well when coverage is good and per-site bandwidth needs are moderate. Site gateways or EMS integration are preferred in facilities with strong Ethernet or fieldbus infrastructure. Selection should balance coverage, cost and integration complexity. See H2-5.
Telemetry nodes typically use circular logs in SPI Flash or FRAM and keep several days or weeks of feature records. Each record carries sequence numbers and timestamps so that, when connectivity returns, data can be uploaded in order with retry and de-duplication logic. On-node buffering and packet integrity are covered in H2-4 and H2-6.
At a minimum, telemetry packets should carry device identity and CRC protection. For public or shared networks, transport encryption and mutual authentication are preferred, often anchored by a secure element that stores device keys. Regulatory or contractual requirements may additionally demand signed logs and long retention periods. Security and identity options are discussed in H2-6.
Well-designed systems send compact features instead of continuous raw data, so a node often uses only a few kilobytes per day. Occasional high-resolution snapshots are reserved for commissioning or rare events. This keeps telecom cost modest compared with SCADA or historian streams. Payload design and bandwidth control are explained in H2-7.
Fuel gauges and BMS focus on cell-level protection and short-term electrical behaviour, while asset health telemetry concentrates on cabinet, rack and environmental trends over weeks or months. The EMS orchestrates power flows and operating modes using both alarm and health inputs. Clear boundaries reduce complexity and avoid duplicated functions. See H2-1 and H2-2.
A practical minimum set is cabinet or container ambient temperature, a few hot-spot temperatures, one vibration channel on the structure and basic cycle-count or energy-throughput features. Nodes log data locally for several weeks and send feature summaries every few minutes or hours over an appropriate uplink. More advanced analytics and extra sensors can be added in later phases. See H2-8 and H2-9.