tl;dr because this is going to be a long one… my zigbee bindings to Hue bulbs aren’t as reliable as I had hoped- sniffed the zigbee traffic and can see odd/wasteful(?) broadcast messages being sent, potentially flooding the network
Prefacing this with the fact that I’m not a zigbee expert by any means and am learning a lot in the process of my troubleshooting.
I’ve had ~25 Blue series dimmers installed for a month or so now (I got lucky and only had a few in the bad batches) and have had the chance to really understand their behavior. One disappointing thing I’ve found is that binding from the switches to my Hue bulbs is not as reliable as I hoped: I was experiencing failures (Hue bulbs not responding) maybe 10-20% of the time I would try to use the switch from the wall.
My first thought was interference on the 2.4ghz channel causing the binding messages to be dropped. I optimized my zigbee channels vs my wireless access point 2.4ghz channels so that no overlap should be occurring and scanned the channels to ensure nothing else was crowding the bands I was using. No improvement in reliability after this, but in my further testing I noticed that reliability was much worse when I would press multiple switches at nearly the same time e.g. if I was trying to turn off all the lights in the basement across 3 switches: I’d press one after the other quickly- I could get the binding to fail on at least one of those presses nearly every time.
So then I went deep…
I flashed a spare sonoff 3.0 dongle with sniffing firmware (using the nice instructions on this reddit thread) and started testing things while monitoring the traffic.
The first thing I noticed was the INCREDIBLE amount of traffic that the energy reporting on these switches produces: I set the min rep interval and min rep change to 1000 in Z2M (as I think there is another bug that keeps it from being fully disabled?) and that cut the overall traffic way down, but didn’t solve my binding problems.
Next, I did a simple test comparing the messages sent from a GE zigbee switch vs Blue series when both were given a bind to the same single Hue bulb. Here is a screenshot of the traffic sent when I pressed down on the GE switch. I highlighted the 4 messages that were produced within a second of the press: the first being the OFF event being sent from the switch to the Hue bulb, the next 2 to the coordinator for something I’m not sure of, and the last one telling the coordinator the new status of the switch (OFF). 4 messages- feels about right.
Here is a screenshot of the same test done on the Inovelli switch. We can see the OFF message being sent from the switch the the bulb and the coordinator just as we see with the GE switch, but then the inovelli switch propagates a [seemingly unnecessary] broadcast across the entire zigbee network resulting in 60+ messages across the network in the matter of a few hundred milliseconds.
Again, I’m not a zigbee engineer or anything so I can’t say if this is concerning behavior, but it does seem to be in line with my binding problems: when these switches hit the network with the broadcast, it seems to break the reliability of the network for the next ~500ms or so. My research also seems to confirm that broadcasting can flood networks and should be used sparingly (pdf source, source).
Anyhow, hoping that somebody on the Inovelli side can take a look at this! The switches are great and I want to love them, but keeping very reliable control of my lights at the switch is a P0 requirement in my house and it’s not great at the moment. Very invested in solving this (if it wasn’t clear from my troubleshooting up to this point ).
Notes on my setup:
I have ~100 devices (almost exclusively Hue bulbs and now these switches) split across 2 networks (the one I did this testing on was my smaller network of ~40 devices- the larger the network: the worse the broadcast performance hit will be) using tubeszb network coordinators with Zigbee2MQTT. I recently split into the two networks because I was seeing overall poor network behavior after installing the Blue switches bringing me to >100 devices- but now am wondering if the Blue switches and all of their power reporting + these types of broadcasts were perhaps partially to blame?