SOLVED: How to buld in Redundancy? (ESP8266)

Yesterday the central unit of my floor heating system went offline and stayed offline. After a reboot everything was fixed again but… this is not something I want during the winter. So I was wondering whether its possible to build in redundancy, basically have a second esp8266 with the same firmware as the ‘main’ unit that kicks in when the main unit disconnects. The control switch should somehow be done with Blynk, preferably automatically. Does anyone have experience with this redundancy units or a clue how to fly this home?
The trick would be that they BOTH operate the same relay, but only is active.

Other stuff that I’m for example also wondering about is how to to set this up (the wiring) if lets say gpio 1 of both esps are connected to channel 1 of the relay then they’re ALSO directly connected to each other and if one of them is HIGH and the other LOW then what happens??

Some insights, suggestions, comments are most welcome!

EDIT: I’ve solved it. the solutions consists out of two parts:

  1. how to connect the relays, which can be found HERE (not yet tested!!)
  2. how to code the ESPs such that it works, which can be found HERE (tested and working)

Hmmm… It seems possible but as for me for the same token both need to be treated as the same device, hence need to work in parallel. So not a real “either-or”
But… as for me it is better and safer to code a proper connection handling and a sort of “standalone” mode, for the time the connection is lost. Reconnect shouldn’t take much time.

@wolph42
hello friend

you could use Blynk notify when the master ESP is down to enable the second ESP

Maybe there’s a simpler way - have you thought about using an external eatchdog timer board like this:

You choose the timeout period of 1 minute or 5 minutes using a solder pad. Set-up a timer to send a pulse from your ESP to the watchdog every 10 seconds or so - if the watchdog hasn’t received a pulse for the pre-selected time then it reboots the ESP.
Put something in your void setup to alert you to the fact that the ESP has rebooted, so you can investigate the issue.

This obviously doesn’t fix a totally dead ESP or power supply, but it would solve the problem you had yesterday.

This watchdog module was designed by the guy who runs the SuperHouseTV channel on YouTube and is open source, so you can find PCB files and schematics online if you want to make your own.

Pete.

2 Likes

At all above posters. Thank you for the swift replies, most of what you mention I had already covered:

@marvin7: I have reconnection routine build in and usually this works, however yesterday it went offline at 15:00 and I first checked this morning, so sometimes reconnection does take a long time or does not happen at all.

@Blynk_Coeur: yes I’m already working on that, but if I’m in France (my home is in the Netherlands) I can do quite a bit remotely but it remains tricky and I want it to act automatically, hence redundancy

@PeteKnight: also ahead of you, I’ve already build in a wtd routine in the central unit to check all the other units (7 thermostats and 4 relays) and I’m currently working on adding the same check routine to the relays to check up on the central unit. (and send me a notification when it does).
As for the pulse. The central units was alive, I let it flash a led every 4 seconds as indicator and the led was flashing BUT the unit was offline. Reset usually does not work (not after a long period of being offline, the only method is resetting the router).

So that’s covered, then I still want redundancy on that particular unit. If one relay or thermostat falls away then its only one zone and the rest will pick this up (heat will flow from one zone to the other) if however the central unit fails…my house freezes over. So yes there are other ways and I’m employing those as well, but I do want to explore the redundancy option.

@Marvin: you mention to use the same token so they operate simultaneously, but if one goes offline then the relay will start to get mixed signals (one unit stuck on ON while the other online version tells OFF) so you really need to actively SWITCH from one unit to the other. I’m rather clueless to how!

Is there some electronic unit that can do this? Thinking out loud here:

The central setup is only

power supply --> esp8266 --> relay --> central heating

and would become

power supply --> esp_Main --->switch??-->relay-->central heating
             --> esp_Backup-->

Where ‘switch’ is some unknown piece of electronic that is operated by Blynk.

Maybe the ‘alive’ pulse should only happen if you can successfully ping an external URL?

Then maybe pop a relay in the power line of your router and reboot it if you’ve restarted the ESP a few times and it’s still not connecting?

Pete.

that’s rather redundant, I already receive an offline notification through blynk. The flash is only for me to see whether its dead or alive. (note that it only starts pulsing AFTER it establishes a connection after reboot)

that’s quite an interesting idea… a bit weird and I’m wondering whether this could potentially lead a to an endless loop but certainly something to think about.

That might solve the immediate issue from the OP…however…other bad things could happen which means I still would want some switch from central main to backup.
Thanks for the idea though!

How about the rule of majority, or as it may be called Triple Modular Redundancy? Three independent devices/projects sharing rule of majority logic and cross communication via bridge and possibly even hardwired. Complex… yes, but then any truly redundant system will be that way.

have you tried to spot the reason? This shouldn’t happen.

Yes, that can lead to troubles. So some sort of feedback would need to be done: the kind of “I have control” signal. But what If the other unit is completely “dead” while still holding the stick? The “I have control” wouldn’t be heard anyway… Then the external circuitry need to judge, who has control. So another MCU? :persevere: That has to be done the “usual” way, i.e reconnect, wait, reconnect, reset, etc

I’m familiar with the concept but I thought that to be one ‘bridge’ too far :stuck_out_tongue:
I’d rather figure out this dual setup first before I go even further…

given that I have to reset the router…its the router. I’ve delved into that a long while ago and I recall its a general caching issue caused by bad garbage collection. When its ‘busy’ at home (eldest playing fortnight, youngest watching netflix) I noticed that the esp’s get regularly disconnected (usually for a few seconds and then they reconnect again). If however some ‘memory leak’ occurs EVERY time this happens then after a while the router starts failing and requires a reboot. This is one of the reasons that I’ve hard-wired my entire house after the rebuild…but these new ‘esp’ locations are not wired, so we’re back to using wifi again and a router that starts failing after a while. I don’t know any proper way to solve this, safe buying an expensive router that is running on ‘older-tech’ firmware that has been fine tuned and debugged over the passed few years.

If you are only concerned about it failing during the winter you could use the Normally Closed output of the relay…

If ESP fails to control, the relay will be closed (heating).

If that is not enough you could use a second ESP as an over-temperature protection (not connected to blynk) to avoid baking the house.

I’m not so sure of the NC output, I guess its a matter of taste, the reverse issue would be a baked house in the summer, regardless it would not work because the problem with the ‘failing esp’ is not that it ACTUALLY fails, its just disconnected, so whatever state it received last: thats the state its stuck on regardless of an NC or NO construction. If the esp actually ‘bricks’ there are two wtd’s watching it to reset it and the relay port is set OFF (but I could as well set it to ON). So that’s relatively ok.

The second esp would essentially be the backup. I have 7 esps in the house monitoring the heat per floor, they are checking up on the central but the canNOT operate the central heating. Hence the need for a second unit in case something goes wrong with the first.

Maybe go for a scheduled reboot of your router at a convenient time every day - say 3am?
Go for a NC relay contact that doesn’t need the relay coil to be energised for 23 hours and 59 minutes every day.

Pete.

@WolfHell
it is possible to connect a Dpin from the main ESP to a Dpin of the second and test the level

ex: pin D7 high = ok

yes that would be possible…but I don’t see what it would achieve. Keep in mind that ‘offline’ is the only issue.

HOWEVER i do have a thought…lets see how far I get with this:

  1. if BOTH units are online MAIN takes the lead, BACKUP sets the relevant relay channel to LOW
  2. if BACKUP goes offline, nothing happens, nothing changes, probably a notification after say 15 minutes.
  3. if MAIN goes offline it will know that its offline and set channel to low. BACKUP checks regularly whether MAIN is online and if not it takes over the channel.
  4. BOTH go offline for > 60 minutes : router is rebooted.

Now the question is, the relay channel is connected to two esp’s. these esps are either

  1. both LOW
    2 one HIGH one LOW
    (note: this does mean I need to use the normally closed setup for the relay
    In case 1 I know it will be LOW, but in case 2, what happens then?

I’m for example wondering whether it would help to set the ‘slave’ port to ‘input’ instead of ‘output’ .

hmm I could also use two relays and connect the cv to both, that would always work I think…

yes you have to use separated relay , never one relay to two board

why don’t you test the device status?

yes, that’s what I meant with:

it ‘knows’ due to a regular blynk connection check. So thats covered.

why? (not that im planning to do so, but im curious nonetheless)

suppose the relay is on PIN D3 of the master and PIN D3 of the slave

when master is ON, PIN show +5V
if slave PIN is OFF, PIN = GND …

is that correct ?

You could potentially use a latching relay and send “pulse” to turn it on or off… So send pulse On, release, and so on… Not a good practice though as one may send On and the other Off at the same time causing unpredictable behaviour.

sure !
but when you send pulse “1” to relay, you send 5V to Slave PIN too

1 Like