SOLVED: How to buld in Redundancy? (ESP8266)

Emilio · June 7, 2018, 1:52pm

I had a similar issue in my summer house. I solved it but making friend with a neighbor who pushes the reset button when needed for me

wolph42 · June 7, 2018, 1:57pm

LOL, good one!!

I don’t see how that could work as all the widgets connected to ‘main’ wil not be connected to ‘backup’ to if backup takes over then all my widgets won’t work anymore. I could build a copy of it obviously but that really sounds like the bad way to do things.

marvin7 · June 7, 2018, 3:09pm

Then I think It would be best to solve the issue, by replacing the router/AP. Any “fixes” will probably not entirely help, as even two ESP’s can get disconnect, and what then?

wolph42 · June 7, 2018, 3:32pm

actually I’ve always had this issue with all 5 routers I have had in the past, so replacing it is an unlikely solution, I do have an additional router that was expensive and is extremely stable with which I do not have that issue. However the ‘main’ router is of my ISP so even if I replace it I wouldn’t know how to set it up such that it would work.

PeteKnight · June 7, 2018, 3:33pm

Having read through all of this, I think this sums-up the situation…

You’re not too worried about having total redundancy for the relay, ESP, ESP power supply etc. Although this would be nice there are too many pitfalls which could make the system less reliable.

Your main concern is the ESP losing it’s connectivity. When this happens, a reboot of the ESP might fix the problem, but more likely a reboot of the router will be needed.
You have a watchdog timer on the ESP, but it’s not been implemented very well because the watchdog is still being fed even if your ESP loses its connectivity.

Is this correct?

If so, then fixing the watchdog should be your priority, as a reboot of the ESP may fix the issue anyway.

A way of rebooting the router remotely and/or automatically seems like it would be useful.
I have a holiday home in Spain and occasionally I want to be able to reboot the router when I’m not there. For me, this scenario happens when none of the devices within the network are responding, so using a Blynk command to reboot them wouldn’t work, as that command wouldn’t make it through the router from the internet. Instead I use a SIM900 module that I can send SMS messages to. The message is checked to ensure that it’s from a pool of authorised phone numbers and I can check the status of the system, reboot the routeror reboot a couple of other devices or the network, depending on what SMS message I send.
I’m not saying that this is what you need, but maybe you should consider a system where the router will be rebooted automatically if certain criteria (such as your ESP being restarted x times and it still not being able to ping an external website) are met.
I understand your concern that this could get into a loop where the router is being constantly rebooted, but in reality, if you design the system and the reboot criteria well then it shouldn’t happen - besides, the reboot is likely to fix the issue anyway, so the only time a reboot cycle is likely to occur is if your ISP is down, in which case it won’t do any harm anyway.

Pete.

wolph42 · June 7, 2018, 3:43pm

@PeteKnight yes good point. Note that I can reboot my router remotely, but I’d rather have an independently working system, so indeed when an ESP is offline for say >1 hour it should reboot. I could build in an additional ESP/relay at the router to take care off that (keeping in mind that it needs to check its own offline status as well…hmm probably better to use an arduino for that as it will be next to the router so i can use a hard line).
Rebooting ESPs though…I can’t recall one moment that that actually helped, so checking its own status and reboot based on that (which is fairly easy to program) wouldn’t achieve much…I think. Still I could consider
IF esp offline>30 min: reboot ESP
IF esp offline >60 min: reboot router.
this might have some annoying consequences though (say my eldest is playing a game and BOOM router offline) but I guess that could be ommited with a notification and some override in BLYNK…god…this is getting complex…

regardless of this all, I still think its wise to have the central unit redundant in case of e.g. critical failure… that is central is offline > 3 hours. I could let the slave sleep and wake up every hour, sync with server, check online status of main and in case of critical failure become the new central.

ldb · June 7, 2018, 7:25pm

How about having a Raspberry Pi, talking MQTT with ESP and NODE-RED <> Blynk.

The Pi would allow remote access to your network and further router control.

wolph42 · June 7, 2018, 7:40pm

why would I? I already have access to my router through the internet and can reset it whenever I want. Point is that I want an auto-reset of the router when an esp is offline > 1 hour.

I don’t see the added value of mqtt and IRC node-red does not work well (yet) with blynk.
note that i already have a local server running on an RPI.

Gunner · June 7, 2018, 7:43pm

Wire up a sonoff to the router, re-programmed (not with Blynk, just standalone) to reboot it if after x amount of time it doesn’t receive a signal (wifi or physical) from the ESP.

wolph42 · June 7, 2018, 7:47pm

thats also a nice idea, however I alredy figured that it should be an arduino for the simple reason that that connection NEVER (relatively to ESP) fails. I can hardwire it to the router making it much much more stable. A sonoff would once more be an esp that could go offline.

note that what you suggest would not work in my situation because its usually only 1 esp that goes ‘permenantly’ offline certainly not all of them. So if I were to make it stand-alone it would hardly ever reset the router at the right time.

But enough on this part of the topic. The thing I’m really curious about is how to setup 2 ESP with the same token where one is master and the other slave ?

Gunner · June 7, 2018, 7:50pm

Same token meens, as far as Blynk is concerned, same device… thus there is no Master/Slave distinction.

Let me expand on this… same token doesn’t require same code… so you could have seperate groups of vPns for each device and thus treat them as “separate”… but I still don’t see the purpose for your double, redundant, redundancy needs Seperate tokens, even separate projects, can still act as backups.

PeteKnight · June 7, 2018, 7:54pm

This is basically the setup that I use, and whilst there are some benefits of using MQTT instead of Blynk Bridge or other ways of getting remote devices to work together, I don’t see the benefit of adding even more complexity to the system - it just adds more potential failure points.

For anyone who is thinking of going down this route then I can assure you that Blynk does work very well with Node-Red.

Pete.

wolph42 · June 7, 2018, 8:04pm

agreed!! that’s why im discussing this rather thoroughly!

However to get back to the main topic(s), there are 2 issues:

sometimes the router starts to (partially) fail and a reset is required.
in case of a critical failure (in the category: throw it away) of the central ESP I want a 2nd esp to take its place.
Node-red nor mqtt won’t do anything to solve these two issues (at least as far as I can see).
both software and hardwara are covered. No more discussion is required.
the hardware part is again covered no discussion required, the software part, specifically the BLYNK interfacing is yet unclear to me (as how to do this).

PeteKnight · June 7, 2018, 8:52pm

That’s actually something that would be quite easy with MQTT and Node-Red, but we’re not going down that route.

Another option is to have two identical ESPs (same IP, code, Auth code etc) and only power one to the other of them. That would need a changeover relay and another device to control that relay. You’re adding more hardware and failure points, but at least if this controller fails then one of the ESPs will be powered anyway.

Pete.

RJW · June 12, 2018, 11:49am

I don’t use blynk as I don’t trust remote applications but I do have two products that I manufactured and we implemented brown out on the processor so if it stops it will auto reset hope that helps

wolph42 · June 12, 2018, 1:33pm

without any context, reference or links…not really.

wolph42 · June 15, 2018, 12:46pm

SOLUTION

Well I’ve managed to set it all up, tested it under different circumstances and all is peaches. I haven’t tested the connection with the relays yet so I will update that result later.

The 2 esp MASTER and SLAVE both run with the same token and the same code. There is one line that I need to change if I create firmware for either of them:

#define BCK 0   //COMPILE FOR: MAIN(0) or BACKUP(1)

obviously this has influence on how the rest of the code runs. So Here the pieces of code that are related to the redundancy.
Note in this post I’ll call them MASTER and SLAVE, in the code MASTER is called either MAIN or CENTRAL and SLAVE is called BCK or BCK_CENTRAL.
The gest of the below is a MASTER that runs constantly and a SLAVE that sleeps constantly () . SLAVE wakes up every 10min, checks online status of MASTER and if MASTER has been offline for >5 minutes it takes over. Should MASTER get back online again then SLAVE goes back to sleep.
Also: most of my pins are DEFINED so instead of e.g. V14 you will see: BLA_BLA_PIN.

Straight below the above line there is:

#if BCK == 0
  char ESP_NAME[]           = "ESP_CENTRAL";
#endif
#if BCK == 1
  char ESP_NAME[]           = "ESP_BCK_CENTRAL";
#endif
bool bck_active = false  // used for CENTRAL_BCK
const unsigned long BCK_TIME_SLEEP = 10 * 60e6;  // sleep cycles are in us, so 60e6=1 minute
int HTTP_OTA = 0;                                 // when switch on app is hit, the ESP searches for and firmware.bin update

The http_ota is not really necessary, but is related to httpOTAupdates which is also explained in this post. (and really handy for sleepy ESP’s).

The next part is Blynk_connected, the order in which everything is synced is vital as that is the order the blynk_write() functions are called. So you FIRST want to check whether MASTER is actually online (in which case SLAVE immediately goes asleep again after waking up). This is my entire connect routine:

BLYNK_CONNECTED() {                               // sync pins as stored last on server.
                                                  // order of syncing is important due to BCK unit
  rtc.begin();                                    // sync clock with server AFTER blynk has connected with said server. 
  ESP.wdtFeed();                                  // prevent wdt time-out
  Blynk.syncVirtual(OTA_BCK_update_PIN);          // make sure BCK updates before going to sleep
  Blynk.syncVirtual(offline_PIN);                 // puts BCK back to sleep!! (if main is online)
  Blynk.syncVirtual(OTA_update_PIN);
  Blynk.syncVirtual(1,2,3,4,5,6,7);               // heat states of thermometers
  Blynk.syncVirtual(targetTType_PIN);             // home/night/vacation
  ESP.wdtFeed();                                  // prevent wdt time-out

  updateTerminalLAbel();                          // immediately update terminal else it takes a minute
  initializing = 1;                               // overrides the 'cvOn' state which might be stored during boot!!
  pinMode(HEATER_RELAY_PIN_CV, OUTPUT);           // Make sure that relay pin is setup correctly
  checkCV();                                      // check if state has changed during offlne state
  initializing = 0;                               
  
  if(BCK){
    Serial.println("ESP BCK online, attempt to reset CENTRAL: LOW");
    pinMode(CENTRAL_RELAY_PIN, OUTPUT);           // Make sure that relay pin is setup correctly
    digitalWrite(CENTRAL_RELAY_PIN, LOW);         // TURN OFF ESP_CENTRAL.
    delay(1000);
    Serial.println("and back to... HIGH");
    digitalWrite(CENTRAL_RELAY_PIN, HIGH);        // TURN ON ESP_CENTRAL. This basically RESETS CENTRAL ESP
  }

Notes:
the OTA…_UPDATE pins are connected to a switch in the app. If the switch is turned on then the ESP downloads the latest firmware from the server (this way its easy to update SLAVE as it checks for updates when it wakes up). Obviously I need two seperate pins for MASTER and SLAVE
toTerminal routine is a simple routine that ports the string to both Serial.println() and terminal.println()
updateTerminalLAbel routine you can find below.
The wdt feeds are sometimes required as I ran into wdt resets
I’ve also connected MASTER 5v feed into a second relay. This allows SLAVE to reset the unit (who knows it might help).
Next is the blynk_write routine, here’s where the sh#t goes down. Note that I initially used #if def, but this strangely led to stack errors on my wemos d1 so i went for a more classic approach. The lot is commented, so no further comment!


BLYNK_WRITE(offline_PIN){                         // put in a seperate routine due to stack overflow issue
  Serial.println(String("Write to offline_pin: ") + param.asStr());
  if(!BCK){               // MAIN UNIT
    if(param.asInt() != -1){                        // prevent eternal sync loop
      Blynk.virtualWrite(offline_PIN, -1);          // reset the 'offline' pin when back online. Use -1 as this allows RLY to reset everything and set it automaticallly to 0 as it usess ++
      APIwriteDevicePin(auth_CENTRAL, offline_PIN, String(-1)); //extra measure to attempt to kick BCK back to sleep as virtualWrite does not always initiate blynk_write for the other unit (with the same token)
    }
    if(!backOnlineNotification){ 
      //@@@ Blynk.notify("Central MAIN is back online");
      backOnlineNotification = true;
    }
  } else {                                        // BCK UNIT ; MAIN unit is offline 
    if(param.asInt() < 0){
      toTerminal(String("Offline_Pin value: ") + param.asStr() + " BACK TO SLEEP");
      ESP.deepSleep(BCK_TIME_SLEEP);              // main is (back) online, go back to sleep (runs setup() first on wakeup (i hope)
    }
    if(param.asInt() > 5 && !bck_active){
      Blynk.notify(String("Central MAIN is offline for ") + param.asStr() + "m now, BACKUP is taking over!");
      toTerminal("CENTRAL BCK TAKING OVER THE WORLD");
      bck_active = true;
      digitalWrite(HEATER_RELAY_PIN_CV, HIGH);        // activate = LOW, deactivate = HIGH

    } 
  }
}

edit (forgot a rather vital part): //reset the ‘offline’ pin when back online. Use -1 as this allows RLY to reset everything and set it automaticallly to 0 as it usess ++ That line refers to another ESP that monitors MASTER!! In the end you’ll need a virtual pin that has to be updated to ‘master is offline’ status and that can only be done by yet another ESP. I initially thought it would be much easier to let SLAVE check this BUT slave uses the same token as MASTER so it would return ‘alive’ if you check it. The routine on the OTHER ESP is as follows:
further notes: the apiwrite is required to force a blynk_write() call on (both) ESP(s), else the SLAVE does not register the offline reset. This however ALSO forces a blynk_write on MASTER hence the if(-1) check, else you end up in a loop.

void checkNotifications(){                                                                          // sends messages to app in case of emergency, runs after checkHeater
  String apiResult      =  APIreadDevicePin(auth_CENTRAL, offline_PIN);                             // check whether another (or this) unit already detected the issue and initiated a timer. Note that this bool is reset to -1 when Central comes back online!! So the 3 options are -1/0/1+
  int oldCentralOffline = apiResult.toInt();

  centralOffline        = !deviceAlive(auth_CENTRAL);                                               // check if CENTRAL is actually offline
  toTerminal(String("Api/oldCO/CO: ") + apiResult +"/"+ String(oldCentralOffline) +"/"+ String(centralOffline),0);
  if (!centralOffline && !oldCentralOffline) return;                                                // was and is online; everything is peaches
  if (centralOffline) {                                                                             // if offline
    oldCentralOffline++;
    toTerminal(String("oldCO: ") + String(oldCentralOffline) );
    APIwriteDevicePin(auth_CENTRAL, offline_PIN, String(oldCentralOffline));                        // store updated value as a 'Central' pin, so other THMs can find it. Note that CENTRAL will reset it to 0 when it gets back online!
    if(oldCentralOffline > 20) Blynk.notify("Central is offline for 20m now!");                     // time the amount of minutes offline. (checkNotifications is called every minute), CENTRAL_BCK *should* take over in <16 min. So this line should never happen!!
  }
}

for this you also need a timer:

  timer.setInterval(TIME_CENTRAL_ONLINE_CHECK, checkNotifications);                  // check every minute if still connected to server

Notes:

I use APIwriteDevicePin() (got those from @wanek ) routines, its probably much easier to use blynk.bridge but I had these anyway and I find them easier to use. In case you’r looking for them:

/////////////////API BRIDGE FUNCTIONS/////////////////////////////////
bool deviceAlive(String token){
  HTTPClient http;                                // Create:http://192.168.1.93:8080/383d08989c2zzdbdf28bf268807c7c/isHardwareConnected
  String payload = "request failed";
  String url = "http://192.168.1.93:8080/" + token + "/isHardwareConnected";
  http.begin(url);
  int httpCode = http.GET();
  delay(50);

  if (httpCode > 0) {
    payload = http.getString();                   // get response payload as String, value = true or false
  }  else payload = payload + ", httpCode: " + httpCode;
  http.end();
  delay(10);
  return (payload=="true")?1:0;
}

void APIwriteDevicePin(String token, int pin, String value){
// created by WANEK: https://community.blynk.cc/t/a-substitute-for-bridge-for-lazy-people/24128 
  String spin = String(pin);                      // convert pint number to string
  HTTPClient http;                                // Create:http://192.168.1.93:8080/383d08989c2zzdbdf28bf268807c7c/update/v14?value=42
  String url = "http://192.168.1.93:8080/";       // url -- http://IP:port
  url += token;                                   // blynk token
  url += "/update/V";
  url += spin;                                    // pin to update
  url += "?value=";
  url += value;                                   // value to write
  Serial.print("Value send to server: ");
  Serial.println(url);
  http.begin(url);
  http.GET();
  delay(50);
  http.end();
  delay(10);
}

String APIreadDevicePin(String token, int pin){
  String spin = String(pin);                      // convert pint number to string
  HTTPClient http;                                // create: // http://192.168.1.93:8080/383d08989c2zzdbdf28bf268807c7c/get/pin
  String payload = "request failed";
  String url = "http://192.168.1.93:8080/";       // url -- http://IP:port
  url += token;                                   // blynk token
  url += "/get/V";
  url += spin;                                    // pin to read
  http.begin(url);
  int httpCode = http.GET();
  delay(50);
  if (httpCode > 0) {
    payload = http.getString();                   // get response payload as String
    payload.remove(0, 2);
    payload.remove(payload.length() - 2);         // strip [""]
  }
  else payload = payload + ", httpCode: " + httpCode;

  http.end();
  delay(10);
  return payload;
}
/////////////////API BRIDGE FUNCTIONS/////////////////////////////////

(the token is a fake!)

To be complete, here the HTTPupdate routine, in case the OTA_update_PIN is TRUE. Note that I handle most pin updates not in blynk_write but in blynk_write_default (its a matter of taste).

BLYNK_WRITE_DEFAULT(){                          //this routine is activated when ANY of the CENTRAL Vpins are changed. 
  int pin = request.pin;
  
  if (pin == OTA_update_PIN && !BCK){           //switch in app to check for updates
    HTTP_OTA = param.asInt();                   //on the pin is the old version stored OR its '1' in case an update is due
    if(HTTP_OTA > 1){                           //an OTA update just took place, check if version number updated correctly
      toTerminal("ESP CENTRAL UPDATED: old version: " + String(HTTP_OTA) + " new version: " + FW_VERSION,0);
      HTTP_OTA = 0;                             //reset bool
      Blynk.virtualWrite(OTA_update_PIN, HTTP_OTA);
      Blynk.virtualWrite(version_PIN, FW_VERSION);
    }
  }

 if (pin == OTA_BCK_update_PIN && BCK){         //switch in app to check for updates
    HTTP_OTA = param.asInt();                   //on the pin is the old version stored OR its '1' in case an update is due
    if(HTTP_OTA > 1){                           //an OTA update just took place, check if version number updated correctly
      toTerminal("ESP CENTRAL UPDATED: old version: " + String(HTTP_OTA) + " new version: " + FW_VERSION,0);
      HTTP_OTA = 0;                             //reset bool
      Blynk.virtualWrite(OTA_BCK_update_PIN, HTTP_OTA);
      Blynk.virtualWrite(version_BCK_PIN, FW_VERSION);
    }
    if(HTTP_OTA == 1){                              //This is required for BCK or else it will never update!! server update (upload file to: http://192.168.1.10/fota/
      HTTP_OTA = FW_VERSION;                          //use H_OTA to store old version number in to compare with new one on reboot. This will also prevent eternal update loop
      Blynk.virtualWrite(OTA_BCK_update_PIN, HTTP_OTA); //use separate pin for BCK and CENTRAL else they influence eachother (same token)
      checkForUpdates(ESP_NAME);          
    }  
  }
  //a lot more not relevant code
}

This will also prevent eternal update loop …unless you start with FW_VERSION 1 (So don’t!!, start with 2, and integer only!!)
In order to use the above, you’ll need:

#include <ESP8266httpUpdate.h>

const char* fwUrlBase = "http://192.168.1.10/fota/";   //used for http OTAbool HTTP_OTA = false;

void checkForUpdates(char ESP_NAME[]) {
  String ESP = String(ESP_NAME);
  String fwURL = String( fwUrlBase ) + ESP + ".bin";

  Serial.println( "Checking for firmware updates." );
  Serial.print( "ESP Name: " );
  Serial.println( ESP );

  Serial.println( "Preparing to update" );
  t_httpUpdate_return ret = ESPhttpUpdate.update( fwURL );

  switch(ret) {
	case HTTP_UPDATE_FAILED:
	  Serial.printf("HTTP_UPDATE_FAILD Error (%d): %s", ESPhttpUpdate.getLastError(), ESPhttpUpdate.getLastErrorString().c_str());
	  break;
	case HTTP_UPDATE_NO_UPDATES:
	  Serial.println("HTTP_UPDATE_NO_UPDATES");
	  break;
  }
}

where http://192.168.1.10/fota/ is the place where the routine can find the firmware.bin file.

The actual HTTP OTA is handled in the void loop() same as the ‘usual’ OTA. Hence my loop looks like this:

void loop() {
  timer.run(); 
  if(Blynk.connected()) { Blynk.run(); }
  ArduinoOTA.handle();								//the 'normal' OTA update.
  if(HTTP_OTA == 1){                                //server update (upload file to: http://192.168.1.10/fota/
    HTTP_OTA = FW_VERSION;                          //use H_OTA to store old version number in to compare with new one on reboot. This will also prevent eternal update loop
    Blynk.virtualWrite(BCK?OTA_BCK_update_PIN:OTA_update_PIN, HTTP_OTA); //use separate pin for BCK and CENTRAL else they influence eachother (same token)
    checkForUpdates(ESP_NAME);          
  }  
}

…I really use toTerminal quite a bit I noticed, so here the routine:

WidgetTerminal terminal(TERMINAL_PIN);            //initialize Terminal widget

void toTerminal(String input, bool showDate = true) {
  terminal.println(String(showDate?(getDateTime() + "-"):"") + input);
  terminal.flush();
  Serial.println(String("TO TERM.:") + input);
}

and…so you need to the gettime routines as well…

WidgetRTC rtc;                                    //initialize Real-Time-Clock. Note that you MUST HAVE THE WIDGET in blynk to sync to!!!

String getTime(){     return String((hour()<10)?"0":"") + hour() + ":"  + String((minute()<10)?"0":"") + minute() + ":" + String((second()<10)?"0":"") + second();  }
String getDate(){     return String(day()) + "/" + month();      }
String getDateTime(){ return getDate() + "-" + getTime();                       }

and…while we’re at it: updateTerminalLAbel, this one (borrowed this one from @Jamin ):

void updateTerminalLAbel(){                       // Digital clock display of the time
 int wifisignal = map(WiFi.RSSI(), -105, -40, 0, 100);

 int gmthour = hour();
  if (gmthour == 24){
     gmthour = 0;
  }
  String displayhour =   String(gmthour, DEC);
  int hourdigits = displayhour.length();
  if(hourdigits == 1){
    displayhour = "0" + displayhour;
  }
  String displayminute = String(minute(), DEC);
  int minutedigits = displayminute.length();  
  if(minutedigits == 1){
    displayminute = "0" + displayminute;
  }  
  // label for terminal
  displaycurrenttimepluswifi = String(ESP_NAME) + " (v.:" + FW_VERSION + ")                       Clock:  " + displayhour + ":" + displayminute + "               Signal:  " + wifisignal +" %";
  Blynk.setProperty(TERMINAL_PIN, "label", displaycurrenttimepluswifi);
}

and …that’s it. In case whether you’re wondering why there are no timers…I had the same thing at first too, but the sleep cycle always starts and ends in setup() so the sleep duration is the actual timer.

man…looking back at this it looks rather complicated, the essence is actually quite simple, however because ‘everything is connected’ you get a LOT of extra routines that I also use out of convenience.

Anyway, If there are any Q’s lemme know.

Blynk_Coeur · June 15, 2018, 1:22pm

very good work !
I’ll test it ASAP !

wolph42 · June 18, 2018, 12:54pm

made a couple of logic errors, updated earlier post with corrected code. More testing is required, but so far it (now) looks good. One thing that for example went wrong is that as soon as the SLAVE was online, the RELAY check the online status every minute of MASTER (which has the same token as SLAVE) and said: yeaj online!! updated the status…which forced SLAVE to go back to sleep again. That and other small stuff.

wolph42 · June 19, 2018, 10:08am

ok done!! It was a bit more finicky then I initially expected, but I got it working…repeatedly. I’ve also tested the relay and I added an extra relay to allow the SLAVE to reset the MASTER. This all works. I’ve updated the ‘solution post’.

The only thing I could not properly test is with power on the MASTER, reason is that I can’t find a way to disconnect MASTER without also not disconnected SLAVE. Hence I just took off the power of the MASTER to check if the SLAVE kicked in. That worked. Reconnecting the MASTER reset everything back to normal.

And that concludes my entire CV setup. I’ve already installed a couple of units that work. Now the rest. I had planned to make a writeup of it all, but that will be a LOT of work. So in due time.