SOLVED: How to buld in Redundancy? (ESP8266)

without any context, reference or links…not really.

SOLUTION

Well I’ve managed to set it all up, tested it under different circumstances and all is peaches. I haven’t tested the connection with the relays yet so I will update that result later.

The 2 esp MASTER and SLAVE both run with the same token and the same code. There is one line that I need to change if I create firmware for either of them:

#define BCK 0   //COMPILE FOR: MAIN(0) or BACKUP(1)

obviously this has influence on how the rest of the code runs. So Here the pieces of code that are related to the redundancy.
Note in this post I’ll call them MASTER and SLAVE, in the code MASTER is called either MAIN or CENTRAL and SLAVE is called BCK or BCK_CENTRAL.
The gest of the below is a MASTER that runs constantly and a SLAVE that sleeps constantly (:thinking:) . SLAVE wakes up every 10min, checks online status of MASTER and if MASTER has been offline for >5 minutes it takes over. Should MASTER get back online again then SLAVE goes back to sleep.
Also: most of my pins are DEFINED so instead of e.g. V14 you will see: BLA_BLA_PIN.

Straight below the above line there is:

#if BCK == 0
  char ESP_NAME[]           = "ESP_CENTRAL";
#endif
#if BCK == 1
  char ESP_NAME[]           = "ESP_BCK_CENTRAL";
#endif
bool bck_active = false  // used for CENTRAL_BCK
const unsigned long BCK_TIME_SLEEP = 10 * 60e6;  // sleep cycles are in us, so 60e6=1 minute
int HTTP_OTA = 0;                                 // when switch on app is hit, the ESP searches for and firmware.bin update

The http_ota is not really necessary, but is related to httpOTAupdates which is also explained in this post. (and really handy for sleepy ESP’s).

The next part is Blynk_connected, the order in which everything is synced is vital as that is the order the blynk_write() functions are called. So you FIRST want to check whether MASTER is actually online (in which case SLAVE immediately goes asleep again after waking up). This is my entire connect routine:

BLYNK_CONNECTED() {                               // sync pins as stored last on server.
                                                  // order of syncing is important due to BCK unit
  rtc.begin();                                    // sync clock with server AFTER blynk has connected with said server. 
  ESP.wdtFeed();                                  // prevent wdt time-out
  Blynk.syncVirtual(OTA_BCK_update_PIN);          // make sure BCK updates before going to sleep
  Blynk.syncVirtual(offline_PIN);                 // puts BCK back to sleep!! (if main is online)
  Blynk.syncVirtual(OTA_update_PIN);
  Blynk.syncVirtual(1,2,3,4,5,6,7);               // heat states of thermometers
  Blynk.syncVirtual(targetTType_PIN);             // home/night/vacation
  ESP.wdtFeed();                                  // prevent wdt time-out

  updateTerminalLAbel();                          // immediately update terminal else it takes a minute
  initializing = 1;                               // overrides the 'cvOn' state which might be stored during boot!!
  pinMode(HEATER_RELAY_PIN_CV, OUTPUT);           // Make sure that relay pin is setup correctly
  checkCV();                                      // check if state has changed during offlne state
  initializing = 0;                               
  
  if(BCK){
    Serial.println("ESP BCK online, attempt to reset CENTRAL: LOW");
    pinMode(CENTRAL_RELAY_PIN, OUTPUT);           // Make sure that relay pin is setup correctly
    digitalWrite(CENTRAL_RELAY_PIN, LOW);         // TURN OFF ESP_CENTRAL.
    delay(1000);
    Serial.println("and back to... HIGH");
    digitalWrite(CENTRAL_RELAY_PIN, HIGH);        // TURN ON ESP_CENTRAL. This basically RESETS CENTRAL ESP
  }    

Notes:
the OTA…_UPDATE pins are connected to a switch in the app. If the switch is turned on then the ESP downloads the latest firmware from the server (this way its easy to update SLAVE as it checks for updates when it wakes up). Obviously I need two seperate pins for MASTER and SLAVE
toTerminal routine is a simple routine that ports the string to both Serial.println() and terminal.println()
updateTerminalLAbel routine you can find below.
The wdt feeds are sometimes required as I ran into wdt resets
I’ve also connected MASTER 5v feed into a second relay. This allows SLAVE to reset the unit (who knows it might help).
Next is the blynk_write routine, here’s where the sh#t goes down. Note that I initially used #if def, but this strangely led to stack errors on my wemos d1 so i went for a more classic approach. The lot is commented, so no further comment!


BLYNK_WRITE(offline_PIN){                         // put in a seperate routine due to stack overflow issue
  Serial.println(String("Write to offline_pin: ") + param.asStr());
  if(!BCK){               // MAIN UNIT
    if(param.asInt() != -1){                        // prevent eternal sync loop
      Blynk.virtualWrite(offline_PIN, -1);          // reset the 'offline' pin when back online. Use -1 as this allows RLY to reset everything and set it automaticallly to 0 as it usess ++
      APIwriteDevicePin(auth_CENTRAL, offline_PIN, String(-1)); //extra measure to attempt to kick BCK back to sleep as virtualWrite does not always initiate blynk_write for the other unit (with the same token)
    }
    if(!backOnlineNotification){ 
      //@@@ Blynk.notify("Central MAIN is back online");
      backOnlineNotification = true;
    }
  } else {                                        // BCK UNIT ; MAIN unit is offline 
    if(param.asInt() < 0){
      toTerminal(String("Offline_Pin value: ") + param.asStr() + " BACK TO SLEEP");
      ESP.deepSleep(BCK_TIME_SLEEP);              // main is (back) online, go back to sleep (runs setup() first on wakeup (i hope)
    }
    if(param.asInt() > 5 && !bck_active){
      Blynk.notify(String("Central MAIN is offline for ") + param.asStr() + "m now, BACKUP is taking over!");
      toTerminal("CENTRAL BCK TAKING OVER THE WORLD");
      bck_active = true;
      digitalWrite(HEATER_RELAY_PIN_CV, HIGH);        // activate = LOW, deactivate = HIGH

    } 
  }
}                             

edit (forgot a rather vital part): //reset the ‘offline’ pin when back online. Use -1 as this allows RLY to reset everything and set it automaticallly to 0 as it usess ++ That line refers to another ESP that monitors MASTER!! In the end you’ll need a virtual pin that has to be updated to ‘master is offline’ status and that can only be done by yet another ESP. I initially thought it would be much easier to let SLAVE check this BUT slave uses the same token as MASTER so it would return ‘alive’ if you check it. The routine on the OTHER ESP is as follows:
further notes: the apiwrite is required to force a blynk_write() call on (both) ESP(s), else the SLAVE does not register the offline reset. This however ALSO forces a blynk_write on MASTER hence the if(-1) check, else you end up in a loop.

void checkNotifications(){                                                                          // sends messages to app in case of emergency, runs after checkHeater
  String apiResult      =  APIreadDevicePin(auth_CENTRAL, offline_PIN);                             // check whether another (or this) unit already detected the issue and initiated a timer. Note that this bool is reset to -1 when Central comes back online!! So the 3 options are -1/0/1+
  int oldCentralOffline = apiResult.toInt();

  centralOffline        = !deviceAlive(auth_CENTRAL);                                               // check if CENTRAL is actually offline
  toTerminal(String("Api/oldCO/CO: ") + apiResult +"/"+ String(oldCentralOffline) +"/"+ String(centralOffline),0);
  if (!centralOffline && !oldCentralOffline) return;                                                // was and is online; everything is peaches
  if (centralOffline) {                                                                             // if offline
    oldCentralOffline++;
    toTerminal(String("oldCO: ") + String(oldCentralOffline) );
    APIwriteDevicePin(auth_CENTRAL, offline_PIN, String(oldCentralOffline));                        // store updated value as a 'Central' pin, so other THMs can find it. Note that CENTRAL will reset it to 0 when it gets back online!
    if(oldCentralOffline > 20) Blynk.notify("Central is offline for 20m now!");                     // time the amount of minutes offline. (checkNotifications is called every minute), CENTRAL_BCK *should* take over in <16 min. So this line should never happen!!
  }
}

for this you also need a timer:

  timer.setInterval(TIME_CENTRAL_ONLINE_CHECK, checkNotifications);                  // check every minute if still connected to server

Notes:

I use APIwriteDevicePin() (got those from @wanek ) routines, its probably much easier to use blynk.bridge but I had these anyway and I find them easier to use. In case you’r looking for them:

/////////////////API BRIDGE FUNCTIONS/////////////////////////////////
bool deviceAlive(String token){
  HTTPClient http;                                // Create:http://192.168.1.93:8080/383d08989c2zzdbdf28bf268807c7c/isHardwareConnected
  String payload = "request failed";
  String url = "http://192.168.1.93:8080/" + token + "/isHardwareConnected";
  http.begin(url);
  int httpCode = http.GET();
  delay(50);

  if (httpCode > 0) {
    payload = http.getString();                   // get response payload as String, value = true or false
  }  else payload = payload + ", httpCode: " + httpCode;
  http.end();
  delay(10);
  return (payload=="true")?1:0;
}

void APIwriteDevicePin(String token, int pin, String value){
// created by WANEK: https://community.blynk.cc/t/a-substitute-for-bridge-for-lazy-people/24128 
  String spin = String(pin);                      // convert pint number to string
  HTTPClient http;                                // Create:http://192.168.1.93:8080/383d08989c2zzdbdf28bf268807c7c/update/v14?value=42
  String url = "http://192.168.1.93:8080/";       // url -- http://IP:port
  url += token;                                   // blynk token
  url += "/update/V";
  url += spin;                                    // pin to update
  url += "?value=";
  url += value;                                   // value to write
  Serial.print("Value send to server: ");
  Serial.println(url);
  http.begin(url);
  http.GET();
  delay(50);
  http.end();
  delay(10);
}

String APIreadDevicePin(String token, int pin){
  String spin = String(pin);                      // convert pint number to string
  HTTPClient http;                                // create: // http://192.168.1.93:8080/383d08989c2zzdbdf28bf268807c7c/get/pin
  String payload = "request failed";
  String url = "http://192.168.1.93:8080/";       // url -- http://IP:port
  url += token;                                   // blynk token
  url += "/get/V";
  url += spin;                                    // pin to read
  http.begin(url);
  int httpCode = http.GET();
  delay(50);
  if (httpCode > 0) {
    payload = http.getString();                   // get response payload as String
    payload.remove(0, 2);
    payload.remove(payload.length() - 2);         // strip [""]
  }
  else payload = payload + ", httpCode: " + httpCode;

  http.end();
  delay(10);
  return payload;
}
/////////////////API BRIDGE FUNCTIONS/////////////////////////////////

(the token is a fake!)

To be complete, here the HTTPupdate routine, in case the OTA_update_PIN is TRUE. Note that I handle most pin updates not in blynk_write but in blynk_write_default (its a matter of taste).

BLYNK_WRITE_DEFAULT(){                          //this routine is activated when ANY of the CENTRAL Vpins are changed. 
  int pin = request.pin;
  
  if (pin == OTA_update_PIN && !BCK){           //switch in app to check for updates
    HTTP_OTA = param.asInt();                   //on the pin is the old version stored OR its '1' in case an update is due
    if(HTTP_OTA > 1){                           //an OTA update just took place, check if version number updated correctly
      toTerminal("ESP CENTRAL UPDATED: old version: " + String(HTTP_OTA) + " new version: " + FW_VERSION,0);
      HTTP_OTA = 0;                             //reset bool
      Blynk.virtualWrite(OTA_update_PIN, HTTP_OTA);
      Blynk.virtualWrite(version_PIN, FW_VERSION);
    }
  }

 if (pin == OTA_BCK_update_PIN && BCK){         //switch in app to check for updates
    HTTP_OTA = param.asInt();                   //on the pin is the old version stored OR its '1' in case an update is due
    if(HTTP_OTA > 1){                           //an OTA update just took place, check if version number updated correctly
      toTerminal("ESP CENTRAL UPDATED: old version: " + String(HTTP_OTA) + " new version: " + FW_VERSION,0);
      HTTP_OTA = 0;                             //reset bool
      Blynk.virtualWrite(OTA_BCK_update_PIN, HTTP_OTA);
      Blynk.virtualWrite(version_BCK_PIN, FW_VERSION);
    }
    if(HTTP_OTA == 1){                              //This is required for BCK or else it will never update!! server update (upload file to: http://192.168.1.10/fota/
      HTTP_OTA = FW_VERSION;                          //use H_OTA to store old version number in to compare with new one on reboot. This will also prevent eternal update loop
      Blynk.virtualWrite(OTA_BCK_update_PIN, HTTP_OTA); //use separate pin for BCK and CENTRAL else they influence eachother (same token)
      checkForUpdates(ESP_NAME);          
    }  
  }
  //a lot more not relevant code
}

This will also prevent eternal update loop …unless you start with FW_VERSION 1 (So don’t!!, start with 2, and integer only!!)
In order to use the above, you’ll need:

#include <ESP8266httpUpdate.h>

const char* fwUrlBase = "http://192.168.1.10/fota/";   //used for http OTAbool HTTP_OTA = false;

void checkForUpdates(char ESP_NAME[]) {
  String ESP = String(ESP_NAME);
  String fwURL = String( fwUrlBase ) + ESP + ".bin";

  Serial.println( "Checking for firmware updates." );
  Serial.print( "ESP Name: " );
  Serial.println( ESP );

  Serial.println( "Preparing to update" );
  t_httpUpdate_return ret = ESPhttpUpdate.update( fwURL );

  switch(ret) {
	case HTTP_UPDATE_FAILED:
	  Serial.printf("HTTP_UPDATE_FAILD Error (%d): %s", ESPhttpUpdate.getLastError(), ESPhttpUpdate.getLastErrorString().c_str());
	  break;
	case HTTP_UPDATE_NO_UPDATES:
	  Serial.println("HTTP_UPDATE_NO_UPDATES");
	  break;
  }
}

where http://192.168.1.10/fota/ is the place where the routine can find the firmware.bin file.

The actual HTTP OTA is handled in the void loop() same as the ‘usual’ OTA. Hence my loop looks like this:

void loop() {
  timer.run(); 
  if(Blynk.connected()) { Blynk.run(); }
  ArduinoOTA.handle();								//the 'normal' OTA update.
  if(HTTP_OTA == 1){                                //server update (upload file to: http://192.168.1.10/fota/
    HTTP_OTA = FW_VERSION;                          //use H_OTA to store old version number in to compare with new one on reboot. This will also prevent eternal update loop
    Blynk.virtualWrite(BCK?OTA_BCK_update_PIN:OTA_update_PIN, HTTP_OTA); //use separate pin for BCK and CENTRAL else they influence eachother (same token)
    checkForUpdates(ESP_NAME);          
  }  
}

…I really use toTerminal quite a bit I noticed, so here the routine:

WidgetTerminal terminal(TERMINAL_PIN);            //initialize Terminal widget

void toTerminal(String input, bool showDate = true) {
  terminal.println(String(showDate?(getDateTime() + "-"):"") + input);
  terminal.flush();
  Serial.println(String("TO TERM.:") + input);
}

and…so you need to the gettime routines as well…

WidgetRTC rtc;                                    //initialize Real-Time-Clock. Note that you MUST HAVE THE WIDGET in blynk to sync to!!!

String getTime(){     return String((hour()<10)?"0":"") + hour() + ":"  + String((minute()<10)?"0":"") + minute() + ":" + String((second()<10)?"0":"") + second();  }
String getDate(){     return String(day()) + "/" + month();      }
String getDateTime(){ return getDate() + "-" + getTime();                       }

and…while we’re at it: updateTerminalLAbel, this one (borrowed this one from @Jamin ):

void updateTerminalLAbel(){                       // Digital clock display of the time
 int wifisignal = map(WiFi.RSSI(), -105, -40, 0, 100);

 int gmthour = hour();
  if (gmthour == 24){
     gmthour = 0;
  }
  String displayhour =   String(gmthour, DEC);
  int hourdigits = displayhour.length();
  if(hourdigits == 1){
    displayhour = "0" + displayhour;
  }
  String displayminute = String(minute(), DEC);
  int minutedigits = displayminute.length();  
  if(minutedigits == 1){
    displayminute = "0" + displayminute;
  }  
  // label for terminal
  displaycurrenttimepluswifi = String(ESP_NAME) + " (v.:" + FW_VERSION + ")                       Clock:  " + displayhour + ":" + displayminute + "               Signal:  " + wifisignal +" %";
  Blynk.setProperty(TERMINAL_PIN, "label", displaycurrenttimepluswifi);
}

and …that’s it. In case whether you’re wondering why there are no timers…I had the same thing at first too, but the sleep cycle always starts and ends in setup() so the sleep duration is the actual timer.

man…looking back at this it looks rather complicated, the essence is actually quite simple, however because ‘everything is connected’ you get a LOT of extra routines that I also use out of convenience.

Anyway, If there are any Q’s lemme know.

1 Like

very good work !
I’ll test it ASAP !

made a couple of logic errors, updated earlier post with corrected code. More testing is required, but so far it (now) looks good. One thing that for example went wrong is that as soon as the SLAVE was online, the RELAY check the online status every minute of MASTER (which has the same token as SLAVE) and said: yeaj online!! updated the status…which forced SLAVE to go back to sleep again. That and other small stuff.

ok done!! It was a bit more finicky then I initially expected, but I got it working…repeatedly. I’ve also tested the relay and I added an extra relay to allow the SLAVE to reset the MASTER. This all works. I’ve updated the ‘solution post’.

The only thing I could not properly test is with power on the MASTER, reason is that I can’t find a way to disconnect MASTER without also not disconnected SLAVE. Hence I just took off the power of the MASTER to check if the SLAVE kicked in. That worked. Reconnecting the MASTER reset everything back to normal.

And that concludes my entire CV setup. I’ve already installed a couple of units that work. Now the rest. I had planned to make a writeup of it all, but that will be a LOT of work. So in due time.

1 Like