ESP32 millis() overflows break the loop

Ok, I’ll try it again. But as I said, I definitely got the same error even with BlynkTimer.

That’s not the problem.

@nikuz we need to see a minimal sketch that demonstrates your problem.

It doesn’t matter actually, because I have another piece of important code in a loop. It’s a serial communication read, and I can’t remove it.

But what the beauthifull design? If the library can’t work properly if so insignificant piece of code is written beside of blynk api requests?

Yeh, I’ll try to produce it. Yesterday I definitely got that error on the code slightly different from which I posted before. Everytime I need to wait one to few hours to reproduce it.

I only had to wait a few seconds to test your earlier sketches. Just add 4294950000 ms to now / millis() to trigger the rollover without waiting for 71 minutes.

1 Like

Did you see the [0] Heartbeat timeout: 0, 4293769, 4293745 message in the logs? It alwais happens when a real rollover happen. The code of the end of the loop just show a real uptime even the rollover happened. If you can’t change the millis() return value, you can’t trigger the the rollover.

But you should be able to simulate the rollover with code.

How? If blynk library depends on the millis() function. Please ignore my code which just print the uptime. Blynk timers just stop to run after the real rollover. Again, I tried with the internal BlynkTimer, and definitely got this error. I’ll focus reproduce this on the clear code today. I’ll write when I’ll get the error with exact code.

@vshymanskyy how does your skipTimes addition to SimpleTimer handle millis() rollover?

From https://github.com/blynkkk/blynk-library/blob/1de0dfd9e867e6ae26178a8d016538d2cc05ede5/src/utility/BlynkTimer.cpp

                unsigned long skipTimes = (current_millis - timer[i].prev_millis) / timer[i].delay;
                // update time
                timer[i].prev_millis += timer[i].delay * skipTimes;

@vshymanskyy looks like it probably has the effect of stopping timers.

Maybe something like this instead:

                unsigned long skipTimes;
                if(current_millis > timer[i].prev.millis)
                {
                  skipTimes = (current_millis - timer[i].prev_millis) / timer[i].delay;  
                }
                else
                {
                  timer[i].prev.millis =  current_millis;  
                }

By hacking the libraries.

You could have tried adding 4294950000 ms to current_millis / timer[i].prev.millis etc.

I didn’t look at the libraries too closely but I suspect none of the SimpleTimer versions, Blynk’s included handles the rollover that well. The new Blynk version with skipTimes probably works badly in your particular case of very frequent sync’s etc.

Here is the code which gave me the loop break now

#define BLYNK_DEBUG // Optional, this enables lots of prints
#define BLYNK_PRINT Serial

#include <WiFi.h>
#include <BlynkSimpleEsp32.h>

const char *SSID = "";
const char *PSWD = "";

char blynkAuth[] = "";
char blynkDomain[] = "192.168.1.4";
const uint16_t blynkPort = 8080;

const unsigned long blynkConnectAttemptTime = 5L * 1000L;  // try to connect to blynk server during only 5 seconds
BlynkTimer blynkTimer1;
BlynkTimer blynkTimer2;

const unsigned long updateInterval = 1L * 1000L;

void blynkSync1() {
    Blynk.virtualWrite(V1, "1");
}

void blynkSync2() {
    Blynk.virtualWrite(V2, "2");
}

void blynkSync3() {
    Blynk.virtualWrite(V3, "3");
}

void blynkSync4() {
    Blynk.virtualWrite(V4, "4");
}

void blynkSync5() {
    Blynk.virtualWrite(V5, "5");
}

void blynkSync6() {
    Blynk.virtualWrite(V6, "5");
}

void blynkSync7() {
    Blynk.virtualWrite(V7, "5");
}

void blynkSync8() {
    Blynk.virtualWrite(V8, "5");
}

void blynkSync9() {
    Blynk.virtualWrite(V9, "5");
}

void blynkSync10() {
    Blynk.virtualWrite(V10, 1);
}

void blynkSync11() {
    Blynk.virtualWrite(V11, 1);
}

void blynkSync12() {
    Blynk.virtualWrite(V12, 1);
}

void blynkSync13() {
    Blynk.virtualWrite(V13, 1);
}

void blynkSync14() {
    Blynk.virtualWrite(V14, 1);
}

void blynkSync15() {
    Blynk.virtualWrite(V15, 1);
}

void blynkSync16() {
    Blynk.virtualWrite(V16, 1);
}

void blynkSync17() {
    Blynk.virtualWrite(V17, 1);
}

void blynkSync18() {
    Blynk.virtualWrite(V18, 1);
}

void blynkSync19() {
    Blynk.virtualWrite(V19, 1);
}

void blynkSync20() {
    Blynk.virtualWrite(V20, 1);
}

void blynkSync21() {
    Blynk.virtualWrite(V21, 1);
}
void blynkSync22() {
    Blynk.virtualWrite(V22, 1);
}
//

void blynkConnect() {
    if (WiFi.isConnected() && !Blynk.connected()) {
        unsigned long startConnecting = millis();
        while (!Blynk.connected()) {
            Blynk.connect();
            if (millis() > startConnecting + blynkConnectAttemptTime) {
                Serial.println("Unable to connect to Blynk server.\n");
                break;
            }
        }
    }
}

void setup() {
    Serial.begin(115200);
    while (!Serial) { ;
    }

    WiFi.mode(WIFI_STA);
    WiFi.begin(SSID, PSWD);

    for (int loops = 10; loops > 0; loops--) {
        if (WiFi.isConnected()) {
            Serial.println("");
            Serial.print("IP address: ");
            Serial.println(WiFi.localIP());
            break;
        } else {
            Serial.println(loops);
            delay(1000);
        }
    }
    if (!WiFi.isConnected()) {
        Serial.println("WiFi connect failed");
    }

    Blynk.config(blynkAuth, blynkDomain, blynkPort);
    blynkConnect();

    blynkTimer1.setInterval(1000L, blynkConnect);
    blynkTimer1.setInterval(2000L, blynkSync1);
    blynkTimer1.setInterval(2050L, blynkSync2);
    blynkTimer1.setInterval(2100L, blynkSync3);
    blynkTimer1.setInterval(2150L, blynkSync4);
    blynkTimer1.setInterval(2200L, blynkSync5);
    blynkTimer1.setInterval(2250L, blynkSync6);
    blynkTimer1.setInterval(2300L, blynkSync7);
    blynkTimer1.setInterval(2350L, blynkSync8);
    blynkTimer1.setInterval(2400L, blynkSync9);
    blynkTimer1.setInterval(2450L, blynkSync10);
    blynkTimer1.setInterval(2500L, blynkSync11);
    blynkTimer1.setInterval(2550L, blynkSync12);
    blynkTimer1.setInterval(2600L, blynkSync13);
    blynkTimer1.setInterval(3050L, blynkSync14);
    blynkTimer2.setInterval(3100L, blynkSync15);
    blynkTimer2.setInterval(3150L, blynkSync16);
    blynkTimer2.setInterval(3200L, blynkSync17);
    blynkTimer2.setInterval(3250L, blynkSync18);
    blynkTimer2.setInterval(3300L, blynkSync19);
    blynkTimer2.setInterval(3350L, blynkSync20);
    blynkTimer2.setInterval(3400L, blynkSync21);
    blynkTimer2.setInterval(3450L, blynkSync22);
}

void loop() {
    if (Blynk.connected()) {
        Blynk.run();
    }
    blynkTimer1.run();
    blynkTimer2.run();

    uint32_t now = millis();
    static uint32_t lastUpdate = 0;
    static uint32_t overFlowCounter = 0;
    if (now < lastUpdate) { // overflow
        overFlowCounter++;
        lastUpdate = now;
    }
    if (now - lastUpdate > updateInterval){
        lastUpdate = now;
        Serial.print("uptime = ");
        Serial.println((overFlowCounter * 4294968) + now);
    }
}

Logs:

...
uptime = 7817532
[3522581] <[14|00|F6|00|06]vw[00]4[00]4
[3522650] <[14|00|F7|00|06]vw[00]9[00]5
[3522718] <[14|00|F8|00|07]vw[00]13[00]1
[3522787] <[14|00|F9|00|07]vw[00]17[00]1
[3522854] <[14|00|FA|00|07]vw[00]12[00]1
[3523031] <[14|00|FB|00|07]vw[00]16[00]1
[3523181] <[14|00|FC|00|07]vw[00]14[00]1
[3523331] <[14|00|FD|00|06]vw[00]2[00]2
uptime = 7818533
[3523581] <[14|00|FE|00|07]vw[00]21[00]1
[3523681] <[14|00|FF|00|07]vw[00]11[00]1
[3523781] <[14|01|00|00|06]vw[00]5[00]5
[3523849] <[14|01|01|00|07]vw[00]19[00]1
[3523931] <[14|01|02|00|06]vw[00]6[00]5
[3524081] <[14|01|03|00|06]vw[00]3[00]3
[3524151] <[14|01|04|00|07]vw[00]15[00]1
[3524219] <[14|01|05|00|06]vw[00]1[00]1
[3524431] <[14|01|06|00|07]vw[00]10[00]1
uptime = 7819534
[3524581] <[14|01|07|00|06]vw[00]7[00]5
[3524650] <[14|01|08|00|06]vw[00]8[00]5
[3524731] <[14|01|09|00|06]vw[00]4[00]4
[3524981] <[14|01|0A|00|06]vw[00]9[00]5
[3525049] <[14|01|0B|00|07]vw[00]20[00]1
[3525181] <[14|01|0C|00|07]vw[00]13[00]1
[3525249] <[14|01|0D|00|07]vw[00]18[00]1
[3525316] <[14|01|0E|00|07]vw[00]12[00]1
[3525384] <[14|01|0F|00|06]vw[00]2[00]2
...
uptime = 8589719
[4294781] <[14|01|C3|00|06]vw[00]3[00]3
[4294851] <[14|01|C4|00|07]vw[00]13[00]1
[4294919] <[14|01|C5|00|07]vw[00]16[00]1
[0] Heartbeat timeout: 0, 4286091, 4285549
[1] Connecting to 192.168.1.4:8080
[13] <[02|00|01|00] c9caa42869de4ea6a087520f68814788
[3016] Login timeout
[5016] Connecting to 192.168.1.4:8080
[5027] <[02|00|01|00] c9caa42869de4ea6a087520f68814788
[5315] >[00|00|01|00|C8]
[5315] Ready (ping: 286ms).
[5382] <[11|00|02|00]Fver[00]0.5.4[00]h-beat[00]10[00]buff-in[00]1024[00]dev[00]ESP32[00]build[00]Jan 16 2019 16:14:20[00]
[7297] >[00|00|02|00|C8]
[15385] <[06|00|03|00|00]
[16527] >[00|00|03|00|C8]
[25389] <[06|00|04|00|00]
[26985] >[00|00|04|00|C8]
[35393] <[06|00|05|00|00]
[35805] >[00|00|05|00|C8]
[45397] <[06|00|06|00|00]
[45692] >[00|00|06|00|C8]
[55401] <[06|00|07|00|00]
[58402] <[06|00|08|00|00]
[59239] >[00|00|07|00|C8]
[59603] >[00|00|08|00|C8]
[68404] <[06|00|09|00|00]
[71405] <[06|00|0A|00|00]
[74406] <[06|00|0B|00|00]
[77407] <[06|00|0C|00|00]
[78604] Heartbeat timeout: 78604, 59603, 77407
[82408] Connecting to 192.168.1.4:8080
[82421] <[02|00|01|00] c9caa42869de4ea6a087520f68814788
[82938] >[00|00|01|00|C8]
[82939] Ready (ping: 517ms).
[83006] <[11|00|02|00]Fver[00]0.5.4[00]h-beat[00]10[00]buff-in[00]1024[00]dev[00]ESP32[00]build[00]Jan 16 2019 16:14:20[00]
[83912] >[00|00|02|00|C8]
[93009] <[06|00|03|00|00]
[96010] <[06|00|04|00|00]
[99011] <[06|00|05|00|00]
[102012] <[06|00|06|00|00]
[102913] Heartbeat timeout: 102913, 83912, 102012
[107013] Connecting to 192.168.1.4:8080
[107022] <[02|00|01|00] c9caa42869de4ea6a087520f68814788
[107629] >[00|00|01|00|C8]
[107629] Ready (ping: 605ms).
[107696] <[11|00|02|00]Fver[00]0.5.4[00]h-beat[00]10[00]buff-in[00]1024[00]dev[00]ESP32[00]build[00]Jan 16 2019 16:14:20[00]
[107823] >[00|00|02|00|C8]
[117701] <[06|00|03|00|00]
[118191] >[00|00|03|00|C8]
[127705] <[06|00|04|00|00]
[128156] >[00|00|04|00|C8]
[137709] <[06|00|05|00|00]
[138211] >[00|00|05|00|C8]
[147713] <[06|00|06|00|00]
[147778] >[00|00|06|00|C8]
[157718] <[06|00|07|00|00]
[159056] >[00|00|07|00|C8]
[167722] <[06|00|08|00|00]
[168059] >[00|00|08|00|C8]
[177726] <[06|00|09|00|00]
...

Uptime of my main app, which avoid to do something around the millis() rollover is more than 9 hours for now. It’s a new record.

The maximum value of millis should be the max uint32 value, which is 4'294'967'295 ms. That would take 49.7 days for this issue to happen. Dou you know why on ESP32 it overflows on 4294967 ms?

@vshymanskyy that’s a good point about the 50 days.

I rarely use the ESP32 and most of what I do is for ESP8266’s.

uint32 when used for deepSleep() is based on microseconds rather than milliseconds and therefore maximum deepSleep was 71 minutes until recently.

uint64 is now available which in theory gives a maximum deepSleep of thousands of years. Espressif have capped this to a practical level of about 3.5 hours.

Maybe the ESP32 is working on microseconds?

1 Like

Yeh, it was my first question. Here is the answer:

1 Like

Looks like ESP32 still needs a lot of development if it doesn’t understand the basics of a millisecond.

Using unsigned values to calculate time differences has some tricks. In order to determine a time delay from a previously captured time, you have to do a subtract in order for the rollover to be taken care of automatically by overflow and truncation. If you compare the current time to a value + a desired delay, you will have trouble at the rollover.

For instance if the current time recently rolled over and = 5 milliseconds and you previously saved a start time that’s lets say for example, the max value time could be (0xFFFFFFFFUL), and then also, lets assume you’re trying to measure a time delay of 1000 milliseconds.

So, assuming all variables are unsigned long:
SavedTime = 0xFFFFFFFFUL;
CurrentTime = 5;
DesiredDelay = 1000;

In unsigned math, using a subtraction in the comparison yields:
(CurrentTime - SavedTime > DesiredDelay) = (5 - 0xFFFFFFFF > 1000), where 5 - 0xFFFFFFFF = 6. This is the correct comparison: Is the elapsed time (6) greater than 1000 milliseconds.

If you do an addition in the comparison it yields:
(CurrentTime > SavedTime + DesiredDelay) = (5 > 0xFFFFFFFF + 1000), where 0xFFFFFFFF + 1000 = 4095 due to rollover. This is an incorrect comparison: Is 5 greater than 4095. This comparison will be correct except when the current time has rolled over and the saved time has not.

For sure, this line should be rewritten from this:
if (millis() > startConnecting + blynkConnectAttemptTime) {
to this:
if (millis() - startConnecting > blynkConnectAttemptTime) {

I also have questions about rollover affecting the print statement:
Serial.println((overFlowCounter * 4294968) + now);
This could easily not print what you’re expecting. The constant 4294969 is an int. And, “now” can be the max value of an unsigned long. So all together, you could have overflow and wrap around here. Maybe print all 3 values separately to see their true values before the math.

My $0.02. Hope it helps.

Also just noticed this: “((2^32)-1)/1000” from the link to another article you provided. That would defeat using rollover math until this gets fixed. What I said is still value if you don’t get a millis() that has been divided by 1000. %)

4 Likes

@jtthewoodworker thanks. I think you’re completely right, you just expressed my thoughts.

As a conclusion of my initial problem:

  1. I do not use the BlynkTimer, instead I use a slightly modified SimpleTimer with taking care about millis() overflows. Here is the run method:
void MyTimer::run() {
    int i;
    unsigned long current_millis;
    bool overflowIncreased = false;

    // get current time
    current_millis = elapsed();

    for (i = 0; i < MAX_TIMERS; i++) {
        // no callback == no timer, i.e. jump over empty slots
        if (callbacks[i]) {
            // is it time to process this timer ?
            // see http://arduino.cc/forum/index.php/topic,124048.msg932592.html#msg932592
            if (current_millis < prev_millis[i]) { // overflow
                if (!overflowIncreased) {
                    overFlowCounter++;
                    overflowIncreased = true;
                }
                prev_millis[i] = current_millis;
            }

            if (current_millis - prev_millis[i] >= delays[i]) {
                // update time
                prev_millis[i] = current_millis;
                (*callbacks[i])();
            }
        }
    }
}
  1. I do not do any blocking operations 10 seconds around overflows.

And my app working very stable for days. Not a single hang or glitch.