Device goes offline in less than one hour

Dana · August 17, 2022, 5:38pm

Using:
• Arduino Uno Wifi Rev2 on a local wifi network
• Iphone ProMax 13 running Legacy Blynk App
• connected to Blynk cloud. The IDE is the Arduino online editor.

When powered up the device connects to local wifi network, then connects to Blynk server and works fine sending once every 15s a message (uptime) to Blynk App. I can also receive data sent to device from Blynk app. After about 0.5 to 1 hour the app notifies that device is offline.
But the device does not think it’s offline. It checks connection to the Blynk server, it can ping internet webhosts. including blynk. I can query the device over the serial port and it’s online. It pings
IPAddress webhostName(45, 55, 96, 146); // IP address of “blynk-cloud.com” this is the address to which VM count and uptime are sent
Problem occurs both when in stand-alone mode or when connected to a computer and serial monitor.

When I restart my device, it works again for about a half hour. I have two identical devices, they both do it.

I suspected I might be inadvertently sending traffic to the blynk cloud too frequently, but messages sized at less than 30 characters are spaced 15 seconds apart.

My code detects loss of wifi and loss of internet separately and will automatically restart on the former.
The device keeps working right through this ‘device offline’ notification but of course my messages fail to appear on the app. When I purposely temporarily inhibit its internet connection by rebooting the wifi router, the device detects it and restarts, at which point it works fine again once the connection is restored, but only for a while. The time it stays online is somewhat random but mostly varies between 30 minutes to an hour with some outliers.

Is the legacy blynk app (being deprecated in Dec) trying to give me an incentive to upgrade (I’m working on it) by disconnecting legacy devices periodically? My code is over 1200 lines so I won’t post it but it’s been working for over a year. Yes, I’ve made changes recently but, you know… my changes have nothing to do with the problem!!! Really, all I did was convert some inline code to functions. My void loop () is clean. Everything is done from a 15 sec timer routine.
Thanks for any thoughts.
-Dana

Kamran · August 17, 2022, 7:50pm

Hi,
Same problem i faced when wrote a lot of function. My device goes offline due overload ram. So, to solve this problem i have devided function to different timers (not on one timer). Example, temperature sensor reading not so important for me and i set timer for temperature reading function about 5 minute.
Also, some timers can meet future in one time and this is also overload for ram depending on coding. Like 15 second and 60 second can meet on 60 second, 120 second. It is better write such timers so they not going to meet so early. Like 17 second, 63 second timers .

Good luck,
Kamran

Dana · August 18, 2022, 1:01am

thanks, I’ll try that.
But I would think overloaded RAM would crash the device, not just take it offline.
When your device went offline, was it still running like mine does? My code seems perfectly happy. I have the serial monitor open to which I send status messages indicating everything is working properly.
Does the compiler let you know if you are using too much RAM?
You are right that the extra functions are the only thing that changed so that likely points to the problem, and my timer event is very busy.

I just ran a compile and got these numbers for program and dynamic memory usage.

Sketch uses 36668 bytes (75%) of program storage space. Maximum is 48640 bytes.

Global variables use 2432 bytes (39%) of dynamic memory, leaving 3712 bytes for local variables. Maximum is 6144 bytes.

Kamran · August 18, 2022, 5:15am

Dana,
My sketch was just 28%, but function take a lot of ram for calculation, e.t.c.

Arduino was working normally, but about blynk server serial monitor show high ping and disconnection.

Dana · August 23, 2022, 2:26am

I’ve simplified my code a little bit by deleting some functions and the online duration increased.

I now detect within a few seconds of when device goes offline and attempt to reconnect without restarting or resetting, which always works. Then it continues to run for about the same length of time before being knocked offline again. It’s a work-around without really solving the problem.

Is there any way to get a status of RAM usage?

I would like to get a look at RAM right when the device gets knocked offline.

My timer only does about four things and setting up a second timer and splitting the work load doesn’t seem like it would save much so I haven’t tried that yet.

bazzio · August 23, 2022, 10:17am

One tip: Throw your ancient uno in the wastebin and purchase a decent modern microcontroller like the Esp32 for a couple of dollars…

Kamran · August 23, 2022, 5:36pm

Have you used delay?
It is better put here your codes if you want help.

Dana · August 23, 2022, 5:46pm

I need EEPROM, which the esp32 doesn’t have. I suppose I could add it but I got a bunch of devices that all use the Uno board that are 3-4 years old.

Kamran · August 23, 2022, 6:28pm

then use mega wifi board

Kamran · August 23, 2022, 6:28pm

or seperate boards

PeteKnight · August 23, 2022, 6:44pm

Of course it does.

However, EEPROM on any dev board, including Arduinos and ESP8288 and ESP32 boards, has a typical life of around 100k write events. As a result it’s recommended that you don’t use the basic EEPROM library and instead use SPIFFS, or the later LittleFS functionality.

Pete.

Dana · August 24, 2022, 11:24am

This reminds me of the time a cop stopped me one winter morning for driving with snow on the roof. Instead of giving me a ticket he berated my Ford Explorer because at the time they tended to roll over.

I don’t want to change hardware to fix what I think a software problem unless I can fault-isolate the problem to the hardware such as insufficient RAM.

I added code to report every 15 sec how much free RAM is available on suspicion of a memory leak. After four hours free memory diminished progressively by 4 bytes from 2886 to 2882. I think that might be explained by the increase size of two variables that are based on uptime. In any case the free RAM did not diminish to nearly zero which I expected to see if there were a memory leak.

My EEPROM use is limited to one variable, an integer and it’s stored in EEPROM so that it will survive a power outage. It is written to EEPROM only when it changes, which is rarely, maybe once or twice per day. I don’t expect 100k writes over the life of my device but during development testing it does get exercised. My plan, if it wears out was just to move the variable to a different EEPROM location. I’ve thought about routinely rotating its storage location around, say over 100 locations so as to not wear out one particular location but so far that’s been unnecessary.

Good to know that these newer boards have EEPROM. I could not find that in the specifications the last time I looked.
Thanks

PeteKnight · August 24, 2022, 11:42am

I’m not berating you for your choice of hardware, simply pointing-out an incorrect statement that you’ve made. The problem is that others will read this and repeat it if inaccurate statements aren’t challenged.

Having said that, I dislike using hardware like the Uno, Nano, Mega etc for IoT solutions, because they don’t have native support for internet connectivity.
Going down the route of adding WiFi connectivity to 12 year old technology is a bit like adding aircon to a horse drawn cart - it can be done, but why not drive an SUV instead?

TBH, you’re wasting your limited available time messing around with Legacy. You’d be better migrating to Blynk IoT, even if it’s just a free account for testing purposes. If you’re going to do this migration then why not spend a few dollars on buying some up to date hardware as well?

Pete.

Dana · August 25, 2022, 3:34am

Ok. Point taken. Still I would like to migrate with working code.

I don’t think my code or the Hw is causing device to go offline, I think it’s being booted offline by the Blynk server because of misbehavior. It’s violating a rule I think so need to do a little research why Blynk would knock a device offline. For instance, I ping the Blynk server to see if there’s internet connection every 15 s. Once a ping is started I assumed it pings once and stops. It just occurred to me that maybe the ping keeps pinging until commanded to turn off, which I don’t do.

PeteKnight · August 25, 2022, 6:48am

Maybe you should share your code.

Pete.

Madhukesh · August 26, 2022, 5:49am

If you are flooding the server with 1000’s of requests per second, then the server would stop responding to your device.

Dana · August 30, 2022, 1:49am

Definitely not sending traffic to Blynk server thousands of times per second although I don’t know how to prove that. Maybe with a network analyzer which I don’t own.

In trying to fault isolate which of my functions might be the culprit, I added some monitoring code that counts the number of times selected functions execute to see if the disconnect events would correlate with those execution counts. Looked at how many times a function executed before a disconnect. I found no correlation. Numbers varied from 50 to 28,000 times between disconnects. Free memory available at time of disconnect remained constant at about 2600 bytes.

Since 99% of what the program does occurs in myTimer, I made the timer run less frequently by a factor of 10. If the number of device offline disconnects reduces in frequency accordingly then the problem is with code within myTimer. If not, the problem might lie in the void loop().

After increasing the timer interval, results were that the frequency of disconnects did not change much. For example, there were 19 disconnects/reconnects in 33,399 sec uptime. That averages 1 disconnect every 2200 seconds but the time between disconnects is highly variable ranging from 75 seconds to 6225 seconds. Since the timer interval is set at 75 seconds and the !Blynk.connected check is done in the timer it means the disconnect was not discovered in real time, but at the first 75-sec check. It could have been much sooner.

Before the 10x decrease in timer run frequency, the average time-to-disconnect was less but not 10x less. Again the averages for a > 16 disconnects sample size were variable from 327 sec to 2,187 sec. Given the random variance these sample sizes are likely too small.

As I write this it’s been running for 13,517 seconds with 13 disconnects averaging a disconnect every 1,039 sec.

The void loop is calling one function because it must execute every 150 ms so an if-statement is used to schedule its execution based on millis() mod 150. I’ll look at that again but when I commented out the two calls to that one function, so it ran zero times from either the void loop or the timer, the disconnects persisted.

Madhukesh · August 30, 2022, 4:02am

You don’t have to own anything. Just post your sketch properly formatted with (```) at the beginning n ending of the sketch.

We will help you .

PeteKnight · August 30, 2022, 6:38am

Pete.

Dana · August 31, 2022, 9:45pm

It’s 1200 lines of code including comments. If I was asked to review it for bugs, I’d be overwhelmed. It’s too much to expect anyone to wade through that much code who is not being paid. And that is why I haven’t posted it even though more than one of you have asked for it. But I’m willing to post it since I’m asking for help and of course the code is what you need.

So, I will redact the auth codes, passwords, phone numbers and email addressed and post it. I’ll be stunned and amazed if anyone figures it out. I’m assuming it’s just a cut and paste.

Should I try to provide some sort of guide. I used a lot of comments mostly for my own benefit so that too will overwhelm. A lot of the code is added for debug so it’s a lot bigger than it needs to be. I would like to provide an index showing where setup(), void loop(), myTimer and the functions are located but the line numbers are not copied so there are no reference points.