Is there a problem with the Blynk v2.0 server?

After a year of ‘dropouts’ with my simple remote monitoring v1.0 project (and two previous years of 100% solid working), I finally upgraded to v2.0. It has worked solidly for two months, but today, went offline again.
During my v1.0 ‘offlines’ I tried various code changes, and a complete hardware replacement, in case that was the issue. I eventually came to the conclusion that it was probably a server issue, and there was no motivation for the operator to fix the problem.
I had expected better with v2.0.
As the only rescue, when the system goes offline is an expensive round-trip in order to press the RESET button on the ESP2866 NodeMCU, I am wondering if anyone else has played with a ‘watchdog’ type function.
My experience with the v1.0 failures was that when it was offline, the serial output was a 5sec repeat of ‘Connecting to 139.59.207.133’ (the old host). To me, this implied that the wifi link to the router had failed (other devices on the same router were all working perfectly), and this needed to be re-initialised before that request would be transmitted.
If a similar problem is occurring with v2.0 (I haven’t yet had time to go out and hook up my laptop to see if the same issue has occurred), I’d like to detect that condition and force a system re-boot.
Alternatively, I could set a flag each time a successful interchange has occurred, and force a reboot if the flag is not set.
To save me hours of trial and error, has anyone played with such a scheme?

(An alternative approach might be to force a reset every hour (for example).)

That’s the Legacy server located in Frankfurt, and that’s where my Legacy projects live. I have one project that uses a NodeMCU to write data to the server once every minute, and this has been running continuously since February 16th 2018. The only interruptions have been when I’ve rebooted my router or switch the power off to do electrical work.
I’ve never had to physically go to the device (which is in my loft) to reboot it.

So, the Frankfurt Legacy server hasn’t been an issue during that time, and if you’ve had issues then I’d say that they would most likely have been caused by your ISP, your Router, your internal WiFi network, your hardware or the way that you’re powering your hardware.

I have some devices that are located at our holiday home in Spain, and I don’t run any Blynk code on these. I have previously had issues with them failing to connect to WiFi or my MQTT server over time, so I developed a hardware watchdog that would reboot the devices if they stopped responding.
The hardware watchdog is designed as “shield” for the Wemos D1 Mini, and I also have a version that works with the ESP32 Devkit V4 boards, but I’ve never bothered to make one for the NodeMCU as it’s not a board I use very often (the Wemos D1 Mini is basically the same, but with a smaller form-factor and with the unuseable pins omitted). Here’s a link to the D1 Mini version of the hardware watchdog…

Building the board, and getting (or building) a suitable programmer for the ATtiny85 is a reasonable investment in time, but nothing compared to making a round-trip to Spain from London to reboot a device.

Pete.

Many thanks Pete, and your link is most interesting.
I obviously suspected my local system but, as it had run continuously for a couple of years, and it was only during the run-up to the v1.0 server shutdown, I’d kind of assumed server issues. This seemed to be confirmed when I went to v2.0, and all seemed solid for a couple of months.
If I assume that a glitch on my system has caused an interruption in ESP-Router comms, is there any feedback from Blynk.run or Blynk.virtualWrite (called every second in myTimerEvent that indicates failure to communicate?
Or is there a Blynk function that returns a ‘healthy’ condition?
Dave

If you’re using Blynk.begin() then your options are limited because it’s a blocking function.

If you switch to managing your own WiFi connection and using Blynk.config() and Blynk.connect() then it’s non-blocking, and you can then use Blynk.connected() to check if you are connected to the Blynk server.

BTW, I didn’t make it clear that the Legacy project is still running perfectly. I intend to use it as my way of knowing when the servers have been taken down.

Pete.

So I think you are suggesting I use Blynk.config() and Blynk.connect() within my setup().
Can I then use Blynk.connected() in my 1sec timer routine, and then Blynk.connect() if failed?

There’s more to it than that, but yes.

You can, but it sounds like an ESP.reset() may be needed at some point.

Pete.

Just trying to learn about these functions at the moment Pete.
Is ESP.reset() already a library function?
Presume it cuts its own throat, which sounds like a more reliable plan than trusting to Blynk.connect()??

It’s a command that is part of the ESP8266 core, which you will have installed (and hopefully updated) at some point (Tools > Board > Boards Manager in the IDE).

I’d expect your Blynk.config() sketch to be doing checks to see if it’s connected to WiFi then Blynk on a regular basis and trying to re-establish connections.
However, as you’ve said that your device requires a hardware reset to resolve the issue then using ESP.reset() has a similar effect to a hardware reset and may be needed as a last resort.

Pete.

Thanks Pete.
I think I’m going mad - I was looking for a list and description of the various Blynk functions - as used to be in the v1.0 documentation. But can’t find anything similar on the current website. Have I missed something?
Dave

Do you mean this?…

https://docs.blynk.io/en/blynk.edgent-firmware-api/configuration#blynk.config

Pete.

Perfect. Thank you. I’d seen ‘edgent’ and assumed it didn’t apply to me as I’m not using it - just Blynk.begin() in setup, then Blynk.run() in the loop.
Do I need to be using BlynkEdgent.run, etc? What is the difference?

I don’t understand your comment & questions in relation to the link I provided. It makes no mention of Blynk Edgent and if you’re planning on going down the Edgent route then that’s a whole different conversation.

Pete.

I’m sorry for any confusion Pete. It is in the URL as …blynk.edgent-firmware… etc.
But you have answered my question. Thank you.
I’ll go away and work out a possible fix now and seek your comments later, if I may.
Dave

This sketch might help point you in the right direction…

It doesn’t have any ESP.reset() option after multiple failed connection attempts, but at least it will point you in the right direction.

Pete.

Thanks.

Had to revisit this as my system has become completely unreliable.
History:

  1. Worked for several years as a legacy system with no dropouts.
  2. Prior to the ending of the legacy system, started to go offline more and more often, requiring a physical reboot to re-connect. I assumed this might be a nudge from the legacy servers to push me into V2.
  3. Converted code to V2 and all good for many months.
  4. Dropouts again.
  5. Added timeout event code as per Peter’s excellent suggestion above.
  6. No improvement.
  7. Went for the ‘nuclear option’! Added an external timeclock that killed ESP power for 5 minutes at midnight.
  8. Good for some months, then it went into a bistable mode where every other boot-up resulted in no connection. So good one day, but off the next.
  9. Added an external physical watchdog. This would pulse the RST line of the ESP if a 1 second toggling output in the timer subroutine didn’t occur. This can’t fail, I thought - it will keep on resetting until the void loop is executing, which only happens when we have a successful connection. And so it did until 12 hours later, and it’s offline again.
    This remote monitoring station is a 100 mile round trip and basically it doesn’t work anymore. It did.
    All suggestions gratefully received.

These sound like router (or possibly ISP) issue to me.
Have you tried adding a system to reboot your router regularly?

There’s also a possibility that it’s a DNS issue, with the wrong server being returned by the DNS lookup. This would most likely give an “Invalid Auth token” message in your serial monitor, but I guess you’re not monitoring that.
Bad DNS lookup is more common with GSM modems, due to the nature of the beast, but it can happen with other systems too. I think the Blynk servers try to mitigate this in a way, but u guess that’s not always possible.
The solution to this is probably to include the subdomain for your regional server in the Blynk.begin() command.

Experimenting with SSL or other ports may also be worthwhile.

If it’s a 100 mile round trip then I’d probably have multiple devices at the remote location, each using a different strategy to see which works best.

Pete.

Pete, many thanks for your kind help.
The router is a SIM based 3G/4G device that independently uploads a couple of CCTV cameras, so I can confirm that it is still active. (Interestingly, I have forced it to 3G because in auto mode it would always favour the 4G connection. I discovered that whilst the 4G gave much better download speeds, its upload rate was much worse than the 3G!)
I can understand that it might fail to connect on occasions, I would have thought the continual reboot would eventually succeed.
When I have been on-site when connection has failed, the serial debug output confirms router connection, but no response. I’ll try to put together some output text. I can’t remember how to include code/text.

That doesn’t mean much in my experience.
Just because the device has established a WiFi connection to the router and obtained an IP address (assuming that is happening) it doesn’t mean that the router will route the data packets correctly.

Also, having CCTV cameras connected and working doesn’t mean that the router is functioning correctly. It could still be misrouting the Blynk data packets.

I’d certainly try rebooting the router when you have these issues, and I’d put the regional subdomain in the Blynk.begin() command. I’d also ensure that you’re using port 8080 or 443 rather than the default port 80, which the ISP may be blocking/throttling/terminating if they suspect that you’re costing a web page.

Are you using a Vodafone sim in your router by any chance?

BTW, I just looked-up your location to see if you were in South Africa (there have been issues with Vodafone SA in the past) and realised that you are in Nottingham - my neck of the woods originally.

Pete.

Thanks Pete.
Yes, Nottingham and Three.
I’ve set up my bench version and a hotspot on a tablet with a fairly crappy 3G signal (but strong wifi).
Just running now and I see a hiccup. (Phone app not currently running.)
Also occasional and random [06] packets.
What are they, and what is “Cmd error” - I’ve searched the Blynk site and can’t find any reference.

[311566] <[14|02]w[00|06]vw[00]7[00]0
[311633] <[14|02]x[00|06]vw[00]9[00]1
[311700] <[14|02]y[00|07]vw[00]11[00]1
[311767] <[14|02]z[00|07]vw[00]13[00]0
[311834] <[14|02]{[00|07]vw[00]14[00]1
[312365] <[14|02]|[00|0B]vw[00]6[00]11.786
[312388] <[06|02]}[00|00]
[312432] <[14|02]~[00|06]vw[00]1[00]1
[312499] <[14|02|7F|00|06]vw[00]4[00]1
[312566] <[14|02|80|00|06]vw[00]7[00]0
[312633] <[14|02|81|00|06]vw[00]9[00]1
[312700] <[14|02|82|00|07]vw[00]11[00]1
[312767] <[14|02|83|00|07]vw[00]13[00]0
[312834] <[14|02|84|00|07]vw[00]14[00]1
[313365] <[14|02|85|00|0B]vw[00]6[00]11.786
[313432] <[14|02|86|00|06]vw[00]1[00]1
[313499] <[14|02|87|00|06]vw[00]4[00]1
[313566] <[14|02|88|00|06]vw[00]7[00]0
[313633] <[14|02|89|00|06]vw[00]9[00]1
[313700] <[14|02|8A|00|07]vw[00]11[00]1
[313767] <[14|02|8B|00|07]vw[00]13[00]0
[313834] <[14|02|8C|00|07]vw[00]14[00]1
[314365] <[14|02|8D|00|0B]vw[00]6[00]11.802
[314432] <[14|02|8E|00|06]vw[00]1[00]1
[320836] <w[00]1[00]1
[327086] Cmd error
[327388] Connecting to blynk.cloud:80
[hostByName] request IP for: blynk.cloud
[hostByName] Host: blynk.cloud IP: 159.65.55.83
[333391] Connecting to blynk.cloud:8080
[hostByName] request IP for: blynk.cloud
[hostByName] Host: blynk.cloud IP: 159.65.55.83
[339396] Connecting to blynk.cloud:80
[hostByName] request IP for: blynk.cloud
[hostByName] Host: blynk.cloud IP: 159.65.55.83
[343197] <[1D|00|01|00|20]30fSk8zmxzsd3TWGl9SZxPLjePRaB0B1
[343707] >[00|00|01|00|C8]
[343707] Ready (ping: 510ms).
[343707] Free RAM: 45072
[343774] <[11|00|02|00]zmcu[00]0.0.0[00]fw-type[00]TMPLr84IvuN1[00]build[00]Oct[20]14[20]2023[20]17:51:37[00]blynk[00]1.3.2[00]h-beat[00]45[00]buff-in[00]1024[00]dev[00]ESP8266[00]tmpl[00]TMPLr84IvuN1
[343854] >[00|00|02|00|C8]
[344365] <[14|00|03|00|0B]vw[00]6[00]11.725
[344433] <[14|00|04|00|06]vw[00]1[00]1
[344500] <[14|00|05|00|06]vw[00]4[00]1
[344567] <[14|00|06|00|06]vw[00]7[00]0
[344634] <[14|00|07|00|06]vw[00]9[00]1
[344701] <[14|00|08|00|07]vw[00]11[00]1
[344768] <[14|00|09|00|07]vw[00]13[00]0
[344835] <[14|00|0A|00|07]vw[00]14[00]1
[345365] <[14|00|0B|00|0B]vw[00]6[00]11.740
[345433] <[14|00|0C|00|06]vw[00]1[00]1
[345500] <[14|00|0D|00|06]vw[00]4[00]1
[345567] <[14|00|0E|00|06]vw[00]7[00]0
[345634] <[14|00|0F|00|06]vw[00]9[00]1
[345701] <[14|00|10|00|07]vw[00]11[00]1
[345768] <[14|00|11|00|07]vw[00]13[00]0
[345835] <[14|00|12|00|07]vw[00]14[00]1```