ESP8266 Library, call from a device not registered or wrong account

Apparently the timeouts in thing.handle(); in the ESP8266 Library, are non-existing or set very long.

Having thing.handle(); called every second, but getting no positive response from thinger.io (due to a device not registered or wrong credentials or thinger temporarily not available) is stalling the ESP, which will not respond any more and even randomly refuse uploads.
Even if everything into configuration is okay, sometimes (in bad days several times in an hour, and good days practicality never) the ESP will stall at WiFi for 3-4 seconds and go on, which is damageable, if you need to run other tasks on a fixed schedule.
Could you please fix that?
Regards

Hi @rin67630, if your use case goes beyond a simple example for connecting a device, I recommend you to use an RTOS. We have tested FreeRTOS with Thinger.io for the ESP32, were one task is for calling the thinger.handle(), meanwhile other task is for the normal loop, and it works perfectly. In fact, we are planing to release a new library for the next month with such functionality and OTA updates (for ESP32 first).

Isn’t it a bit harsh to ask me to rewrite the app for another environment, just because the Thinger handle does not behave nicely?
:frowning:
Frankly, i would rather have left Thinger…
If you were not just by far the best available on the market!
:heart_eyes:

Thanks for your kind words @rin67630, even if the handle method does not behave well for your use case :slight_smile:

It is a bit complex to make it more predictable for any kind of device, as some require connecting to WiFi, cellular, or Ethernet, all with their own libraries from different vendors like Espressif, Arduino, etc. So, I suggested you the best way to solve the problem for big projects.

However, there is an alternative if you want to go further with the libraries. In this case, I suggest you to modify the ThingerWifi.h, where are the timeouts I think you are looking for. One for connecting to the Wifi, and other for getting the IP Address. Take a look at the code here: https://github.com/thinger-io/Arduino-Library/blob/master/src/ThingerWifi.h

   virtual bool connect_network(){
        if(wifi_ssid_==NULL){
            THINGER_DEBUG("NETWORK", "Cannot connect to WiFi. SSID not set!");
        }

        unsigned long wifi_timeout = millis();
        THINGER_DEBUG_VALUE("NETWORK", "Connecting to network ", wifi_ssid_);

        if(wifi_password_!=NULL){
            WiFi.begin((char*)wifi_ssid_, (char*) wifi_password_);
        }else{
            WiFi.begin((char*)wifi_ssid_);
        }

        while( WiFi.status() != WL_CONNECTED) {
            if(millis() - wifi_timeout > 30000) return false;
            #ifdef ESP8266
            yield();
            #endif
        }
        THINGER_DEBUG("NETWORK", "Connected to WiFi!");
        wifi_timeout = millis();
        THINGER_DEBUG("NETWORK", "Getting IP Address...");
        while (WiFi.localIP() == INADDR_NONE) {
            if(millis() - wifi_timeout > 30000) return false;
            #ifdef ESP8266
            yield();
            #endif
        }
        THINGER_DEBUG_VALUE("NETWORK", "Got IP Address: ", WiFi.localIP());
        return true;
    }

Hope it helps!

OK, you are the expert, but I do not believe that
virtual bool connect_network()
is the right place to find the (respectively to insert a) timeout for the Thinger handle().
virtual bool connect_network() is supposed to run during setup, to establish the Wifi connection, isn’t it?
The Thinger handle() in question runs later in the loop.
Regards and stay healthy
Laszlo

The handle method takes care of the the network connectivity, server authentication, and message processing, so it is calling this method in the loop, for sure (I wrote those libraries :slight_smile: ). What you do in the setup is actually setting the parameters that can be used for that connection, but the connection is not done until the handle method is executed. This is necessary as an IoT device should be permanently connected. This way, if the WiFi router is restarted, in cellular you change from on tower cell to another, the server is restarted/crashed, etc., the device always tries to establish a connection. This is not possible to achieve if only executed on the setup.

I believe that the problem does not reside in a lost WiFi connection, but in the handshake procedure Server Client.
The WiFi stability is given. Fair enough.
The stalls happen 100% when Thinger is well logged in, but does not recognize the device and/or sporadically when Thinger (being busy elsewhere?) do not respond quickly enough to the handle() messages.

Have anice sunday!

Do you have any proof of that 100%? Does it happens also without your custom code on the Sketch? There are a lot of reasons why the handle can be waiting more than expected (it should not take in the worst case more than 30 seconds). And they dont directly relay on the server being busy or not responding quick enough (the server is running idle, so it should not be the problem). The most common thing is a broken TCP connection where the device tries to read or write, or a WiFi that is reconnecting (although the Wifi signal is ok, the devices often disconnects), there is handshake out there, DHCP, etc. And as I said, the handle will wait for it to complete.

Moreover, there are plenty of factors that can affect the communications and you can perceive it is a problem with the libraries, like the power supply not being able to provide enough power for the WiFi peaks (this happens quite a lot with mobile modems), failing I2C communications with sensors that halts the sketch, router problems, interferences, out of memory, etc. So, if you want our handle to be fixed, we would require a reproducible problem…

You can take a look to other timeouts the library handle while reading from the server, like DEFAULT_READ_TIMEOUT, or RECONNECTION_TIMEOUT.

"where the handle can be waiting more than expected (it should not take in the worst case more than 30 seconds).

It is never sooo bad.
However I would have expected the timeout to be around 0,5 seconds. Usually the thing handle, even loaded with really a lot of payload, takes just a few milliseconds.

When i find some time, I will build a measuring example based on the default sketch.

1 Like

Nice @rin67630, it would be so helpful!

Here is the test routine:

#include <ThingerESP8266.h>

#define SSID “Your SSID”
#define SSID_PASSWORD “Your Password”

long LastMillis;
int MillisDiff;
int LastMillisDiff;
int MaxMillisDiff;

ThingerESP8266 thing(“YourThinger Name”, “Your Device Name”, “Your device credentials”);

void setup() {
pinMode(LED_BUILTIN, OUTPUT);
Serial.begin(9600);

Serial.println(“Getting WiFi…”);

thing.add_wifi(SSID, SSID_PASSWORD);

Serial.println(“Got WiFi!”);

thing[“led”] << digitalPin(LED_BUILTIN);

// resource output example (i.e. reading a sensor value)
thing[“millis”] >> outputValue(millis());

// more details at http://docs.thinger.io/arduino/
}

void loop() {
digitalWrite(LED_BUILTIN,LOW);
LastMillis = millis();
thing.handle();
MillisDiff = millis() - LastMillis;
MaxMillisDiff = max(MillisDiff, MaxMillisDiff);
digitalWrite(LED_BUILTIN,HIGH);
Serial.printf(“MillisDiff= %i, MaxMillisDiff= %i\n”, MillisDiff, MaxMillisDiff);
if (millis() < 5000) MaxMillisDiff = 0;
delay (1000);
}

Using a device that Thinger does not know (wrong device or wrong credentials):

17:37:34.848 -> Getting WiFi…
17:37:34.949 -> Got WiFi!
17:37:44.536 -> MillisDiff= 9600, MaxMillisDiff= 9600
17:37:51.205 -> MillisDiff= 5677, MaxMillisDiff= 9600
17:37:57.880 -> MillisDiff= 5688, MaxMillisDiff= 9600
17:38:04.548 -> MillisDiff= 5689, MaxMillisDiff= 9600
17:38:11.270 -> MillisDiff= 5708, MaxMillisDiff= 9600
17:38:18.447 -> MillisDiff= 6145, MaxMillisDiff= 9600
17:38:25.116 -> MillisDiff= 5687, MaxMillisDiff= 9600
17:38:31.792 -> MillisDiff= 5689, MaxMillisDiff= 9600
17:38:38.513 -> MillisDiff= 5717, MaxMillisDiff= 9600
17:38:45.183 -> MillisDiff= 5685, MaxMillisDiff= 9600
17:38:51.907 -> MillisDiff= 5684, MaxMillisDiff= 9600
17:38:58.583 -> MillisDiff= 5694, MaxMillisDiff= 9600
17:39:05.253 -> MillisDiff= 5693, MaxMillisDiff= 9600
17:39:11.976 -> MillisDiff= 5690, MaxMillisDiff= 9600
17:39:18.652 -> MillisDiff= 5693, MaxMillisDiff= 9600
17:39:25.420 -> MillisDiff= 5750, MaxMillisDiff= 9600

IMHO the handle should not block that long…
If you put the handle every second, the device is frozen.

Using a recognized device, and letting it run:

18:05:54.810 -> Getting WiFi…
18:05:55.443 -> Got WiFi!
18:05:59.526 -> MillisDiff= 4613, MaxMillisDiff= 4613
18:06:00.529 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:01.533 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:02.536 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:03.539 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:04.562 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:05.561 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:06.518 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:07.521 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:08.525 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:09.528 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:10.532 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:11.535 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:12.538 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:13.542 -> MillisDiff= 0, MaxMillisDiff= 0
18:06:14.546 -> MillisDiff= 1, MaxMillisDiff= 1
18:06:15.549 -> MillisDiff= 1, MaxMillisDiff= 1
18:06:16.553 -> MillisDiff= 1, MaxMillisDiff= 1
18:06:17.533 -> MillisDiff= 0, MaxMillisDiff= 1
…
18:06:53.606 -> MillisDiff= 0, MaxMillisDiff= 1
18:06:54.610 -> MillisDiff= 1, MaxMillisDiff= 1
18:06:55.613 -> MillisDiff= 24, MaxMillisDiff= 24
18:06:56.616 -> MillisDiff= 2, MaxMillisDiff= 24
18:06:57.619 -> MillisDiff= 0, MaxMillisDiff= 24
…

According to that knowledge I changed to report to show only the cycles longer than 1:

if (MillisDiff > 1) Serial.printf(“MillisDiff= %i, MaxMillisDiff= %i\n”, MillisDiff, MaxMillisDiff);

18:25:27.725 -> Getting WiFi…
18:25:27.872 -> Got WiFi!
18:25:32.442 -> MillisDiff= 4593, MaxMillisDiff= 4593
18:26:28.482 -> MillisDiff= 21, MaxMillisDiff= 21
18:27:28.585 -> MillisDiff= 39, MaxMillisDiff= 39
18:27:29.541 -> MillisDiff= 2, MaxMillisDiff= 39
18:28:28.638 -> MillisDiff= 22, MaxMillisDiff= 39
18:28:29.642 -> MillisDiff= 2, MaxMillisDiff= 39
18:29:28.687 -> MillisDiff= 18, MaxMillisDiff= 39
18:30:28.741 -> MillisDiff= 20, MaxMillisDiff= 39
18:30:29.745 -> MillisDiff= 2, MaxMillisDiff= 39
18:31:28.794 -> MillisDiff= 27, MaxMillisDiff= 39

That is acceptable, I will see tomorrow where the MaxMillisDiff reached…

Could you let the sketch run on an ESP with your credentials and within the own LAN of the server, so we could see if the freezes are due Internet propagation or from server responses?

Regards.

P.S. after ~ one hour run:

19:37:32.378 -> MillisDiff= 22, MaxMillisDiff= 60
19:37:33.382 -> MillisDiff= 2, MaxMillisDiff= 60
19:38:32.440 -> MillisDiff= 22, MaxMillisDiff= 60
19:38:33.444 -> MillisDiff= 2, MaxMillisDiff= 60
19:38:38.461 -> MillisDiff= 2, MaxMillisDiff= 60
19:39:32.486 -> MillisDiff= 11, MaxMillisDiff= 60
19:39:34.238 -> MillisDiff= 743, MaxMillisDiff= 743
19:40:33.295 -> MillisDiff= 16, MaxMillisDiff= 743
19:40:34.298 -> MillisDiff= 2, MaxMillisDiff= 743
19:41:27.321 -> MillisDiff= 2, MaxMillisDiff= 743
19:41:33.342 -> MillisDiff= 19, MaxMillisDiff= 743

Remarkable is that very frequently after a delay, the second after is slightly delayed as well.

IMHO everything up to 100mS is OK, but ahead of that, the handle should abort the transaction.

Hi! Thanks for the code. I cannot put an ESP8266 on the “server” LAN. In fact, the public server is a cluster of 5 instances in the Amazon Cloud, in the US, Europe, Indonesia, etc. plus a cluster of 3 MongoDB servers.

IMHO I think that the handle cannot timeout at an arbitrary timestamp, as it greatly depends on the WiFi network, router, the Internet, etc. Putting it as low as 100ms would cause several server disconnections, for sure. In your case maybe that 100ms is ok, but any other with poor connection will suffer while connecting, transmitting data, etc. Moreover, it is not the same a handle where the device does just nothing (there is no data in the TCP socket like those taking 1 or 2 ms), or a handle that will carry out streaming some resources, like a periodic sensor transmission. Moreover, inside a resource you can introduce a delay, that will affect also the handle call. So, it is not possible to control the handle timeout in such way. Hope you understand.

If you need to execute your tasks on a more predictable manner, you should take a look on timers, or tasks. For example, the MQTT Arduino library uses the library CooperativeMultitasking (only available in SAMD) for solving such problems. For the ESP8266 we can user other library like:

An example of Thinger being used with the task scheduler, can be like that:

#define _DEBUG_
#include <TaskScheduler.h>
#include <ThingerESP8266.h>

#define USERNAME "username"
#define DEVICE_ID "device"
#define DEVICE_CREDENTIAL "credential"

#define SSID "SSID"
#define SSID_PASSWORD "SSIDPASS"

ThingerESP8266 thing(USERNAME, DEVICE_ID, DEVICE_CREDENTIAL);
Scheduler scheduler;

Task t1(1000, TASK_FOREVER, []{
    Serial.printf("I am running at: %dl\n", millis());
});

Task t2(0, TASK_FOREVER, [&thing]{
  thing.handle();
});

void setup() {
  Serial.begin(115200);
  thing.add_wifi(SSID, SSID_PASSWORD);
  
  thing["millis"] >> outputValue(millis());
  scheduler.init();
  scheduler.addTask(t1);
  scheduler.addTask(t2);
  t1.enable();
  t2.enable();
}

void loop() {
  scheduler.execute();
}

Hope it helps!

You are right, I forgot that the the servers are in the cloud. :grimacing:

Any collaborative library will require code to be non-blocking.
That is precisely, what currently the Thinger handle, without a built-in timeout, is not.

Why a delay in network transactions happens, is not the point. (by the way: the ESP reconnects within a millisecond)
The point is, that delays happen and the handle should abort the transaction timely to guarantee collaborative multitasking operation.

Here is a graphical plot of the last 3 hours:
3 hours of thinger handle…

But OK: I do accept, that 99,9% of the users just don’t care…
My whammy…:roll_eyes:

This was really extreme:
The reason was in the WAN: it impacted my two devices simultaneously but not the device of another user, friend of mine, residing in another town.
It looks there is a kind of timeout, at 12 seconds. That high!

16:33:16.006 -> MillisDiff= 2, MaxMillisDiff= 901
16:33:17.009 -> MillisDiff= 2, MaxMillisDiff= 901
16:33:23.028 -> MillisDiff= 2, MaxMillisDiff= 901
16:33:27.341 -> MillisDiff= 303, MaxMillisDiff= 901
16:33:39.880 -> MillisDiff= 11526, MaxMillisDiff= 11526
16:33:52.874 -> MillisDiff= 11998, MaxMillisDiff= 11998
16:34:05.860 -> MillisDiff= 11999, MaxMillisDiff= 11999
16:34:18.854 -> MillisDiff= 11998, MaxMillisDiff= 11999
16:34:31.841 -> MillisDiff= 11998, MaxMillisDiff= 11999
16:34:44.835 -> MillisDiff= 11998, MaxMillisDiff= 11999
16:34:57.866 -> MillisDiff= 11998, MaxMillisDiff= 11999
16:35:10.862 -> MillisDiff= 11999, MaxMillisDiff= 11999
16:35:23.852 -> MillisDiff= 11998, MaxMillisDiff= 11999
16:35:25.556 -> MillisDiff= 719, MaxMillisDiff= 11999
16:35:26.559 -> MillisDiff= 2, MaxMillisDiff= 11999
16:35:39.969 -> MillisDiff= 2, MaxMillisDiff= 11999

Just for info…

Did you tried the task example?. As I said, you may encounter bigger timeouts.

Your given collaborative scheduler cannot perform miracles long as the thinger handle does not return control in time. Only a RTOS could, but that means for me to rewrite a full year of hard working and work with a completely new IDE with potentially no libraries available.
But OK, I am alone with my problem. My bad luck, I fully understand that.
If I were you, I would not care either just for one single case -not even paying- out of thousands being just happy as it is.
Let us close that subject.
Thank you for your time, maybe we both just had increased a bit our experience. At least that.
Maybe I will try to dig into your impressive work, fork the handle in Github and propose a merge…

@rin67630, I think it is a bit complex to fully address all use cases with just one single library. However, did you tried tuning the timeouts I pointed out in the libraries?. I am rewriting the core protocol and libraries for Arduino/Linux, so I will try to take a look on it (hope I can make something parametrizable), so, thanks for your insight.

Thank you.
If you work on the Linux library, do you consider a Python variant?
The C++ ecosystem for e.g. a Raspberry Pi is ways not so complete as in Arduino.
The Raspberry pulses on Python, everything else is exotic and you have got no community to support you.

Yes, I am rewriting the core protocol (and giving it a name: IOTMP) to support different encoding formats in addition to the current PSON. So, it will be easier to work with other languages and wide available libraries for JSON, MessagePack, CBOR, etc. I am in the process of testing the new C++ libraries/server, but the next target is to write a client in other language (python is a great candidate). Moreover, I am documenting IOTMP (currently at 70%), so, anyone could better understand the protocol, create clients, etc.

I would like to give you some information in PM.
Could you PM me at lazlo.lebrun?gmail.com?