ESP8266 Library, call from a device not registered or wrong account

Nice @rin67630, it would be so helpful!

Here is the test routine:

#include <ThingerESP8266.h>

#define SSID “Your SSID”
#define SSID_PASSWORD “Your Password”

long LastMillis;
int MillisDiff;
int LastMillisDiff;
int MaxMillisDiff;

ThingerESP8266 thing(“YourThinger Name”, “Your Device Name”, “Your device credentials”);

void setup() {
pinMode(LED_BUILTIN, OUTPUT);
Serial.begin(9600);

Serial.println(“Getting WiFi…”);

thing.add_wifi(SSID, SSID_PASSWORD);

Serial.println(“Got WiFi!”);

thing[“led”] << digitalPin(LED_BUILTIN);

// resource output example (i.e. reading a sensor value)
thing[“millis”] >> outputValue(millis());

// more details at http://docs.thinger.io/arduino/
}

void loop() {
digitalWrite(LED_BUILTIN,LOW);
LastMillis = millis();
thing.handle();
MillisDiff = millis() - LastMillis;
MaxMillisDiff = max(MillisDiff, MaxMillisDiff);
digitalWrite(LED_BUILTIN,HIGH);
Serial.printf(“MillisDiff= %i, MaxMillisDiff= %i\n”, MillisDiff, MaxMillisDiff);
if (millis() < 5000) MaxMillisDiff = 0;
delay (1000);
}

Using a device that Thinger does not know (wrong device or wrong credentials):

17:37:34.848 → Getting WiFi…
17:37:34.949 → Got WiFi!
17:37:44.536 → MillisDiff= 9600, MaxMillisDiff= 9600
17:37:51.205 → MillisDiff= 5677, MaxMillisDiff= 9600
17:37:57.880 → MillisDiff= 5688, MaxMillisDiff= 9600
17:38:04.548 → MillisDiff= 5689, MaxMillisDiff= 9600
17:38:11.270 → MillisDiff= 5708, MaxMillisDiff= 9600
17:38:18.447 → MillisDiff= 6145, MaxMillisDiff= 9600
17:38:25.116 → MillisDiff= 5687, MaxMillisDiff= 9600
17:38:31.792 → MillisDiff= 5689, MaxMillisDiff= 9600
17:38:38.513 → MillisDiff= 5717, MaxMillisDiff= 9600
17:38:45.183 → MillisDiff= 5685, MaxMillisDiff= 9600
17:38:51.907 → MillisDiff= 5684, MaxMillisDiff= 9600
17:38:58.583 → MillisDiff= 5694, MaxMillisDiff= 9600
17:39:05.253 → MillisDiff= 5693, MaxMillisDiff= 9600
17:39:11.976 → MillisDiff= 5690, MaxMillisDiff= 9600
17:39:18.652 → MillisDiff= 5693, MaxMillisDiff= 9600
17:39:25.420 → MillisDiff= 5750, MaxMillisDiff= 9600

IMHO the handle should not block that long…
If you put the handle every second, the device is frozen.

Using a recognized device, and letting it run:

18:05:54.810 → Getting WiFi…
18:05:55.443 → Got WiFi!
18:05:59.526 → MillisDiff= 4613, MaxMillisDiff= 4613
18:06:00.529 → MillisDiff= 0, MaxMillisDiff= 0
18:06:01.533 → MillisDiff= 0, MaxMillisDiff= 0
18:06:02.536 → MillisDiff= 0, MaxMillisDiff= 0
18:06:03.539 → MillisDiff= 0, MaxMillisDiff= 0
18:06:04.562 → MillisDiff= 0, MaxMillisDiff= 0
18:06:05.561 → MillisDiff= 0, MaxMillisDiff= 0
18:06:06.518 → MillisDiff= 0, MaxMillisDiff= 0
18:06:07.521 → MillisDiff= 0, MaxMillisDiff= 0
18:06:08.525 → MillisDiff= 0, MaxMillisDiff= 0
18:06:09.528 → MillisDiff= 0, MaxMillisDiff= 0
18:06:10.532 → MillisDiff= 0, MaxMillisDiff= 0
18:06:11.535 → MillisDiff= 0, MaxMillisDiff= 0
18:06:12.538 → MillisDiff= 0, MaxMillisDiff= 0
18:06:13.542 → MillisDiff= 0, MaxMillisDiff= 0
18:06:14.546 → MillisDiff= 1, MaxMillisDiff= 1
18:06:15.549 → MillisDiff= 1, MaxMillisDiff= 1
18:06:16.553 → MillisDiff= 1, MaxMillisDiff= 1
18:06:17.533 → MillisDiff= 0, MaxMillisDiff= 1

18:06:53.606 → MillisDiff= 0, MaxMillisDiff= 1
18:06:54.610 → MillisDiff= 1, MaxMillisDiff= 1
18:06:55.613 → MillisDiff= 24, MaxMillisDiff= 24
18:06:56.616 → MillisDiff= 2, MaxMillisDiff= 24
18:06:57.619 → MillisDiff= 0, MaxMillisDiff= 24

According to that knowledge I changed to report to show only the cycles longer than 1:

if (MillisDiff > 1) Serial.printf(“MillisDiff= %i, MaxMillisDiff= %i\n”, MillisDiff, MaxMillisDiff);

18:25:27.725 → Getting WiFi…
18:25:27.872 → Got WiFi!
18:25:32.442 → MillisDiff= 4593, MaxMillisDiff= 4593
18:26:28.482 → MillisDiff= 21, MaxMillisDiff= 21
18:27:28.585 → MillisDiff= 39, MaxMillisDiff= 39
18:27:29.541 → MillisDiff= 2, MaxMillisDiff= 39
18:28:28.638 → MillisDiff= 22, MaxMillisDiff= 39
18:28:29.642 → MillisDiff= 2, MaxMillisDiff= 39
18:29:28.687 → MillisDiff= 18, MaxMillisDiff= 39
18:30:28.741 → MillisDiff= 20, MaxMillisDiff= 39
18:30:29.745 → MillisDiff= 2, MaxMillisDiff= 39
18:31:28.794 → MillisDiff= 27, MaxMillisDiff= 39

That is acceptable, I will see tomorrow where the MaxMillisDiff reached…

Could you let the sketch run on an ESP with your credentials and within the own LAN of the server, so we could see if the freezes are due Internet propagation or from server responses?

Regards.

P.S. after ~ one hour run:

19:37:32.378 -> MillisDiff= 22, MaxMillisDiff= 60
19:37:33.382 -> MillisDiff= 2, MaxMillisDiff= 60
19:38:32.440 -> MillisDiff= 22, MaxMillisDiff= 60
19:38:33.444 -> MillisDiff= 2, MaxMillisDiff= 60
19:38:38.461 -> MillisDiff= 2, MaxMillisDiff= 60
19:39:32.486 -> MillisDiff= 11, MaxMillisDiff= 60
19:39:34.238 -> MillisDiff= 743, MaxMillisDiff= 743
19:40:33.295 -> MillisDiff= 16, MaxMillisDiff= 743
19:40:34.298 -> MillisDiff= 2, MaxMillisDiff= 743
19:41:27.321 -> MillisDiff= 2, MaxMillisDiff= 743
19:41:33.342 -> MillisDiff= 19, MaxMillisDiff= 743

Remarkable is that very frequently after a delay, the second after is slightly delayed as well.

IMHO everything up to 100mS is OK, but ahead of that, the handle should abort the transaction.

Hi! Thanks for the code. I cannot put an ESP8266 on the “server” LAN. In fact, the public server is a cluster of 5 instances in the Amazon Cloud, in the US, Europe, Indonesia, etc. plus a cluster of 3 MongoDB servers.

IMHO I think that the handle cannot timeout at an arbitrary timestamp, as it greatly depends on the WiFi network, router, the Internet, etc. Putting it as low as 100ms would cause several server disconnections, for sure. In your case maybe that 100ms is ok, but any other with poor connection will suffer while connecting, transmitting data, etc. Moreover, it is not the same a handle where the device does just nothing (there is no data in the TCP socket like those taking 1 or 2 ms), or a handle that will carry out streaming some resources, like a periodic sensor transmission. Moreover, inside a resource you can introduce a delay, that will affect also the handle call. So, it is not possible to control the handle timeout in such way. Hope you understand.

If you need to execute your tasks on a more predictable manner, you should take a look on timers, or tasks. For example, the MQTT Arduino library uses the library CooperativeMultitasking (only available in SAMD) for solving such problems. For the ESP8266 we can user other library like:

An example of Thinger being used with the task scheduler, can be like that:

#define _DEBUG_
#include <TaskScheduler.h>
#include <ThingerESP8266.h>

#define USERNAME "username"
#define DEVICE_ID "device"
#define DEVICE_CREDENTIAL "credential"

#define SSID "SSID"
#define SSID_PASSWORD "SSIDPASS"

ThingerESP8266 thing(USERNAME, DEVICE_ID, DEVICE_CREDENTIAL);
Scheduler scheduler;

Task t1(1000, TASK_FOREVER, []{
    Serial.printf("I am running at: %dl\n", millis());
});

Task t2(0, TASK_FOREVER, [&thing]{
  thing.handle();
});

void setup() {
  Serial.begin(115200);
  thing.add_wifi(SSID, SSID_PASSWORD);
  
  thing["millis"] >> outputValue(millis());
  scheduler.init();
  scheduler.addTask(t1);
  scheduler.addTask(t2);
  t1.enable();
  t2.enable();
}

void loop() {
  scheduler.execute();
}

Hope it helps!

You are right, I forgot that the the servers are in the cloud. :grimacing:

Any collaborative library will require code to be non-blocking.
That is precisely, what currently the Thinger handle, without a built-in timeout, is not.

Why a delay in network transactions happens, is not the point. (by the way: the ESP reconnects within a millisecond)
The point is, that delays happen and the handle should abort the transaction timely to guarantee collaborative multitasking operation.

Here is a graphical plot of the last 3 hours:
3 hours of thinger handle…

But OK: I do accept, that 99,9% of the users just don’t care…
My whammy…:roll_eyes:

This was really extreme:
The reason was in the WAN: it impacted my two devices simultaneously but not the device of another user, friend of mine, residing in another town.
It looks there is a kind of timeout, at 12 seconds. That high!

16:33:16.006 → MillisDiff= 2, MaxMillisDiff= 901
16:33:17.009 → MillisDiff= 2, MaxMillisDiff= 901
16:33:23.028 → MillisDiff= 2, MaxMillisDiff= 901
16:33:27.341 → MillisDiff= 303, MaxMillisDiff= 901
16:33:39.880 → MillisDiff= 11526, MaxMillisDiff= 11526
16:33:52.874 → MillisDiff= 11998, MaxMillisDiff= 11998
16:34:05.860 → MillisDiff= 11999, MaxMillisDiff= 11999
16:34:18.854 → MillisDiff= 11998, MaxMillisDiff= 11999
16:34:31.841 → MillisDiff= 11998, MaxMillisDiff= 11999
16:34:44.835 → MillisDiff= 11998, MaxMillisDiff= 11999
16:34:57.866 → MillisDiff= 11998, MaxMillisDiff= 11999
16:35:10.862 → MillisDiff= 11999, MaxMillisDiff= 11999
16:35:23.852 → MillisDiff= 11998, MaxMillisDiff= 11999
16:35:25.556 → MillisDiff= 719, MaxMillisDiff= 11999
16:35:26.559 → MillisDiff= 2, MaxMillisDiff= 11999
16:35:39.969 → MillisDiff= 2, MaxMillisDiff= 11999

Just for info…

Did you tried the task example?. As I said, you may encounter bigger timeouts.

Your given collaborative scheduler cannot perform miracles long as the thinger handle does not return control in time. Only a RTOS could, but that means for me to rewrite a full year of hard working and work with a completely new IDE with potentially no libraries available.
But OK, I am alone with my problem. My bad luck, I fully understand that.
If I were you, I would not care either just for one single case -not even paying- out of thousands being just happy as it is.
Let us close that subject.
Thank you for your time, maybe we both just had increased a bit our experience. At least that.
Maybe I will try to dig into your impressive work, fork the handle in Github and propose a merge…

@rin67630, I think it is a bit complex to fully address all use cases with just one single library. However, did you tried tuning the timeouts I pointed out in the libraries?. I am rewriting the core protocol and libraries for Arduino/Linux, so I will try to take a look on it (hope I can make something parametrizable), so, thanks for your insight.

Thank you.
If you work on the Linux library, do you consider a Python variant?
The C++ ecosystem for e.g. a Raspberry Pi is ways not so complete as in Arduino.
The Raspberry pulses on Python, everything else is exotic and you have got no community to support you.

Yes, I am rewriting the core protocol (and giving it a name: IOTMP) to support different encoding formats in addition to the current PSON. So, it will be easier to work with other languages and wide available libraries for JSON, MessagePack, CBOR, etc. I am in the process of testing the new C++ libraries/server, but the next target is to write a client in other language (python is a great candidate). Moreover, I am documenting IOTMP (currently at 70%), so, anyone could better understand the protocol, create clients, etc.

I would like to give you some information in PM.
Could you PM me at lazlo.lebrun?gmail.com?

Yes, already wrote you!

did you tried tuning the timeouts I pointed out in the libraries?

The disruption is not between the device and the router for which you have implemented your timeout strategies.
The disruption occurs between the ISP and your servers anywhere in the meanders of the internet.
The TCP message is not successful, it will be resent by the ESP over and over.
IMHO, the messages to and from Thinger are not that critical to require the transmission integrity of TCP, UDP would have been largely enough and even more savvy to the device consumption, isn’t it?

You did. Using that email : noreply@community.thinger.io
A PM reply with this address will probably never reach you, right?

Those emails are automatically sent on on post responses. Wrote you again.

I still get only the “noreply” mail address
I can’t PM you that way.

The issue with the thing.handle() call taking 12 seconds to respond when the device is unknown or the credential wrong is really harmful upon using remote ESP8266 devices programmed over the air.

You just upload once the wrong device or the wrong Thinger credentials and then your device will not be remote accessible any more, being busy with 100% useless calls, having got no left time for OTA, and hence absolutely no chance to correct it, unless you go physically to the device and upload by USB.

That is really weird!

I have now found a workaround to overcome that problem:

byte GracePause;
long MillisMem;

// In loop, every second
  yield();
  if (GracePause) GracePause--;
  MillisMem = millis();
  if (not GracePause) thing.handle();               // do not call permanently Thinger if it takes too long to respond.
  if (millis() - MillisMem > 100) GracePause = 60;  // if the Thinger call took longer than 100mS, make 60s pause before retrying

That means, if the thing.handle(); takes longer than 100mS to execute, the sketch will pause the next thing.handle() calls for a minute, leaving enough “airtime” time to upload another sketch.

That prevents insane sketches to “hammer” your site as well.
Maybe you could include this kind of prevention in the thing.handle() call as well?

Regards,
Laszlo.

Hi @rin67630, the grace pause is a good idea for your use case, but still not convinced about establishing a threshold of 100ms just for that. As we have discussed before, there are plenty of possibilities this would happen, even in normal operation (and can drop your connection with the server or make it unresponsive during the pause). I suggest to improve your approach by setting the GracePause depending on the current device state. From this post: Check Cloud Connection on Arduino, you can currently monitor what is happening, i.e., if the credential is bad, or cannot connect to network. So, establish then the GracePause then. You can even improve the logic by stopping calling thing.handle() if the device cannot authenticate for more than x times, or your cannot connect to network more than y times, so, the device is completely available for OTA.

switch(state){
    case NETWORK_CONNECTING:
        break;
    case NETWORK_CONNECTED:
        break;
    case NETWORK_CONNECT_ERROR:
        // connect error. grace pause here?
        break;
    case SOCKET_CONNECTING:
        break;
    case SOCKET_CONNECTED:
        break;
    case SOCKET_CONNECTION_ERROR:
        break;
    case SOCKET_DISCONNECTED:
        break;
    case SOCKET_TIMEOUT:
        break;
    case THINGER_AUTHENTICATING:
        break;
    case THINGER_AUTHENTICATED:
        break;
    case THINGER_AUTH_FAILED:
        // bad credentials. grace pause here?
        break;
    case THINGER_STOP_REQUEST:
        break;
}

Could you give me a hint how to sense the connection state, without creating a new class?

ThingerESP8266::thinger_state_listener(THINGER_STATE byte t_state);
if (t_state == THINGER_AUTHENTICATED) thing.handle();

does not compile.
On the other side, does it make sense to call thing.handle(); if the state is not THINGER_AUTHENTICATED?
surely not, isn’t it?
Can’t you catch that directly at the call thing.handle(); function and return the call immediately with an error code?

Current version of the library does not support listening to events without creating a subclass. Next release will support it.

Yes, it is required to call handle, specially when the state is not THINGER_AUTHENTICATED, as it will handle starting the network connection, socket connection, authentication, and so on.

It would be possible to return a code in the thing.handle call, but one call may generate more than one “internal evet”, so, I prefer setting the state listener approach coming in the next release.