CubeCell Board RX1 Timing Issue

jwyatt · January 28, 2020, 5:13am

Hi All,

I’m hoping to get some help with the CubeCell boards I have just received. I have a couple of RAK2245 concentrator boards connected to Chirpstack and have had other devices connect using ABP and OTAA and pass both unconfirmed and confirmed uplinks properly, receiving downlink or ACKs correctly. The gateways are conected via ethernet cable to the same switch where the server is running Chirpstack, and the network delays are minimal.

I have downloaded the codebase and flashed the Cube Cell boards using the Arduino environment.

All of the devices can ABP or OTAA perfectly fine - i.e. they send a few packets, eventually find the correct channel, open an RX window 5s after and receive a join accept, then get ADR parameters. This works reasonably consistently.

Then the random failures start to happen - I can see confirmed data uplinks in chirpstack, then I see the unconfirmed downlink (the ACK) in the live lorawan frame view. The gateway receives the ACK packet and sends it using the 1s delay it is supposed to on the correct channel and DR based on the uplink channel. However many times the CubeCell does not receive the packet and hence tries to send again. Because the network server has seen these packets already with the same sequence number, they are ignored (no ACK is sent) the CellCube keeps sending uplinks until it times out, then goes to sleep for a bit and tries again. With the same sequence being followed. Rarely one of the devices will recover from this scenario and start receiving the ACKs.

I ordered 3 boards, and all 3 have suffered this issue. I believe there is a problem in the LoRaWAN stack - similar to LMIC - where many people have to relax the timings to get ACKs or even downlink packets to work correctly in RX1.

Since we have SOME of the source code - but there is definitely a .a in the tree (i.e. a binary portion that can’t be updated and recompiled) - can someone tell me if there is access in the code to relax these timings, or if I have to wait for a firmware update, or if this can’t be fixed - I suppose abandon the product in favour of another. If anyone has run into this issue, did you resolve it, and if so how?

Thanks in advance.

dserrano · January 23, 2021, 2:34am

I have exact the same problem.

jasonXu · January 25, 2021, 2:22am

hi,

Could you modify this para to 1? set to 1 the DR Won’t drop

superslot · January 25, 2021, 8:03am

Hi Jason,

I’ve seen the same problem on one board out of 8 devices.

I do not think that setting NbTrials is the correct solution because we do need ADR for a proper product.

What we need is a parameter to “tune” the tx/rx windows for gateways which sends ack with a little bit more delay…

thanks

jwyatt · January 26, 2021, 1:37am

Hi Jason,

This is definitely not the correct solution - it will stop the DR from dropping, which has nothing to do with not receiving the ACK. If the problem was “in the field” with many km between the device and gateway, I would parhaps give this a try, but this happens even if the device is only a few meters from the gateway.

This is a known problem in other LoRaWAN stacks and usually there is a parameter to allow for adjusting the window time (both in start and length), in case timing is off at the gateway or the microcontroller.

If this is not available in the Heltec devices and never planned to be added to the firmware, please let us know, so we can find a different device that does and move on.

Regards,

Justin

jasonXu · January 26, 2021, 2:53am

hi,

Could you try the DR_5? I suspect that the frequency points of the gateway and node are not aligned. The lower the speed, the higher matching requirements.

According to our test results，if the tx freq a bit lower than rx freq, message can be received. But if the tx freq higher than rx freq， the message may not be received.

If DR_5 have no problem，we think is the reason. you can refer this topic:

jwyatt · January 26, 2021, 7:13am

Hi Jason,

I’m going to say it again - this happens no matter the distance or data rate and has happened on 3 different gateways (2 of which are different hardware). There are other devices on the network that have no issues whatsoever and the gateway is setup to follow the Australian frequency plan. ADR is turned on, so the devices move through different DRs over time based on conditions. This issue has been shown to happen at various distances and across all data rates (spreading factors) and frequencies. The link you sent is about Class C downlink and you are asking about the downlink size and suggesting changing frequency offset - in my case the problem is with an ACK - which obviously falls within the required number of bytes for any DR and is a very small packet so takes up very little air time even at SF12. Stating that it is “time on air” dependent (i.e. that frequency instability is bad enough that using SF12 for an ACK causes a problem) or that it is frequency offset dependent doesn’t make sense, and quite frankly if it’s true, then your hardware is defective, which is a much bigger issue.

As per Semtechs document AN1200.22 section 4.1.5:
“Doppler Resistant
Doppler shift causes a small frequency shift in the LoRa pulse which introduces a relatively negligible
shift in the time axis of the baseband signal. This frequency offset tolerance mitigates the requirement for tight tolerance reference clock sources. LoRa is ideal for mobile data communications links such as wireless tire-pressure monitoring systems, drive-by applications such as toll booth and mobile tag readers, and trackside communications for railroad infrastructure.”

So if there IS a frequency offset problem, then people using your devices on something that moves around are going to have significantly more problems than those with a stationary device, and it would seem that either the software or the hardware don’t conform to the standard, as doppler shift can be in either direction depending on if the device is moving towards or away from the gateway. Note that the frequency shift casues a TIME change in the base band as per the documentation - and this may be exactly why people using other codebases have provided the very thing we are asking for - the ability to tune the timing windows for RX1 / RX2.

To directly answer your comment - the issue has been seen to occur on SF7 through SF12, so your suggestion does not help at this point. I await your answer to the original question - can we adjust the RX1/RX2 timing.

Justin

jwyatt · January 26, 2021, 7:16am

Jason - also note - I have had this issue for 12 months with no reply until now, which is also concerning.

dserrano · January 26, 2021, 1:49pm

Me too. i have 150 pcs AB02 devices with same problem. RX of ASK problem.

dserrano · January 27, 2021, 10:43pm

I run all test sugested here and the RX problems continues.

I post my results here:

Supporter · January 28, 2021, 8:21am

Hi jwyatt
Is the code running our examples or your modified code ? If it’s modified code , can you try the examples to see if the problem occurs? And can you give me your code?
%E5%9B%BE%E7%89%87
Did you add code here?

Supporter · January 28, 2021, 9:24am

delay could be added before LoRaWAN.send(). Added delay or code runs for long time after LoRaWAN.send(), the chip will too late to process event that if there is event it is processed in Radio.IrqProcess() (for old version, Radio.IrqProcess() is in LoraWan.sleep()).

you can have a try update the sdk from our git ,and put your code as :
%E5%9B%BE%E7%89%87

We updated the sdk that can simply get lorwan status with para “LoRaMacState” since got this topic.
When LoRaMacState != LORAMAC_IDLE, other code may break the lorawan process.

jasonXu · January 28, 2021, 10:23am

hi,
It will automatically recover after sending failed.

dserrano · January 28, 2021, 11:38am

i think LoRaWAN.displayAck(); was the proble.
i make this change:

case DEVICE_STATE_SLEEP:
{
  Radio.IrqProcess();
  LoRaWAN.sleep();
  if(LoRaMacState == LORAMAC_IDLE){
    LoRaWAN.displayAck();
  }
  break;
}

Supporter · January 29, 2021, 4:10am

Is that works with the change?

dserrano · January 30, 2021, 12:38am

This help but not totally fixed I will keep working on this and update here

jwyatt · September 16, 2021, 8:46am

Update:

I ordered a bunch more of the boards, and I’d say this issue was a QC problem with a batch of boards, most likely with whatever crystal was being used at the time. Not a single one of the new boards ive received displays this behaviour, same code, same pcbs they plug into etc.
I still have missed acks, but you can tell from the received data, that there was marginal RF conditions in those cases and after dropping DR or a second / 3rd attempt the ack is picked up, the device goes to sleep and wakes up properly.

I threw the first 3 boards away (they never got any better and it was obvious that there was a serious timing issue) and now all is good (except a slightly lighter wallet from 3 broken boards).