Project

General

Profile

Bug #665

DLR lost on kannel restart

Added by mr bugreporter almost 8 years ago. Updated almost 8 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Bearerbox DLR handling
Target version:
Start date:
11/02/2012
Due date:
% Done:

0%

Estimated time:
Affected version:
latest

Description

I'm using bearerbox<->sqlbox scheme for pushing out MTs from kannel.

I do use MySQL storage for storing DLR queue and spool for internal's queue of bearerbox.

Here is the order I'm stopping down kannel:

1) first stop sqlbox instances

test "x$START_SQLBOX" = "x1" && (
for i in $SQLBOX;
do
echo -n " sqlbox_$i"
start-stop-daemon --stop --retry 5 --quiet \
--pidfile $PIDFILES/sqlbox_$i.pid \
--exec $BOXPATH/run_kannel_box
done
)

2) stop bearerbox

echo -n " bearerbox" 
start-stop-daemon --stop --quiet \
--pidfile $PIDFILES/bearerbox.pid \
--exec $BOXPATH/run_kannel_box

3) I'm also waiting until bearerbox releases it's port via

while [ -z "$(netstat -nl|grep ":${PORTCONF}.*LISTEN")" ]; do sleep 1; done

In the operator log I see:

2012-11-02 08:49:13 [17221] [6] ERROR: SMPP[smsc]: got DLR but could not find message or was not interested in it id<7691947519> dst<XXXXXXXXX>, type<2>

I checked web-interface of the operator (smsc) and it says DLR went just fine.

Also DLR losing happens if MySQL goes down for some reason and/or bearerbox can't connect to MySQL.

NOTE: I've tested it during load, under normal circumstances everything works fine (because when I'm doing restart there is no message being submitted).

Priority is quite high because DLR losing can affect client-base and billing systems.

History

#1 Updated by mr bugreporter almost 8 years ago

http://www.kannel.org/pipermail/users/2009-April/006853.html

I've found similar discussion.

I know Alex has implemented dlr-retry-count patch but it's kinda different idea and it only happens when INSERT-ing into database is slower than receiving DLR from the SMSC.

Really need this solved, because we are losing currently about 10-20% of DLRs if kannel needs to be restarted and there is an active queue going on.

Also I had encounter another bug - when I was trying to restart bearerbox closed it's admin-port (13000 default), whilst smsbox-port was still active (13001).

bearerbox was still in the background eating CPU / Memory, but I couldn't get into it's web-interface and it didn't work until i did restart manually again.

2012-11-02 08:34:11 [16057] [8] DEBUG: send_msg: sending msg to boxc: <sqlbox>
2012-11-02 08:34:11 [16057] [8] DEBUG: boxc_sender: sent message to <127.0.0.1>
2012-11-02 08:34:11 [16057] [7] DEBUG: boxc_receiver: got ack
2012-11-02 08:34:11 [16057] [7] DEBUG: boxc_receiver: got ack
2012-11-02 08:34:11 [16057] [7] ERROR: Error reading from fd 38:
2012-11-02 08:34:11 [16057] [7] ERROR: System error 104: Connection reset by peer
2012-11-02 08:34:11 [16057] [7] ERROR: Connection to box <127.0.0.1> broke.
2012-11-02 08:34:11 [16057] [8] DEBUG: send_msg: sending msg to boxc: <sqlbox>
2012-11-02 08:34:11 [16057] [8] ERROR: Error writing 16 octets to fd 38:
2012-11-02 08:34:11 [16057] [8] ERROR: System error 32: Broken pipe
2012-11-02 08:34:11 [16057] [8] ERROR: Couldn't write Msg to box <127.0.0.1>, disconnecting
2012-11-02 08:34:11 [16057] [8] DEBUG: Thread 8 (gw/bb_boxc.c:boxc_sender) terminates.
2012-11-02 08:34:11 [16057] [0] WARNING: Killing signal or HTTP admin command received, shutting down...
2012-11-02 08:34:11 [16057] [0] DEBUG: Shutting down Kannel...
2012-11-02 08:34:11 [16057] [0] DEBUG: shutting down smsc
2012-11-02 08:34:11 [16057] [0] DEBUG: Shutting down SMSCConn SMPP:smpp.server.xxx:xxx:NULL (slow)
2012-11-02 08:34:11 [16057] [0] DEBUG: shutting down udp
2012-11-02 08:34:11 [16057] [3] WARNING: smsbox_list empty!
2012-11-02 08:34:11 [16057] [3] WARNING: smsbox_list empty!
2012-11-02 08:34:11 [16057] [3] WARNING: smsbox_list empty!
2012-11-02 08:34:11 [16057] [3] WARNING: smsbox_list empty!
2012-11-02 08:34:11 [16057] [7] ERROR: Error writing 16 octets to fd 38:
2012-11-02 08:34:11 [16057] [7] ERROR: System error 32: Broken pipe
2012-11-02 08:34:11 [16057] [7] DEBUG: Thread 7 (gw/bb_boxc.c:function) terminates.
2012-11-02 08:34:11 [16057] [3] WARNING: smsbox_list empty!
2012-11-02 08:34:11 [16057] [3] WARNING: smsbox_list empty!
2012-11-02 08:34:13 [16057] [6] DEBUG: sms_router: handling message (0x7fc2304acaa0 vs 0x7fc2304acaa0)
2012-11-02 08:34:13 [16057] [6] DEBUG: Routing failed, re-queued.
2012-11-02 08:34:13 [16057] [6] DEBUG: Thread 6 (gw/bb_smscconn.c:sms_router) terminates.
2012-11-02 08:34:22 [16057] [3] DEBUG: Thread 3 (gw/bb_boxc.c:sms_to_smsboxes) terminates.
2012-11-02 08:34:22 [16057] [4] DEBUG: Thread 4 (gw/bb_boxc.c:smsboxc_run) terminates.
2012-11-02 08:34:22 [16057] [0] INFO: All flow threads have died, killing core
2012-11-02 08:34:22 [16057] [0] DEBUG: final clean-up for SMSCConn
2012-11-02 08:34:22 [16057] [0] DEBUG: MO concatenated message handling cleaned up
2012-11-02 08:34:22 [16057] [0] INFO: Total WDP messages: received 0, sent 0
2012-11-02 08:34:22 [16057] [0] DEBUG: Remaining SMS: 52 incoming, 36604 outgoing
2012-11-02 08:34:22 [16057] [0] INFO: Total SMS messages: received 0, dlr 19783, sent 9602, dlr 0
2012-11-02 08:34:22 [16057] [0] DEBUG: Immutable octet strings: 239.

Also available in: Atom PDF