Handling unclean shutdown of brokers in a cluster

MQ 4.0

The cluster of brokers work great if the brokers were shutdown cleanly.

However, it seems that if the brokers were not shutdown cleanly, creating consumers will fail on any of the remaining live brokers.

Try to do the following:

1) Setup a cluster of 2 brokers on 2 separate physical hosts. Destinations that will be created must be autocreated.

2) Setup the administered objects for the client so that they broker addresses are using PRIORITY instead of RANDOMIZE. For example, say first priority is broker A, second is broker B.

3) Startup producers and consumers.

So far so good. Now:

4) Unplug the network cable on broker B.

5) Create a new connection on broker A, a new session, and create a consumer. In this case, creating the consumer will time out.

The log on broker A will say something like this:

[01/Sep/2006:14:51:24 EST] [B1065]: Accepting: guest@192.168.0.188:42138->jms:40746. Count: service=92 broker=92

[01/Sep/2006:14:51:24 EST] Creatednew session 1516883388605893120 on connection 1516883388605891072

[01/Sep/2006:14:51:24 EST] Creatednew consumer [consumer:1516883388605894400, type=NONE] on destination Q:mBloxUKThreeP1G1DeliveryReceipts with selectornull

[01/Sep/2006:14:51:24 EST] Sending resource lock requestfor : queue:mBloxUKThreeP1G1DeliveryReceipts_5

Expecting responses from :

soft3-0/192.168.0.76:7676 (imqbroker, mq://192.168.0.76:7676/, [3263998569023120384])

[01/Sep/2006:14:51:24 EST] SENDING PACKET :

Packet: 11

Version: 350Size: 125Magic: 2147476418

Timestamp: 0Sequence: 0Bits: B

Prop Size: 89:{I=queue:mBloxUKThreeP1G1DeliveryReceipts_5, SH=false, TS=0, X=1157086284688}

Payload Size: 0

[01/Sep/2006:14:52:24 EST] WARNING [B2062]: Lock request timed outfor resid = queue:mBloxUKThreeP1G1DeliveryReceipts_5.

Following brokers did not respond :

soft3-0/192.168.0.76:7676 (imqbroker, mq://192.168.0.76:7676/, [3263998569023120384])

and the client side will throw an exception like this:

2006-09-01 15:16:17,144 [FailOverBackup-soft3-0] ERROR - Unable to complete JMS initialisationfor [mBloxUK - ThreeP1G1]

com.sun.messaging.jms.JMSException: [C4000]: Packet acknowledge failed. user=guest, broker=192.168.0.188:7676(48038)

at com.sun.messaging.jmq.jmsclient.ProtocolHandler.writePacketWithAck(ProtocolHandler.java:694)

at com.sun.messaging.jmq.jmsclient.ProtocolHandler.writePacketWithAck(ProtocolHandler.java:555)

at com.sun.messaging.jmq.jmsclient.ProtocolHandler.writePacketWithReply(ProtocolHandler.java:410)

at com.sun.messaging.jmq.jmsclient.ProtocolHandler.addInterest(ProtocolHandler.java:2163)

at com.sun.messaging.jmq.jmsclient.WriteChannel.addInterest(WriteChannel.java:75)

at com.sun.messaging.jmq.jmsclient.ConnectionImpl.addInterest(ConnectionImpl.java:1227)

at com.sun.messaging.jmq.jmsclient.Consumer.registerInterest(Consumer.java:134)

at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.addInterest(MessageConsumerImpl.java:160)

at com.sun.messaging.jmq.jmsclient.MessageConsumerImpl.init(MessageConsumerImpl.java:147)

at com.sun.messaging.jmq.jmsclient.QueueReceiverImpl.<init>(QueueReceiverImpl.java:68)

at com.sun.messaging.jmq.jmsclient.UnifiedSessionImpl.createReceiver(UnifiedSessionImpl.java:139)

at com.sun.messaging.jmq.jmsclient.UnifiedSessionImpl.createConsumer(UnifiedSessionImpl.java:621)

at com.sun.messaging.jmq.jmsclient.UnifiedSessionImpl.createConsumer(UnifiedSessionImpl.java:539)

at softgame.gateway.connectors.smpp.DeliveryReceiptJMSConsumerImpl.init(DeliveryReceiptJMSConsumerImpl.java:69)

You can see from the above that the lock timeout is 60 seconds. I tried adding the following in the config.properties for my instance:

imq.cluster.locktimeout=50

... but this does not seem to have an effect, although I could see references to this in the source code and on the net.

What to do in this case ?

[4399 byte] By [j.salvo] at [2007-11-14]
# 1
Forgot to mention that I am using file-based persistence.
jsalvo at 2007-7-7 > top of java,Application & Integration Servers,Sun Java System Message Queue...
# 2

Looks like there is no workaround for this problem except to restart the remaining brokers.

Creating a producer is not a problem, as the messages are sent by the client to its home broker. Problem is that when creating a consumer, the home broker has to ask the other brokers in the cluster to lock something, and of course cannot get a lock so the home broker throws a lock timeout exception.

jsalvo at 2007-7-7 > top of java,Application & Integration Servers,Sun Java System Message Queue...