Bugzilla – Bug 1268
ACE_Select_Reactor deadlocks with ACE_HAS_REACTOR_NOTIFICATION_QUEUE
Last modified: 2003-10-13 12:03:27
You need to log in before you can comment on or make changes to this bug.
Christian Gebauer Tveen <ctj@navicon.dk> reported: -------------------------------------------------------------------- I have further information to my problem here. The notification queue will eventually not work when the send mechanism is activated for each new notification; the notification queue is limited by the size of the write channel. We have tried with some workarounds that can enable the queuing mechanism, however in the end it appears that we have a performance problem. I would therefore like to increase the buffer size of the write buffer. Is this possible using pipes ? I could try to disable the ACE_HAS_STREAM_PIPES to use sockets instead of pipes Is there a good reason to use pipes instead of sockets ? > Hello, > I would like to ask if this is the proper place for a such a question > ? SHould it have been reported as a bug instead, or is the question > just not formulated in an understandable way ? > > Regards > > Christian G. Tveen > > Christian Gebauer Tveen wrote: > >> Hello, I woild very much appreciate if anybody could have a look at >> this problem as I'm about to get stuck ? >> >> ACE VERSION: 5.2.3 >> >> HOST MACHINE and OPERATING SYSTEM: >> SparcIII (Ultra 60), Solaris 2.6 >> >> COMPILER NAME AND VERSION (AND PATCHLEVEL): >> CC-4.2 patch 104631-07 >> CONTENTS OF $ACE_ROOT/ace/config.h: >> #define ACE_LEGACY_MODE >> #define ACE_HAS_REACTOR_NOTIFICATION_QUEUE >> + >> config-sunos5.6.h >> >> CONTENTS OF $ACE_ROOT/include/makeinclude/platform_macros.GNU (unless >> this isn't used in this case, e.g., with Microsoft Visual C++): >> platform_sunos5_sunc++.GNU >> >> AREA/CLASS/EXAMPLE AFFECTED: >> ACE_Select_Reactor >> >> DOES THE PROBLEM AFFECT: >> EXECUTION? >> Application deadlocks >> >> SYNOPSIS: >> A default reactor based consumer thread is deadlocked in the >> messagequeue when this is loaded with a lot of messages from a >> producer thread. >> >> DESCRIPTION: >> Especially at startup I get lots of messages queued up, and when the >> internal notification_queue reaches approx 1155 messages the >> message_queue/reactor deadlocks. Pls. see the stack traces in the >> end. It appears that a message_queue lock is tried acquired in >> deque_head, but the lock is held by putq, which cannot 'put' due to >> blocking send on a socket. Maybe the block happens if the socket has >> an 8kb internal queue and each send puts 8 bytes on the socket ? >> >> My problem might be related to bug 1175. >> >> REPEAT BY: >> heavy load >> >> SAMPLE FIX/WORKAROUND:: >> Still only in our code ---------------------------- Cut Here -------------------------------- Reason why this occurs is because: - The TP_Reactor dispatches threads on a per-event basis ie. as soon as it sees an event it will dispatch a thread for the event handler. Since notification is an event we need to dispatch to the notification handler in the same way. - The TP_Reactor cannot work with just one message in the pipe as the Select_Reactor does. The TP_Reactor reads a message of the pipe and then goes ahead (removes just one message of the queue if needed) and then dispatches. - If message is removed off the pipe, we need more notifications for others in the queue. If we dont the TP_Reactor may block on select (). - When Bala was fixing things for the TP_Reactor before 1.2, Bala and Irfan decided to err on the side of the TP_Reactor - Bala can fix it for select_reactor but it is going to be a problem for the AC_TP_Reactor anyway. The reason why this deadlocks is because it was decided to make it that way for the ACE_TP_Reactor. Dr. Schmidt's suggestion was this --------------------- Cut Here -------------------------------------- . For the ACE_Select_Reactor let's reapply the approach that the Siemens guys had since that'll enable people to have a very scalable solution. . For the ACE_TP_Reactor we can simply change this stuff so that rather than writing/reading 8 bytes to the pipe, we'll simply write/read 1 byte to the pipe and store the ACE_Notification_Buffer in the message queue. This isn't as scalable as the ACE_Select_Reactor approach, but it'll be 8 times larger than the current approach! ------------------------------------------------------------------- We are going with Dr. Schmidt's suggestion.
Assigning it to Bala
Accepting it
Created an attachment (id=213) [details] Proposed patch based on discussions with Bala
Created an attachment (id=214) [details] Regression test
Created an attachment (id=223) [details] New patch based on the 1.3.1 beta kit
I found the same problems in the 5.3.1 beta kit (or bug-fix-only release or whatever it is called.) Without this patch I saw many of the TAO tests fail.
Fixed! Sun Oct 12 17:20:40 2003 Balachandran Natarajan <bala@dre.vanderbilt.edu> Thanks