Author Topic: GNE not working when server and client are in the same process  (Read 6399 times)

Haukinger

  • User
  • *
  • Posts: 2
GNE not working when server and client are in the same process
« on: January 10, 2007, 04:48:08 PM »
Hi all !

I'm using GNE for developing a game on WinXP using Visual Studio 2005. Everything works as it should but for one thing: if the server and the client live inside the same process, waitForConnect() never returns. It works fine if I start the server and the client seperately and have the client connect to localhost, but the server-process can't connect to itself.

I use gnelib r653 from svn/trunk with HawnkNL 1.68

gne.log:
Code: [Select]
ServerConnectionListener.cpp, line   42, thrd     main: 1309150: created
ServerConnectionListener.cpp, line   98, thrd     main: 1309150: Registering listen socket 0
                  Thread.cpp, line  124, thrd     main: 1309994: Thread created
        ClientConnection.cpp, line   47, thrd     main: 1309960: created
                  Thread.cpp, line  124, thrd     main: 1309bb0: Thread created
             EventThread.cpp, line   37, thrd     main: 1309bb0: created
                  Thread.cpp, line  318, thrd     main: 1309994: Starting Thread CliConn
        ClientConnection.cpp, line  118, thrd  CliConn: 1309960: Trying to connect to 127.0.0.1:1488
        ClientConnection.cpp, line  205, thrd  CliConn: 1309960: Sending the CRP.
        ClientConnection.cpp, line  211, thrd  CliConn: 1309960: Waiting for the CAP.
                  Thread.cpp, line  124, thrd EventGen: 12e91e4: Thread created
        ServerConnection.cpp, line   50, thrd EventGen: 12e91b0: created
                  Thread.cpp, line  124, thrd EventGen: 12e9788: Thread created
             EventThread.cpp, line   37, thrd EventGen: 12e9788: created
ServerConnectionListener.cpp, line  140, thrd EventGen: 1309150: Spawning a new ServerConnection 12e91b0 on socket 2
                  Thread.cpp, line  318, thrd EventGen: 12e91e4: Starting Thread SrvrConn
        ServerConnection.cpp, line  106, thrd SrvrConn: 12e91b0: New connection incoming from 127.0.0.1:1025
        ServerConnection.cpp, line  166, thrd SrvrConn: 12e91b0: Waiting for the client's CRP.
                  Thread.cpp, line  124, thrd SrvrConn: 12e9d80: Thread created
            PacketStream.cpp, line  380, thrd SrvrConn: 12e9d80:   Negotiated current rate: 0, rate step: 0
            PacketStream.cpp, line   63, thrd SrvrConn: 12e9d80: PacketStream negotiated: max: 0 requested: 0
            PacketStream.cpp, line   64, thrd SrvrConn: 12e9d80: created
        ServerConnection.cpp, line  181, thrd SrvrConn: 12e91b0: Got CRP, now sending CAP.
        ServerConnection.cpp, line  261, thrd SrvrConn: 12e91b0: Sent a CAP with 12 bytes.
        ServerConnection.cpp, line  189, thrd SrvrConn: 12e91b0: Unreliable connection not requested or refused.
        ServerConnection.cpp, line  115, thrd SrvrConn: 12e91b0: GNE Protocol Handshake Successful.
                  Thread.cpp, line  124, thrd  CliConn: 12ea420: Thread created
          SyncConnection.cpp, line   39, thrd SrvrConn: 12ea590: created
            PacketStream.cpp, line  380, thrd  CliConn: 12ea420:   Negotiated current rate: 0, rate step: 0
          SyncConnection.cpp, line   40, thrd SrvrConn: 12ea590: Wrapping Connection 12e91b0 into a SyncConnection.
            PacketStream.cpp, line   63, thrd  CliConn: 12ea420: PacketStream negotiated: max: 0 requested: 0
            PacketStream.cpp, line   64, thrd  CliConn: 12ea420: created
        ClientConnection.cpp, line  218, thrd  CliConn: 1309960: Unreliable connection not requested.

Gillius

  • Administrator
  • User
  • *****
  • Posts: 147
    • http://www.gillius.org/
Re: GNE not working when server and client are in the same process
« Reply #1 on: January 10, 2007, 05:13:50 PM »
To me, it could be a deadlock issue. Or it could be an event-thread scheduling issue because I doubt I've ever tested connecting a client and server socket in the same process and when I combined everything into a single thread there could be a deadlock issue. Really the thing to do here is to see what state all of the threads are in when the process hangs (via the debugger) then I can see the deadlock condition. I had a lot of problems some years ago with deadlocks in GNE when I was not as proficient with threads, and I wouldn't be at all surprised if this is another one of them.
Gillius
Gillius's Programming http://www.gillius.org/

Haukinger

  • User
  • *
  • Posts: 2
Re: GNE not working when server and client are in the same process
« Reply #2 on: January 11, 2007, 03:50:02 AM »
My working pc (the one where it doesn't work ;) ) is one of those nifty double-core-intels, while the notebook runs with one core. One more pointing to deadlock...

EDIT: when I bind my code to one core (using taskman), everything works fine.  just tested.

EDIT: looks like the main-thread and the client-thread both hang in void Mutex::markAcquired(). If compiled as release everything works.
« Last Edit: January 11, 2007, 04:31:36 AM by Haukinger »

Gillius

  • Administrator
  • User
  • *****
  • Posts: 147
    • http://www.gillius.org/
Re: GNE not working when server and client are in the same process
« Reply #3 on: January 11, 2007, 08:11:48 AM »
Hmm I just bought a dual-core PC 3 weeks ago but I haven't run any GNE programs on it yet but I will try the one that you sent me. I'll need to look into it but I will see when I can find time, perhaps when I get home from work today.
Gillius
Gillius's Programming http://www.gillius.org/

Gillius

  • Administrator
  • User
  • *****
  • Posts: 147
    • http://www.gillius.org/
Re: GNE not working when server and client are in the same process
« Reply #4 on: January 11, 2007, 10:26:37 PM »
I have found your deadlock bug. I analyzed all of the threads from your trace:

Code: [Select]
CEG thread is waiting to get mapSync for currentThread
 has: mapCtrl

Cli conn thread is in the middle of getting current thread
 has: doTrace lock

main thread is joining on cli conn thread

server conn thread is in the middle of acquiring doTrace lock
 has: mapSync

new thread starting is trying to acquire mapSync
 has: nothing

The problem is the system that I put in place to help debug mutex deadlocks in the first place! When a mutex is acquired, it is marked as acquired by the current thread. In order to do that I call currentThread, which has to lock a mutex itself to do a lookup in the thread map. This thread map is under the protection of "mapSync".

Here is the deadlock -- startThread calls doTrace (debug logging) while mapSync is locked. doTrace calls currentThread, but to make messages come out sequential, doTrace has its own mutex. So when it tries to acquire the mutex the debugging code calls currentThread. I think the only vulnerable line to this bug is the gnedbgo1 call in Thread::start on line 318, since the lock reversal can only happen when mapSync is locked when doTrace is called and this happens only in Thread::start.

The solution can be to disable the mutex locking code, or to comment out the gnedbgo1 line in Thread::start. Another alternative is to move that gnedbgo1 statement out of the lock, but if you do so then the "starting thread" debug statement might actually be written after the other thread has started.

I'm not sure what solution I will take but it is clear there is a lot of complex locking going on in statements that you assume are "safe" for example calling the debug log and of course locking a mutex shouldn't cause a deadlock itself. I'm not sure how to get around this in C++ without resorting to thread-specific variables, which I don't know how to do and I'm not sure if it can be done portably. Really if I can put the current Thread into a thread-specific variable, this problem would be solved and the code a lot more clean.
« Last Edit: January 11, 2007, 10:28:30 PM by Gillius »
Gillius
Gillius's Programming http://www.gillius.org/

Gillius

  • Administrator
  • User
  • *****
  • Posts: 147
    • http://www.gillius.org/
Re: GNE not working when server and client are in the same process
« Reply #5 on: January 29, 2007, 04:22:31 PM »
I forgot to mention, I committed a change to fix this deadlock in the latest Subversion, for anyone else who was listening.
Gillius
Gillius's Programming http://www.gillius.org/