Author Topic: client not properly shutdown, can not reconnect  (Read 4592 times)

archer

  • User
  • *
  • Posts: 2
client not properly shutdown, can not reconnect
« on: November 16, 2008, 08:28:42 PM »
Hi all,

I’m using Delta3d 2.1 under Windows XP, Visual Studio 2005, gne 0.7.

I have a server running, accepting connections. A client connect to the server, quit not properly, for example, end client process in taskmgr.exe. then next time, the client try to reconnect to the server, it will block forever. I think it block in the Thread::join.
------------------------------------------
void Thread::join() {
  assert( started );
  if ( started ) {
    LockMutex lock( joinSync );
    if ( !joined ) {
#ifdef WIN32
      valassert(WaitForSingleObject( id->hThread, INFINITE), WAIT_OBJECT_0);
#else
      valassert(pthread_join( id->thread_id, NULL ), 0);
#endif
      joined = true;
    }
  }
}
-------------------------------------------
I try to use ‘3000’ instead of ‘INFINITE’, the client will not block, but can not connect to the server unless the server restart.

--------------------------------------------
Here is the net.log
Thread.cpp, line  124, thrd     main: 5d8f0ec: Thread created
ClientConnection.cpp, line   47, thrd     main: 5d8f0b8: created
Thread.cpp, line  124, thrd     main: 5d90280: Thread created
EventThread.cpp, line   37, thrd     main: 5d90280: created
Thread.cpp, line  324, thrd     main: 5d8f0ec: Starting Thread CliConn
ClientConnection.cpp, line  118, thrd  CliConn: 5d8f0b8: Trying to connect to 192.168.1.32:4444
ClientConnection.cpp, line  205, thrd  CliConn: 5d8f0b8: Sending the CRP.
ClientConnection.cpp, line  211, thrd  CliConn: 5d8f0b8: Waiting for the CAP.
Thread.cpp, line  124, thrd  CliConn: 5d90a08: Thread created
PacketStream.cpp, line  380, thrd  CliConn: 5d90a08:   Negotiated current rate: 0, rate step: 0
PacketStream.cpp, line   63, thrd  CliConn: 5d90a08: PacketStream negotiated: max: 0 requested: 0
PacketStream.cpp, line   64, thrd  CliConn: 5d90a08: created
ClientConnection.cpp, line  215, thrd  CliConn: 5d8f0b8: Setting up the unreliable connection
ClientConnection.cpp, line  136, thrd  CliConn: 5d8f0b8: GNE Protocol Handshake Successful.
SyncConnection.cpp, line   39, thrd  CliConn: 5d91c58: created
SyncConnection.cpp, line   40, thrd  CliConn: 5d91c58: Wrapping Connection 5d8f0b8 into a SyncConnection.
Thread.cpp, line  324, thrd  CliConn: 5d90a08: Starting Thread PktStrm
Thread.cpp, line  324, thrd  CliConn: 5d90280: Starting Thread EventThr
Connection.cpp, line  348, thrd  CliConn: 5d8f0b8: Registered reliable socket 0
Connection.cpp, line  352, thrd  CliConn: 5d8f0b8: Registered unreliable socket 1
ClientConnection.cpp, line  163, thrd  CliConn: 5d8f0b8: Starting onConnect r: 0, u: 1
SyncConnection.cpp, line  138, thrd  CliConn: 5d91c58: Releasing Connection 5d8f0b8
SyncConnection.cpp, line   58, thrd  CliConn: 5d91c58: destroyed
Thread.cpp, line  112, thrd  CliConn: 5d8f0ec: Thread CliConn Ending
--------------------------------------------

Is it a bug? Or could you give me some suggestions?
Think you very much

archer

Gillius

  • Administrator
  • User
  • *****
  • Posts: 147
    • http://www.gillius.org/
Re: client not properly shutdown, can not reconnect
« Reply #1 on: November 16, 2008, 09:33:05 PM »
I wonder if you are affected by the following bug fixed since 0.70:

Quote
Moved a "thread starting" debugging call that was causing a deadlock, typically on multi-CPU or dual-core systems.

In general, 0.70 is very old (of course, so is the latest unreleased version). I haven't worked on GNE actively for several years so it is hard for me to say. Do you know for certain that it blocks in join? The join should be pretty safe, and it is expected that it would block there waiting for some thread. However, if the thread isn't ending (because it is deadlocked), then the join waiting on it will never end either (but it isn't the problem).
Gillius
Gillius's Programming http://www.gillius.org/

archer

  • User
  • *
  • Posts: 2
Re: client not properly shutdown, can not reconnect
« Reply #2 on: November 18, 2008, 04:23:23 AM »
I appreciate for your quick reply.

I think I find the problem.
"int numsockets = nlPollGroup(group, NL_READ_STATUS, sockBuf, NL_MAX_GROUP_SOCKETS, 250);"
In ConnectionEventGenerator.cpp. if the client not proplerly shutdown, the function nlPollGroup will receive no error, produce no tips. it leads the code "listener->onReceive();" block forever, which waiting data to receive. so the server will not receive any other new client.
so it should be the HawkNL problem

archer

Gillius

  • Administrator
  • User
  • *****
  • Posts: 147
    • http://www.gillius.org/
Re: client not properly shutdown, can not reconnect
« Reply #3 on: November 20, 2008, 08:49:03 AM »
Sorry for the delayed reply.

So, normally when a socket in the group is closed, that method returns immediately with an error. But, you are saying if a client is terminated improperly it doesn't? OK, is this client on the same machine as the server, or a different machine? Right now GNE does use TCP under the hood as well as UDP. If a remote machine goes away completely and doesn't send a disconnect, then the server might not know about it for some time. However, I thought that most operating systems will close a "leaked" socket from a crashed app.

But, what I am talking about wouldn't cause your original problem with not being able to reconnect. Unless, what you are saying is that the nlPollGroup returned that the sockets were ready and GNE interpreted that as some data was received and it tried to read it and blocked?

I think I'll have to look into this a little more to understand what you've found. It's been too long since I've worked with this code.
Gillius
Gillius's Programming http://www.gillius.org/

Gillius

  • Administrator
  • User
  • *****
  • Posts: 147
    • http://www.gillius.org/
Re: client not properly shutdown, can not reconnect
« Reply #4 on: November 20, 2008, 08:54:40 AM »
Oh, the point that I ended up wanting to make is that HawkNL hasn't been maintained for quite a few years (and of course, GNE since 2004). So, we should check to make sure that there is no fault in GNE. If there isn't, then we should check if there is a reasonable way to workaround the problem. If there isn't, then we should try to fix HawkNL. But, I bet that Mr. Frisbee (author of HawkNL) probably is in the same state as I am with GNE and not really able to give time to maintain the library, so a fork might be needed in this case, unless it was fixed in the 1.7 beta series (but I think when I tried that with GNE before there were problems).
Gillius
Gillius's Programming http://www.gillius.org/