PDA

View Full Version : Network problem?? causing applications freeze



mauriceb
10-27-2007, 10:12 PM
I have 6 networked computers in my SIM. Prior to my dismantling everything to build & install my rudders everything was working fine. After putting everything back together, SIM worked OK for about an hour and then all **** broke loose.

I'm now getting intermittent lockups of all PM & Phidgets applications. It can work from one to 5 hours and then all applications freeze except FS9. I can access all PCs via VNC viewer so network connectivity seems OK and there us nothing abnormal in all event viewers. I can recover by restarting the various wide clients & applications without re-booting PCs.

I replaced the Ethernet switch where all PCs are connected with no change. The only error indication I am getting is in the wide server log in the main FS PC ((I use UDP instead of TCP as is recommended). Here is a typical error:

20081000 Connected to computer "SIM3" running WideClient version 6.710 (IP=192.168.1.103) UDP
20081000 **** ERROR! Sumcheck or length fails on received socket 7624 block, len=206 (time=0)
20086156 Connected to computer "SIM4" running WideClient version 6.700 (IP=192.168.1.104) UDP.

So, can anyone offer any suggestions on what to try next? I'm rapidly running out of ideas here. My next attempt will likely be to use a new Ethernet card in main FS PC instead of built in Ethernet port on motherboard. I doubt this is the problem as there is no other indication of network problems on main FS PC. Access to Internet is quick and nothing unusual in event viewer.


Any help/suggestions would be greatly appreciated, especially from Peter Dowson :wink: whom I hope can shed some light about the wide server log.

Thanks,
Maurice

Deesystems
10-27-2007, 10:45 PM
" replaced the Ethernet switch where all PCs are connected with no change"

Id look at the cables, there cheeper then a router and Nic's

Do a netstat -a at a cmd prompt and see what it shows.

Post the info here if needed.


Dee

Hit anyuser to continue.

mauriceb
10-27-2007, 11:01 PM
[QUOTE=Deesystems;44239]"
Id look at the cables, there cheeper then a router and Nic's.

Do a netstat -a at a cmd prompt and see what it shows.

Dee
QUOTE]

Hi Dee,
Replacing the cable between the main FS PC & the Ethernet switch was the first thing I tried. All other cables cannot all be bad.
As far netstat -a is concerned, I'll try that next time I power up & fly (it has been a very long day already :-)

Thanks,
Maurice

Peter Dowson
10-28-2007, 06:32 AM
I use UDP instead of TCP as is recommended.

Recommended where? UDP is faster for reliable networks but awful for bad networks as there's little error checking and no recovery built in. This is actually what I say in the Dox for WideFS some place.

Also, avoid UDP if you have any "loops" in the Network, as happens for instance with a Firewire looped daisy-chain arrangement. What can happen, if there are two or more routes between two of the PCs, is that blocks can arrive out of order. TCP doesn't allow that, it puts them back into order.

You are also using well out-of-date versions of WideClient (at least -- presumably WideServer too?). Current is 6.75 (with a 6.756 version of WideClient only due this week).

Not that any of these things will solve network errors, though TCP may provide recovery and thereby hide them of course.

Regards

Pete

mauriceb
10-28-2007, 09:07 AM
Thanks Pete,

I can't remember where I read about UDP but I'm sure I did not dream it, although I may have 'overlooked' the possible issues with that protocol since one is always looking for better performance in this damn 'hobby' of ours :D

I figured this was a simple enough network that there shouldn't be any problems with UDP and it was working quite well until I took it apart and put it back together.

But anyway, I'm not about to disregard your suggestions. I will change it back to TCP and update my wide clients and hope it cures my problems.

Thanks a lot for your fast response. Greatly appreciated.

Regards,
Maurice

Peter Dowson
10-28-2007, 01:06 PM
I figured this was a simple enough network that there shouldn't be any problems with UDP and it was working quite well until I took it apart and put it back together.

Yes, I agree. Something has been changed which makes it less reliable. Maybe it is also too unreliable for TCP. Even if it isn't, if it is forcing TCP to adopt recovery actions (e.g. re-sending blocks) then it could obviously impinge upon performance.

Normally a local home network should be easily fully reliable for UDP to be fault-free. TCP. of ocurse, is designed for traffic around the world, hence the checking and recovery.

So really you do need to try to find out which of your components is not as good as it used to be.


But anyway, I'm not about to disregard your suggestions. I will change it back to TCP and update my wide clients and hope it cures my problems.

It's less likely to cure them than hide them. If that gives you adequate performance then it may not matter that much.

Regards

Pete

mauriceb
10-28-2007, 09:06 PM
It's less likely to cure them than hide them. If that gives you adequate performance then it may not matter that much.
Regards
Pete

Well Pete, I did the changes you suggested and I ran the simulator for more than 8 hours without any more problems. I don't know if this was a cure or merely hiding the problem, but 'frankly my dear, I don't give a damn' :D. It's working and I am a happy camper again.

I did get 6 errors below in the log file over the 8 hours, but this is nothing compared to the many dozens I was getting before.

1113203 **** ERROR! Sumcheck or length fails on received socket 7568 block, len=39 (time=1289734)
3905656 **** ERROR! Sumcheck or length fails on received socket 7568 block, len=81 (time=4082156)
4483672 **** ERROR! Sumcheck or length fails on received socket 7568 block, len=39 (time=4660125)
13762406 **** ERROR! Sumcheck or length fails on received socket 7568 block, len=81 (time=13938609)
18487000 **** ERROR! Sumcheck or length fails on received socket 7568 block, len=81 (time=18663076)
21713969 **** ERROR! Sumcheck or length fails on received socket 7568 block, len=81 (time=21889953)

Anyway, thanks again Pete for your help.

Best regards,
Maurice

Michael Carter
10-28-2007, 09:14 PM
Are those the errors Peter spoke of that TCP fixes automatically?

mauriceb
10-28-2007, 09:24 PM
Are those the errors Peter spoke of that TCP fixes automatically?

I have no idea. The only thing I know for sure is that I was getting a constant stream of such errors before.

Maurice

Peter Dowson
10-28-2007, 09:55 PM
Well Pete, I did the changes you suggested and I ran the simulator for more than 8 hours without any more problems. I don't know if this was a cure or merely hiding the problem

It's just hiding them.


I did get 6 errors below in the log file over the 8 hours, but this is nothing compared to the many dozens I was getting before.

No, but you shouldn't get any. There's something wrong somewhere, but maybe not bad enough to fix ... yet.

Regards

Pete

Michael Carter
10-28-2007, 10:16 PM
Could a LAN or ethernet card be failing? Maybe subjected to a static shock?

Do you have a cable diagnostics box?

mauriceb
10-29-2007, 07:20 AM
Could a LAN or ethernet card be failing?
Anything is possible. Would be a strange coincidence though

Maybe subjected to a static shock?
Not likely. Never touched the inside of the any of the 5 PCs. Only reconnected network & USB cables. I did replace the Ethernet switch at some point to troubleshoot, but that made no differnece

Do you have a cable diagnostics box? No. Easier to just replace cables (I've got lots of them :)

Maurice

mauriceb
10-29-2007, 07:24 AM
There's something wrong somewhere, but maybe not bad enough to fix ... yet.
Regards
Pete

I'll keep that in mind for sure, but for now, unless anything freezes again, I'm going to leave well enough alone. The performance does not seem to be affected at all, so why quibble over a few lost packets? :D

I'm sure this will come back to haunt me though at some time in the future :roll:

Maurice

joaquim Sa Nogueira
10-29-2007, 01:41 PM
Hi,

I wonder if your problem is not a USB power related issue!
Did you check that for each computer, each USB power management is configured to not allow to cut off this device for power economy reasons ?
Just a suggestion.
Regards
Joaquim

mauriceb
10-29-2007, 01:58 PM
Hi,

I wonder if your problem is not a USB power related issue!
Did you check that for each computer, each USB power management is configured to not allow to cut off this device for power economy reasons ?
Just a suggestion.
Regards
Joaquim


I don't think so Joaquim. I use several powered USB hubs but more importantly I think, the Wideserver/Wideclient communications do not happen over USB, but over Ethernet.

Also, I was able to recover by just restarting the wide client and the associated PM program without re-booting the PC. Once a USB device is taken off-line, it does not recover until you re-boot the computer I think.

At any rate, if one or more USB ports were to fail, it should only cutoff whatever devices are connected to it and not freeze all the PM programs in 5 computers I would think (at least not in theory :-)

Thanks,

Maurice

AndyT
10-29-2007, 06:29 PM
I'm thinking MTU and a couple of other registry values here. Also, NetBui and the QoS Packet Scheduler.

This is looking like dropped packets to me. I'd reset my protocol settings to a default state and see if I'm still getting these errors.

michelmvd
10-30-2007, 05:03 AM
Hi Andy,
Can you explain a little more in detail what MTU, Netbui and packet sheduler has as influence in the network. Do we need Netbui with TCP/IP protocol ?
Many thanks for your most appreciated answer.
B. Rgds
Michel

AndyT
10-30-2007, 01:18 PM
Netbui is a really old protocol designed for hardware control across a network. Most of its functions are now built-in to TCP/IP but some of them were not added for various reasons. I just checked the TCP/IP tab in the network controls and I do not see Netbui, so you probably do not need to worry about it. Its been fully integrated into TCP v6.

MTU = Max Transmission Unit size.This tells your machine the largest packet size possible to expect over the network. If it is set to small you might get dropped packets or CRC errors in the data stream. I have mine at 1500. Over the internet, an MTU that high can cause a bit of slowdown, but nothing drastic.

QoS Packet Scheduler provides network traffic control, including rate-of-flow and prioritization services. Without it you may sit for some time waiting for a reciever to open up to allow new data to flow in. Its the traffic police for data.