RHEL305 x86_64 on W2100z X logout kernel crash
When I logout of an Xsession (gnome/kde/fvwm2) the system
hangs (power cycle required) fairly often.
This happens when using the nv or nvidia (x86_64-1.0-7667) drivers
and kdm or gdm. When I'm using the nvidia driver, after the
logout the nvdia splash screen stays permanently up.
This happens on wkstns with an NVS280 and FX3000 nvidia card.
Has anyone experienced similar issues or able to
give any pointers if it's hardware or software?
Below are messages from /var/log/messages after a crash:-
Aug 11 13:30:09 fourier gdm(pam_unix)[3699]: session closed for user
Aug 11 13:30:39 fourier kernel: NMI Watchdog detected LOCKUP on CPU0, rip ffffffff80123e85, registers:
Aug 11 13:30:39 fourier kernel: CPU 0
Aug 11 13:30:39 fourier kernel: Pid: 13272, comm: X Not tainted
Aug 11 13:30:39 fourier kernel: RIP: 0010:[<ffffffff80123e85>]{.text.lock.fork+39}
Aug 11 13:30:39 fourier kernel: RSP: 0018:00000100202e7e38 EFLAGS: 00000086
Aug 12 10:08:14 fourier gpm[3581]: oops() invoked from gpm.c(164)
Aug 12 10:08:14 fourier gpm[3581]: /dev/tty0: Input/output error
Aug 12 10:08:14 fourier kernel: NVRM: Xid: 16, Head 00000000 Count 004ab839
Aug 12 10:08:14 fourier kernel: NVRM: Xid: 8, Channel 0000001e
Aug 12 10:08:14 fourier kernel: NVRM: Xid: 16, Head 00000000 Count 004ab83a
Aug 12 10:08:14 fourier kernel: NVRM: Xid: 8, Channel 0000001e
Aug 12 10:08:14 fourier kernel: NVRM: Xid: 16, Head 00000000 Count 004ab83b
Aug 12 10:08:14 fourier kernel: NVRM: Xid: 8, Channel 0000001e
Aug 12 10:08:14 fourier kernel: NMI Watchdog detected LOCKUP on CPU0, rip ffffffff80123e85, registers:
Aug 12 10:08:14 fourier kernel: CPU 0
Aug 12 10:08:14 fourier kernel: Pid: 7618, comm: X Tainted: P
Aug 12 10:08:14 fourier kernel: RIP: 0010:[<ffffffff80123e85>]{.text.lock.fork+39}
Aug 12 10:08:14 fourier kernel: RSP: 0018:0000010079d45e38 EFLAGS: 0000008
Aug 12 13:10:00 fourier kde(pam_unix)[5751]: session closed for user
Aug 12 13:10:30 fourier kernel: NVRM: Xid: 16, Head 00000000 Count 0000b182
Aug 12 13:10:30 fourier kernel: NVRM: Xid: 8, Channel 0000001e
Aug 12 13:10:30 fourier kernel: NVRM: Xid: 16, Head 00000000 Count 0000b183
Aug 12 13:10:30 fourier kernel: NVRM: Xid: 8, Channel 0000001e
Aug 12 13:10:30 fourier kernel: NVRM: Xid: 16, Head 00000000 Count 0000b184
Aug 12 13:10:30 fourier kernel: NVRM: Xid: 8, Channel 0000001e
Aug 12 13:10:30 fourier kernel: NMI Watchdog detected LOCKUP on CPU0, rip ffffffff80123e85, registers:
Aug 12 13:10:30 fourier kernel: CPU 0
Aug 12 13:10:30 fourier kernel: Pid: 4200, comm: X Tainted: P
Aug 12 13:10:30 fourier kernel: RIP: 0010:[<ffffffff80123e85>]{.text.lock.fork+39}
[3313 byte] By [
] at [2007-12-21]

<table border="0" align="center" width="90%" cellpadding="3" cellspacing="1"><tr><td class="SmallText"><b>davidch wrote on Thu, 18 August 2005 18:40</b></td></tr><tr><td class="quote">
Ronnie,
I'm having the same exact problem with the 32bit version of the driver x86_32-1.0-7667 under gnome/kde running RHEL 3 (Kernel 2.4.21-32 EL-SMP). Basically when a user logs out the machine freezes (about 80% of the time) and the only way to get it back is a hard reboot... No ssh/pinging access avaible. The freezes also sometimes occur when doing an init 3 (under root) or just rebooting.
This is currently happening on (4 out of 4) Sun W2100z machines (AMD 250 Proc, 4G Ram, Nvidia FX3000). I tried dowgrading to the 6229 driver included on the newest Sun Supplemental CDROM 2.1 recently posted on Sun's support download. It seems slightly better but definitely does not fix the problem. I also have 2 W1100z machines (AMD 150, 1G Ram, Nvidia FX500) and neither has had a lockup. Resolutions are 1920x1200 on Sun 24" FP Monitors and 1200x1024x60 on Sun 19" FP Monitors
Any help on this greatly appreciated.
- David
</td></tr></table>
Hi David,
Although of little consolation to yourself I'm sort of pleased
that somebody else is having the same issues!
I too tried the NVidia driver on the Sun Supplemental CDROM 2.1
and also upgraded the Bios from this CD. Like you say I found it
to be better but it still hung up an unacceptable amount of times.
I decided to try an install of RHEL4.1 on Friday to see if that would
result in an improvement. No problems thus far 3 days in which
at least is giving me some hope.
If this too falls over my next step was going to buy RHEL support
through Sun.
at 2007-7-5 >

<table border="0" align="center" width="90%" cellpadding="3" cellspacing="1"><tr><td class="SmallText"><b>davidch wrote on Wed, 24 August 2005 16:11</b></td></tr><tr><td class="quote">
Ronnie,
Glad to hear about 4.1 success, but that's not a short term option for me. I work for a large university and we just got 3 going and a large number of seats working outside of the W2100z... with no plans to move to 4.x until spring at the earliest. The machines are "officially" certifed to run RHEL 3, so this should work. Other machines that have been successfull with our RHEL 3 builds are Dell Optiplex and the Sun W1100z.
Anyway, I've done some more testing. I replaced the FX3000 with the FX500 from an W1100z and it still routinely freezes, so this leads me to believe its specific to the W2100z motherboard. I still haven't received any solutions from Sun, but still have a "case" out.
- David
</td></tr></table>
The machine running 41 has now been up for almost a week.
I should confess that it is Scientific Linux (a RHEL rebuild) we are
using rather than RHEL itself.
I've had 3.0x on Optiplex GX270/GX280s for several months with
no problems.
As most of the w2100z are for academic desktops I may still standardize
to 3.0x on our other wkstns. But I'm just dealing with a single University department.
As an afterthought, have you tried installing a 2.6 kernel with
RHEL3.0x to see if that made a difference or does that create
too many dependency issues?
Please let me know if you get any resolution from Sun.
at 2007-7-5 >

Hello,
this post was "imported" from the Sun Hardware Support Forum.
I have no clue when it was originally posted, but the last answer wasn't posted before September 2005...
- All "imported" messages from supportforum.sun.com bear the same date (Mar 30, 2006).
If someone isn't aware of this fact, he/she asks maybe for further information on a
problem/question of a few months ago.
- Due to the fact that the members of supportforums.sun.com had to re-register,
if someone answers one of these imported messages, no notification is sent, because
the original poster is un-registered (even if he/she re-registered with the same
"screenname").
If you can display the profile when clicking on the username, it's likely that a notification is sent (if enabled in the profile), otherwise the user is unregistered and won't get a notification.
maal/maalatft