RHEL305 x86_64 on W2100z X logout kernel crash

When I logout of an Xsession (gnome/kde/fvwm2) the system

hangs (power cycle required) fairly often.

This happens when using the nv or nvidia (x86_64-1.0-7667) drivers

and kdm or gdm. When I'm using the nvidia driver, after the

logout the nvdia splash screen stays permanently up.

This happens on wkstns with an NVS280 and FX3000 nvidia card.

Has anyone experienced similar issues or able to

give any pointers if it's hardware or software?

Below are messages from /var/log/messages after a crash:-

Aug 11 13:30:09 fourier gdm(pam_unix)[3699]: session closed for user

Aug 11 13:30:39 fourier kernel: NMI Watchdog detected LOCKUP on CPU0, rip ffffffff80123e85, registers:

Aug 11 13:30:39 fourier kernel: CPU 0

Aug 11 13:30:39 fourier kernel: Pid: 13272, comm: X Not tainted

Aug 11 13:30:39 fourier kernel: RIP: 0010:[<ffffffff80123e85>]{.text.lock.fork+39}

Aug 11 13:30:39 fourier kernel: RSP: 0018:00000100202e7e38 EFLAGS: 00000086

Aug 12 10:08:14 fourier gpm[3581]: oops() invoked from gpm.c(164)

Aug 12 10:08:14 fourier gpm[3581]: /dev/tty0: Input/output error

Aug 12 10:08:14 fourier kernel: NVRM: Xid: 16, Head 00000000 Count 004ab839

Aug 12 10:08:14 fourier kernel: NVRM: Xid: 8, Channel 0000001e

Aug 12 10:08:14 fourier kernel: NVRM: Xid: 16, Head 00000000 Count 004ab83a

Aug 12 10:08:14 fourier kernel: NVRM: Xid: 8, Channel 0000001e

Aug 12 10:08:14 fourier kernel: NVRM: Xid: 16, Head 00000000 Count 004ab83b

Aug 12 10:08:14 fourier kernel: NVRM: Xid: 8, Channel 0000001e

Aug 12 10:08:14 fourier kernel: NMI Watchdog detected LOCKUP on CPU0, rip ffffffff80123e85, registers:

Aug 12 10:08:14 fourier kernel: CPU 0

Aug 12 10:08:14 fourier kernel: Pid: 7618, comm: X Tainted: P

Aug 12 10:08:14 fourier kernel: RIP: 0010:[<ffffffff80123e85>]{.text.lock.fork+39}

Aug 12 10:08:14 fourier kernel: RSP: 0018:0000010079d45e38 EFLAGS: 0000008

Aug 12 13:10:00 fourier kde(pam_unix)[5751]: session closed for user

Aug 12 13:10:30 fourier kernel: NVRM: Xid: 16, Head 00000000 Count 0000b182

Aug 12 13:10:30 fourier kernel: NVRM: Xid: 8, Channel 0000001e

Aug 12 13:10:30 fourier kernel: NVRM: Xid: 16, Head 00000000 Count 0000b183

Aug 12 13:10:30 fourier kernel: NVRM: Xid: 8, Channel 0000001e

Aug 12 13:10:30 fourier kernel: NVRM: Xid: 16, Head 00000000 Count 0000b184

Aug 12 13:10:30 fourier kernel: NVRM: Xid: 8, Channel 0000001e

Aug 12 13:10:30 fourier kernel: NMI Watchdog detected LOCKUP on CPU0, rip ffffffff80123e85, registers:

Aug 12 13:10:30 fourier kernel: CPU 0

Aug 12 13:10:30 fourier kernel: Pid: 4200, comm: X Tainted: P

Aug 12 13:10:30 fourier kernel: RIP: 0010:[<ffffffff80123e85>]{.text.lock.fork+39}

[3313 byte] By [] at [2007-12-21]
# 1

Ronnie,

I'm having the same exact problem with the 32bit version of the driver x86_32-1.0-7667 under gnome/kde running RHEL 3 (Kernel 2.4.21-32 EL-SMP). Basically when a user logs out the machine freezes (about 80% of the time) and the only way to get it back is a hard reboot... No ssh/pinging access avaible. The freezes also sometimes occur when doing an init 3 (under root) or just rebooting.

This is currently happening on (4 out of 4) Sun W2100z machines (AMD 250 Proc, 4G Ram, Nvidia FX3000). I tried dowgrading to the 6229 driver included on the newest Sun Supplemental CDROM 2.1 recently posted on Sun's support download. It seems slightly better but definitely does not fix the problem. I also have 2 W1100z machines (AMD 150, 1G Ram, Nvidia FX500) and neither has had a lockup. Resolutions are 1920x1200 on Sun 24" FP Monitors and 1200x1024x60 on Sun 19" FP Monitors

Any help on this greatly appreciated.

- David

at 2007-7-5 > top of java,Sun Hardware,Other Sun Hardware...
# 2

<table border="0" align="center" width="90%" cellpadding="3" cellspacing="1"><tr><td class="SmallText"><b>davidch wrote on Thu, 18 August 2005 18:40</b></td></tr><tr><td class="quote">

Ronnie,

I'm having the same exact problem with the 32bit version of the driver x86_32-1.0-7667 under gnome/kde running RHEL 3 (Kernel 2.4.21-32 EL-SMP). Basically when a user logs out the machine freezes (about 80% of the time) and the only way to get it back is a hard reboot... No ssh/pinging access avaible. The freezes also sometimes occur when doing an init 3 (under root) or just rebooting.

This is currently happening on (4 out of 4) Sun W2100z machines (AMD 250 Proc, 4G Ram, Nvidia FX3000). I tried dowgrading to the 6229 driver included on the newest Sun Supplemental CDROM 2.1 recently posted on Sun's support download. It seems slightly better but definitely does not fix the problem. I also have 2 W1100z machines (AMD 150, 1G Ram, Nvidia FX500) and neither has had a lockup. Resolutions are 1920x1200 on Sun 24" FP Monitors and 1200x1024x60 on Sun 19" FP Monitors

Any help on this greatly appreciated.

- David

</td></tr></table>

Hi David,

Although of little consolation to yourself I'm sort of pleased

that somebody else is having the same issues!

I too tried the NVidia driver on the Sun Supplemental CDROM 2.1

and also upgraded the Bios from this CD. Like you say I found it

to be better but it still hung up an unacceptable amount of times.

I decided to try an install of RHEL4.1 on Friday to see if that would

result in an improvement. No problems thus far 3 days in which

at least is giving me some hope.

If this too falls over my next step was going to buy RHEL support

through Sun.

at 2007-7-5 > top of java,Sun Hardware,Other Sun Hardware...
# 3

Ronnie,

Glad to hear about 4.1 success, but that's not a short term option for me. I work for a large university and we just got 3 going and a large number of seats working outside of the W2100z... with no plans to move to 4.x until spring at the earliest. The machines are "officially" certifed to run RHEL 3, so this should work. Other machines that have been successfull with our RHEL 3 builds are Dell Optiplex and the Sun W1100z.

Anyway, I've done some more testing. I replaced the FX3000 with the FX500 from an W1100z and it still routinely freezes, so this leads me to believe its specific to the W2100z motherboard. I still haven't received any solutions from Sun, but still have a "case" out.

- David

at 2007-7-5 > top of java,Sun Hardware,Other Sun Hardware...
# 4

<table border="0" align="center" width="90%" cellpadding="3" cellspacing="1"><tr><td class="SmallText"><b>davidch wrote on Wed, 24 August 2005 16:11</b></td></tr><tr><td class="quote">

Ronnie,

Glad to hear about 4.1 success, but that's not a short term option for me. I work for a large university and we just got 3 going and a large number of seats working outside of the W2100z... with no plans to move to 4.x until spring at the earliest. The machines are "officially" certifed to run RHEL 3, so this should work. Other machines that have been successfull with our RHEL 3 builds are Dell Optiplex and the Sun W1100z.

Anyway, I've done some more testing. I replaced the FX3000 with the FX500 from an W1100z and it still routinely freezes, so this leads me to believe its specific to the W2100z motherboard. I still haven't received any solutions from Sun, but still have a "case" out.

- David

</td></tr></table>

The machine running 41 has now been up for almost a week.

I should confess that it is Scientific Linux (a RHEL rebuild) we are

using rather than RHEL itself.

I've had 3.0x on Optiplex GX270/GX280s for several months with

no problems.

As most of the w2100z are for academic desktops I may still standardize

to 3.0x on our other wkstns. But I'm just dealing with a single University department.

As an afterthought, have you tried installing a 2.6 kernel with

RHEL3.0x to see if that made a difference or does that create

too many dependency issues?

Please let me know if you get any resolution from Sun.

at 2007-7-5 > top of java,Sun Hardware,Other Sun Hardware...
# 5
Hi, to resolve these crashes 1) install nvidia 1.0-8178 2) install RHEL3 Kernel 2.4.21-38.ELsmp from RHEL3 beta channel (or wait for RHEL3 Update 7) gmyy
at 2007-7-5 > top of java,Sun Hardware,Other Sun Hardware...
# 6

I too am seeing simular issues can you please provide further details to the BUG number/s if you know them or details were this is stated to be resolved, or is this via trial and error as it would be great to provide this resolution to others.

Thanks very much for providing this resolution.

MichaelJohnson at 2007-7-5 > top of java,Sun Hardware,Other Sun Hardware...
# 7

Hello,

this post was "imported" from the Sun Hardware Support Forum.

I have no clue when it was originally posted, but the last answer wasn't posted before September 2005...

- All "imported" messages from supportforum.sun.com bear the same date (Mar 30, 2006).

If someone isn't aware of this fact, he/she asks maybe for further information on a

problem/question of a few months ago.

- Due to the fact that the members of supportforums.sun.com had to re-register,

if someone answers one of these imported messages, no notification is sent, because

the original poster is un-registered (even if he/she re-registered with the same

"screenname").

If you can display the profile when clicking on the username, it's likely that a notification is sent (if enabled in the profile), otherwise the user is unregistered and won't get a notification.

maal/maalatft

MAALATFT at 2007-7-5 > top of java,Sun Hardware,Other Sun Hardware...