Corrupt Filesystem :( *FIXED*

Crusty

Lifer
Sep 30, 2001
12,684
2
81
I've been trying to get Debian installed on a box..it's an old dual pIII rig w/ 1gb of ram. It's got a 40gb drive as the boot drive. Once I reboot from the installer and start selecting the packages I want to install through base-config everything starts getting fishy.

All the packages download just fine, but once it starts installing them, something happens to the filesystem and it gets remounted read-only, and then apt starts goin crazy cause it can't write to the drive.

The only usefull info i've found in the log files is

EXT3-fs error (device hda1): ext3_free_blocks: Freeing blocks in system zones - Block = 512, count =1
Remounting filesystem read-only
EXT3-fs error (device hda1) in start_transaction: Readonly filesystem


I've changed IDE cables, and put in a more powerful power supply and reinstalled, yet it still does it.

The data on the drive is not important, just a base debian install.

What else can I do to troubleshoot this?

Edit: it turned out to be some bad ram, see my last post for more details.
 

drag

Elite Member
Jul 4, 2002
8,708
0
0

Maybe try running fsck on it and see what you get from that. Maybe that can correct the errors, force it to do a thourough check of your FS.

Maybe it's a screwed up harddrive.

If there is no information on the harddrive you can try to zero it out and make a new filing system.
dd if=/dev/zero of=/dev/hda

I have a couple old harddrives that have some damage on the platters that show up with that. Also if your harddrives support SMART you can probably find out how screwed up it is from that.

For instance my laptop smart logs report that is has had 59 errors over it's lifetime, which is ok... Laptop drives have a hard life. My desktop's WD 80gig, which I've had running for a long time now has reported that it has zero errors.
 

silverpig

Lifer
Jul 29, 2001
27,703
12
81
Look in your /etc/fstab to see what the boot options are for that partition. If there's an "ro" under the boot options next to the entry for that partition, get rid of it, or maybe try adding a "umask=0777" line (I think that's right...), and then rebooting.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Originally posted by: silverpig
Look in your /etc/fstab to see what the boot options are for that partition. If there's an "ro" under the boot options next to the entry for that partition, get rid of it, or maybe try adding a "umask=0777" line (I think that's right...), and then rebooting.


The drive get's mounted fine. I can use the system for a little bit, but as soon as I start using the drive, ie by updating my system the drive gets remounted readonly.

I ran fsck on it and there were probably a good 15-20 errors that it fixed, but after reboot nothing still works and it keeps on going through the cycle or getting mounted as readonly after a few minutes.

I will enable SMART in the BIOS and see what it says, are there any tools that will read the SMART data..or can I only go off of what get's shown during bootup?
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
or maybe try adding a "umask=0777" line (I think that's right...), and then rebooting.

That option doesn't work for most filesystems because they support proper unix rights, only 'odd' ones like FAT and NTFS need a umask option to get the simulated rights you want.

I will enable SMART in the BIOS and see what it says, are there any tools that will read the SMART data..or can I only go off of what get's shown during bootup?

There are SMART tools for Linux, get a copy of Knoppix and run smartctl. You can read the SMART log and run some SMART self-tests.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Okay, i'll try and get the smart info now :)

Everytime I run apt-get install <package> I get this..

syntax error at /usr/sbin/install-info line 144, near "$1 unless"
Execution of /usr/sbin/install-info aborted due to compilation errors.
dpkg: error processing findutils (--configure):
subprocess post-installation script returned error exit status 255
Errors were encountered while processing:
findutils
E: Sub-process /usr/bin/dpkg returned an error code (1)
 

drag

Elite Member
Jul 4, 2002
8,708
0
0
zero out the harddrive. It sounds like that install is toast. Maybe completely wiping out the partitions will solve some of the problems. Sometimes if you have a filing system that gets completely wasted and you try to format over it it still gets screwed up. Also if you have a damaged platter dd will puke on the screwed up sectors, It should run and then only have a error when it runs out of space at the end of the drive.
dd if=/dev/zero of=/dev/hda

To see what your harddrive errors logged. (Smart is something that is built into most modern harddrive's hardware)

You would go something like:
smartctl -l error /dev/hda

 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Yeah, I just did a cat /dev/zero > /home/test

after about 10 seconds kernel messages started flooding my screen. kjournald seemed to be having the most fun!

I know what smart is, I just didn't know how to access it's info from within Linux ;). During bootup, it didn't throw any errors...i'm gonna zero out the drive and see what happens. Will post back after that.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
OMG! even doing dd if=/dev/zero of=/dev/hda I get kernel panics!!

The debug info it spits out is Process: swapper

and the Call Trace ends in end_buffer_async_write

There is definitely something wrong with either my IDE controller or my hdd. :( I hope it's the latter.
 

drag

Elite Member
Jul 4, 2002
8,708
0
0
You just boot up with knoppix and use the smart commands. It doesn't take anything more fancy then that.
Unless of course Knoppix doesn't include the smartmontools package. I am guessing that it does.

drag@spock:~$ apt-cache search smartd
smartmontools - control and monitor storage systems using S.M.A.R.T.

from "man smartctl"
DESCRIPTION
smartctl controls the Self-Monitoring, Analysis and Reporting Technology (SMART) system
built into many ATA-3 and later ATA, IDE and SCSI-3 hard drives. The purpose of SMART
is to monitor the reliability of the hard drive and predict drive failures, and to
carry out different types of drive self-tests. This version of smartctl is compatible
with ATA/ATAPI-5 and earlier standards (see REFERENCES below)

smartctl is a command line utility designed to perform SMART tasks such as printing the
SMART self-test and error logs, enabling and disabling SMART automatic testing, and
initiating device self-tests. Note: if the user issues a SMART command that is (appar-
ently) not implemented by the device, smartctl will print a warning message but issue
the command anyway (see the -T, --tolerance option below). This should not cause prob-
lems: on most devices, unimplemented SMART commands issued to a drive are ignored
and/or return an error.

They include a daemon, but you don't need to run it just to watch the error logs, the harddrive themselve keep track of that stuff. Some harddrives don't support it, even some modern drives don't support it.
 

drag

Elite Member
Jul 4, 2002
8,708
0
0
Try taking hdparm and lowering turning off all the features to get max compatability.

turn off DMA access, turn off 32bit I/O, turn off all that stuff.

(except keepsettings, if you want it to last thru a reboot, of course. Also try a different kernel you may have a buggy driver for your IDE controller)
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Originally posted by: drag
Try taking hdparm and lowering turning off all the features to get max compatability.

turn off DMA access, turn off 32bit I/O, turn off all that stuff.

(except keepsettings, if you want it to last thru a reboot, of course. Also try a different kernel you may have a buggy driver for your IDE controller)

Unfortunately I misplaced my Knoppix cd..but i'm downloading the ISO atm.

I've tried kernels 2.4.26, 2.6.7, 2.6.8, and 2.6.9. But my chipset is old, so I don't think there are bugs with the drivers. My mobo is an ASUS cuv4x-d.

The debian install is completely hosed atm, it won't even boot anymore. So until knoppix downloads i'm out of luck :(
 

jameslast

Junior Member
Dec 19, 2004
1
0
0
Do you run the latest bios for your board ?
AFAIK you can get it at :
ftp://ftp.asuscom.de/pub/ASUSC...Apollo_Pro_133Z/CUV4X/

Also make sure that your run the via ide kernel module (post an lsmod ouput). Is your System OC ? If so please clock on standard speeds (debian uses per default pio for ide-disks, when OC you have to give the kernel the speed of your pci-bus to to avoid data-corruption with the boot option: idebus=xx where xx is your pci-bus speed). You may also want to try an older debian release first (eg woody), and then upgrade to sarge.

Good luck

James
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Originally posted by: jameslast
Do you run the latest bios for your board ?
AFAIK you can get it at :
<a target=_blank class=ftalternatingbarlinklarge href="ftp://ftp.asuscom.de/pub/ASUSCOM/BIOS/Socket_370/VIA_Chipset/Apollo_Pro_133Z/CUV4X/">ftp://ftp.asuscom.de/pub/AS......lo_Pro_133Z/CUV4X/</a>

Also make sure that your run the via ide kernel module (post an lsmod ouput). Is your System OC ? If so please clock on standard speeds (debian uses per default pio for ide-disks, when OC you have to give the kernel the speed of your pci-bus to to avoid data-corruption with the boot option: idebus=xx where xx is your pci-bus speed). You may also want to try an older debian release first (eg woody), and then upgrade to sarge.

Good luck

James

Yes, my board has the latest BIOS, and nothing is overclocked. I've checked all my voltages and they are right on, temps are fine as well. I'm burning Knoppix as we speak. So we'll see what SMART says.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
root@ttyp1[knoppix]# smartctl -a /dev/hda
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: MAXTOR 6L040J2
Serial Number: 662134617974
Firmware Version: A93.0500
Device is: In smartctl database [for details use: -P show]
ATA Version is: 5
ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1
Local Time is: Sun Dec 19 08:35:06 2004 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 35) seconds.
Offline data collection
capabilities: (0x1b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 21) minutes.

SMART Attributes Data Structure revision number: 11
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0029 100 253 020 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 080 080 020 Pre-fail Always - 2544
4 Start_Stop_Count 0x0032 100 100 008 Old_age Always - 125
5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 023 Pre-fail Always - 0
9 Power_On_Hours 0x0012 069 069 001 Old_age Always - 20952
10 Spin_Retry_Count 0x0026 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 008 Old_age Always - 114
13 Read_Soft_Error_Rate 0x000b 100 100 023 Pre-fail Always - 0
194 Temperature_Celsius 0x0022 088 081 042 Old_age Always - 31
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 4686
196 Reallocated_Event_Count 0x0010 100 100 020 Old_age Offline - 0
197 Current_Pending_Sector 0x0032 100 100 020 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x001a 199 199 000 Old_age Always - 1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 20952 -

Device does not support Selective Self Tests/Logging
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Well, im running memtest86 atm and I don't think i've ever seen so many errors in my life! All of the errors are in the upper end of ram >800mb :(

At least I know what the problem is now. Hopefully it's only one stick, so I can still have 768mb in the machine.
 

rmrf

Platinum Member
May 14, 2003
2,872
0
0
Originally posted by: jameslast
Do you run the latest bios for your board ?
AFAIK you can get it at :
<a target=_blank class=ftalternatingbarlinklarge href="ftp://ftp.asuscom.de/pub/ASUSCOM/BIOS/Socket_370/VIA_Chipset/Apollo_Pro_133Z/CUV4X/">ftp://ftp.asuscom.de/pub/ASUSC...Apollo_Pro_133Z/CUV4X/</a>

Also make sure that your run the via ide kernel module (post an lsmod ouput). Is your System OC ? If so please clock on standard speeds (debian uses per default pio for ide-disks, when OC you have to give the kernel the speed of your pci-bus to to avoid data-corruption with the boot option: idebus=xx where xx is your pci-bus speed). You may also want to try an older debian release first (eg woody), and then upgrade to sarge.

Good luck

James

Thank you for that site! I've been pulling my hair out trying to find the latest bios for my cubx board, but I couldn't find it anywhere.