Data Corruption Bug In Kernel 3.6.3 - EXT4

lxskllr

No Lifer
Nov 30, 2004
60,031
10,523
126
It's under fairly specific circumstances it can happen, but it's something to be aware of.

Date Tue, 23 Oct 2012 18:19:13 -0400
From Theodore Ts'o <>
Subject Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)


On Tue, Oct 23, 2012 at 09:57:08PM +0100, Nix wrote:
>
> It is now quite clear that this is a bug introduced by one or more of
> the post-3.6.1 ext4 patches (which have all been backported at least to
> 3.5, so the problem is probably there too).
>
> [ 60.290844] EXT4-fs error (device dm-3): ext4_mb_generate_buddy:741: group 202, 1583 clusters in bitmap, 1675 in gd
> [ 60.291426] JBD2: Spotted dirty metadata buffer (dev = dm-3, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
>

I think I've found the problem. I believe the commit at fault is commit
14b4ed22a6 (upstream commit eeecef0af5e):

jbd2: don't write superblock when if its empty

which first appeared in v3.6.2.

The reason why the problem happens rarely is that the effect of the
buggy commit is that if the journal's starting block is zero, we fail
to truncate the journal when we unmount the file system. This can
happen if we mount and then unmount the file system fairly quickly,
before the log has a chance to wrap. After the first time this has
happened, it's not a disaster, since when we replay the journal, we'll
just replay some extra transactions. But if this happens twice, the
oldest valid transaction will still not have gotten updated, but some
of the newer transactions from the last mount session will have gotten
written by the very latest transacitons, and when we then try to do
the extra transaction replays, the metadata blocks can end up getting
very scrambled indeed.

*Sigh*. My apologies for not catching this when I reviewed this
patch. I believe the following patch should fix the bug; once it's
reviewed by other ext4 developers, I'll push this to Linus ASAP.

- Ted

commit 26de1ba5acc39f0ab57ce1ed523cb128e4ad73a4
Author: Theodore Ts'o <tytso@mit.edu>
Date: Tue Oct 23 18:15:22 2012 -0400

jbd2: fix a potential fs corrupting bug in jbd2_mark_journal_empty

Fix a potential file system corrupting bug which was introduced by
commit eeecef0af5ea4efd763c9554cf2bd80fc4a0efd3: jbd2: don't write
superblock when if its empty.

We should only skip writing the journal superblock if there is nothing
to do --- not just when s_start is zero.

This has caused users to report file system corruptions in ext4 that
look like this:

EXT4-fs error (device sdb3): ext4_mb_generate_buddy:741: group 436, 22902 clusters in bitmap, 22901 in gd
JBD2: Spotted dirty metadata buffer (dev = sdb3, blocknr = 0). There's a risk of filesystem corruption in case of system crash.

after the file system has been corrupted.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 0f16edd..0064181 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1351,18 +1351,20 @@ void jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
static void jbd2_mark_journal_empty(journal_t *journal)
{
journal_superblock_t *sb = journal->j_superblock;
+ __be32 new_tail_sequence;

BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
read_lock(&journal->j_state_lock);
- /* Is it already empty? */
- if (sb->s_start == 0) {
+ new_tail_sequence = cpu_to_be32(journal->j_tail_sequence);
+ /* Nothing to do? */
+ if (sb->s_start == 0 && sb->s_sequence == new_tail_sequence) {
read_unlock(&journal->j_state_lock);
return;
}
jbd_debug(1, "JBD2: Marking journal as empty (seq %d)\n",
journal->j_tail_sequence);

- sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
+ sb->s_sequence = new_tail_sequence;
sb->s_start = cpu_to_be32(0);
read_unlock(&journal->j_state_lock);

https://lkml.org/lkml/2012/10/23/690
 

Red Squirrel

No Lifer
May 24, 2003
70,565
13,802
126
www.anyf.ca
Looks like I'm safe.

ryan@falcon:~$ uname -a
Linux falcon 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
ryan@falcon:~$


Though I did try to run system update today and it went to hell, not sure if that would have updated the kernel or not but kinda glad it failed now. I'll wait to try again.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
You Archers... :shakes head:

Good old Debian, slow and stable. I'm on 3.2.0

:^P :^D
I'm an Archer, and I'm on a 3.4.x-ck :), with 3.2.29 on my virtual dev instance (Slackware, because Awt and GTK+ are borked, ATM). But, still, yikes!

The worst will be for users of new hardware, where an older kernel could mean poor feature support (such as unstable virtualization, or poor GPU or NIC support).
 

Jodell88

Diamond Member
Jan 29, 2007
8,762
30
91
Based on comments by the devs working on fixing this bug on the phoronix forums, I don't believe that I have anything to fear. It is a very difficult bug to trigger, and one of the two people that this bug triggered for was running experimental features. I'm confident that nothing will happen to me. :)
 

VinDSL

Diamond Member
Apr 11, 2006
4,869
1
81
www.lenon.com
SOURCE: http://www.h-online.com/open/news/i...-ext4-data-corruption-bug-Update-1736110.html
[SIZE="+1"]Update 25-10-12:[/SIZE]
Theodore Ts'o has continued his investigation of the bug and has found that the problem was more esoteric than was first thought. The user who reported the problem was using umount -l, which immediately unmounts the filesystem without waiting for it to stop being busy. The bug is now thought to be caused when the machine is being shut down while it is in the process of unmounting the filesystem with an already compromised journal.

The developers are still working to pinpoint the exact problem and it might actually involve more kernel components than just the ext4 drivers. In any case, it has become clear that the bug needs a very specific configuration to surface and is unlikely to affect most users.
 

Red Squirrel

No Lifer
May 24, 2003
70,565
13,802
126
www.anyf.ca
That's the nice thing with Linux. Most of the time these very serious bugs are hard to trigger, and they fix them up pretty fast.
 

lxskllr

No Lifer
Nov 30, 2004
60,031
10,523
126
Fixed...

author Linus Torvalds <torvalds@linux-foundation.org>
Tue, 30 Oct 2012 22:35:16 +0000 (15:35 -0700)
committer Linus Torvalds <torvalds@linux-foundation.org>
Tue, 30 Oct 2012 22:35:16 +0000 (15:35 -0700)
Pull ext4 bugfix from Ted Ts'o:
"This fixes the root cause of the ext4 data corruption bug which raised
a ruckus on LWN, Phoronix, and Slashdot.

This bug only showed up when non-standard mount options
(journal_async_commit and/or journal_checksum) were enabled, and when
the file system was not cleanly unmounted, but the root cause was the
inode bitmap modifications was not being properly journaled.

This could potentially lead to minor file system corruptions (pass 5
complaints with the inode allocation bitmap) after an unclean shutdown
under the wrong/unlucky workloads, but it turned into major failure if
the journal_checksum and/or jouaral_async_commit was enabled."

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix unjournaled inode bitmap modification

https://git.kernel.org/?p=linux/ker...ff;h=8c673cbc7682b3f2862fe42f8069cac20c09e160