Understanding Fractured block and ORA-1578 error
ORA-1578 error could be an indication of a Fractured Block. A Fractured Block means that the block is incomplete and considered a Physical Corruption. Information from the block header does not match the block end/tail. It is a clear symptom of issues within the Operating System (OS)/Hardware layers.
In order to understand why a Fractured Block happens, we need to understand how a block is written into disk. The block size at OS level does not match the block size at the Oracle level, so in order to write an Oracle block, the Operating System needs to perform more than one write. As an example: if OS block size is 512 bytes and Oracle block size is 8K, the OS performs 16 writes in order to complete the write process.
Oracle keeps track off the header of each block and before writing down to disk updates a 4 byte field/value in the tail of each block (tailchk) to guarantee afterward consistency check that the block is complete after written.
Example of a Fractured (Broken) block:
Page 264462 is influx - most likely media corrupt
Corrupt block relative dba: 0x0284090e (file 10, block 264462)
Fractured block found during dbv:
Data in bad block:
type: 6 format: 2 rdba: 0x0284090e
last change scn: 0x0003.da17adf8 seq: 0x1 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0xaead0601 <---------- tailchk
check value in block header: 0x8564
computed block checksum: 0x355 <----- Checksum result is different than 0x0
tailchk: (current: 0xaead0601)
It is built, at the footer of the block to guarantee the beginning and end of the block corresponds to the same version. It is four bytes built as:
- lower order two bytes of scn base - 0x….adf8
- block type - 0x06
- scn sequence number - 0x01
tailchk should have been 0xadf80601 in this block whilst it is 0xaead0601 hence the block is fractured / incomplete.
This value was right on the block when the RDBMS sent the write to the Operating System; this is also confirmed by the fact that the checksum result is different than 0x0 which means that the checksum is different than when it was calculated before write and the block is not the same version that the RDBMS request to write before. The write did not complete as a whole and only partial write was done or the write completed but invalid information was actually written in the block.
The tail check only verifies if the header and the tail of the block have been written correctly, but does not warrant that the complete block was written. In order to warranty this, the database has the parameter DB_BLOCK_CHECKSUM. When the parameter is set to TRUE / TYPICAL (default value), a checksum value is calculated for the complete block and this value is stored in the block header before writing the block. When the block is read again, this value is recomputed and compared with the one at the block header.
There are checks that may be run against datafiles to ensure the validity of all tail values on all blocks of them. RMAN Validate or DBVerify catch this kind of failures and may be used against the Database file(s) to check for Physical Corruption. Identically, there is a clear path to follow when this happens. These blocks are incorrectly written by the Operating System / Hardware and as such, Oracle operations over the blocks affected are correct (otherwise, a different kind of error would have been printed out).
In addition, restoring and recovering the block is not introducing the issue again, which indicates that the redo changes generated are correct, so the issue is clearly related to other layers (most probably the hardware layer).