How to use Undrop For InnoDB to Recover InnoDB Data

Posted by PDSERVICE on Nov 07, 2019 In

APPLIES TO:

MySQL Server - Version 4.1 to 5.6 [Release 4.1 to 5.6]

Information in this document applies to any platform.

GOAL

How to extract data from corrupted tables using undrop-for-innodb

SOLUTION

Using the open-source UnDROP tool for InnoDB from twindb.com, it is sometimes possible to recover data from tables that can not be read even using innodb_force_recovery.

The undrop tool is intended to be used against ibdata files that are not in active use by a live DB server.

In general terms, the tool works by extracting index pages from the monolithic ibdata file(s) and/or from independent InnoDB tablespace files (.ibd files where innodb_file_per_table is in use). Where applicable, blob pages are extracted to a separate subdirectory.

Once the data is extracted to index pages, the next step is to retrieve the primary key or the general clustered index ID from the data dictionary, then extract that data into a file suitable for using with LOAD DATA INFILE.

If possible, start with at least a schema dump of the database(s) to be recovered, and recover as much data as possible by dumping the tables, using innodb_force_recovery if necessary. Even an out-of-date backup is better than nothing. Although UnDROP can sometimes extract a valid table definition from the ibdata file, it's not perfect at handling all column types. If you have NO backup at all, the .frm files can be used to rebuild the table definitions. If you don't have a backup or .frm files, then as a last resort the table definition that UnDROP can extract from ibdata can be used to try to get at least some of the data back.

The UnDROP tools are as follows, in the order they are used to extract data:

stream_parser

stream_parser is the tool that is used to extract pages from the ibdata. Its use is straightforward:

./stream_parser -f <path_to_ibdata>

Pages will be extracted by default to "pages-<ibdata_file_name>". Index pages are stored in the subdirectory FIL_PAGE_INDEX and blob pages are stored in the subdirectory FIL_PAGE_TYPE_BLOB.

To extract all the data for a table, it is necessary to identify the data dictionary's index ID number for the table's primary key (or general index in the absence of a primary key). This can be accomplished by extracting the dictionary data from the extracted index pages into the 'test' schema on a running server by using the "recover_dictionary.sh" script that comes with the UnDROP tools, like this:

$ ./recover_dictionary.sh 

		Generating dictionary tables dumps... OK

		Creating test database ... OK

		Creating dictionary tables in database test:

		SYS_TABLES ... OK

		SYS_COLUMNS ... OK

		SYS_INDEXES ... OK

		SYS_FIELDS ... OK

		All OK

		Loading dictionary tables data:

		SYS_TABLES ... 1845 recs OK

		SYS_COLUMNS ... 22029 recs OK

		SYS_INDEXES ... 4994 recs OK

		SYS_FIELDS ... 6070 recs OK

		All OK

Now the dictionary can be queried to find the index ID that corresponds to any given table. Example given is for the table called mdl2_user in the moodle2 schema:

	mysql> SELECT SYS_TABLES.NAME TABLE_NAME, SYS_TABLES.ID TABLE_ID, SYS_INDEXES.NAME INDEX_NAME, SYS_INDEXES.ID INDEX_ID FROM SYS_TABLES LEFT JOIN SYS_INDEXES ON SYS_TABLES.ID = SYS_INDEXES.TABLE_ID WHERE SYS_INDEXES.NAME LIKE '%PRIMARY%' AND SYS_TABLES.NAME LIKE 'moodle2/mdl2_user' AND SYS_INDEXES.NAME IN ('PRIMARY', 'GENERAL_CLUSTERED_INDEX');

	+-------------------+----------+------------+----------+

	| TABLE_NAME | TABLE_ID | INDEX_NAME | INDEX_ID |

	+-------------------+----------+------------+----------+

	| moodle2/mdl2_user | 646 | PRIMARY | 1867 |

	+-------------------+----------+------------+----------+

	1 row in set (0.00 sec)

The INDEX_ID returned corresponds to the name of one the extracted page files:

 $ ls pages-ibdata1/FIL_PAGE_INDEX/*1867.page
pages-ibdata1/FIL_PAGE_INDEX/0000000000001867.page

c_parser

At this point, with a known table definition, the data can be recovered with c_parser, like this, where mdl2_user.sql contains the table definition:

	$ ./c_parser -b "./pages-ibdata1/FIL_PAGE_TYPE_BLOB" -p "dumps/moodle2" -l dumps/moodle/mdl2_user.load -5f pages-ibdata1/FIL_PAGE_INDEX/0000000000001867.page -t mdl2_user.sql

Using the following script and given table definitions in the dumps directory for each schema, all of the tables found in the data dictionary can be extracted into files suitable for LOAD DATA INFILE, along with separate .load files to actually load them. Uncomment the sys_parser line only as a last resort if original table definitions can't be provide or extracted from .frm files. If original table defs can be provided, they should be stored as separate sql files in dumps/<schema>/<table>.sql:

#!/bin/bash

		RECOVERY_DB="test"

		USER="root"

		PASS="somepass"

		DUMPS="dumps"
# Create schema
echo > ${DUMPS}/schema.sql
for DB in `mysql --user=${USER} --password=${PASS} -NBe "select name from ${RECOVERY_DB}.sys_tables" | sed -r "s/^(.*)\/.*$/\1/" | grep -v SYS_ | sort -u `

		do

		    mkdir -p ${DUMPS}/${DB}
    echo "Creating schema for $DB..."
    echo >> ${DUMPS}/schema.sql

		    echo "CREATE DATABASE IF NOT EXISTS $DB;" >> ${DUMPS}/schema.sql
    for TABLE in `mysql ${RECOVERY_DB} --user=${USER} --password=${PASS} -NBe "SELECT NAME FROM SYS_TABLES WHERE NAME LIKE '${DB}/%'"`

		    do

		       echo $TABLE

		       # ./sys_parser -u${USER} -p${PASS} -d ${RECOVERY_DB} ${TABLE} | tee ${DUMPS}/${TABLE}.sql >> ${DUMPS}/schema.sql

		       PKEY=`mysql ${RECOVERY_DB} -BNe "SELECT SYS_INDEXES.ID FROM SYS_TABLES LEFT JOIN SYS_INDEXES ON (SYS_TABLES.ID = SYS_INDEXES.TABLE_ID) WHERE SYS_TABLES.NAME = \"${TABLE}\" AND SYS_INDEXES.NAME=\"PRIMARY\""`

		       echo "pkey = $PKEY"

		       PAGE="pages-ibdata1/FIL_PAGE_INDEX/`printf '%016u' ${PKEY}`.page"

		       echo "PAGE = $PAGE"

		       ./c_parser -b "./pages-ibdata1/FIL_PAGE_TYPE_BLOB" -p "./${DUMPS}/${DB}" -l ${DUMPS}/${TABLE}.load -5f ${PAGE} -t ${DUMPS}/${TABLE}.sql > ${DUMPS}/${TABLE}

		    done

		done

sys_parser

As noted above, sys_parser can be used to extract a table definition from the ibdata file, but it should be used only as a last resort. Some guesswork may be required and it may be impossible to fully extract available data from ibdata if valid table definitions can not be provided.

Extraction of a valid table definition from a .frm file

It is possible using mysqlfrm from the MySQL Utilities package to extract a valid table definition from a non-corrupted .frm file. It is important to note that using mysqlfrm in --diagnostic mode is susceptible to the same kind of problems as extracting from ibdata using sys_parser, so it is important to use mysqlfrm with the --server flag. The output must also be tweaked in order to work with c_parser, because c_parser will die if comment lines, warnings, or default charset information is found in the table definition. Here is a way to extract table definitions along with a single schema.sql to be used to create the schema(s) from all .frm files found in a given directory:

#!/bin/bash
for FRM in `find ../datadir/ -type f -wholename "*frm"  | sort`
do
    mysqlfrm --server=root:somepass@localhost:../datadir/mysql.sock --port=33307 $FRM | extract_schema.pl
    if [ ${PIPESTATUS[0]} -ne 0 ]; then  
         echo "$FRM is corrupt"
    fi
done  

extract_schema.pl looks like this:

!/usr/bin/perl
open (SCHEMAFILE, '>>', "schema.sql") or die "Can not write to schema.sql $!";
$schema = "";

		$table = "";

		while (<STDIN>) {

		    $origline = $_;

		    chomp;

		    if (/^CREATE TABLE.*$/) {

		        m/.*CREATE TABLE \`(.*)\`\.\`(.*)\`[[:space:]]\(/;

		        $schema = $1;

		        $table = $2;

		        print "Creating $schema.$table\n";

		        unless (-e $schema or mkdir $schema) {

		            die "Unable to create dir for schema $schema";

		        }

		        print SCHEMAFILE "CREATE DATABASE IF NOT EXISTS $schema;\n";

		        print SCHEMAFILE "USE $schema;\n";

		        print SCHEMAFILE "$origline";

		        open (TABLEFILE, '>', "$schema/$table.sql") or die "Can not write to $schema/$table.sql";

		        print TABLEFILE "CREATE TABLE $table (\n";
    } else {

		        s/ENGINE=(.*?)[[:space:]].*/ENGINE=\1;/;

		        s/`PRIMARY`//;

		        s/^#.*$//;

		        s/^WARNING.*$//;

		        $origline =~ s/(.*ENGINE.*$)/\1;/;

		        $origline =~ s/^#.*$//;

		        $origline =~ s/^WARNING.*$//;

		        print SCHEMAFILE "$origline" if (!/^\s*$/);

		        print TABLEFILE "$_\n" if (!/^\s*$/);

		    }
}

While it's exciting that the UnDROP tool is available, getting into a situation where it must be used is a very, very bad idea. There is no guarantee that any data can be recovered when a database is corrupted. The way to avoid being in this situation is to use MySQL Enterprise Backup to create reliable backups on a regular basis, along with at least one replication slave. Smart administrators who care about their data will also establish a replication slave in a remote location and take regular backups of that server as a disaster recovery precaution.

You are here

How to use Undrop For InnoDB to Recover InnoDB Data