Tuesday, July 16, 2013

Immigrate data from SQLite3 to MySQL on Linux

This is a classical problem of how to immigrate data from SQLite3 to MySQL. I tried lots of scripts found online and the following one is the best suitable to me.

The code is from https://github.com/athlite/sqlite3-to-mysql/blob/master/sqlite3-to-mysql  and the author should take the whole credit.
#/usr/bin/env sh
sed \
-e '/PRAGMA.*;/ d' \
-e '/BEGIN TRANSACTION.*/ d' \
-e '/COMMIT;/ d' \
-e '/.*sqlite_sequence.*;/d' \
-e 's/"/`/g' \
-e 's/CREATE TABLE \(`\w\+`\)/DROP TABLE IF EXISTS \1;\nCREATE TABLE \1/' \
-e 's/\(CREATE TABLE.*\)\(PRIMARY KEY\) \(AUTOINCREMENT\)\(.*\)\();\)/\1AUTO_INCREMENT\4, PRIMARY KEY(id)\5/' \
-e "s/'t'/1/g" \
-e "s/'f'/0/g" \
$1
 Here are the full process of immigrating SQLite3 to MySQL in Linux (Ubuntu for me).
1) Make sure that you have installed sqlite3 which is a management software for sqlite database. Otherwise,
    $  sudo apt-get install sqlite3
2) Save the above code in a file like sqlite32mysql.sh and assign execute privilege.
    $  chmod +x sqlite32mysql.sh
3) If you don't have a database, then create one
     mysql -u xx -p xx -h 127.0.0.1 "CREATE DATABASE sample IF NOT EXIST"
4) If you'd like only import the data to old tables,
      $  sqlite3 your_sqlite3_db.db .dump|grep "^INSERT"| ./sqlite32mysql.sh >dump.sql
   Otherwise create new tables,
     $   sqlite3 your_sqlite3_db.db .dump| ./sqlite32mysql.sh >dump.sql
5) If there are still some records cannot be imported correctly, do it separately using the following command   to grab them out.
     $ sed -n "starter_num, end_line_num p" dump.sql > manual.sql
6) Load the dump.sql using mysql.
    $ mysql -u xx -p xx -h 127.0.0.1 sample <dump.sql

BINGO!

Remove duplicated rows

Problem: 
Assume you have a table with structure as follows.
    Comment [id, app_id, reviewer_id,review_id ....] where id is primary key.
And you have to remove duplicated records with the same (app_id, and review_id).
Here are some solutions to fulfill your requirement.

Solution 1: [For small size of table]

 1) Find out which records are duplicated.
    SELECT id FROM (SELECT COUNT(*) AS num, id, app_id, review_id FROM Comment GROUP BY app_id,review_id) AS result WHERE result.num>2;
 NOTE: the above sql statement will list one of duplicated id;

2) Discard duplicated records
   DELETE FROM Comment WHERE id in (SELECT id FROM (SELECT COUNT(*) AS num, id, app_id, review_id FROM Comment GROUP BY app_id,review_id) AS result WHERE result.num>2)


Solution 2: [Once for all]

    ALTER IGNORE TABLE Comment ADD UNIQUE INDEX idx_name (app_id,review_id);

This solution works for me so that I did not examine its limitation in some extreme cases. 

The credit of this solution goes to http://stackoverflow.com/a/3312066 . 

Monday, July 15, 2013

EPFImporter with error [WARNING]: Incorrect string value: '\xE6\xB0\x91\xE8\xA8\xB4...' for column 'title' at row 15

When using EPFImporter provided by Apple .Inc to import EPF data into table, it reported the following errors.

[WARNING]: Incorrect string value: '\xE6\xB0\x91\xE8\xA8\xB4...' for column 'title' at row 15

The reason is that the encoding of the column title is utf8 (assuming it is utf8) which cannot store characters larger than 3 bytes. For mysql, utf8 is designed as 3 bytes. After MySQL 5.5, they add another type utf8mb4 to deal with such problem  which is 4-byte encoding.

The solution is to make sure your table is look like this:
CREATE TABLE `epf_application_detail_tmp` (
`export_date` BIGINT(20) NULL DEFAULT NULL,
`application_id` INT(11) NOT NULL DEFAULT '0',
`language_code` VARCHAR(20) NOT NULL DEFAULT '' COLLATE 'utf8mb4_unicode_ci',
`title` VARCHAR(1000) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`description` LONGTEXT NULL COLLATE 'utf8mb4_unicode_ci',
`release_notes` LONGTEXT NULL COLLATE 'utf8mb4_unicode_ci',
`company_url` VARCHAR(1000) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`support_url` VARCHAR(1000) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`screenshot_url_1` VARCHAR(1000) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`screenshot_url_2` VARCHAR(1000) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`screenshot_url_3` VARCHAR(1000) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`screenshot_url_4` VARCHAR(1000) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`screenshot_width_height_1` VARCHAR(20) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`screenshot_width_height_2` VARCHAR(20) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`screenshot_width_height_3` VARCHAR(20) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`screenshot_width_height_4` VARCHAR(20) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`ipad_screenshot_url_1` VARCHAR(1000) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`ipad_screenshot_url_2` VARCHAR(1000) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`ipad_screenshot_url_3` VARCHAR(1000) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`ipad_screenshot_url_4` VARCHAR(1000) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`ipad_screenshot_width_height_1` VARCHAR(20) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`ipad_screenshot_width_height_2` VARCHAR(20) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`ipad_screenshot_width_height_3` VARCHAR(20) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
`ipad_screenshot_width_height_4` VARCHAR(20) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
PRIMARY KEY (`application_id`, `language_code`)
)
COLLATE='utf8mb4_unicode_ci'
ENGINE=InnoDB;

For convenience, modify the default setting of mysql.

add the following to my.cnf
[mysqld]
collation-server=utf8mb4_unicode_ci
character-set-server=utf8mb4

Then, restart mysql and then create database e.g., epf to store EPF data. If you create epf already, make sure to change its default setting.