mysql

Uploaded from authorPOINTLite
Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Experience Unicode Enabling MySQL: 

Experience Unicode Enabling MySQL Thomas Emerson Senior Software Engineer Basis Technology Corp.

Overview: 

Overview Introduction

Overview: 

Overview Introduction What is mySQL?

Overview: 

Overview Introduction What is mySQL? Character Set Architecture in mySQL

Overview: 

Overview Introduction What is mySQL? Character Set Architecture in mySQL Phased Implementation

Overview: 

Overview Introduction What is mySQL? Character Set Architecture in mySQL Phased Implementation Summary

Overview: 

Overview Introduction What is mySQL? Character Set Architecture in mySQL Phased Implementation Summary Q & A

Preliminaries: 

Preliminaries Cell Phones? Just say vibrate.

Preliminaries: 

Preliminaries Cell Phones? Just say vibrate. If you need to take a call, please get up and leave.

Preliminaries: 

Preliminaries Cell Phones? Just say vibrate. If you need to take a call, please get up and leave. If you fall asleep, you will be rediculed.

Assumptions: 

Assumptions Unicode and Unicode Terminology

Assumptions: 

Assumptions Unicode and Unicode Terminology Basic RDBMS concepts

Introduction: 

Introduction Who am I and why am I here?

Introduction: 

Introduction Who am I and why am I here? Large amounts of linguistic and lexicographic data Simplified and Traditional Chinese Japanese Korean Thai Western and Eastern European Languages

Introduction: 

Introduction Who am I and why am I here? Large amounts of linguistic and lexicographic data Accessability Across Platforms Web-based Interface

Introduction: 

Introduction Who am I and why am I here? Large amounts of linguistic and lexicographic data Accessability Low Impact Could not take cycles (hard-, soft-, or wet-) from our Oracle 8i system and its DBA. Didn’t have big iron available No budget

What is mySQL?: 

What is mySQL? GPL’d buzz-word compliant SQL engine High Performance Robust Popular

What is mySQL?: 

What is mySQL? GPL’d buzz-word compliant SQL engine Supports Industry Standards Entry-level SQL92 ODBC Level 0-2

What is mySQL?: 

What is mySQL? GPL’d buzz-word compliant SQL engine Supports Industry Standards Extensions Advanced (though complex) authentication system

What is mySQL?: 

What is mySQL? GPL’d buzz-word compliant SQL engine Supports Industry Standards Extensions Advanced (though complex) authentication system Extra datatypes, including ENUM and SET

What is mySQL?: 

What is mySQL? GPL’d buzz-word compliant SQL engine Supports Industry Standards Extensions Excellent Support for Legacy Encodings Big Five, GB 2312, and GBK EUC-JP and ShiftJIS TIS 620 ISO-Latin-1 KOI-8R

What is mySQL?: 

What is mySQL? GPL’d buzz-word compliant SQL engine Supports Industry Standards Extensions Excellent Support for Legacy Encodings C and C++ APIs, and bindings for Python, Perl, PHP, and others.

I18N Architecture in mySQL: 

I18N Architecture in mySQL Server can be built to support multiple encodings Databases can only contain a single character set Support for single- and double-byte character sets.

Phased Implementation: 

Phased Implementation UTF-8 in and out UTF-8 as a multibyte encoding UCS-2 as the internal encoding

Phase I: 

Phase I No Unicode-specific features. Unicode support is piggy-backed as ISO-Latin-1. This is surprisingly effective, but:

Phase I: 

Phase I No Unicode-specific features. Unicode support is piggy-backed as ISO-Latin-1. This is surprisingly effective, but: Wild card searches are awkward (since each “character” is composed of up to three Latin 1 characters) No regular expression support No collation support

Phase I (cont.): 

Phase I (cont.) The Font End problem was solved with PHP (www.php.org) An HTML front end using UTF-8 as the document charset PHP not Unicode aware, but it just doesn’t matter!

Phase II: 

Phase II Treat UTF-8 as a multibyte character set

Phase II: 

Phase II Treat UTF-8 as a multibyte character set Simple collation model

Phase II: 

Phase II Treat UTF-8 as a multibyte character set Simple collation model Still no regular expression support

Phrase II (cont.): 

Phrase II (cont.) Rosette is used as the Unicode layer No longer limited to a single character set But now we need to differentiate between language and script!

Phrase III: 

Phrase III Use UCS-2 as the internal character representation. Transcoding to legacy encodings as needed, so existing databases will continue to work. Each column can have a *different* legacy encoding

Phase III (Cont): 

Phase III (Cont) Data can be imported, transcoded and filtered using Rosette’s full transform functionality. Hankaku/Zenkaku transformation Case Conversion SGML Entity Folding Ad nauseum

Status: 

Status Phase I is complete and live. Phase II is underway, as time allows. UTF-8 support in place. Collation still going. Phase III is planned, but not yet started.

Status (cont.): 

Status (cont.) Removal of Rosette to use glibc features and/or ICU Measure and improve performance All will be released to the MySQL code

Q&A: 

Q&A Tom Emerson tree@basistech.com Slides and other information available at http://cymru.basistech.com/iuc17