Search |
||
MySQL for DevelopersPosted by caroljmcdonald on September 16, 2008 at 7:22 PM PDT
this is a write up of info I gathered from Jay Pipes, High Performance MySQL , Colin Charles and others (see References below) for a Sun Tech Days MySQL presentation. MySQL powers many high-volume Web sites, including industry leaders such as Yahoo!, Alcatel-Lucent, Google, Nokia, YouTube, and Zappos.com. MySQL for DevelopersIf you are a developer using MySQL, you should learn enough to take advantage of its strengths, because having an understanding of the database can help you develop better-performing applications. This can be especially important for Hibernate or JPA developers, since ORM frameworks tend to obscure the database schema and SQL for the developer, which can lead to poorly-performing index and schema strategies and inefficient SQL.MySQL Storage Engine Architecture
A key difference between MySQL and other database platforms is the pluggable storage engine architecture of MySQL, which allows you to select a specialized storage engine for a particular application need such as data warehousing, transaction processing, high availability... A storage engine is responsible for storing and retrieving all the data stored . The storage engines have different functionality, capabilities and performance characteristics, in many applications choosing the right storage engine can greatly improve performance. Here is a brief summary of some of the more common engines:
Schema, the basic foundation of performanceDatabase normalization minimizes duplication of information, this makes updates simpler and faster because the same information doesn't have to be updated in multiple tables. In a denormalized database, information is duplicated, or stored in multiple places. With a normalized database:
Data TypesIn general, try to use the smallest data type that you can. Small and simple data types usually give better performance because it means fewer disk accesses (less I/O), more data in memory, and less CPU to process operations.Numeric Data TypesMySQL has 9 numeric data types.Whole Numbers: TINYINT, SMALLINT, MEDIUMINT, INT,
BIGINT: require 8, 16, 24, 32, and 64 bits of storage
space, respectively. They can store values from –2(n-1)
to 2(n-1)–1, where N is the number of bits of
storage space they use.
Use UNSIGNED when you don't need negative numbers, this doubles the bits of storage space. BIGINT is not needed for AUTO_INCREMENT, INT UNSIGNED stores 4.3 billion values! INT(1) does not mean 1 digit! The number in parentheses is the ZEROFILL argument, and specifies the number of characters some tools reserve for display purposes. For storage and computational purposes, INT(1) is identical to INT(20). Integer data types work best for primary key data types. Real Numbers: FLOAT, DOUBLE: supports approximate
calculations with standard floating-point math.
DECIMAL: use DECIMAL when you need exact results, always use for monetary/currency fields. Other: BIT: to store 0,1 values. BIT(1)
stores 1 bit, BIT(2) stores 2... max is 64.
Character Data TypesThe CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters to store.VARCHAR(n) stores variable-length character strings. VARCHAR uses only as much space as it needs, which helps performance because it saves disk space. Use when the maximum column length is larger than the average length and when updates to the field are rare, so fragmentation is not a problem. VARCHAR gives more efficient disk usage, however MySQL internal memory buffers are fixed width, so internal memory usage isn't more efficient, therefore try to keep VARCHAR Max Length short. CHAR(n) is fixed-length: MySQL allocates enough space for the specified number of characters. Useful to store very short strings, when all the values are nearly the same length, and for data that's changed frequently. Use NOT NULLAlways define columns as NOT NULL unless there is a very good reason not to do so:
IndexesIndexes are data structures that help retrieve row data with specific column values faster. Indexes can especially improve performance for larger data bases. Most MySQL storage engines support B-tree indexes. a B-tree is (remember data strucutres 101 ?) a tree data structure that sorts data values, tree nodes define the upper and lower bounds of the values in the child nodes. Leaf nodes have pointers to the data instead of pointers to child nodes. B-trees are kept balanced by requiring that all leaf nodes are at the same depth. MyISAM leaf nodes refer to the indexed row by the position of the row, InnoDB leaf nodes refers to the index by its primary key values.InnoDB's clustered indexes store the row data in the leaf nodes, it's called clustered because rows with close primary key values are stored close to each other. This can make retrieving indexed data fast, since the data is in the index. But this can be slower for updates , secondary indexes, and for full table scans. Covering Indexes are indexes that contain all the data values needed for a query, these queries can improve performance because the row does not have to be read. Know Every SQL StatementYou need to understand the SQL queries your application makes and evaluate their performance. For this you can harness the MySQL slow query log and use EXPLAIN.Consider adding
an index for queries
that are taking too long, or cause a lot of load. Know how your query
is executed by MySQL, consider disk I/O in
particular. Basically you want to make your queries access less
data, Indexes are a good way to reduce data access. Know Every SQL Statement for Hibernate/JPA developers If you are using Hibernate, JPA, or another ORM framework you want to look at the generated SQL-statements. If you are using Hibernate, enable the Hibernate SQL log. If you are using Glassfish with Toplink as the JPA provider, you can log the SQL that is being sent to the database by adding this property in the definition of the persistence unit in the persistence.xml
file, as follows: <property name="toplink.logging.level"
value="FINE">. Then run a use case of your application and
examine the SQL statements which are executed by JPA or your ORM
framework. You want to make sure you are only retrieving the data your
application needs, that is you want to optimize the number of SQL
statements executed (see lazy loading below), and you want to examine
the execution plan for queries. Optimize data accessBasically you want to make your queries access less data:
Understanding EXPLAINEXPLAIN provides the execution plan chosen by the MySQL optimiser for a specific SELECT statement. It is important to run EXPLAIN on all SELECT statements that your code is executing against the database. This step ensures that missing indexes are picked up early in the development process and gives developers insight into how the MySQL optimizer has chosen to execute the query.To use it just precede a SELECT statement
with the keyword EXPLAIN and MySQL
will display information from the optimizer about the query execution
plan, which explains how it would process the SELECT,
including information about how tables are joined and in which order.
The EXPLAIN EXTENDED SELECT variant, followed by SHOW WARNINGS gives
some additional information, including the final rewritten query as
used by the optimizer. With the help of EXPLAIN, you can see
where you should add indexes to tables to get a faster SELECT that uses indexes to find rows. You can
also use EXPLAIN to check whether the
optimizer joins the tables in an optimal order. returns a row of
information for each "table"
used in the SELECT statement, which shows
each part and the order of the execution plan. The "table" can
mean a real schema table, a derived or temporary table, a subquery, a
union result. Here is an explanation of some of the output for each
row:
In the EXPLAIN output, the “type” for the table/set will be
“ALL” or “index”. “ALL” means a full table data record scan is
performed. “index” means a full index record scan. Avoid
this by ensuring indexes are on columns that are used in the WHERE, ON,
and GROUP BY clauses.
Scans and seeksA seek jumps into a random place (on disk or in memory) to fetch data. A scan will jump to the start of the data, and sequentially read (from either disk or memory) until the end of the data. With large amounts of data, sequentially scanning through contiguous data on disk or in memory is faster than performing many random seek operations. But scans can also be a sign of poor indexing, if an index contains many rows with the same value it is not very useful. MySQL keeps stats about the uniqueness of values in an index in order to estimate the rows returned (rows in the explain output). If the rows is greater than a certain % of rows in the table, then MySQL will do a scan.Indexed columns and functions don't mixIndexes can quickly find the rows that match a WHERE clause, however this works only if the index is NOT used in a function or expression in the WHERE clause. Below is an example where an index is used :mysql> EXPLAIN SELECT * FROM film WHERE title LIKE 'Tr%'\GIn the example above, a fast range "access strategy" is chosen by the optimizer, and the index scan on title is used to winnow the query results down. Below is an example where an index can not be used : mysql> EXPLAIN SELECT * FROM film WHERE LEFT(title,2)
= 'Tr' \GA slow full table scan (the ALL"access strategy") is used because a function (LEFT) is operating on the title column. Operating on an indexed column with a function (in this case the LEFT() function) means the optimizer cannot use the index to satisfy the query. Typically, you can rewrite queries in order to not operate on an indexed column with a function. Solving multiple problems in one queryHere is an example of improving a query:SELECT * FROM Orders WHERE TO_DAYS(CURRENT_DATE()) – TO_DAYS(order_created) <= 7;First, we are operating on an indexed column (order_created)
with a function TO_DAYS – let's fix that:SELECT * FROM Orders WHERE order_created >= CURRENT_DATE() - INTERVAL 7 DAY;Although we rewrote the WHERE expression to remove the function on the index, we still have a non-deterministic function CURRENT_DATE()
in the statement, which eliminates this query from being placed in the
query cache. Any time a
non-deterministic function is used in a SELECT statement, the query
cache ignores the query. In read-intensive applications, this can
be a significant performance problem. –
let's fix that:SELECT * FROM Orders WHERE order_created >= '2008-01-11' - INTERVAL 7 DAY;We replaced the function with a constant (probably using our application programming language). However, we are specifying SELECT * instead of the actual fields we need from the table. What if there is a TEXT field in Orders called order_memo that we don't need to see? Well, having it included in the result means a larger result set which may not fit into the query cache and may force a disk-based temporary table. – let's fix that: SELECT order_id, customer_id, order_total, order_created FROM Orders WHERE order_created >= '2008-01-11' - INTERVAL 7 DAY; A Few Things to consider for JPA/Hibernate devlopers:Lazy loading and JPAWith JPA many-to-one and many-to-many relationships lazy load by default , meaning they will be loaded when the entity in the relationship is accessed. Lazy loading is usually good, but if you need to access all of the "many" objects in a relationship, it will cause n+1 selects where n is the number of "many" objects. You can change the relationship to be loaded eagerly as follows :public class Employee{
However you should be careful with eager loading which could cause
SELECT statements that fetch too much data. It can cause a Cartesian
product if you eagerly load entities with several related
collections. If you want to temporarily override the LAZY fetch type, you could use Fetch Join. For example this query would eagerly load the employee addresses:
Optimistic locking and JPAThe Java Persistence API 1.0 persistence providers support optimistic locking, where your application will get an exception whenever a transaction tries to commit an object that was updated in the database since this transaction began. To enable this for a entity, you need to add a version attribute:@EntityIn Conclusion
ReferencesHigh Performance MySQL bookMySQL Pluggable Storage Engine Architecture MySQL Storage Engine Architecture, Part 2: An In-Depth Look Optimizing Queries with EXPLAIN Java Persistence with Hibernate book Jay Pipes blog Colin Charles blog mysql performance blog Ronald Bradford blog Taking JPA for a Test Drive Pro EJB 3: Java Persistence API Pro MySQL, Chapter 6: Benchmarking and Profiling »
Comments
Comments are listed in date ascending order (oldest first)
Submitted by dserodio on Thu, 2008-09-18 11:42.
Very thorough article, thanks.
Submitted by crackbrained on Thu, 2008-09-25 04:49.
nice article, thanks.
I though have an issue regarding the optimistic lock. Is there a way to increment not the version number of an entity when a property of it having an OneToMany, ManyToMany or ManyToOne relation is dirty?
Submitted by caroljmcdonald on Fri, 2008-09-26 11:20.
JPA defines that when any of the object's tables changes the version is updated. However it is less clear on relationships. If Basic, Embedded, or a OneToOne, ManyToOne relationship changes, the version will be updated. But if OneToMany, ManyToMany changes making an update to the version may depend on the JPA provider.
|
||
|
|