Placeholder Image

Subtitles section Play video

  • >> Okay. Hi, guys. So this is Jay Pipes. He's a MySQL employee. He's in charge of North

  • American Community Relations. I met him this week at the MySQL User Conference which is

  • down in Sta. Clara. Now, the deal is that we're going to host this on Google video.

  • So if you have confidential questions, if you could leave them till the cameras are

  • turned off, that would be cool. With that, no more ado, Jay Pipes.

  • >> PIPES: All right. Nice to meet all of you. I'm going to be giving a talk that I gave

  • on, I think, Tuesday or Wednesday, on "Performance Tuning Best Practices for MySQL." Some of

  • the things are version specific but for the most part, it's pretty general. We're going

  • to be covering Profiling and Benchmarking Core Concepts, Sources of Problems that you

  • might encounter, Index Guidelines, Schema Guidelines, Coding Techniques, and lastly,

  • we'll go over some general things about tuning server parameters. Since the audience is,

  • I think, sort of mixed levels of experience, we'll just shout out questions or tell me

  • if you want some grand stuff or so. All right, Benchmarking Concepts. The general idea behind

  • benchmarking MySQL is to, obviously, have a track record. You know, when you change

  • things in your code, in your application, or the configuration files with MySQL, or

  • your schema design, or you're indexing, you want to make sure you have a track record

  • of what you're doing. Because a lot of times, if you--if you don't keep a track record and

  • don't do benchmarking, you won't really know whether a specific change that you made had

  • a detrimental impact or whether it actually increased performance. So, the baseline--and

  • you always want to have your baseline for your starting point. And always give yourself

  • a target. Don't just do it so it's a moving start--moving target, you always want to have

  • an end goal, you know. You either want to get, you know, 20% increase in performance.

  • So that's your goal. But just don't have it, you know, just kind of end with nothing because

  • then you'll just spend years and years and years trying to, you know, tune milliseconds

  • out of your--out of your application when there's really no benefit to it. So, change

  • only one thing at a time. A pretty basic concept but it really--it goes a long way in benchmarking.

  • If you change two things and you rerun a test, how do you know which one of those two things

  • actually affected the difference between the two tests? So, always change one thing at

  • a time and rerun the benchmarks. And record everything, configuration files, information

  • on the hardware, information on obviously the operating system, the version of MySQL,

  • and everything to do with the benchmarks. Keep it in a separate folder and keep everything

  • for historical record. A lot of times, you'll think that you won't need information when

  • you're running benchmarks. And then, six months later, you realize that, "Wow! It would be

  • really useful if I had--oh, I threw it away". So, always keep information from your benchmarks

  • so you have a historical record of what you did. And specifically with MySQL, you want

  • to disable the query cache whenever you're running benchmarks. The reason is it will

  • skew the results that you get from read-intensive queries and write-intensive queries as well.

  • So, disable the query cache by setting the query cache size equal to zero when you--when

  • you run benchmarks, so. ProfYes. Go ahead. >> [INDISTINCT] benchmarking for [INDISTINCT]

  • problem with the operating system, caches [INDISTINCT]?

  • >> PIPES: Right. Yes. The question was--what you were experiencing that the operating system

  • caching was skewing the results of the benchmark. Generally, the idea is either you can disable

  • caching if it's actually getting in the way or if you bump up the actual number of runs

  • in the benchmark tests you can minimize the skew effect. So that's usually what we recommend.

  • But, when you get into specific caches there's very specific things to do. So it kind of

  • depend on the exact scenario that you're doing, so. But, yes. Bumping up the number of runs

  • in the benchmarks generally tends to mitigate a lot of that, so. Does that answer your question?

  • Okay. Profiling Concepts, profiling is a little different than benchmarking. Generally, you're

  • profiling a production or running a test system that you're trying to figure out, you know,

  • diagnosing what's going on while it's running. So, instead of doing--you know finding the

  • stress or the load that the system can take, you're actually trying to pinpoint where the

  • bottlenecks are in the code while it's running. With MySQL, the best tool that you have is

  • the EXPLAIN command. So, regardless of what version of MySQL you are running, you really

  • want to get familiar with everything in the EXPLAIN command. Understand what the access

  • types, does everyone know what the EXPLAIN command is? Everyone use it? Okay. Does everyone

  • know what every single type of access type is? Okay. It's sort of a poorly named column

  • in the--in the EXPLAIN results. But, it actually describes the path or the optimization pattern

  • at which the optimizer's saying, "Okay well, I'm going to, you know, access, you know,

  • these results at and join it to these results at." And each of those things, the ref, the

  • refrenol, the equaf range, all the different access types mean different things. And the

  • performance that you get from each of them, you can go into MySQL manual, depending on

  • your version. You can see which ones have a better performance for different types of

  • queries. It's not always the same. So, you want to make sure you understand those. Understanding

  • obviously that if you see all in the access type, it means it's doing a full table scan.

  • You know just little things like that. Be aware of what they all mean and what the performance

  • impact of the different ones are. So, by using the Slow Query Log and mysqldumpslow to parse

  • the query log, very easy to set up, Log Slow Queries and then you give it a long query

  • time, anything above that, it will log to the--to the Slow Query Log. There's also--I

  • think in 4.1, we put it an option that you can--anything that--any table, any select

  • statement that does not use an index on a table can get logged to the Slow Query Log

  • regardless of how fast it runs. And that's--and that's useful for when you're doing an indexing

  • strategy and you're trying to determine okay my patterns have access, do I--do I have indexes

  • on those fields that I'm actually accessing? You know, and my where clause and ON clause

  • and that kind of thing. So, low hanging fruit law of diminishing returns tackles stuff that's

  • going to get you the absolute best return on investment for your time. Don't concentrate

  • on, you know, I know you guys are Google. So, a millisecond definitely does count. But,

  • if you got a--if you got a choice of, you know, going from five milliseconds to four

  • milliseconds, you're going from a minute to, you know, a second, it's a no-brainer. You

  • know, concentrate on the stuff that--that's going to give you the biggest impact. Using

  • Mytop. Jeremy Zawodny wrote a nice little tool. Most of you are probably familiar with

  • the top utility in UNIX. It's almost identical but it's just for MySQL. And it gives you

  • an ability to on a delay--on a repeating delay, you know, show you the process list of what's

  • going on in MySQL, and show, you know, show various show commands, the status variables

  • and gives you a nice little interface to profile into the system while it's running. And one

  • nice thing about Mytop that I really like is that you can--you can use it for remote

  • hosts as well. So, you can--you can run it on your local machine and access a remote

  • host. So that it is a nice little tool. Okay. Sources of Problems. This is kind of my personal

  • opinion of where I rank the law of diminishing returns, like what I actually like go for

  • immediately. The performance team tends to think that, you know, sometimes you can--you

  • can tune a server parameter and that's like the biggest bang for the least, you know,

  • amount of time. But I think, really, where performance problems come in are the--are

  • the definitely the top three here. Poor Indexing Choices will kill an application faster than

  • really anything else. If you're querying on a field and you don't have an index on it

  • and you're repeatedly hitting that, it's going to kill the application faster than, you know,

  • whether sort buffer size can be adjusted. So, same thing with the Inefficient or Bloated

  • Schema Design, you know, I talked in the--in the performance talk. I've seen a number of

  • cases where people designing new applications, well, actually de-normalized from the beginning

  • because they think it's like going to give them better performance. Don't do that. You

  • know, there's specific cases where you can de-normalize the schema and you can get some

  • performance benefits. But don't do it right away. You know, just because you think it's

  • going to get, you know, a performance benefit. Use good, efficient, normalize schemas. And

  • we'll talk--we'll go into detail on this later. Bad coding practices. MySQL has some inefficiency

  • problems with sub-queries. But in general, using joins is a lot more efficient than using

  • sub-queries. We're going to go show some examples of specific where you can really get a performance

  • increase by just simply changing the way that you--the way that you code SQL. Server Variables

  • Not Tuned Properly, that's a source of problems. But in my opinion, very specific situations

  • can be--can be helped by tuning server parameters. It's not some catch-all, you know, it totally

  • depends on, you know, how much memory you have in the machine, what the architecture

  • is, what operating system you're running on, what version of MySQL, what storage engines

  • you're using, your, you know, the type of the application that you have, whether it's

  • OLTP or heavy reads, you know, all these things. Yes. Tuning server parameters can help but

  • it's very specific to certain situations. And it also requires retesting and rerunning.

  • But, you know, you tweak one thing, you have to re, you know, run the benchmark test to

  • see if--see if you got the performance gain out it. It can be time consuming to tweak

  • server variables whereas it's pretty easy to, you know, change some seq--some SQL statements

  • and immediately rerun it and see if you got a huge performance gain from doing that. And

  • finally, Hardware and--and Network Bottlenecks. I/O, CPU bound, network latency. Right. There

  • is a good book that I brought along that I recommend for that and when we get to the

  • last slide, I'll show you that, so. All right, Indexing Guidelines. A poor or missing index,

  • definitely the fastest way to kill an application, like I said. Look for covering index opportunities.

  • We're going to go into some slides here to kind of explain what that is. But the covering

  • index is--especially for my--who's using MyISAM versus InnoDB? InnoDB, raise hands. MyISAM,

  • raise hands. Nobody is using MyISAM? For a single thing, okay.

  • >> [INDISTINCT] >> PIPES: Say again?

  • >> [INDISTINCT] >> PIPES: Got you, okay. Well, we'll explain

  • some things about covering index opportunities that are very specific to InnoDB. But covering

  • index is basically what it means is that the MySQL can get all the information from the

  • index records which are by definition slimmer than the data records because they don't contain

  • all the fields in the table and use that information to complete whatever the query was without

  • having to go into the data records. One of the key concepts to this, the slimmer that

  • your index records are, the more that you can fit into a single index block and the

  • more that you can fit into an index block, the fewer reads you're going to do, the faster

  • your application's going to be. So, covering index opportunities definitely look around

  • for those. When you're deciding on which fields you actually want to index, make sure you

  • take a look at what the selectivity of the--of the field is. Selectivity is the cardinality

  • or the number of unique values that are contained in that field divided by the total number

  • of values in a table. So, obviously a unique index would be a cardinality of one. Because

  • it's, you know, unique over the total count. That's obviously the best-case scenario but

  • you can get situations where you've got awful selectivity. If you're querying just on that

  • index or just on that field you may get awful performance from that because--or may just

  • decide not to use the index because it ceases to be useful. I think generally like, [INDISTINCT]

  • could talk a little bit more about this but I think that it numbers about 30%. If your

  • optimizer sees that, okay you do--you do a query select, you know, start from whatever,

  • where column A equals two. And it will take a look and see an estimate from what the information

  • that has on the--on the index, well this particular query returned a third--you know, around 30%

  • or more of the rows in the table. If it does it's actually much more efficient to just

  • sequentially scan the table than it is to go into the index and randomly seek and grab

  • a bunch of records and then do lookups back to the data records. So, be aware of--of the

  • uniqueness of the values in your index. For lower selective fields were you--were you

  • don't have very much selectivity you can sometimes tack them on, you know, to a multicolumn index

  • so that, especially we grouped by queries and we'll show an example of this you get

  • some efficiencies there. But just be aware of the selectivity, so. On multicolumn indexes,

  • like those kind of explained just now, the order of the fields is very important. You

  • have to look at the queries that you're running against the database. You know, if you have

  • the columns mixed up or they're not optimal, the optimizer just won't use the index. And

  • I'll show you a perfect example of something that happens all the time with this on group

  • by queries. And as the database grows you want to ensure that the distribution is good.

  • You know if you set up an application, writing new application and, you know, you go through

  • a couple of months and you got a couple of hundred thousand records in there. Okay, great.

  • Everything looks great. The index strategy that you designed for it is wonderful but

  • that doesn't necessarily mean that in nine months, you know, you've got millions of records

  • in there that the index strategy that you initially picked out is going to be optimal,

  • you know. Make sure that you understand how your data changes over time, you know. If

  • business practices can change. And so, if you've got values, let say's you got seven

  • values in a--in a field, right. And you've got an index on that, seven distinct values

  • that you're putting into like a status field. Okay. And over time 90% of the field values

  • contain one specific thing, that index ceases to be as useful as it was when you initially

  • designed the application because 90% of the--the rows contain the same value. So just be aware

  • that as your database grows and your application changes that you need to reexamine, you know,

  • what you initially thought was a good index strategy. Okay. Remove redundant indexes for

  • faster write performance. Obviously, every index that you put on a field you get a, you

  • know, slight performance impact on writes because it's got to write an index record

  • as well as the data record. Every time you do an insert and every time you update. If,

  • you know, update the key value, it's got to write to the index. So unfortunately, MySQL

  • allows you to create redundant indexes on the exact same things. So, I've seen cases

  • where you get 40 indexes on the table and, you know, 10 of them are redundant. You know,

  • because over time teams of people work on it and no one wants to change other people's

  • stuff, you know. So, they just add their own indexes and with different names and all that

  • but, they're redundant. They're on the same field. So, you know, as time goes by make

  • sure you check, remove redundant indexes to get some write performance. Okay. Common Index

  • Problem here. Kind of a web 2.0 example. You got a--you got a tag's table which got tag

  • ID and a text for the tag and a products table. Both of these tables have an auto-incrementing

  • primary key and then you've got a relationship table that stores the many to many relationship

  • between products and tags. And if you noticed that the primary key on these Products2Tags

  • table is product ID--tag ID. Makes sense? Pretty basic? Okay. Here's two examples of

  • pretty common queries run on this Products2Tags table. Only the top one's going to use the

  • index on it. And the reason is because of the order of the columns in the index. On

  • the top one, you're grouping by the--what's essentially the product ID. And so, it's going

  • to be able to take that left side of the index and use it in order to count the right side

  • value which is the tag ID. Now in the bottom you're reversing that. You're saying for each

  • distinct tag give me the--give me the count of products. You can't use the index. You

  • can't use that primary key index. So, you won't use it, you get no read performance

  • from it. The remedy would be to create an index that has tag on the left side. Okay.

  • Now, there's two examples here where it says create index, the--the top example, create

  • index, I X'd tag on just the tag ID for all you guys using InnoDB, that's exactly what

  • you'd want to do. From MyISAM, you'd want to do the bottom one to create a covering

  • index opportunity. Does anyone know why the top one should be used for InnoDB and not

  • the bottom one? What's the difference between the secondary indexes in InnoDB versus MyISAM?

  • >> Because it went back to the primary key. >> PIPES: Exactly. For it--it--because it's

  • a clustered index organization in InnoDB, every time you add an index on a field, that's

  • not the primary--every non-primary key index, the primary key is actually appended to every

  • single index record. So, if you've got a primary key product ID, tag ID and you add an index

  • on tag ID, product ID you're being redundant because you've already got the product ID

  • actually in the index record in InnoDB. So, it's important to understand what the difference

  • between the cluster organization and the MyISAM which is non-clustered where it's just a record

  • number that's attached to each index record instead of the clustering key. So, be aware

  • of those differences. So, I was talking about redundant indexes. That one on the bottom

  • would be redundant for InnoDB. Did the mic just change?

  • >> [INDISTINCT]. >> PIPES: Okay. Okay. Sorry. Schema Guidelines.

  • Inefficient schema, another great way to kill your application. Use the smallest data types

  • possible that you really need. If--if you don’t have four billion rows in the table,

  • don’t use big int. I mean, it’s just the basics, you know. Keep them as trim as possible,

  • you know, if you have maximum of a hundred values, don’t use int. Use tiny int, you

  • can feed, you know, 255 unique values in it. Don’t use char[200], you know, when a varchar

  • or you know, smaller char will do. Little things like this make a huge impact in performance.

  • And the