Issue
I've read about the ORDER BY RAND() and its performance problems -- do these only apply to queries that return large datasets? For example, if I have a table with 100,000 rows and return a dataset with 10 records using a WHERE clause and then use ORDER BY RAND() LIMIT 1, will this ORDER BY RAND() be applied AFTER my table has been filtered down to records matching the WHERE clause, and thus have negligible performance issues?
Solution
You're right, it will apply the ORDER BY after reducing the number of rows with WHERE, GROUP BY, and HAVING. But it will apply ORDER BY before LIMIT.
So if you filter the number of rows down sufficiently, then yes, the ORDER BY RAND() may achieve what you want without a great performance impact. There's a legitimate benefit to code that is simple and easily readable.
The trouble comes when you think your query should reduce the rows to something small, but over time as your data grows, the number of rows it needs to sort becomes large again. Since your query then does LIMIT 10 on the sorted result hides the fact that you're performing ORDER BY RAND() on 500k rows. You just see performance mysteriously getting worse.
I have written about alternative methods for picking a random row in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming, or in other answers here on Stack Overflow:
- Selecting random rows with MySQL
- randomizing large dataset
- quick selection of a random row from a large table in mysql
Answered By - Bill Karwin Answer Checked By - Dawn Plyler (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.