SQL Server Performance Optimisation
The scenario is probably familiar to everyone: on the test system the application worked perfectly and quickly, now it's rolled out and the users are not satisfied with the performance. In smaller projects there is no database administrator to take care of this problem, so developers have to take care of it themselves.
This blog is intended to show the software developer how the data access for an application can be made more efficient, with higher performance
Insight in brief
- The scenario is probably familiar to everyone: on the test system the application worked perfectly and quickly, now it's rolled out and the users are not satisfied with the performance.
- In smaller projects there is no database administrator to take care of this problem, so developers have to take care of it themselves.
- This blog is intended to show the software developer how the data access for an application can be made more efficient, with higher performance.
- The first step is to measure the performance. This includes, for example, the response times or the number of read accesses for individual database queries.
- In the second step a bottleneck must then be identified. It is important to select the particular bottleneck that, when eliminated, will bring most benefit to the user. It will be of little use to them if the monthly report can be produced twice as fast as before, but they are still slowed down in their daily work.
- This specific bottleneck can now be eliminated.
- The most important step in the whole process is to verify the optimisation. The same measurements can now be made again and compared with the results from step 1. It is important at this stage not to limit oneself to the point that has been optimised, but to consider the whole application. We would have missed the target if only one part of the application runs faster, but the rest has become even slower.
What are the factors in lost performance?
The question now is where does performance go missing. There are four layers that we can consider and examine individually:
- At the bottom we have the hardware. This includes, above all, the disks on which our data is ultimately stored. But the performance is also influenced by the machine on which our SQL Server instance is running.
- The physical database structure includes the tables and stored procedures as they exist in the database.
- The logical database structure deals with the entities as they occur in the application.
- On the top layer we have the application and its database access.
- The integrated performance counters in Windows can also give us an indication of a bottleneck. Especially interesting are the two counters Avg. Disk Queue Length and SQL Server: Wait Statistics. These show how heavily the hard disks are being used (especially those disks with the data files) and what the SQL Server is having to wait for.
Using the values that have been measured, we can start looking for the cause. The simplest explanation is that the hardware is just too slow. Although it sounds like a lame excuse, it can and does happen that the hardware is no longer able to cope with the volumes of data. Another reason is that the SQL Server has to compete with other server applications for resources. Virus scanners, web servers and application servers can deprive the database server of valuable resources if they are all running on the same hardware. The same applies to the databases of other applications if they all run on the same SQL Server instance. All these databases are competing for the Server's resources. The hard disk can be another bottleneck, although this point is no longer so important since the introduction of SSDs in the server area. If all data files, log files, the TempDB and the SystemDB are on the same disk, the positioning of the read head becomes the bottleneck. This situation is made even worse if the files are heavily fragmented on the disc.
We have now seen various causes. The next step is to remedy them:
- Ideally, of course, our application database should have its own dedicated server. All services that are not required are switched off, and the SQL Server has all the resources at its sole disposal.
- If there are several disks, it makes sense to place the data and the log files on different disks. This also applies to the System DB and the TempDB. These should be stored on a different disk from the application data. You should also make sure that the files are not fragmented.
- Another way to reduce the load on the hardware is to store the data in compressed form. The SQL Server supports compressed storage of tables and indexes. It means that less data has to be read, but the CPU load increases.
With large tables, it may also make sense to partition them and store them on multiple disks. This allows the data to be loaded in parallel.
After a bottleneck has been eliminated, the last step in the optimisation process must not be forgotten, namely the validation of the hopefully now better performance.
The execution plan can also be displayed for a query to an SQL Azure database. The query can be executed in the Management Portal and the execution plan is displayed immediately.
If Full Table Scans appear in the execution plan, this is an indication that an index could possibly improve performance. The SQL Server Management Studio even displays this as a tip. SQL Server stores information about indexes that it could have used for queries if they had been there. We can also retrieve this information directly from the table sys.dm_db_missing_index_group_stats. In the same table, the SQL Server also makes suggestions for "included columns". These are fields of a table which do not belong to the actual index but are stored in addition. Thus, when these fields are accessed using the index, the data can be returned directly. It is not only missing indexes that can have a negative impact on performance; there may also be too many indexes. The SQL Server has to update them every time data changes, but may never have used them for a query. The table sys.dm_db_index_usage_stats stores the usage statistics for the indexes. If the number of updates is greater than the number of accesses, it may be possible to delete the index. However, the entire workload must always be considered. Perhaps this is the very index that will be used for the monthly report.
Much as with a hard disk, indexes can also be fragmented. This means that (too) many datapages have to be read when accessing the data, which understandably has a negative effect on performance. The fragmentation level can be queried with the function sys.dm_db_index_physical_stats. If this value is greater than 30%, the index should be rebuilt (ALTER INDEX REBUILD). If it is between 5% and 30%, reorganising it is enough (ALTER INDEX REORGANIZE). Values smaller than 5% should not be considered for action, because the effort (reorganising requires resources and puts additional load on our server) is greater than the benefit.
One reason for poor execution plans can be outdated statistics. The Query Optimizer uses the statistics to decide how to access the tables and how they should be linked. If the statistics are not updated, the optimizer may choose the wrong strategy. The statistics should therefore be updated regularly. This can be done manually using the UPDATE STATISTICS command. The SQL Server can also do this automatically, however. This is set in the options for each individual database.
Microsoft provides the "Database Engine Tuning Advisor" in addition to the tools already mentioned. This can be found in the SQ Server Management Studio under the menu item Tools. This tool allows you to specify the database that is to be analysed for a specific workload.
After the analysis is complete, suggestions are displayed, including SQL statements. These suggestions should not simply be implemented blindly at this point. Depending on the selected workload, they may well improve performance, but only within a certain range.
An Extended Event Session displays many interesting features. All queries and statements that are sent to the server are visible here. This gives a good insight into which queries are executed how often, how many resources (CPU, Reads, Writes) they require and how long the execution time is.
The Query Store has also been available since SQL Server Version 2016. If this is activated, all queries are saved with their associated runtime statistics and can be analysed at a later time.
So what can be the cause of poor performance? One cause can be that the data normalisation is too "rigid". Although data should not be stored redundantly in a relational database, there are situations in which the targeted insertion of redundancies can greatly improve the execution speed of a query. However, if the main activity is the execution of modification operations (as opposed to Select Statements), the whole thing can also tilt in the other direction. The redundant data must then always be updated, which in turn requires resources.
SQL Server is not always to blame for poor performance. Sometimes it is the application itself that brings the SQL Server to its knees. Two tools that we already know are used to detect errors or malfunctions in the application. One is the SQL Server Profiler, the other the Performance Counters.
With the performance counters, this time we are interested in the number of user connections and the number of transactions (found under SQL Server: General Statistics). If the number of connections increases with every click in the application and does not decrease again until the application is closed, this indicates incorrect or completely absent connection management in the application. The connections are opened, but are not closed afterwards. The same applies to transactions. If the number of open transactions only increases, proper transaction management in the application has been omitted. The only remedy in this case is a review of the connection- and transaction-management.
The SQL Server Profiler gives us information about the queries that the application makes on the database. The focus here is on certain patterns of queries that occur repeatedly. If, for example, the same master data (e.g. countries) is queried repeatedly, this indicates that a cache does not exist in the application. In such a case of data that is used over and over again (and which changes very rarely), it is of course helpful to cache it in the application and not to reload it every time. This makes all the more sense the further away (e.g. in the cloud) the database is located. If an ORM (Object-Relation Mapper) such as Entity Framework is used in the application, special attention must be paid to the 1+n problem. For each record (e.g. person), the associated child elements (the telephone numbers of the person) are queried individually, although this would be possible with a JOIN in one single statement. Many ORM's offer a remedy here by making it possible, when querying, to specify which objects are to be additionally loaded. With Entity Framework, this is possible with the Include() method.
Besides all the patterns, the SQL Server Profiler also identifies all queries that are not optimal or even point to errors in the application. If, for example, the data is all transferred to the client and then aggregated there, the situation is certainly not optimal. In this case it helps to go through the individual queries. Queries that take a very long time to run or are invoked very frequently can then be optimised in a targeted manner.
Tips and tricks
Finally, a few tips and tricks that you should pay attention to when optimising:
- Never optimise without measuring. As described in the optimisation process, the first step is always to measure performance. Only then can the optimisation and modification of the system be started.
- Carry out all measurements on the live system. It is very difficult to simulate exactly the same server loading and workload on a test system as on the live system. In addition, it is usually very complicated to generate the same volume of data as on the productive system.
- Change only one thing at a time. If 3 changes are made at once, this might be two steps forward and one back. It is very difficult to find out which change caused an improvement, and which one did not.
- Always look at the system as a whole. The purpose of performance optimisation is to ensure that the system responds more quickly to user interaction and that the user can work better. If only individual parts of the system are considered, then they may well become faster, but that is far from meaning that the user experiences any benefit.
Thomas Lips is Principal Consultant at Zühlke Engineering AG in Zürich. Over the last years he has worked in several .Net projects as a software engineer, software architect and consultant. His areas of expertise are databases and the development of persistence- and business layer using .NET technologies.