Cross-database query performance in Azure SQL Database

Problem

Indexes play an important role in improving performance of queries, by allowing SQL Server to scan a limited number of rows. When queries involve tables from the same database, SQL Server can choose the best indexes available. Can SQL Server take advantage of indexes when querying multiple databases?  In this tip, I’ll explore the performance of cross-database queries in Azure.

Solution

Provisioning test environment

For the purpose of this exercise, I’ll provision two Azure SQL databases on the same Azure logical server. To provision the first database I’ve downloaded and created a database from the sample Contoso data warehouse from here. Then I deployed the Contoso database to Azure using ‘Deploy Database to SQL Azure Wizard’, described in this tip. Finally, I created a copy of the Azure SQL database.  I named the Azure databases ContosoRetailDW and ContosoRetailDW2 and assigned them the ‘S1 Standard’ price tier.

Through this tip I’ll refer to the ContosoRetailDW database as the main database and ContosoRetailDW2 as the external database.

Using the below script, I’ve created an Extended Event to capture query text and execution plans from the external database:

CREATE EVENT SESSION [SqlText_ExecPlanExt] ON DATABASE 
ADD EVENT sqlserver.query_post_execution_showplan(
ACTION(sqlserver.database_name,sqlserver.sql_text,sqlserver.tsql_stack)
WHERE ([sqlserver].[like_i_sql_unicode_string]([sqlserver].[sql_text],N'%FactOnlineSales%'))),
ADD EVENT sqlserver.sql_statement_completed(
ACTION(sqlserver.database_name,sqlserver.sql_text,sqlserver.tsql_stack)
WHERE ([sqlserver].[like_i_sql_unicode_string]([sqlserver].[sql_text],N'%FactOnlineSales%')))
ADD TARGET package0.ring_buffer
WITH (STARTUP_STATE=OFF)
GO

ALTER EVENT SESSION [SqlText_ExecPlanExt] ON DATABASE STATE = START
GO

The output from this trace will be stored in memory (ring buffer) in XML format and I’ll use the following query to extract the query text and execution plans from the XML data:

DECLARE @extevents XML
SELECT @extevents = CAST([t].[target_data] AS XML)
FROM sys.[dm_xe_database_sessions] AS s
JOIN [sys].[dm_xe_database_session_targets] AS t
ON [t].[event_session_address] = [s].[address]
WHERE [s].[name] = N'SqlText_ExecPlanExt' AND
[t].[target_name] = N'ring_buffer' ;

SELECT DATEADD(hh, DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), n.value('(@timestamp)[1]', 'datetime2')) AS [timestamp],
n.value('(data[@name="duration"]/value)[1]', 'bigint') as duration,
n.value('(action[@name="sql_text"]/value)[1]', 'varchar(max)') as sql_text,
replace(replace(cast(n.query('(data[@name="showplan_xml"]/value/.)[1]') as varchar(max)),'<value>','<?xml version="1.0" encoding="utf-16"?>'),'</value>','') as exec_plan2
FROM @extevents.nodes('RingBufferTarget/event[@name=''query_post_execution_showplan'']') AS q(n)

Performance test exploring local queries

Let’s start with a simple query running locally within the main database:

--Q1
SELECT [StoreKey],[SalesAmount]
FROM [dbo].[FactSales] 
WHERE storeKey = 223

This query took around 9 seconds and if we check the execution plan, we’ll notice SQL has scanned the entire clustered index to get the results.

Now, let’s create an index including the required fields and re-run the query again:

CREATE INDEX IDX_FactOnlineSales ON [dbo].[FactOnlineSales] (StoreKey,[SalesAmount]) 

As expected, the new execution was much faster (1 ms) with the following plan:

Let’s make the query more complex by joining a table:

--Q2
SELECT F.[StoreKey],F.[SalesAmount],S.StoreDescription
FROM [dbo].[FactSales] F 
JOIN DimStore S ON F.StoreKey=S.StoreKey
WHERE F.StoreKey = 223

The new query is still fast (49 ms) and is using the index we created:

Finally, let’s try an aggregate query:

--Q3
SELECT S.[StoreKey], SUM(F.[SalesAmount]) 
FROM [dbo].[FactSales] F 
JOIN DimStore S ON F.StoreKey=S.StoreKey
GROUP BY S.StoreKey

It took 10906 ms and produced the following plan:

Elastic queries

It’s time to investigate how Azure SQL would behave when the query involves external tables. To refer to tables in other Azure SQL databases we need to use the Azure elastic queries feature. This feature allows you to split your data with vertical or horizontal partitions (sharding) (see more about it here).

I’ll focus on the vertical partitions feature, for the purpose of this exercise.

Let’s create a master key, credential, data source and external table using the below script inside main database:

CREATE MASTER KEY ;

CREATE DATABASE SCOPED credential ext_cred with identity ='fazizov' ,secret='Password201)';

CREATE EXTERNAL DATA SOURCE ExtDS
WITH (
TYPE=RDBMS,
LOCATION='fasqlsrv.database.windows.net',
DATABASE_NAME='ContosoRetailDW2',
CREDENTIAL= ext_cred);

CREATE EXTERNAL TABLE FactSales_Ext (
[SalesKey] [int] NOT NULL,
[DateKey] [datetime] NOT NULL,
[channelKey] [int] NOT NULL,
[StoreKey] [int] NOT NULL,
[ProductKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[UnitCost] [money] NOT NULL,
[UnitPrice] [money] NOT NULL,
[SalesQuantity] [int] NOT NULL,
[ReturnQuantity] [int] NOT NULL,
[ReturnAmount] [money] NULL,
[DiscountQuantity] [int] NULL,
[DiscountAmount] [money] NULL,
[TotalCost] [money] NOT NULL,
[SalesAmount] [money] NOT NULL,
[ETLLoadID] [int] NULL,
[LoadDate] [datetime] NULL,
[UpdateDate] [datetime] NULL)
WITH
(
DATA_SOURCE = ExtDS,
SCHEMA_NAME='dbo',
OBJECT_NAME = N'FactSales'); 
GO

Please note, this external table points to the same FactSales table located in the external database.

Since we’ve already seen benefit of the index, let’s create the index IDX_FactOnlineSales inside the external database as well:

CREATE INDEX IDX_FactOnlineSales ON [dbo].[FactOnlineSales] (StoreKey,[SalesAmount]) 

Now, let’s query the external table using a query similar to the one we’ve used earlier (Q1) and execute it inside the main database:

SELECT [StoreKey],[SalesAmount]
FROM [dbo].[FactSales_Ext] 
WHERE storeKey = 223

Although we were are querying same table with the same index, it took much longer (74 ms), when called from the external database, which could be explained by overhead related to the bridge between the databases.

Examining the execution plan from main database doesn’t allow us to confirm whether or not the index has been used.  We can see this is shows as a Remote Query:

Let’s switch to external database and use the Extended Events related to the query from above to get the SQL queries and execution plans from the ring buffer.  Here’s the query text and execution plan that we find in the Extended Events:

SELECT [T1_1].[StoreKey] AS [StoreKey],[T1_1].[SalesAmount] AS [SalesAmount] 
FROM (SELECT [T2_1].[StoreKey] AS [StoreKey],[T2_1].[SalesAmount] AS [SalesAmount] 
FROM [dbo].[FactSales] AS T2_1 
WHERE ([T2_1].[StoreKey] = CAST ((223) AS INT))) AS T1_1

As you can see, the SQL has parameterized the filter value and used the index.

Now, let’s run a cross-database version of the join query we’ve ran earlier (Q2):

SELECT F.[StoreKey],F.[SalesAmount],S.StoreDescription
FROM [dbo].[FactSales_ext] F 
JOIN DimStore S ON F.StoreKey=S.StoreKey
WHERE S.storeKey =223

It took 81 ms with the following execution plan from the main and external databases accordingly:

Although our filtering parameter was linked to the main table, amazingly SQL was smart enough to push the parameter to the external table. Here is the query captured from external database:

 SELECT [T1_1].[StoreKey] AS [StoreKey],[T1_1].[SalesAmount] AS [SalesAmount] 
FROM (SELECT [T2_1].[StoreKey] AS [StoreKey],[T2_1].[SalesAmount] AS [SalesAmount] 
FROM [dbo].[FactSales] AS T2_1 WHERE ([T2_1].[StoreKey] = CAST ((223) AS INT))) AS T1_1

Encouraged by this result, I ran a cross-database version of the aggregate query from above (Q3):

 SELECT S.storeKey,SUM(F.[SalesAmount]) 
FROM [dbo].[FactSales_ext] F 
JOIN DimStore S ON F.StoreKey=S.StoreKey 
GROUP BY S.storeKey

The execution time was the same as its local equivalent and as you already guessed, it was also fast because SQL intelligently re-shaped the external database related part of it as follows:

 SELECT [T1_1].[StoreKey] AS [StoreKey],[T1_1].[col] AS [col] 
FROM (SELECT SUM([T2_1].[col]) AS [col],[T2_1].[StoreKey] AS [StoreKey] 
      FROM (SELECT SUM([T3_1].[SalesAmount]) AS [col], [T3_1].[StoreKey] AS [StoreKey] 
            FROM [dbo].[FactSales] AS T3_1 
            GROUP BY [T3_1].[StoreKey]) AS T2_1 
      GROUP BY [T2_1].[StoreKey]) AS T1_1

To confirm the index usage, let’s examine execution plan from external database:

Conclusion

Azure SQL Database can intelligently optimize performance of cross-database queries. However, care must be taken when allocating tables into different Azure databases, as in some cases overhead of merging data from different databases could be significant.

Next Steps

Last Update:
2018-04-25

About the author

Fikrat Azizov has been working with SQL Server since 2002. He’s currently working as Business Intelligence consultant at Avanade Canada.

View all my tips

Bir Cevap Yazın

E-posta hesabınız yayımlanmayacak. Gerekli alanlar * ile işaretlenmişlerdir

TOP