Thursday, February 28, 2019

Parallel Execution of multiple queries in SQL Server

Parallel Execution with Logging =>


Sometimes it is necessary for a process to execute several queries in parallel. There are few ways to do it. Lets look at them.

1. Using parallel tasks in SSIS package. That is most appropriate way to execute parallel queries. You can have these queries hard coded, you will have full execution logging and error handling. That method requires you to have SSIS knowledge and Visual Studio installed. It is most appropriate for processes, which are already using SSIS and parallelism will be done naturally;

2. Creation of an extended DLL function to run queries on the server. That method requires some programming skills in VB, C#, C++ or other available languages to write DLL. That might be too complex for an individual to develop error handling and can produce some unexpected problems during application migration;

3. Using SQL Agent. That is the easiest, but most unsecured way. At first, it requires SQL Agent to be available and running. At second, it uses dynamic SQL for it's executions, which might be pretty dangerous.
However, because it is the easiest way, I'll demonstrate how you can use it.

Here is a stored procedure, which will make a trick:
DROP proc IF EXISTS SP_Parallel_Execution;
GO
CREATE PROC sp_Parallel_Execution
@SQLCommand NVARCHAR(4000),
@QueueLen INT = 100
AS
SET NOCOUNT ON
DECLARE @DBName sysname = DB_NAME();
DECLARE @SQL NVARCHAR(4000);

/* Check for current queue */
IF @QueueLen < ( 
       SELECT COUNT(a.queued_date)
       FROM msdb.dbo.sysjobs as j WITH (NOLOCK)
       INNER JOIN msdb.dbo.sysjobactivity as a WITH (NOLOCK)
       ON j.job_id = a.job_id
       WHERE j.name like 'Parallel_Execution_Command_%'
              and a.queued_date is NOT null
)
BEGIN
  RAISERROR('Can''t execute a query. SQL Agent queue length is bigger than current threshold %d.',16,1, @QueueLen);
  RETURN -1;
END

/* Generate new job */
DECLARE @CommandName sysname = 'Parallel_Execution_Command_'
       + REPLACE(CAST(NEWID() as NVARCHAR(36)),'-','_')
       + '_' + CONVERT(NVARCHAR(150), HASHBYTES ('SHA2_512', @SQLCommand), 1);
EXEC msdb.dbo.sp_add_job @job_name = @CommandName;

/* Generate first job's step with code execution */
SET @SQL = 'DECLARE @SQL NVARCHAR(4000);' + CHAR(10)
       + 'SET @SQL = ''' + REPLACE(@SQLCommand,'''','''''') + ''';' + CHAR(10)
       + 'EXEC sp_executesql @SQL;'
EXEC msdb.dbo.sp_add_jobstep @job_name = @CommandName, @step_name = N'Parallel_Execution_Command_1', @subsystem = N'TSQL', 
       @database_name = @DBName, @command = @SQL, @on_success_action = 3, @on_fail_action = 3;

/* Generate second job's step with deleting the job */
SET @SQL = 'EXEC msdb.dbo.sp_delete_job  @job_name = ''' + @CommandName + ''';';
EXEC msdb.dbo.sp_add_jobstep @job_name = @CommandName, @step_name = N'Parallel_Execution_Command_2', @subsystem = N'TSQL', 
       @database_name = @DBName, @command = @SQL;

/* Adding job to the server and executing it */
EXEC msdb.dbo.sp_add_jobserver @job_name = @CommandName;
EXEC msdb.dbo.sp_start_job @job_name = @CommandName;
RETURN 0;
GO

Now we can test it. Run following code to see how it works:
CREATE TABLE ##Test_Parallel_Execution(F VARCHAR(20));
GO
DECLARE @SQL NVARCHAR(4000);
DECLARE @i INT = 30;
DECLARE @DelaySeconds INT = 30;

WHILE @i > 0
BEGIN
       SET @SQL = '
       DECLARE @i INT = ' + CAST(@DelaySeconds as VARCHAR) + '
       WHILE @i > 0
       BEGIN
         INSERT INTO ##Test_Parallel_Execution(F)
         SELECT ''A' + CAST(@i as VARCHAR) + '-'' + CAST(@i as VARCHAR);
         WAITFOR DELAY ''00:00:01'';
         SET @i -= 1;
       END'
  EXEC sp_Parallel_Execution @SQL;
  SET @i -= 1;
END


SELECT * FROM ##Test_Parallel_Execution;

If you monitor content of "##Test_Parallel_Execution" table you'll see that for 30 seconds it will grow by 30 records each second. Just do not forget to drop temporary table after the test.
If you take a look at list of your SQL Agent jobs in SSMS during the test you might see something like this:
After all jobs are executed they are supposed to disappear from the list.

Here is an explanation how "sp_Parallel_Execution" procedure works:
1. To run any query for the parallel execution you just pass it as a parameter to the procedure;
2. To run too many parallel queries via SQL Agent can be too dangerous. If you create a million jobs for execution it will blow SQL Server "MSDB" database even if your queries will do nothing. To avoid that I've set "@QueueLen" parameter, which is set to 100 by default. If happens that SQL Agent queue riches 100 jobs, the next requested execution will be errored. You can adjust that parameter for your environment.
3. After checking for the queue, stored procedure generates new job name. It contains wording "Parallel_Execution_Command_" plus hash value of your query in addition to uniqueidentifier, which is supposed to make your new execution job totally unique.
4. As the first step of new job will be added your query. It can be any dynamic query against current database or a local stored procedure.
5. The second step of a job is a deletion of the job. That is made to prevent accumulation of executed jobs in SQL Agent.
6. At the end, procedure executes the newly created job and finishes.
7. The job is supposed to delete itself right after executed submitted query without any check if execution was successful.

Disclosure:

1. The stored procedure is provided "as is" and at your own risk. Do not run it in production before the comprehensive testing in Dev environment.
2. That method is very insecure. Whoever has rights to run that procedure can make a damage to your whole SQL Server.
3. Because SQL Agent job is self-deleting, you won't see any execution results of it. That means you have to do your own error handling. The most preferred way to wrap all executed queries inside of hard coded stored procedures with their own error handling. Later on I'll come up with more comprehensive version of that procedure to provide error handling and execution check.

Parallel Execution with Logging =>

7 comments:

  1. It is interesting approach, thank you.
    I like to use the Service Broker for async/parallel sp execution.
    It will be interesting to read the article about it too.

    ReplyDelete
    Replies
    1. Thanks for the comment. I did not mention Service Broker because have little experience with it, but I'll take a look at it to compare those methods.

      Delete
    2. I happen to have written a script and a blog post about this exact thing a few years ago: https://eitanblumin.com/2018/10/31/advanced-service-broker-sample-multi-threading/

      Delete
  2. I have used same concept many times over the years when I wanted to do asyncronous sql. I can recommend the concept!

    ReplyDelete
  3. We use this concept for many things like setting up Availability Groups where task are performed on multiple servers. or heavy data manipulation where we can multi-thread the process to improve performance. but in our implementation we configure the job to delete itself on success and not if an error occurs so the problems can be addressed and we do that with the "Automatically delete job" setting under the notifications configuration of the job.

    ReplyDelete
    Replies
    1. Good catch! Auto-delete might be more appropriate for that purpose.
      I understand that you delete only on the success because you need to collect error information in case of failure.
      I plan to do modifications to my procedure to include error checking and logging. That should solve the issue.

      Delete