Thursday, September 11, 2014

Experimenting with "Bulk Insert" Fast load

Tried to use scientific approach in measuring data load speed using "Bulk Insert" command.

Experiment Goal:
Load text data file into SQL Server table as fast as possibly using "Bulk Insert" command with different parameters.
Experiment does not involve using indexes or any ETL activities - just simple load of one file to one table.

Experiment Environment:
SQL Server 2012 Enterprise

Experiment Preparations:
Generated 1.3 Gb file with 10 million records from sys.messages table.

Experiment Scenarios:
Variations of SQL Database transactional mode: Full, Bulk_Logged, Simple.
Variations of "BULK INSERT" command parameters: TABLOCK, ROWS_PER_BATCH, BATCHSIZE;
Using Pre-Growth for Data and Log files.

Used Script:

Experiment Metrics/Results:

Metrics explanation:
1. ## - Experiment number. Some experiments were omitted (blank lines).
2. Mode - SQL Database transactional mode (Full, Bulk_Logged, Simple).
3. "BULK INSERT" command parameters (TABLOCK, ROWS_PER_BATCH, BATCHSIZE). For "BATCHSIZE" were used following batch sizes: 1 million records, 0.1 million and 5 million.
4. Database Pre-Growth - Data and log files were pre-grown to following values, depending on expected results: 2500Mb for Data file and 6200MB, 2600MB and 100MB for Log file.
5. Experiment Timing in seconds - measured Full time of the experiment, including DB creation, Back Up file Pre-Growth, Data Load and drop of the database. Also measured time for Data Loading process itself.
6. File size growth - The final Data and Log files sizes. In some cases Log file growth indicated amount of space needed to perform the load.

Interpreting Results:
For all three transactional modes the best performance was achieved with use of "TABLOCK" option and pre-growing Data and Log files (see yellow lines).

Expected conclusions:
1. "TABLOCK" option behaved as expected and dramatically improved loading performance.
2. Pre-growing of Data and Log Files also significantly improved performance. That is another point against "Autoshrink" database option.

Unexpected conclusions:
1. "BULK INSERT" in FULL transactional mode can be as fast as in BULK_LOGGED and SIMPLE modes: 24 sec vs 22 sec. (See lines #6 and #16 & #26). That might be explained only by the fact that Data and Log files were created on different physical drives and writes into both files were dome in parralel.
2. Even though use of "ROWS_PER_BATCH" and "BATCHSIZE" parameters showed little performance improvement it was not significant comparing to "TABLOCK" option. Most probably that server just had enough memory to load whole file without necessity divide it in chunks.

Please ask any questions if you think experiment was not held properly or result interpretation is not clear.

1 comment: