Post-it notes: Chapter 8 – ETL Techniques

April 8, 2013

Flirting with the idea of taking exam [70-432: Microsoft SQL Server 2008, Implementation and Maintenance] I created loads of post-it notes based on the book: “The Real MCTS SQL Server 2008 Exam 70-432 Prep kit”. I’ve decided to re-visit these notes (before the stickiness wears off and I lose them) and transfer them onto this blog. This will ultimately be a great knowledge refresher and quick reference guide for me. Hopefully you’ll find them useful too. Thanks for reading.

bcp

BCP

  • CMD line (runs outside SQL process)
  • Limited to SQL and Flat files
  • Case sensitive options
  • Use double quotes when export using query
  • No field terminator specified – it will use tab
  • No row terminator specified – it will us line feed (\n)
  • (-F) format files (-X) xml format
  • (-b) batch size, run in parallel, distribute load
  • SQL Server > BCP > Text file and vice-versa

BCP Options (case sensitive)

  • Query  (instead of table)
  • -N (Unicode)
  • -n (native)
  • -c (Character data type)
  • -t (field terminator)
  • -r (row terminator)
  • -f (format file) –x (xml)
  • -F (first row skipped) –L (last)
  • -b (batch size)
  • Native format best for in between SQL Servers
    • Regular native (-n)
    • Unicode native (-N) – best likelihood of transferring all chars in nchar and nvarchar

Bulk Insert

  • T-SQL
  • Only import data
  • No export (like BCP)
  • BULK INSERT (db..table) FROM (datafile) WITH
  • BULK INSERT and BCP work with FLAT FILES only.

Bulk Insert Options

  • ‘char’ – character format
  • ‘native’ – native format (performance, better than char)
  • ‘ widechar’ – Unicode
  • ‘widenative’ – native Unicode format (performance, better than widechar)

Recovery Model + Bulk Operations

  • Full – everything is logged, slower, but recover in full
  • Bulk logged
    • Logs normal things in full, bulk operations minimally
    • Instead of logging details of each row, it records which extents and pages where updated
    • Less I/O = faster
    • Downside – log file does not contain all the information
    • When you back up the log, the changes pages/extents are backed up
    • Log file can remain small, but the backup file could be big
    • You should perform a backup after bulk insert
    • If you had full recovery model, change to bulk, do the import, then change back to full and then backup the log file
    • Minimul recovery requires the following:
      • Not replicated
      • Tablock placed
      • No indices (unless empty)

Optimize Bulk load

  • Multiple batches = each batch = 1 tran
    • Only fails that particular batch
    • Can have multiple clients doing the processing
    • Single batch is default, entire load is success or failure, cannot clear log file until complete
    • Use bulk-logged model
    • Pre-sort to match key

Ad-hoc distributed queries

  • Allows clients to write queries that access external data
  • Enable advanced options ‘ad-hoc distributed queries’, 1
  • Each provider (allow or dissalow)
  • Option is a negative property (tick to disable)
  • Sys_admins can still use even if disabled

OPEN ROWSET()

  • Access data across a data source
  • Can be used for bulk load (must use format file)
  • Linked server not required
  • Returns a table e.g. select * from openrowset(…)
  • Can be used with INSERT, UPDATE, DELETE

OPEN ROWSET(Bulk,…)

  • Must use a format file
  • Reads entire data file as single value
    • SINGLE_BLOG – large binary
    • SINGLE_CLOG – character value
    • SINGLE_NCLOB – Unicode value
    • Reads data as a single column

Differences

  • OPENROWSET – use in place of a table – can pass query
  • OPENDATASOURCE – use in place of instance name/FQ Name
  • OPENQUERY – Pass through query like OPENROWSET but used linked servers!

Linked Servers

  • Sites on top of OLEDB
  • Can specify a connection user
  • Replace instance with (linkedserver.database.schema.table)
    • Sp_addlinkedserver (add the server)
    • Sp_serveroption (configuration)
    • Sp_addlinkedserverLogin (security
    • Data Access Option – true – allows select/insert/update/delete
    • RPC in/out/ – true – allows use of stored procedres
    • Master.sys.server – list of linked servers

Distributed Transactions

  • Changes to remote and local system done as a transaction
  • Select/insert/update/delete on remote servers
  • BEGIN DISTRIBUTED TRANSACTION
  • Uses windows MSDTC
    • Transaction manager manages the across resource manager
    • Resource managers are the servers involved
    • 2 phase commit
      • Prepare – locks and verify that it will work
      • Commit – DTC instructs to commit otherwise rollback

SSIS

  • Extract, transform and load
  • SSIS is an xml file with .dtsx extension
  • Use BIDS to create
  • Package configuration  
    • XML
    • SQL Table
    • Variable
    • Registry
  • Create SSIS using
    • Import/export wizard
    • Copy database wizard
    • BIDS
    • *maintenance plan
  • SSIS Service
    • One per server even if multiple instances
    • Manages storage, security on package
    • SSMS connect to IS, only uses Win authentication
  • Storage
    • Msdb (sysssispackages)
    • File system (…\100\dts\packages\)
  • Deploying SSIS
    • Deployment wizard – creates manifest file
    • Object explorer – import/export wizard
    • Dtutil – cmd untility
  • MSDB roles
    • Db_ssisadmin
    • Db_ssisltduser
    • Db_ssisoperator
  • Execute SSIS
    • DTExec
    • DTExecUI
    • Object Explorer
    • SQL Agent
    • Programmatically
  • SSIS logging
    • Windows Event log
    • Text files
    • Xml files
    • SQL Tables
    • SQL profiler trace

Copy Database Wizard

  • Only works between SQL Servers
  • Copies entire database
  • Can use for transfer 2000 to 2008
  • Method 1 – Detach/Attach
    • Efficient
    • Source must go offline
    • SQL agent on destination must be able to see mdf/ldf etc.
    • Method2 – SQL Management Object
      • Slower
      • Creates objects then populates
      • Agent service does not require access like method 1
      • Copy wizard also prompts for following:
        • Logins
        • User defined error msgs
        • SP’s in master
        • SQL Server agent definition
        • End points

 Import Export Wizard

  • SQL Server to any format
  • Any format to SQL Server

 

Thanks for reading.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: