CI Artifact Creation For SQL Server

This article is part of my SQL Server CI/CD Series. In it I will be talking about creating artifacts related to SQL Server in your the CI part of the build chain.

What is an artifact?

First lets define an artifact within the narrow scope of Continuous Integration (CI) an artefact is anything created during a build that is retained for download. Typically  this is a zip file or other package of the compiled application and a html rendering of test reports.

Artifacts we will produce

We are going to produce the following artifacts:

  • A backup of the database taken after all tests are run
  • A dacpac of the schema
  • Output of some queries we run through SQLCMD

We can produce more artifacts like SQL Unit test reports, database diagrams, or even scripting out the entire database. We will do that in a future article. For now, we will stick to these three artifacts.

Why do we want to produce these artifacts?

Just because you can’ doesn’t mean you should. You don’t want to create artifacts unless they will be used. So what is the point of producing each artifact.

Database Backup

In a CI process that involves SQL Server, you’re probably going to create a database. You are then going to modify it as part of your testing. In the end you are going to end up with a “gently used” database. You will likely have all your schema setup, any “static” data, and some test data. You’ll have some statistics, and fragmentation as well. You might want to restore the backup to your laptop and inspect all this stuff. Perhaps manually verify that your assertions in your tests are covering what you think they should.

You also might want a copy of the database that someone can restore to run a local copy of their code, or reports against.

Finally, if any of your CD pipelines are designed to create a fresh database, your going to need this artifact in that CD pipeline. As a matter of fact you might end up creating two backups. One at the point where your database is in a clean “initial state,” and another after all your automated testing has been run.

Dacpac

I hold the strong opinion that the best way to deploy schema changes to a database is a dacpac. You’re welcome to disagree with me, and like everything else in SQL Server, the answer it depends. However, My advice in these articles will always be around the assumption that the dacpac is the gold standard for automated deployments of schema changes.

That being said, if you prefer to use scripts or another tool to manage schema changes, then you are wasting storage and build server CPU creating a dacpac.

Sqlcmd output

Sometime you just want to generate a quick and dirty report in T-SQL to look at. Maybe there is information you want to be aware of, but don’t necessarily want to fail a build, like index fragmentation after the tests are run. Perhaps someone just wants a report to look at after a test run.

Regardless, sometimes sqlcmd –i inputfile.sql –o outputfile.txt is good enough to get you home.

How do we create these artifacts in AppVeyor?

In our introduction I stated we would use AppVeyor as our CI tool, because its free for open source projects, and because its simplicity makes it easy to explicitly show how we accomplish tasks. The steps are all segments of appveypr.yml.

First of all, we need to start SQL Server. We use the services section of appveyor.yml to specify what services to start. I happen to use SQL Server 2016 because that’s the version of SQL Server that Always Encrypted was introduced on.

services:
- mssql2016

Now, every step we use to get the database to its final step is a subject for another blog. I’ll just simply this.

  1. We use environment variables to configure the connection information.
  2. We create the database with sqlcmd.
  3. Other stuff happens to the database.

The relevant environment variables in our case are:

environment:
  SQL_SERVER_INSTANCE: (local)\SQL2016
  SQL_SERVER_USER: AlwaysEncryptedOwner
  SQL_SERVER_PASSWORD: 7aO!z@xUu!4r6EvD#D&l$sz6&h^rhxL6fzAHMpnOga@LO*WdsEdpfh4^Egtl
  SQL_SERVER_DATABASE: AlwaysEncryptedSample
  SQL_SERVER_BACKUP_FILE: $(SQL_SERVER_DATABASE).bak
  SQL_SERVER_DACPAC: $(SQL_SERVER_DATABASE).dacpac
  SQL_SERVER_VERIFICATION_LOG: $(SQL_SERVER_DATABASE).verification.log

Our creation script is run here:

before_build:
- cmd: sqlcmd -S "%SQL_SERVER_INSTANCE%" -i .\appveyor\init.sql

The script itself is:

CREATE DATABASE [$(SQL_SERVER_DATABASE)];
GO
CREATE LOGIN $(SQL_SERVER_USER) WITH PASSWORD = '$(SQL_SERVER_PASSWORD)';
GO
-- Needed for dacpac creation
-- GRANT VIEW ANY DEFINITION TO $(SQL_SERVER_USER);
-- GO
USE [$(SQL_SERVER_DATABASE)];
GO
CREATE USER $(SQL_SERVER_USER);
ALTER ROLE  db_owner  
       ADD MEMBER $(SQL_SERVER_USER); 
GO

It goes without saying that all SQL Server services on AppVeyor build machines use mixed mode authentication and the user that executes AppVeyor steps is a system administrator on all SQL Server instances.

Creating the sql login/user is superfluous, and a step I might remove from that script.

Other stuff happening in the database is the part I’m going to brush over. We now have an empty database that we can perform artifact creating operations on.

Creating the Artifacts

As of the time I wrote this article all my sql artifacts are created in after_build. This is before unit tests are run. None of my tests or any subsequent steps currently modify the database. At this point these artifacts operate on a “fresh” database.  Moving them to after_test would give me artifacts that have the test result data in them, if any.

after_build:
- cmd: sqlcmd -S "%SQL_SERVER_INSTANCE%" -d %SQL_SERVER_DATABASE% -W -i .\appveyor\schema_verification.sql -o %SQL_SERVER_VERIFICATION_LOG%
- cmd: sqlcmd -S "%SQL_SERVER_INSTANCE%" -Q "BACKUP DATABASE [$(SQL_SERVER_DATABASE)] TO DISK='$(SQL_SERVER_BACKUP_FILE)' WITH FORMAT, COMPRESSION, STATS=10;"
- ps: Move-Item -Path (Join-Path $env:SQL_SERVER_BACKUP_DIRECTORY "$($env:SQL_SERVER_BACKUP_FILE)") -Destination $env:APPVEYOR_BUILD_FOLDER
- cmd: sqlpackage.exe /Action:Extract /TargetFile:"%SQL_SERVER_DACPAC%" /SourceServerName:"%SQL_SERVER_INSTANCE%" /SourceDatabaseName:%SQL_SERVER_DATABASE%

Right now schema_verification.sql looks like this:

PRINT 'Tables in database:'
SELECT 
        QUOTENAME(s.name) + '.' + QUOTENAME(t.name) AS [Table Name] 
    FROM 
        sys.tables t INNER JOIN sys.schemas s
            ON s.schema_id = t.schema_id;

I was actually using this to manually verify that the tables were being created. This is a stop gap for actual automated tests of the SQL Schema.

Also a note on the Move-Item. AppVeyor only lets you create artifacts from files inside the folder where the git repo is checked out. I probably could just specify the full path of the backup file in the CREATE BACKUP T-SQL statement. I’m pretty sure the image is setup so there would be no security issues.

Finally, a note on the environment variables used by the Move-Item invocation. SQL_SERVER_BACKUP_DIRECTORY is set from data in the registry so I need powershell to set it dynamically like so.

install:
- ps: $env:SQL_SERVER_BACKUP_DIRECTORY=(Get-ItemProperty "HKLM:\Software\Microsoft\Microsoft SQL Server\MSSQL13.SQL2016\MSSqlServer").BackupDirectory

And APPVEYOR_BUILD_FOLDER is provided to us already. That’s the folder where the git repo is checked out.

So now that we have created these artifacts, we must publish them. This is where the artifacts section of the yaml files comes in.

artifacts:
  - path: $(SQL_SERVER_BACKUP_FILE)
    name: Database backup
    type: file
  - path: $(SQL_SERVER_DACPAC)
    name: DACPAC
    type: file
  - path: $(SQL_SERVER_VERIFICATION_LOG)
    name: Verifiction Query Results
    type: file
  - path: $(MSBUILD_LOG_FILE)
    name: MSBuild BInary Log
    type: file

So as you can see, any file we can make via powershell, the command line or any of the tooling the AppVeyor images provides us can become an artifact. Artifacts are by no means limited to SQL Server, that’s simply the focus of this blog series.

Introducing the SQL CI/CD series

Articles in the series

About the series

Perhaps the hardest part of your software stack to integrate into a CI/CD (Continuous Integration/Continuous Deploy) chain is your database objects. Now Microsoft has made some great strides in this, and its actually not that bad at all if you happen to use all the Microsoft best practices, and have a simple use case, ie:

  • You’re using Agile, Scrum, Kanban, etc. on a closed source project with a single production instance.
  • A private git repo with a .NET Core MVC App and an OLTP database preferably hosted on GitHub or Azure DevOps (formerly Visual Studio Team Services online).
  • Schema managed by a SSDT Project in Visual Studio
  • Any ORMs are not generating schema, and developers are manually handling creating models (C# classes that map to the RDBMS tables).
  • All code coverage and unit tests reports will be viewed in Visual Studio or from the Azure DevOps Dashboard.
  • Build Pipeline Creating a DACPAC in Azure DevOps
  • Release Pipeline does a full DB backup and Deploys a DACPAC in Azure DevOps
  • Release pipeline failure plan is to restore DB backup and rerun last good release
  • Every merged PR gets deployed to dev
  • Manual gates for Production Deploys
  • Database deployed to Azure SQL DB PaaS

If you stay within these lines, and you don’t have any schema or data changes that require a specific migration strategy that present a problem for SSDT, then things are easy.

However, reality doesn’t always match this ideal, especially if the DBAs were brought in after the fact. Even if  DBAs were brought in you could be in a shop that doesn’t bleed Microsoft Blue. Or you could simple be overruled on something. Heck, you could even be a DBA that doesn’t want to do some of the things on that list.

Perhaps your reality, for whatever reason looks like the following:

  • This is an open source project.
  • This project is a demo for training, conference talks, user group talks, and blog articles, so you need many different artefacts produced.
  • Your Git repo is a complex mono repo or contains submodule with projects in different languages
  • You’re main web app is .NET 4.6.1 with MVC 5 and EF 6.2
  • You’re using a third party CI/CD tool such as AppVeyor
  • You’re using other third party services in your CI/CD chain such as codecov.io.
  • Part of your schema is from a Code First EF6 application
  • Your build pipelines is creating several artifacts
    • Website deploy zip file
    • Nuget Packages for some of the DLLs
    • Noget Packages of Powershell Modules
    • SQL DACPAC
    • SQL Schema Generation Script
    • Backup of database Unit test were run against.
    • Code Coverage and unit test reports in HTML format and junit
    • Merges to master trigger a github release and all generated artifacts are sent there.

I happen to have such a project, the AlwaysEncrypted sample application I wrote around the release of SQL Server 2016 that I’ve recently dusted off to improve my DevOps skills. I’ve decided to document some of the steps in my journey along the way as a series of articles on this blog.

About the sample project

While I’m no fan of code first ORMs, I wanted this app to demonstrate AlwaysEncrypted “just worked” even if your app wasn’t built around it. Instead of Azure DevOps, I went with AppVeyor because its free, and since its more simple than Azure DevOps build pipelines, and configured by YAML, I can illustrate the how and why better than “here is a GUI step in Azure DevOps”

A simple Azure SQL DB to Azure SQL DB External Table How To

I’m working on an app for a client with some interesting requirements. The app has its on Azure SQL DB thats ultimately for staging data, but is part of a project with a Data Warehouse. We needed some office location data from the data warehouse to populate a drop down in the app and I decided to try the EXTERNAL DATA SOURCE feature.

External data sources seem similar in concept to linked tables but are much different in practice. They let you connect to hadoop data, azure blob storage, do elastic tables, and expose a table in one database to another. The last one is the least sexy, but the one I want. It’s also the option with the least attention in books online and blogs. I’m here to fix that.

Defining our problem

Of course these are not the real names of my clients servers, dbs and tables.

The data warehouse is on a server called mywarehouse.database.windows.net. The database name is myWarehouse. It has a table called dbo.dimFacilities. We want to expose a subset of that tables columns to a database called loadingDb on myapp.database.windows.net.

Doing the work

loadingDb is going to need to authenticate to myWarehouse and perform some selects. Therefore we need to create a user in myWarehouse. for loadingDb and assign it to the db_datareader role.

CREATE USER app_dw_reader WITH PASSWORD = '************';
GO
EXEC sp_addrolemember @rolename = 'db_datareader', @membername = 'app_dw_reader';
GO

It’s probably better to create a custom role with just the selects on the table we want, but that’s an exercise for the reader. You are not limited to contained users here. You can also use AzureAD users or users associated with a login.

Now its time to turn our attention to loadingDb. The first two objects we will create are a MASTER KEY and a SCOPED CREDENTIAL.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**************';
GO
CREATE DATABASE SCOPED CREDENTIAL dw_reader WITH IDENTITY = 'app_dw_reader', SECRET = '********';
GO

Note the MASTER KEY password is a distinct password the SCOPED CREDENTIAL password must match the users password. The master key is used to encrypt the scoped credential. When you change the password for app_dw_reader, you need to use ALTER DATABASE SCOPED CREDENTIAL to update the secret.

Now that we have that the next object is the external data source This somewhat analogous to adding a linked server entry.

CREATE EXTERNAL DATA SOURCE data_warehouse
WITH (
TYPE = RDBMS,
LOCATION = 'mywarehouse.database.windows.net',
DATABASE_NAME = 'myWarehouse',
CREDENTIAL = dw_reader
);
GO

Note this command will succeed even if the credential is invalid as long as the credential object exists. You don’t get an error until you try to query the linked table. Luckily its a useful error.

Now we need to define the table and the columns. You can define just a subset of the columns, but column definitions must match exactly or you get an error on select. Luckily, the errors returned if the select failed list the column in question.

CREATE EXTERNAL TABLE [dbo].[dimFacility]
(
[facilityId] int NOT NULL,
[facilityName] NVARCHAR(100) NULL,
[country] NVARCHAR(64) NULL,
)
WITH (
DATA_SOURCE = dev_data_warehouse
);
GO

Tangential note on column definitions. If you are using sys.columns.max_length to get the length of NVARCHAR columns divide that by two because that is length in bytes. The azure portal query editor doesn’t show column lengths, so I decided to use DMVs.

Now if everything worked out well you should be able to do a SELECT * FROM dbo.dimFacility; in loadingDb and get you data. Congratulations!