Quantcast
Channel: Virtualization – Jonathan Kehayias
Viewing all articles
Browse latest Browse all 23

SQL Server and VMware: A Potentially Fatal Combination? That’s Nonsense!

$
0
0

This morning Paul received an email from a member of the community that was asking if it is safe to run SQL Server inside of a VM and whether we would suggest running databases in virtual machines or not.  The root of the question being asked was an article that the person had read back in January titled SQL Server and VMware: A Potentially Fatal Combination.  This article is an amazing example of how misinformation on the internet can lead to confusion and an incorrect decisions about what options are feasible for a given environment or not.  Most people that read my blog know that I have been running SQL Server virtualized on VMware in production environments since late 2004/early 2005.  In all that time, not once have I ever experienced a data corruption issue associated with SQL Server running inside of a virtual machine, even on my laptop where the machine has powered off due to loss of battery life and the machines experienced a hard crash during operation.  Why?

Analyzing the Source of the Misinformation

Lets start out at the top of the article and debunk the misinformation that is contained in it about running SQL Server on VMware.  The first thing is that the article makes some very broad claims about VMware based on the use of VMware Player, which is akin to Microsoft Virtual PC for Windows XP Mode in Windows 7.  This is not a server class hypervisor, nor is it in any way representative of VMware’s main products ESX/ESXi or vSphere.  This is like comparing SQL Server to Microsoft Access, they aren’t even in the same class of functionality or features.

The next claim in the article is that the database is at risk due to hypervisor caching of the writes being performed by SQL Server, which uses the FILE_FLAG_WRITE_THROUGH flag when calling the CreateFile function to open the database, instructing the system to write through any cache directly to disk.  The article claims that VMware Player caches I/O operations and can result in lost writes.  Unfortunately, the example provided is a machine running on Windows Vista which as a hosted platform always uses unbuffered I/O for virtual machines, so it is not caching the writes being performed.  You don’t have to take my word for this, it is documented in the Storage IO crash consistency with VMware products KB article on their site:

For hosted products, write handling depends on the host operating system.

On Linux hosts, VMware does not use unbuffered IO, because it is not safe or supported across all the Linux versions that VMware supports. So currently, VMware hosted products on Linux hosts always use buffered IO.

On Windows hosts, VMware hosted products use unbuffered IO by default.

Unfortunately the entire foundation of the article is based on misinformation that can be proven by reading VMware’s documentation.  The VMware ESX hypervisors DO NOT cache I/O under any circumstance, which is also covered in the same KB article:

VMware ESX acknowledges a write or read to a guest operating system only after that write or read is acknowledged by the hardware controller to ESX. Applications running inside virtual machines on ESX are afforded the same crash consistency guarantees as applications running on physical machines or physical disk controllers. 

If this is the case, you might be wondering how the database in the article became corrupt? Unfortunately it is impossible to know because the necessary parameters surrounding the test configuration are not provided in the article.  Instead the article provides a Disclaimer that tells you the specifics are “rather unimportant”

Disclaimer:
In many articles you will find the execution times of different queries. Please do not interpret these results scientifically or as official benchmarks. You will not even find the configuration of the computer used for these experiments because it is rather unimportant.

I disagree that this kind of information is unimportant, especially when making claims that SQL Server will suffer database corruption issues if running under VMware. 

Setting up a Repeatable Test Scenario

For scientific purposes, I decided to take one of my standard SQL Server 2008R2 VMs running inside of my Dell M6500 laptop and run a series of repeatable tests using VMware Workstation 7 on Windows 7.  The VM is configured with 2 vCPUs and 2GB RAM, and like the article a thumb drive will be used to store the database data and log files.  To setup the environment for the tests, the following script will be used:

– Create a "Safe" database on the VMDK stored on my laptop SSD
CREATE DATABASE [VMSafe]
ON PRIMARY
( NAME = N’VMSafe’,
  FILENAME = N’C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\VMSafe.mdf’,
  SIZE = 6144KB,
  FILEGROWTH = 1024KB )
LOG ON
( NAME = N’VMSafe_log’,
  FILENAME = N’C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\VMSafe_log.ldf’,
  SIZE = 1024KB ,
  FILEGROWTH = 1024KB)
GO

– Create a "Test" database on a removable thumb drive
CREATE DATABASE [VMTest]
ON PRIMARY
( NAME = N’VMTest’,
  FILENAME = N’E:\SQLskills\VMTest.mdf’,
  SIZE = 6144KB ,
  FILEGROWTH = 1024KB )
LOG ON
( NAME = N’VMTest_log’,
  FILENAME = N’E:\SQLskills\VMTest_log.ldf’ ,
  SIZE = 1024KB ,
  FILEGROWTH = 1024KB)
GO

– Create the baseline table in the "Safe" database
USE [VMSafe];
GO
IF OBJECT_ID(‘dbo.TestTable’) IS NOT NULL
BEGIN
    DROP TABLE dbo.TestTable;
END
GO

CREATE TABLE dbo.TestTable
(RowID INT IDENTITY PRIMARY KEY,
Col1 INT,
Col2 CHAR(400) DEFAULT(‘ABC123′),
Col3 NCHAR(400) DEFAULT(’789XYZ’))
GO
DECLARE @StartTime DATETIME = CURRENT_TIMESTAMP
DECLARE @loop INT = 1
BEGIN TRANSACTION
WHILE @loop <=100000
BEGIN
    INSERT INTO TestTable (Col1) VALUES (@loop);
    SET @loop = @loop + 1;
END
COMMIT TRANSACTION
SELECT DATEDIFF(ms, @StartTime, CURRENT_TIMESTAMP) AS Table1_Time
GO
UPDATE TestTable
SET Col1 = RowID;
GO

– Create a duplicate of the table in the "Test" database
USE [VMTest]
GO
CREATE TABLE dbo.TestTable
(RowID INT IDENTITY PRIMARY KEY,
Col1 INT,
Col2 CHAR(400) DEFAULT(‘ABC123′),
Col3 NCHAR(400) DEFAULT(’789XYZ’))
GO
SET IDENTITY_INSERT dbo.TestTable ON
INSERT INTO TestTable (RowID, Col1, Col2, Col3)
SELECT RowID, Col1, Col2, Col3
FROM VMSafe.dbo.TestTable
SET IDENTITY_INSERT dbo.TestTable OFF
GO

– Issue CHECKPOINT to flush dirty buffer pages to disk
CHECKPOINT
GO

Now that we have our databases created and identical data in them so that we can validate the consistency of the environment in the event of a forced crash by removing the USB thumbdrive from the computer, lets issue a UPDATE against all the rows in the VMTest databases to prefix them with a 1_ for the Col2 column.

SQL Server and VMware: A Potentially Fatal Combination? Thats Nonsense!   image thumb

With the UPDATE complete, pull out the USB thumbdrive and then issue another CHECKPOINT operation to force a flush of dirty buffers to the disk.  This will raise a 9001 error for the VMTest database since the log is not available to write the CHECKPOINT log records to.

SQL Server and VMware: A Potentially Fatal Combination? Thats Nonsense!   image thumb

At this point we have a crashed database, so we can plug in our USB thumbdrive and restart the SQL Server instance to allow crash recovery to run for the databases.  Immediately upon instance restart we will see:

SQL Server and VMware: A Potentially Fatal Combination? Thats Nonsense!   image thumb

OMG!  It has the same status as the article, that must mean we have a problem!  NO! The In Recovery status tells you that crash recovery is running for the database.  If the database is In Recovery you should not set it to Emergency Mode as the article states.  Instead open up the ErrorLog for the instance and look at what is happening with the recovery, the database engine logs informational messages about the progression of crash recovery for the databases while it is running.

SQL Server and VMware: A Potentially Fatal Combination? Thats Nonsense!   image thumb

SQL Server and VMware: A Potentially Fatal Combination? Thats Nonsense!   image thumb

In this case, within a few minutes of opening the database the estimated time for crash recovery to be performed is already at 86 seconds.  The only thing you have to do here is be patient and allow the Database Engine to perform the Undo/Redo operations the way it is designed to.

SQL Server and VMware: A Potentially Fatal Combination? Thats Nonsense!   image thumb

SQL Server and VMware: A Potentially Fatal Combination? Thats Nonsense!   image thumb

All it took was 103 seconds for crash recovery to be performed and the database is available online and in a consistent manner.  The log records describing the transaction were hardened in the transaction log before the commit of the update operation occurred ensuring that the database could be recovered in a consistent state.

I have run this series of tests a number of different ways and the database always starts up in a crash consistent state.  Feel free to test this yourself by pulling the USB thumbdrive out while the UPDATE is running, and it will force a rollback of the transaction during crash recovery.

Summary

In Summary, there is nothing inherently unsafe about running SQL Server on VMware.  Microsoft has supported SQL Server on VMware for nearly three years at this point as a validated Server Virtualization Validation Program (SVVP) configuration (http://support.microsoft.com/?id=956893).  This article is further evidence of the danger of misinformation on the internet, and the need to validate information online before making decisions based on the information.  A SQL Server running as a virtual machine in a SVVP validated configuration is going to be as safe as a SQL Server running on physical hardware.

The post SQL Server and VMware: A Potentially Fatal Combination? That’s Nonsense! appeared first on Jonathan Kehayias.


Viewing all articles
Browse latest Browse all 23

Trending Articles