How Does Microsoft Azure Security Work?

Microsoft Azure provides most of the same controls and security features that you would expect from Windows Server within an on-premises data center – with the one notable exception that it must operate and scale across thousands of tenant environments simultaneously. Core security and access components include.

  • Microsoft Azure Active Directory: Identity and access management capabilities in the cloud
  • Access Control Services: Cloud-based service that provides simplified authentication and authorization
  • Microsoft Azure Virtual Network: Logically isolated section in Microsoft Azure that connects over IPsec to your on- premises data center.M
  • Microsoft Azure Multi-Factor Authentication: Additional layer of authentication via mobile services.

In this article, we’ll delve into the mechanics of security in the cloud. Not to be confused with Security-as-a-Service (fee-based subscriptions such as security event management), our interests lie in demonstrating how to properly secure applications, services, and data – wherever they reside in the cloud services stack (IaaS or PaaS).

Specifically, we will focus on workloads running in Microsoft Azure. By integrating the capabilities of Barracuda Network’s best-in-class virtualized NG Firewall and Web Application Firewall (WAF) with Microsoft Azure’s native security features, you’ll be in a superior position to deploy reliable and resilient cloud services for your enterprise, partners, and customers.

Introduction

Microsoft azure security

Cloud computing has faced many challenges when it comes to reliability, privacy, and security. With early incarnations, such as those from commercial service providers, sensitive corporate information was kept at a distance from public platforms. Instead, hosting was used for lightweight applications with limited needs for access to such data, including websites, collaboration tools, and real-time communications. As cloud services matured, businesses became more willing to place proprietary data in the hands of trusted SaaS partners, such as salesforce.com for CRM. And when the depth and breadth of available services evolved into more standardized infrastructure – the “as-a-Service” model – organizations started large-scale moves into the cloud with their most critical data.

Today, cloud computing has become a “must-have” to a majority of the enterprise IT community, for reasons ranging from economic gains to technology benefits. But one of the major concerns carrying over from traditional IT – data and application security – has not changed, and requires the same diligence in the cloud as with on-premises solutions.

Solution Profile

When you migrate data, applications, and processes to the cloud, you take with you the requirements to safely manage both corporate and customer information. And in most cases, you are still subject to the privacy and compliance directives of your industry, whether HIPAA, SOX, PCI, or others.

So, while the cloud computing model promises great flexibility, cost savings, scalability, and other benefits, it’s essential to understand the differences between implementing effective on-premises information security and deploying the same protections in the cloud. These considerations include (but are not limited to) data governance, auditing, leak prevention, threat detection/ remediation, privacy and confidentiality, information integrity, and reliability/availability.

Historically, many of these needs could be met and enforced at the edge of your corporate network. A selection of physically-wired routers, firewalls, gateways, IPS devices, and VPNs worked together to keep the bad traffic out and the good in.

But in the cloud, you can only rely on what the platform vendor offers, either natively or as valueadded services on top of your subscriptions. You do not have a say in how the infrastructure behaves or what mechanisms are used to secure it, other than in how your applications interact. You cannot deploy your own firewall in the server rack, and you can’t configure the ACLs on the fabric routers.

In this sense, your previous approach to security is no longer suitable. You must trade physical control for virtual stewardship using the combination of the cloud platform’s capabilities, your application design, deployment methodology, and layered virtualized security. Indeed, you may not own the hardware, but you can certainly own what runs on it – and implement a security solution that fits your needs.

A fully-functional virtualized security appliance deployed within the framework of your environment can deliver all of the benefits of a physical device, with the flexibility only possible in a fluid software form-factor. The Barracuda NG Firewall and Barracuda Web Application Firewall are optimized to run on Microsoft Azure, make it easy to protect mission-critical enterprise applications and data in the cloud.

Application Security Overview

The origins of application security revolve around the protocols, commands, data types, credentials, and policies associated with providing external access to internal corporate resources – whether by employees or otherwise. As traditional client/server architecture transitions to webbased solutions using XML and HTTP, forms-and-claims-based authentication becomes more common. Likewise, application firewalls are transforming into web-based systems that deal with a broad threat landscape

How Does Microsoft Azure Security Work?

Microsoft Azure provides most of the same controls and security features that you would expect from Windows Server within an on-premises data center – with the one notable exception that it must operate and scale across thousands of tenant environments simultaneously. Core security and access components include.

  • Microsoft Azure Active Directory: Identity and access management capabilities in the cloud
  • Access Control Services: Cloud-based service that provides simplified authentication and authorization
  • Microsoft Azure Virtual Network:Logically isolated section in Microsoft Azure that connects over IPsec to your on- premises data center
  • Microsoft Azure Multi-Factor Authentication:Additional layer of authentication via mobile services.

With these capabilities, augmented by Windows Server’s native security functionality within a virtual machine, customers can either host their IT resources in Microsoft Azure, or easily extended their on-premises infrastructure to the cloud, without adversely affecting their security posture. Some examples of architecture-level security features in Microsoft Azure include:

  • Storage, network, process isolation:prevents different customers’ virtual machines (VMs) or cloud services from interfering with one another’s operation
  • VM-VM/host-VM packet filtering: host-level and VM-level firewalls block unwanted traffic from crossing VM boundaries, as well as blocking tenantto-host connections that could impact the Microsoft Azure fabric
  • Port restrictions: by default, all new VMs are deployed with deny-all policies on the native Windows Firewall, and all ports are closed to external traffic

But are these capabilities, taken by themselves, enough to protect your IT environment from all attacks and exploits? As we’ll see below, the answer is “no.

Bringing Together Application Security and Microsoft Azure.

The future data center is one which spans both on-premises physical infrastructure and cloud based virtualized services.

Whether you use the platform, infrastructure, Software-as-a-Service, or a mix of all or a mix of all, your security strategy needs to encompass your applications and data wherever they reside. Applying a “Defense-in-Depth” (DiD) approach means using more than one type of security measure in the data path. As a simple example, consider how your corporate email system has anti-malware, anti-spam, a front-end gateway, router ACLs, and a firewall in front of it.

Putting that same resource into the cloud requires the same effort, but security capabilities also have to bridge between your local systems/users and your virtualized data center. The latter is true for both a fully-hosted IT environment in the cloud, and extended IT from existing on-premises infrastructure.

So while Microsoft Azure’s core infrastructure security stands well on its own with comprehensive authentication/ authorization/access control technologies, encryption, data and storage protection, etc., it makes sense to augment it with advanced security functionality to safeguard your critical assets

Web World.

Microsoft Azure websites are built on top of the same secure infrastructure as Bing, Microsoft. com, and Office 365. In terms of reliability, it is the same platform that drives some of the world’s biggest websites and collaboration services.

However, challenges emerge as you move from service-level offerings (SaaS) from the provider to custom-developed solutions (or packaged applications) running within customers’ virtual environments (IaaS/PaaS). The cloud infrastructure should not, by it’s very nature, obstruct the operations of a tenant’s workload.

Thus, Microsoft Azure cannot block a poorly-designed web application from running on a VM, regardless of the security risks it might pose. However, any VM found to be the source of a DDoS or malware attack will be removed from the network by Microsoft Azure data center administrators.

In an ideal world, every web-facing application would be designed and thoroughly reviewed according to strict security-development practices, penetration-tested, and deployed using the latest embedded filtering technologies.

This is a rare occurence, as was the trend with rollouts of data-driven web systems in the past, many corporate cloud applications will get released with little security testing and limited system hardening. of data-driven web systems in the past, many corporate cloud applications will get released with little security testing and limited system hardening.

When web application firewalls came along, they solved this problem by making up for inadequate design by locking down every transport, protocol, method, command, and data structure used in modern web services. Developers could now rely on the web gateway to protect the app, and CIOs could accelerate the push towards getting more of their in-house resources onto the Internet for increased employee productivity.

The present situation is nearly identical…CIOs want to move applications and services out of the expensive data center and into the commodity cloud; however, the cloud does not natively have the same protections in place as on-premises corporate resources

As before, will developers begin rewriting their applications specifically to run effectively and securely in the cloud, or will economics win out once again, rapidly driving corporate assets onto the Internet without sufficient security considerations?

The answer is that economics always win. Hackers know this, and will continue to exploit every new IT medium that is introduced. It means that IT customers must once again take an active role in protecting their investments at multiple levels, without relying on just the cloud provider, or just developer resources to address Internet threats.

Now, web application security is known for preventing advanced attacks that may hide in scripts, code, downloads, data streams, images, protocols, tunneled/encrypted traffic, program execution/ applets, forms, and more.

Thus the idea of an application firewall has become too narrow, as only a comparatively small number of exploits are targeted at protocol or IP stack vulnerabilities (most of which are known and easily deterred at this point). The rich functionality, power, and depth of data in web applications represents a treasure trove for malevolent individuals and organizations to exploit

There’s no difference between securing a workload or application that runs in the cloud, or an onpremises service in your data center. The basic concepts of encryption, filtering, and access control apply equally well in a hosted scenario. The primary change is in the breadth of considerations, as well as how you deploy and configure these security attributes.

A proven tenet of application, one of the proven tenets of application protection is that the closer the security capability is to the resource, the better it will function. In the data center, this means installing access and security gateways on the same network, in the same DMZ, as the target servers. Such a deployment topology prevents traffic from reaching the application without first being inspected and filtered – in some cases, at multiple points along the way.

However, in the public cloud, no such topology exists because a customer cannot walk into a global data center, find the rack their applications are running on, and install a firewall. The very nature of cloud computing stands in stark contrast to hardware-level protection, since the fabric can autonomously move your workload to another rack, or to another data center altogether.

As a result, application security requirements take on a transitory nature that must be as mobile as the workload itself. Apart from security capabilities built into the cloud platform, there are only two options: Security-as-a-Service, or virtualized security (i.e., virtual appliances).

But in the cloud, you can only rely on what the platform vendor offers, either natively or as valueadded services on top of your subscriptions. You do not have a say in how the infrastructure behaves or what mechanisms are used to secure it, other than in how your applications interact. You cannot deploy your own firewall in the server rack, and you can’t configure the ACLs on the fabric routers.

In this sense, your previous approach to security is no longer suitable. You must trade physical control for virtual stewardship using the combination of the cloud platform’s capabilities, your application design, deployment methodology, and layered virtualized security. Indeed, you may not own the hardware, but you can certainly own what runs on it – and implement a security solution that fits your needs.

Conclusion

azure consulting

Medha Hosting has been protecting its users, applications, and data for different organizations worldwide, Medha Hosting has developed a global reputation as one of the trusted partner to provide  powerful, easy to hire, affordable IT solutions. The company’s proven customer-centric business model focuses on delivering high value, subscription-based IT solutions for security and data protection. For additional information, please visit Azure ConsultingGoogle Cloud Consulting, and AWS consulting .

 

How to Run Oracle DB on Microsoft Azure ?

This article is written to provide you a high level overview of how to run Oracle database on Azure. There are some significant differences for Oracle Architectures when it comes to Azure and there are some additional opportunities as well. This article specifically talks about how to map on-premise Oracle architectures on Azure.

IaaS vs PaaS

Customers had two options originally for running Oracle database on azure namely IaaS and PaaS. Microsoft does not provide Oracle DB as a PaaS service anymore and Oracle on Azure is only possible under IaaS model.

Licensing

Azure offers only BYOL options for running Oracle on Azure. As per Oracle’s published licensing document, a core factor of 0.5 should be when calculating the licenses needed on Azure for Oracle Enterprise Edition choice. In evaluating the licenses needed for Oracle careful analysis of the CPU performance mapping has to be performed. Almost all the times, the hardware on-premises are legacy and not with an upgraded chipset. However on Azure the CPU capabilities are constantly evolving and moving to the next generation. 2GHz CPU with a previous generation is lesser powerful than a 2GHz CPU of the next generation on Azure. Please see table below with CPU capability matrix which could be handy when comparing on-premises to Azure capacity analysis for Intel servers. Also for a variety of reasons and situations hardware on -premises could potentially be oversized and underutilized. Moving to Azure cloud gives the opportunity optimize workloads and licenses. In a nutshell, you might potentially need lesser licenses on Azure than on-premises.

 

Azure Consulting

Storage

Storage on Azure is triple replicated by default. The SAN like redundancy is built-in a different way. The choice of storage is often-times one of the critical component for performance of the DB. In the on-premise world, SAN from top vendors like EMC/Hitachi could handle the IOPS requirement very well. However, world has changed and for the cloud, the primary choices are Standard storage(SATA) and Premium Storage(SSD).

Disks & IOPS

If your DB is on running on filesystem, the general recommendation and good practice is to have the OS disks on Standard Storage which certainly can be placed in SSD for even better performance and for the data disks the recommendation is to have it on Premium SSD disks. By default, you will get about 500 IOPS of 1KB payload on a given disk on a Standard Tier VM or 5000 IOPS on a premium storage disk. To increase throughput, you could certainly opt for RAID 0 to do stripping on disks similar-to on-premises. https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-linux-configure-raid?toc=%2fazure%2fvirtual-machines%2flinux%2ftoc.json

This is to scale IOPS requirements. The choice of premium storage also provides better stability and availability. Since Azure retains 3 copies of data for any kind of storage, RAID 10 is not necessary and RAID 0 should be sufficient.

Oracle ASM

Don’t get confused with Oracle ASM and Azure ASM. Oracle ASM is Automatic Storage Management which is a volume manager and filesystem for Oracle databases. Azure ASM is Azure Service Management which is the old style way of working with Azure. Oracle recommends the databases to be on Oracle ASM for better performance and manageability. Oracle databases can be built on Oracle ASM on Azure on both Linux and Windows. Standard Oracle installation procedure is recommended.

Scalability

Scaling RDBMS systems has always been expensive and complicated and Azure no different. Horizontal scalability on Azure is not feasible primary due to two reasons 1. Multicasting of IP and 2. Shared storage. Azure Oracle VMs follow a shared nothing architecture. This means that running clusters using Oracle RAC is not possible but however vertical scalability is feasible. RAC enables fault tolerance, capability of capacity, shared workloads etc., Oracle RAC is a 10 year old legacy technology when hardware failure, limited vertical scalability, single core sockets etc., were common. In current world, the CPUs are lot more powerful, vertical scalability has far increased, multi-core socks available. Oracle RAC workloads with proper planning can be quiet easily moved to Azure on standalone VMs.

High Availability

Oracle Maximum Availability Architecture(MAA) suggests having Oracle RAC and Oracle physical standby using Oracle Dataguard for local HA and a DR with another Oracle standby. Automation of failover can be achieved using an Oracle node with Oracle data guard broker with the Fast Start Fail Over configuration. Similar architectures can be built on Azure with the assistance of Availability Sets.

Availability Sets

Availability Sets is an Azure capability and virtual abstraction layer which enables fault tolerance for Azure VMs. When multiple virtual machines are created within an Availability Set each of the VM is placed under separate fault and update domains and separate hardware. However, disks on different storage accounts could potentially be under same storage cluster. With the introduction of managed disks, which is currently under private preview, this will be changed and storage and compute together will have fault tolerance.

Oracle HA Architecture

The recommended approach of HA would be to have the 2 VMs in an Availability Set. With the primary running on VM1 a physical standby should be built on VM2 using Oracle data guard. This will enable active-passive architecture in a given region. However active-active architectures can be achieved using replication tools such as Oracle GoldenGate, Attunity etc., DR with an Active-passive or Active-Active architectures can be achieved across geographically distributed locations using Oracle physical standby or Oracle Goldengate. Alternatively there are new systems that seem to provide real-time HA automation and failover capabilities. ScaleArc seems to be potentially provide this ability with some customizations of the product.

Alternatively one of our customers are looking at dynamically detaching Azure page blob disks in a vm down scenario, spin off a new vm dynamically using powershell script and attach the blobs to the new VM. This is something unthinkable in a on-premise scenario but Azure cloud gives you a new method of providing HA for Oracle DBs. The link gives you the high level steps of this approach https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-linux-classic-detach-disk

Backups

Standard backup to disk using RMAN can be on Azure Vms which includes full and incremental options. Azure currently provides capability to configure a Windows or Linux vms for backups using Azure Backups. Daily, weekly, monthly and yearly retention configurations are possible. However Oracle backups on Windows are file system consistent but Linux vms are currently only crash consistent. There are several capabilities that will be available in the near future for faster and higher throughput backups.

Further to Azure backups, common backup software like Netbackup and CommVault are commonly chosen by customer to provide enterprise wide backup services using Azure blobg storage underneath.

Are you ready to run Oracle on Azure? We are here to help you.Please find more information at our Azure  Management Consulting Section.

Azure Consulting
Azure Consulting

Performance Guidance SQL Server-Microsoft Azure Virtual Machines

Performance Guidance for SQL Server in Microsoft Azure Virtual Machines

SQL Server Technical Article

Summary: Developers and IT professionals should be fully knowledgeable about how to optimize the performance of SQL Server workloads running in Microsoft Azure Infrastructure Services and in more traditional on-premises environments. This technical article discusses the key factors to consider when evaluating performance and planning a migration to SQL Server in Azure Virtual Machines. It also provides certain best practices and techniques for performance tuning and troubleshooting when using SQL Server in Microsoft Azure Infrastructure Services.

Contents

Introduction 5

Quick check list 5

Azure Infrastructure Services fundamentals 6

Azure VM configuration options 6

Virtual machine size 6

Network bandwidth 6

Disk types and configurations 6

Disk cache settings in Azure virtual machines 7

Planning a virtual machine configuration in Azure for optimal SQL Server performance 9

Different performance characteristics and considerations for major sub-systems between the cloud and on-premises 9

I/O sub-system 9

CPU and memory 10

Network 10

Raw storage performance testing 11

Best practices and recommendations for optimizing SQL Server performance in Azure VMs 11

Virtual machine sizes 11

Azure virtual machine disks and cache settings 12

Operating system disk vs. data disk 12

Temporary disk 12

Data disks performance options and considerations 12

Single data disk configuration 13

Multiple data disk configuration 13

Adding multiple data disks to Azure virtual machine 13

Disk striping options for Azure Virtual Machines 14

Placement of database files 16

TempDB 19

Effects of warm-up on data disks 19

Single vs. multiple storage accounts for data disks attached to a single VM 20

NTFS allocation unit size 20

Data compression for I/O bound workloads 20

Restore performance – instant file initialization 22

Other existing best practices 23

Performance troubleshooting fundamental concepts 23

Traditional factors that govern performance in SQL Server 24

Factors that govern performance for SQL Server in Azure Virtual Machines 25

Performance monitoring methodology for SQL Server in Azure Virtual Machine 26

Appendix 30

Raw storage performance testing scripts 30

Guidance on defining key performance indicators 33

How to use KPIs 33

SQL Server troubleshooting scripts 35

Snapshot wait stats script 35

Requests executing on the system script 37

Top query statements and plan by total CPU time 38

Snapshot spinlock stats 39

Snapshot I/O stats 41

Performance monitor 44

Introduction

The goal of this technical article is to provide guidance to developers and IT professionals on how to optimize the performance of Microsoft SQL Server workloads running in an Azure Virtual Machine environment. The article first describes the key concepts in Azure that have a direct impact on the performance of SQL Server workloads. It then introduces the key factors to take into account when evaluating performance and when you plan a migration to Azure platform. Next, it introduces additional considerations for performance troubleshooting in Azure Infrastructure Services. The article uses results from specific test scenarios to provide guidance and best practices for optimizing the performance of SQL Server workloads running in Azure virtual machine (VM).

The online documentation for SQL Server in Azure Virtual Machines covers getting started, migration, deployment, high availability, security, connectivity, backup and restore, creating a new virtual machine using one of our pre-built SQL Server platform images in the Image Gallery, and other topics. For more information, see SQL Server in Azure Virtual Machines.

Quick check list

The following is a quick check list that you can follow:

• Use minimum Standard Tier A2 for SQL Server VMs.

• Keep the storage account and SQL Server VM in the same region.

• Disable Azure geo-replication on the storage account.

• Avoid using operating system or temporary disks for database storage or logging.

• Avoid using Azure data disk caching options (caching policy = None).

• Stripe multiple Azure data disks to get increased IO throughput.

• Format with documented allocation sizes.

• Separate data and log file I/O paths to obtain dedicated IOPs for data and log.

• Enable database page compression.

• Enable instant file initialization for data files.

• Limit or disable autogrow on the database.

• Disable autoshrink on the database.

• Move all databases to data disks, including system databases.

• Move SQL Server error log and trace file directories to data disks.

• Apply SQL Server performance fixes.

• Setup default locations.

• Enable locked pages.

• Backup directly to blob storage.

For more information, please follow the guidelines provided in the following sub sections.

Azure Infrastructure Services fundamentals

This section provides an overview of the Azure Infrastructure Services including some key considerations that can have a direct impact on the performance of your SQL Server workloads.

Azure Infrastructure Services lets you access scalable, on-demand infrastructure using Virtual Machines and Virtual Networks. You can use Azure to store data, to create virtual machines for development and test, and to build web applications or run other applications. Before you start reading this article, you should understand the fundamental concepts of Azure, such as cloud services, virtual machines, virtual networks, storage accounts, and so on. For more information, see Azure Documentation.

Azure VM configuration options

Virtual machine size

Azure virtual machines are available in different sizes and resource options in terms of number of CPU cores, memory capacity, maximum disk space, bandwidth, and so on. For the latest information about the supported virtual machine sizes and resource options, see Virtual Machine and Cloud Service Sizes for Azure. We recommend that you review all these virtual machine size options and choose the ones best suited for your SQL Server workloads.

Network bandwidth

The network traffic includes all traffic between client applications and SQL Server in Azure VM and, any other communication that involves a SQL Server process, or other processes in a virtual machine, such as ETL packages or backup and restore operations. We recommend that you choose a larger virtual machine size with more network bandwidth.

Important note: Network communications generated by accessing data disks and the operating system disk attached to the virtual machine are not considered part of the bandwidth limits.

Disk types and configurations

Azure Virtual Machines provide three types of disks:

  • Operating system disk (persistent): Every virtual machine has one attached operating system disk (C: drive) that has a limit of 127 GB. You can upload a virtual hard disk (VHD) that can be used as an operating system disk, or you can create a virtual machine from a platform image, in which case an operating system disk is created for you. An operating system disk contains a bootable operating system. It is mostly dedicated to operating system usage and its performance is optimized for operating system access patterns such as boot up.
  • Data disk (persistent): A data disk is a VHD that you can attach to a virtual machine to store application data. Currently, the largest single data disk is 1 terabyte (TB) in size. You can specify the disk size and add more data disks (up to 16, depending upon virtual machine size). This is useful when the expected database size exceeds the 1 TB limit or the throughput is higher than what one data disk can provide.
  • Temporary local disk (non-persistent): Each virtual machine that you create has a temporary local disk, the D: drive and which is labeled as TEMPORARY STORAGE. This disk is used by applications and processes that run in the virtual machine for transient storage of data. It is also used to store page files for the operating system. Temporary disk volumes are hosted in the local disks on the physical machine that runs your virtual machine (VM). Temporary disk volumes are not persistent. In other words, any data on them may be lost if your virtual machine is restarted. Temporary disk volumes are shared across all other VMs running on the same host.

Important note: Proper configuration of your Azure disks (operating system and data) is one of the most important areas to focus on when optimizing the performance of your SQL Server workloads. Both operating system disks and data disks are stored as VHD files in the individual page blobs hosted in the Azure Storage Blob Service. Each blob has its own capacity limits based on the intrinsic storage architecture. For more information, see the following blog posts:

Each data disk can be a maximum of 1 TB in size. Depending upon your virtual machine size, you can attach up to 16 data disks to your virtual machine. I/Os per second (IOPS) and bandwidth are the most important factors to consider when determining how many data disks you need.

Storage capacity planning is similar to the traditional ‘spindle’ count in an on-premises storage design. In this article, we discuss various approaches you can use to spread I/O operations across one or more data disks, and discuss how to use SQL Server performance features, such as compression, to maximize the performance of your SQL Server workloads.

Disk cache settings in Azure virtual machines

In Azure virtual machines, data of persisted drives is cached locally to the host machine, thus bringing the data closer to the virtual machine. Azure disks (operating system and data) use a two-tier cache:

  • Frequently accessed data is stored in the memory (RAM) of the host machine.
  • Less recently accessed data is stored on the local hard disks of the host machine. There is cache space reserved for each virtual machine operating system and data disks based on the virtual machine size.

Cache settings help reduce the number of transactions against Azure Storage and can reduce disk I/O latency in some scenarios.

Azure disks support three different cache settings: Read Only, Read Write, and None (disabled). There is no configuration available to control the size of the cache.

  • Read Only: Reads and writes are cached for future reads. All writes are persisted directly to Azure Storage to prevent data loss or data corruption while still enabling read cache.
  • Read Write: Reads and writes are cached for future reads. Non-write-through writes are persisted to the local cache first, then lazily flushed to the Azure Blob service. For SQL Server, writes are always persisted to Azure Storage because it uses write-through. This cache setting offers the low disk latency for light workloads. It is recommended for the operating system and data disks with sporadic disk access. As the physical local disks are shared by all tenants on the host, you may observe increased latency when the total workload exceeds the performance of the physical host local disks.
  • None (disabled): Requests bypass the cache completely. All disk transfers are completed against Azure Storage. This cache setting prevents the physical host local disks from becoming a bottleneck. It offers the highest I/O rate for I/O intensive workloads. Note that I/O operations to Azure Storage do incur transaction costs but I/O operations to the local cache do not.

The following table demonstrates the supported disk cache modes:

Disk typeRead OnlyRead WriteNone (disabled)
Operating system diskSupportedDefault modeNot supported
Data diskSupportedSupportedDefault mode

Note that temporary disks are not included in this table because they are not Azure disks, but are implemented using local attached storage.

Important notes:

  • Cache can be enabled for up to 4 data disks per virtual machine (VM).
  • Changes to caching require a VM restart to take effect.

For specific recommendations and best practices for cache settings and performance behaviors, see the Azure virtual machine disks and cache settings section in this article.

Planning a virtual machine configuration in Azure for optimal SQL Server performance

Before you migrate your SQL Server workloads to Azure Infrastructure Services, consider the hardware, design, and architectural differences between your on-premises environment and Azure Infrastructure Services. Knowing these differences will help you identify which workloads are appropriate for this environment. The Microsoft Assessment and Planning (MAP) Toolkit can help you assess your current IT infrastructure for various platform migrations, including Azure Virtual Machines. You should also consider adopting an approach that is well suited to measure the performance of your SQL Server workloads in Azure Infrastructure Services.

Different performance characteristics and considerations for major sub-systems between the cloud and on-premises

I/O sub-system

Storage optimization for SQL Server transactional and analytical workloads is a very important task that requires careful planning and analysis. There is already a tremendous amount of information that explains how to deal with I/O subsystems with different performance characteristics in a traditional on-premises environment, such as spindles, host bus adapters (HBAs), disk controllers, and so on. For more information, see Analyzing I/O Characteristics and Sizing Storage Systems for SQL Server Database Applications.

Azure disks are implemented as a service, so they do not offer the same range of complex configuration options available in traditional on-premises I/O subsystems. This has both benefits and costs. For instance, Azure disks offer built-in local redundancy and optional geo-redundancy for disaster recovery through the use of replicas. To achieve the same level of redundancy in on-premises deployments, you would need to set up multiple disk arrays in multiple locations and a synchronization mechanism, such as, a storage area network (SAN) replication. On the other hand, the Azure disk performance is not as predictable as on-premises disk I/O subsystem due to several factors:

  • Azure Infrastructure Services is a shared, multi-tenant service. Resources like host machines, storage services and network bandwidth are shared among multiple subscribers.
  • Performance may vary depending upon where and when you provision your virtual machines due to a variety of factors including differences in hardware. Your virtual machine may get moved to a different host due to a hardware replacement necessitated by a failure or lifecycle refresh.
  • The performance and availability of your virtual machines may be impacted (positively or negatively) by maintenance operations such as platform upgrades or performance and reliability fixes.
  • When you use cloud-based storage options in Azure, you sacrifice granular control and deep performance optimization options for lower costs, simplicity and out-of-the-box redundancy.
  • Azure disks are connected to virtual machines via a network infrastructure and that can introduce higher network latency compared to the local attached disks in on-premises environment.

For a detailed discussion on different storage configurations, see the Best practices and recommendations for optimizing SQL Server performance in Azure VMs section.

CPU and memory

Although Azure Storage introduces most of the differences between on-premises and Azure deployments, other system components, such as CPU and memory, need to be considered as well when you evaluate performance. In Azure, the only configurable option for CPU is the number of CPU cores assigned to a virtual machine deployment, currently from one shared (A0) to sixteen dedicated in A9 in the Standard Tier, which is subject to increase or change in the future. These CPU cores may not be the same as the SKUs customers can find in expensive on-premises servers. This can lead to significant performance differences in CPU-bound workload and needs to be taken into account during testing and baselining.

For memory, the current offering spans from 768 MB in A0 to 112GB in an A9 Compute Intensive Instance in the Standard Tier. Again, when you compare the performance characteristics of an existing on-premises application, you need to consider the right virtual machine size and its other options to avoid any performance impact due to inadequate memory sizing for buffer pools or other internal memory structures.

Network

Network latency in Azure Infrastructure Services can be higher than that of a traditional on-premises environment, due to several reasons, such as virtualization, security, load balancers, and so on. This means that reducing network round trips between application layers in a cloud solution has a strong positive impact on the performance when compared to on-premises solutions.

For “chatty” applications, where communications between application layers and components are frequent, we recommend that you consolidate multiple application layers on the same virtual machine. This reduces the number of tiers and the amount of communications that your application needs resulting in better performance.

When this is not possible or even counterproductive, consider the following recommendations to optimize network communication:

Raw storage performance testing

To achieve optimal results in any complex storage environment, you need to follow a systematic, data- driven methodology. This section summarizes the approach that we use internally within Microsoft to develop best practices on how to configure Azure Infrastructure Services and SQL Server for various types of workloads.

To perform I/O benchmarking tests on different Azure disk configurations, we use the Microsoft SQLIO Disk Subsystem Benchmark Tool and analyze the various performance metrics as described in the Analyzing I/O Characteristics and Sizing Storage Systems for SQL Server Database Applications article through the various performance counters and dynamic management views (DMVs). The primary goal of these tests is to determine the I/O capacity of various configurations. Then, we compare these configurations with different SQL Server workloads (for example, OLTP, point lookup queries, analytical queries with aggregations) and I/O patterns. Our goal is to verify which configuration is most appropriate for each type of workload. The Best practices and recommendations for optimizing SQL Server performance in Azure VMs section contains our findings and recommendations based on these results.

Note that precise I/O performance depends on a number of factors including data center capacity and network utilization. So, you should consider these generalized findings as subject to change. We hope that they will be useful for capacity planning as you consider which SQL Server workloads you want to migrate to Azure environment.

We recommend that you also run your own validation tests. You can see a detailed example of reusable scripts in the Appendix.

Best practices and recommendations for optimizing SQL Server performance in Azure VMs

Many of the same techniques used to optimize SQL Server performance in your on-premises environment can be used to tune your SQL Server workloads in Azure Infrastructure Services. Having said that, running your database workload in a hosted multi-tenant cloud service like Azure Infrastructure Services is fundamentally different and if you want to be successful you will need to consider some new best practices. This section provides new best practices and recommendations for optimizing SQL Server performance in Azure Infrastructure Services.

Virtual machine sizes

In general, smaller VM sizes are best suited for lightweight development and test systems or applications. For production and intensive workload deployments, bigger virtual machine sizes (such as A3 or high memory instances) are often a better choice because they can provide more capacity in terms of virtual cores, memory and data disks.

For SQL Server production workloads, we recommend that you use minimum Standard Tier A2 VM sizes or bigger instances. Starting with May 2014, new VM sizes (A8 and A9) have been introduced sporting faster Intel Xeon processor and increased memory sizes. Based on various performance tests, these VMs provide important benefits in terms of CPU performance, IO throughput and bandwidth. If you plan to run very high SQL Server workloads in Azure Virtual Machines, we recommend that you consider these new VM sizes.

Azure virtual machine disks and cache settings

Azure Virtual Machines provide three types of disks: operating system (OS) disk, temporary disk, and data disks. For a description of each disk type, see section Azure Infrastructure services fundamentals in this article.

Operating system disk vs. data disk

When placing your data and log files you should consider disk cache settings in addition to size limits. For a description of cache settings, see section Azure Infrastructure services fundamentals in this article.

While “Read Write” cache (default setting) for the operating system disk helps improve the overall operating system performance, boot times and reducing the read latency for the IO patterns that OS usually generates, we recommend that you do not use OS disk for hosting system and user database files. Instead, we recommend that you use data disks. When the workload demands a high rate of random I/Os (such as a SQL Server OLTP workload) and throughput is important to you, the general guideline is to keep the cache set to the default value of “None” (disabled). Because Azure storage is capable of more IOPS than a direct attached storage disk, this setting causes the physical host local disks to be bypassed, therefore providing the highest I/O rate.

Temporary disk

Unlike Azure disks (operating system and data disks) which are essentially VHDs stored as page blobs in Azure Storage, the temporary disk (labeled as D:) is not persistent and is not implemented using Azure Storage. It is reserved by the operating system for the page file and its performance is not guaranteed to be predictable. Any data stored on it may be lost after your virtual machine is restarted or resized. Hence, we do not recommend the D: drive for storing any user or system database files, including tempdb.

Data disks performance options and considerations

This section discusses the best practices and recommendations on data disk performance options based on testing done by Microsoft. You should be familiar with how SQL Server I/O operations work in order to interpret the test results reported in this section. For more information, see Pages and Extents Architecture.

It is important to note that the results we provide in this section were achieved without SQL Server High Availability and Disaster Recovery Solutions enabled (such as, AlwaysOn Availability Groups, database mirroring or log shipping). We recommend that you deploy one of these features to maintain multiple redundant copies of your databases across at least two virtual machines in an availability set in order to be covered by the Azure Cloud Services, Virtual Machines, and Virtual Network Service Level Agreement. Enabling any of these features affects performance, so you should consider incorporating one of them in your own performance testing to get more accurate results.

As a general rule, we recommend that you attach maximum number of disks allowed by the VM size (such as, 16 data disks for an A7 VM) for throughput sensitive applications. While latency may not necessarily improve by adding more data disks when your workload is within the maximum IOPS limit, the additional IOPS and bandwidth that you get from the attached additional disks can help to avoid reaching the single disk 500 IOPS limit. Note that this might trigger throttling events that might increase disk response times and disk latency.

Single data disk configuration

In our performance tests, we’ve executed several SQL Server I/O measurements to understand data disk response characteristics with respect to the typical I/O patterns generated by SQL Server based on different kind of workloads. The results for a single disk configuration on an A7 VM instance are summarized here:

Random I/O
(8 KB Pages)
Sequential I/O
(64 KB Extents)
ReadsWritesReadsWrites
IOPS500500500300
Bandwidth4 MB/s4 MB/s30 MB/s20 MB/s

Note: Because Azure Infrastructure Services is a multi-tenant environment, performance results may vary. You should consider these results as an indication of what you can achieve, but not a guarantee. We suggest you repeat these tests and measurements based on your specific workload.

Multiple data disk configuration

If your workload exceeds or is close to the I/O performance numbers mentioned in the previous section, we recommend that you add multiple disks (depending on your virtual machine size) and stripe multiple disks in volumes. This configuration gives you the ability to create volumes with specific throughput and bandwidth, based on your data and log performance needs by combining multiple data disks together.

Adding multiple data disks to Azure virtual machine

After you create a virtual machine in Azure, you can attach a data disk to it using either the Azure Management Portal or the Add-AzureDataDisk Azure PowerShell cmdlet. Both techniques allow you to select an existing data disk from a storage account, or create a new blank data disk.

If you choose to create a new blank data disk at the Management Portal, you can choose the storage account that your virtual machine was created in but not a different storage account.

To place your existing data disk (.vhd file) into a specific storage account, you need to use the Azure PowerShell cmdlets. The following example demonstrates how to update a virtual machine using the Get-AzureVM and the Add-AzureDataDisk cmdlets. The Get-AzureVM cmdlet retrieves information on a specific virtual machine. The Add-AzureDataDisk cmdlet creates a new data disk with specified size and label in a previously created Storage Account.

Get-AzureVM “CloudServiceName” -Name “VMNAme” | Add-AzureDataDisk -CreateNew -DiskSizeInGB 100 -MediaLocation ` “https://<storageaccount>.blob.core.windows.net/vmdisk/Disk1.vhd” -DiskLabel “disk1” -LUN 1 | Update-AzureVM

To create a new storage account, use the New-AzureStorageAccount cmdlet as follows:

New-AzureStorageAccount -StorageAccountName “StorageAccountX” -Label “StorageAccountX” -Location “North Central US”

For more information about Azure PowerShell cmdlets, see Azure PowerShell on MSDN and Azure command line tools.

Disk striping options for Azure Virtual Machines

For Azure VMs running on Windows Server 2008 R2 and previous releases, the only striping technology available is striped volumes for dynamic disks. You can use this option to stripe multiple data disks into volumes that provide more throughput and bandwidth than what a single disk can provide.

Starting with Windows Server 2012, Storage Pools are introduced and operating system software RAID capabilities are deprecated. Storage Pools enable you to virtualize storage by grouping industry-standard disks into “pools”, and then create virtual disks called Storage Spaces from the available capacity in the storage pools. You can then configure these virtual disks to provide striping capabilities across all disks in the pool, combining good performance characteristics. In addition, it enables you to add and remove disk space based on your needs.

During our tests, after adding a number of data disks (4, 8 and 16) as shown in the previous section, we created a new storage pool by using the following Windows PowerShell command:

New-StoragePool –FriendlyName StoragePool1 –StorageSubsystemFriendlyName “Storage Spaces*” –PhysicalDisks (Get-PhysicalDisk –CanPool $True)

Next, we created a virtual disk on top of the new storage pool and specified resiliency setting and virtual disk size.

$disks = Get-StoragePool –FriendlyName StoragePool1 -IsPrimordial $false | Get-PhysicalDisk

New-VirtualDisk –FriendlyName VirtualDisk1 -ResiliencySettingName Simple –NumberOfColumns $disks.Count –UseMaximumSize –Interleave 256KB

Important Note: For performance, it is very important that the –NumberOfColumns parameter is set to the number of disks utilized to create the underlying Storage Pool. Otherwise, IO requests cannot be evenly distributed across all data disks in the pool and you will get suboptimal performance.

The –Interleave parameter enables you to specify the number of bytes written in each underlying data disk in a virtual disk. We recommend that you use 256 KB for all workloads.

Lastly, we created and formatted the volume to make it usable to the operating system and applications by using the following Windows PowerShell commands:

Get-VirtualDisk –FriendlyName VirtualDisk1 | Get-Disk | Initialize-Disk –Passthru | New-Partition –AssignDriveLetter –UseMaximumSize | Format-Volume –AllocationUnitSize 64K

Once the volume created, it is possible to dynamically increase the disk capacity by attaching new data disks. To achieve optimal capacity utilization, consider the number of columns your storage spaces have and add disks in multiples of that number. See Windows Server Storage spaces Frequently Asked Questions for more information.

Using Storage Pools instead of traditional Windows operating system striping in dynamic disks brings several advantages in terms of performance and manageability. We recommend that you use Storage Pools for disk striping in Azure Virtual Machines.

During our internal testing, we have implemented the following scenarios with different number of disks as well as disk volume configurations. We tested the following scenarios with configurations of 4, 8 and 16 data disks respectively, and we observed increased IOPS for each data disk added as expected:

  • We arranged multiple data disks as simple volumes and leveraged the Database Files and Filegroups feature of SQL Server to stripe database files across multiple volumes.
  • We used Windows Server Storage Pools to create larger volumes, which contains multiple data disks, and we placed database and log files inside these volumes.

It’s important to notice that using multiple data disks provides performance benefits but it creates more management overhead. In addition, partial unavailability of one of the striped disks can result in unavailability of a database. Therefore, for such configurations, we recommend that you consider enhancing the availability of your databases using high availability and disaster recovery capabilities of SQL Server as described in High Availability and Disaster Recovery for SQL Server in Azure Virtual Machines.

The following tables summarize the results of tests that we performed using multiple data disks configurations at Microsoft.

Aggregated throughput and bandwidth across 4 data disks

Random I/O
(8 KB Pages)
Sequential I/O
(64 KB Extents)
ReadsWritesReadsWrites
IOPS2000200016001200
Bandwidth16 MB/s16 MB/s100 MB/s75 MB/s

Aggregated throughput and bandwidth across 8 data disks

Random I/O
(8 KB Pages)
Sequential I/O
(64 KB Extents)
ReadsWritesReadsWrites
IOPS4000400024002400
Bandwidth30 MB/s30 MB/s150 MB/s150 MB/s

Aggregated throughput and bandwidth across 16 data disks

Random I/O
(8 KB Pages)
Sequential I/O
(64 KB Extents)
ReadsWritesReadsWrites
IOPS8000800024004000
Bandwidth60 MB/s60 MB/s150 MB/s250 MB/s

Note: Because Azure Infrastructure Services is a shared, multi-tenant environment, performance results may vary. You should consider these results as an indication of what you can achieve, but not a guarantee. We recommend that you repeat these tests and measurements based on your specific workload.

By using the newly introduced Intel-based A8 and A9 VM sizes, we repeated our IO performance tests and noticed a significant increase in bandwidth and throughput for larger sequential IO requests. If you use Intel-based A8 and A9 VM sizes, you can get a performance increase for 64 KB (and bigger) read and write operations. If your workload is IO intensive, these new VM sizes (A8 and A9) can help in achieving more linear scalability compare to smaller VM sizes, but always within the 500 IOPs per disk boundaries. For more information, see About the A8 and A9 Compute Intensive Instances.

Based on our tests, we have made the following observations about the Azure Virtual Machine environment:

  • Spreading your I/O workload across a number of data disks benefits smaller random operations (more common in OLTP scenarios) where IOPS and bandwidth scale in a nearly linear fashion.
  • As the I/O block size increases, for read operations adding more data disks does not result in higher IOPS or bandwidth. This means that if your workload is read intensive with more analytical queries, adding more disks will not necessarily help.
  • For write intensive workload, adding more data disks can increase performance in a nearly linear fashion. This means that you can benefit from placing each transaction log for multiple databases on a separate data disk.
  • For large sequential I/O block sizes (such as, 64 KB or greater), writes generally perform better than reads.
  • A8 and A9 VM sizes provide increased throughput for IO sensitive workloads.

Placement of database files

Depending on how you configure your storage, you should place and the data and log files for user and system databases accordingly to achieve your performance goals. This section provides guidance on how you should place database files when using SQL Server in Azure Virtual Machines:

  • Option 1: You can create a single striped volume using Windows Server Storage Spaces leveraging multiple data disks, and place all database and log files in this volume. In this scenario, all your database workload shares aggregated I/O throughput and bandwidth provided by these multiple disks, and you simplify the placement of database files. Individual database workloads are load balanced across all available disks, and you do not need to worry about single database spikes or workload distribution. You can find the graphical representation of this configuration below:

  • Option 2: You can create multiple striped volumes, each composed by the number of data disks required to achieve specific I/O performance, and do a careful placement of user and system database files on these volumes accordingly. You may have one important production database with a write-intensive workload that has high priority, and you may want to maximize the database and log file throughput by segregating them on two separate 4 disk volumes (each volume providing around 2000 IOPs and 100 MB/sec). For example, use:
  • 4-disks volume for hosting TempDB data and log files.
  • 4-disks volume for hosting other minor databases.

This option can give you precise file placement by optimizing available IO performance. You can find the graphical representation of this configuration below:

You can still create single disk volumes and leverage SQL Server files and filegroups placement for your databases. While this can still offer some benefits in terms of flexible storage layout organization, it introduces additional complexity and also limits single file (data or log) IO performance to a value that a single Azure data disk can provide such as 500 IOPs and 60 MB/sec.

Although Azure data disks have different behaviors than traditional rotating spindles (,in which competing random and sequential operations on the same disks can impact performance), we still recommend that you keep data and log files in different paths to achieve dedicated IOPs and bandwidth for them.

To help understand your IO requirements and performance while running your SQL Server workloads on Azure Virtual Machines, you need to analyze the following three tools and combine the results carefully:

  • SQL Server IO statistics: They reflect the database management system view of the IO subsystem.
  • Windows Server Logical Disk Performance Counters: They show how the operating system performs on IOs.
  • Azure Storage Analytics: Azure hosts data disks’ VHD files in Azure Storage. You can turn on logging and metrics for the storage account that hosts your data disks, and get useful information such as the number of successful and failed requests, timeout, throttling, network, authorization, and other errors. You can configure and get data from these metrics on the Azure Portal, or via PowerShell, REST APIs, and .NET Storage Client library.

By leveraging all these information, you can understand:

  • If your IO related stalls or wait types in SQL Server (manifesting as increased disk response times in OS Perf Counters) are related to throttling events happening in Azure Storage. And,
  • If rebalancing your data and log files across different volumes (and underlying disks) can help maintaining throughput and bandwidth between storage performance limits.

TempDB

As mentioned in section Azure virtual machine disks and cache settings, we recommend that you place tempDB on data disks instead of the temporary disk (D:). Following are the three primary reasons for this recommendation based on our internal testing with SQL Server test workloads.

  • Performance variance: In our testing, we noticed that you can get the same level of performance you get on D:, if not more IOPS, from a single data disk. However, the performance of D: drive is not guaranteed to be as predictable as the operating system or data disk. This is because the size of the D: drive and the performance you get from it depends on the size of the virtual machine you use, and the underlying physical disks shared between all VMs hosted by the same server.
  • Configuration upon VM downtime situation: If the virtual machine gets shutdown down (due to planned or unplanned reasons), in order for SQL Server to recreate the tempDB under the D: drive, the service account under which SQL Server service is started needs to have local administrator privileges. In addition, the common practice with on-premises SQL deployments is to keep database and log files (including tempDB) in a separate folder, in which case the folder needs to be created before SQL Server starts. For most customers, this extra re-configuration overhead is not worth the return.
  • Performance bottleneck: If you place tempdb on D: drive and your application workloads use tempDB heavily, this can cause performance bottleneck because the D: drive can introduce constraints in terms of IOPS throughput. Instead, place tempDB on data disks to gain more flexibility. For more information on configuration best practices for optimizing tempdb, see Compilation of SQL Server TempDB IO Best Practices.

We strongly recommend that you perform your own workload testing before implementing a desired SQL Server file layout strategy.

Effects of warm-up on data disks

With Azure disks, we have observed a “warm-up effect” that can result in a reduced rate of throughput and bandwidth for a short period of time. In situations where a data disk is not accessed for a period of time (approximately 20 minutes), adaptive partitioning and load balancing mechanisms kick in. If the disk is accessed while these algorithms are active, you may notice some degradation in throughput and bandwidth for a short period of time (approximately 10 minutes), after which they return to their normal levels. This warm-up effect happens because of the adaptive partitioning and load balancing mechanism of Azure, which dynamically adjusts to workload changes in a multi-tenant storage environment. You may observe similar effects in other widely known cloud storage systems as well. For more information, see Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency.

This warm-up effect is unlikely to be noticed for systems that are in continuous use. But we recommend you consider it during performance testing or when accessing systems that have been inactive for a while.

Single vs. multiple storage accounts for data disks attached to a single VM

To simplify management and reduce potential risks of consistency in case of failures, we recommend that you leave all the data disks attached to a single virtual machine in the same storage account. Storage accounts are implemented as a recovery unit in case of failures. So, keeping all the disks in the same account makes the recovery operations simple. There is no performance improvement if you store data disks attached to a single VM in multiple storage accounts. If you have multiple VMs, we recommend that you consider the storage account limits for throughput and bandwidth during capacity planning. In addition, distribute VMs and their data disks to multiple storage accounts if the aggregated throughput or bandwidth is higher than what a single storage account can provide. For information on storage account limits, see Azure Storage Scalability and Performance Targets. For information on max IOPS per disk, see Virtual Machine and Cloud Service Sizes for Azure.

NTFS allocation unit size

NTFS volumes use a default cluster size of 4 KB. Based on our performance tests, we recommend changing the default cluster size to 64 KB during volume creation for both single disk and multiple disks (storage spaces) volumes.

Data compression for I/O bound workloads

Some I/O intensive workloads can gain performance benefits through data compression. Compressed tables and indexes means more data stored in fewer pages, and hence require reading fewer pages from disk, which in turn can improve the performance of workloads that are I/O intensive.

For a data warehouse workload running on SQL Server in Azure VM, we found significant improvement in query performance by using page compression on tables and indexes, as shown in Figure 1.

Figure 1: Query Performance with Data Compression

Figure 1 compares performance of one query with no compression (NONE) and page compression (PAGE). As illustrated, the logical and physical reads are significantly reduced with page compression, and so is the elapsed time. As expected, CPU time of the query does go up with page compression, because SQL Server needs to decompress the data while returning results to the query. Your results will vary, depending upon your workload.

For an OLTP workload, we observed significant improvements in throughput (as measured by business transactions per second) by using page compression on selected tables and indexes that were involved in the I/O intensive workload. Figure 2 compares the throughput and CPU usage for the OLTP workload with and without page compression.

Figure 2: OLTP Throughput and CPU Usage with Data Compression

Note that you may see different results when you test your workloads in Azure Virtual Machine environment. But we recommend that you test data compression techniques for I/O intensive workloads and then decide which tables and indexes to compress. For more information, see Data Compression: Strategy, Capacity Planning and Best Practices.

Restore performance – instant file initialization

For databases of any significant size, enabling instant file initialization can improve the performance of some operations involving database files, such as creating a database or restoring a database, adding files to a database or extending the size of an existing file, autogrow, and so on. For information, see How and Why to Enable Instant File Initialization.

To take advantage of instant file initialization, you grant the SQL Server (MSSQLSERVER) service account with SE_MANAGE_VOLUME_NAME and add it to the Perform Volume Maintenance Tasks security policy. If you are using a SQL Server platform image for Azure, the default service account (NT Service\MSSQLSERVER) isn’t added to the Perform Volume Maintenance Tasks security policy. In other words, instant file initialization is not enabled in a SQL Server Azure platform image.

After adding the SQL Server service account to the Perform Volume Maintenance Tasks security policy, restart the SQL Server service.

The following figure illustrates observed test results for creating and restoring a 100 GB database with and without instant file initialization.

Figure 3: Performance Impact of Instant File Initialization

For more information, see Database File Initialization.

Other existing best practices

Many of the best practices when running SQL Server on premises are still relevant in Azure Virtual Machines, including:

  • Limit or disable autogrow on the database: Autogrow is considered to be merely a contingency for unexpected growth. Do not manage your data and log growth on a day-to-day basis with autogrow. If autogrow is used, pre-grow the file using the Size switch.
  • Disable autoshrink on the database: Make sure autoshrink is disabled to avoid unnecessary overhead that can negatively affect performance. For more information about autogrow and autoshrink, see Considerations for the “autogrow” and “autoshrink” settings in SQL Server.
  • Establish locked pages to reduce IO and any paging activities: Lock pages in memory is a Windows policy that determines, which account can use a process to keep memory allocations pinned in physical memory. It prevents the system from paging the data to virtual memory on disk. When the SQL Server service account is granted this user right, buffer pool memory cannot be paged out by Windows. For more information about enabling the Lock pages in memory user right, see How to: Enable the Lock Pages in Memory Option (Windows).

Performance troubleshooting fundamental concepts

This section provides an overview of how you troubleshoot SQL Server when running on a Azure virtual machine.

Traditional factors that govern performance in SQL Server

Performance analysis in SQL Server is well documented. The Troubleshooting Performance Problems in SQL Server 2008 article provides comprehensive information on this subject. You can also find other advanced performance-related blogs and articles on topics such as latch and spinlock analysis at SQLCAT.com. Here, let’s summarize the main performance factors:

  • Plan change/plan choice issues:
    • SQL Server Query Optimizer searches and chooses the optimal plan that can improve the system performance. The query optimizer’s choices are usually correct, but there are cases where it can be suboptimal. This can be caused by a number of factors, including out-of-date input data (such as statistics), poor index coverage and, in some cases, issues with the optimizer model or functionality. Query tuning and appropriate index creation and management can help you tune your I/O, CPU, and memory usage and optimize your workload performance.
    • As discussed previously, using SQL Server in Azure virtual machine has also other performance implications for the performance of your workload. Due to its multi-tenant nature and various capping and throttling mechanisms, you might notice different I/O performance results over time. That’s why we recommend that you minimize the number of reads and writes that your query requires by tuning the query plan, and also apply SQL Server performance optimization techniques, such as compression. For more information about the Optimizer, see the Optimizer chapter in the book SQL Server 2008 Internals or the upcoming SQL Server 2012 Internals.
  • Improperly configured software or hardware: One common cause of I/O bottleneck is when I/O intensive files, such as data and log files for a high throughput database, are placed on the operating system disk drive. As a general rule, we recommend that you should add more data disks and balance your data and log files according to your IOPS requirements when using SQL Server in Azure Virtual Machines.
  • Locking and latching: Excessive locking and latching can be caused by plan choice, or concurrency volumes due to overall system throughput; or certain schema and workload patterns. Locking is used to allow multiple users to make consistent changes to the same data in a controlled manner. Latching is used to adjudicate multi-user access to structures within SQL Server. Latching is somewhat orthogonal to locking; both issues can apply and the core resolution is often similar. If there is a hot latch or lock, the typical answer is to change the calling pattern (either by forcing different plan shapes or rearranging the calling code) to mitigate the impact of the blocking latch or lock on overall throughput. If you observe high PAGEIOLATCH waits on your system, this means that a user requests a page that is not in the buffer pool and I/O system is slow to respond. In this case, we recommend that you spread your I/O workload across more data disks or gain more memory by increasing the virtual machine instance size to improve the performance within Azure virtual machine.
  • Multi-user operations and system resource usage: Some user operations may not run, or they may run at a reduced level of performance because of insufficient system resources. Some type of maintenance and schema management operations can increase resource utilization. For example, index rebuild operations can cause increased I/O workload on your system. You may think that your application’s queries or operations are taking longer than usual. But the internal cause is pressure on one or more resources. We recommend that you consider the resource usage of both the user application’s queries and the maintenance tasks in your capacity planning. Especially for I/O intensive operations, we recommend that you choose your virtual machine size appropriately and plan your maintenance windows to avoid your application’s peaks periods.
  • Checkpoints and system operations: Flushing I/Os to disk during the checkpoint process can cause a spike in I/O operations. This might slow the performance of queries and impact throughput. It is important to determine your I/O workload and test for a period longer than the checkpoint frequency.

Factors that govern performance for SQL Server in Azure Virtual Machines

Although the SQL Server binaries running in a traditional on-premises and Microsoft Azure virtual machine environments are identical, there are some infrastructure differences that affect the way that applications perform in SQL Server in Azure Virtual Machines compared to an on-premises dedicated enterprise server. An analogy that can be used to compare the experience of running SQL Server in Azure virtual machine is an on-premises virtualized SQL Server environment with dedicated virtualized resources (such as processor and memory) but no Hyper-V over-commit of memory or processors. In addition, the I/O performance varies in Microsoft Azure storage as it is a managed shared disk environment and performance at a point in time depends on other tenants’ activity and overall system load; this can happen in shared on-premises storage environments as well.

All the traditional SQL Server performance factors described in the previous section are still applicable when running SQL Server in Azure virtual machine environment. The following table summarizes most common resource bottleneck issues and provides a list of actions that you can take to resolve them.

IssueKey performance indicators (KPIs) to monitorActions to consider
SQL Server CPU at or near 80%% Processor Time (_Total)

SOS_SCHEDULER_YIELD waits

  • Increase number of CPUs by increasing your SQL Server virtual machine instance size (if possible).
  • Identify top consuming queries and tune.
  • Split out workload (for example, move a database off the SQL Server instance).
Near I/O capacity limits or

I/O latency increases

Average disk reads per second

Average disk writes per second

Disk reads per second

Disk writes per second

io_virtual_file_stats

PAGEIOLATCH waits

SQL Server: Buffer Manager\Page Life Expectancy

  • Check Page Life Expectancy counter value is low (<300 seconds). This can indicate memory pressure on the system is causing increased disk IO. Consider increasing instance size (if possible).
  • Identify which database and log files have I/O bottleneck.
  • Add more data disks and separate data files if you are at or near IOPS limits per disk.

Note: This can apply to any user created or tempdb databases.

  • Tune queries to reduce reads and writes.
  • Consider enabling row or page compression to reduce the number of I/Os.
Memory resource pressureMemory: Available Bytes

Memory: Pages per second

SQL Server: Buffer Manager\Page Life Expectancy

Process: Working Set (for SQL Server)

RESOURCE_SEMAPHORE waits

  • Check max server memory setting for SQL Server.
  • Increase memory by Increasing instance size or use high memory instance if possible.
  • Check which component of SQL Server utilizes memory (such as, CLR, high memory grants for application queries, and so on) and tune appropriately.

As described in the table, you can resolve performance issues by following different approaches and actions. In traditional on-premises environment, you might prefer adding or purchasing more hardware resources to alleviate performance e problems. In Azure environment, the resources that are available per machine are smaller in number and less powerful than the typical on-premises enterprise servers. For example, while adding a data disk may increase disk throughput by 25 percent; tuning query workloads might reduce the overall I/O requirement by 90 percent in some cases. Therefore we recommend that you always follow a systematic approach that involves analyzing, tuning and redesigning to achieve better performance results.

Optimizing your application for Azure Virtual Machine environment will provide valuable cost benefits because you can achieve a higher density per unit of compute. Unlike the traditional on-premises environment, Azure allows you reduce the number and size virtual machines immediately to reduce the operational costs. In addition, you can dynamically re-balance the size of machines based on seasonal usage peaks.

It is important that you develop a systematic monitoring and troubleshooting methodology to effectively run your SQL Server in Azure Virtual Machine environment.

Performance monitoring methodology for SQL Server in Azure Virtual Machine

This section explains the approaches that can be used to identify and isolate performance issues while running SQL Server in Azure Virtual Machine environment:

  • Define key performance indicators (KPIs) to monitor resource utilization: We recommend that you define KPIs for SQL Server and each important application tier. These should include Windows Performance Monitor (Performance Monitor) counters for your application’s important tiers and components as well as SQL Server. In addition, you should monitor SQL Server’s performance related dynamic management views (DMVs) to identify the underlying causes of performance problems. We recommend that you define five to eight KPIs for each major application entity or component, such as SQL Server, application-related counters, and caching component counters. For more information, see the Guidance on defining key performance indicators section in the Appendix.
  • Monitor your KPIs to track resource utilization: We recommend that you track and monitor KPIs by using tools such as Performance Monitor and SQL Server Management Studio. If application requests increase by a certain percent capacity, the proportional increase in the underlying system resources depends on how uniform the workload is. Make sure you track a latency metric, particularly for web applications running on SQL Server in Azure virtual machine.
  • Examine trends and patterns to identify issues as the workload increases: As the number of users that are using your application increases, some system resources (such as processor, I/O, memory, and network) might become under pressure. For example, if the SQL Server workload reaches to its sustainable I/O limits, it becomes I/O bound. When this happens, the latency of the I/O subsystem increases first. Therefore, you may notice a corresponding increase on individual query execution times, which increase latency for end users. Secondly, throughput, such as the number of concurrent user queries that SQL Server can support, begins to level-off or decrease. Finally, if the pressure on a specific resource increases, you may notice unavailability issues, such as query and application timeouts. By monitoring Performance Monitor and the sys.dm_os_wait_stats DMV, you can identify the potential performance problems before end users notice them. For example, this scenario would initially cause an increase in disk response time as measured by the logical disk performance counters, and an increase in the number of PAGEIOLATCH or log related waits that SQL Server provides.
  • Monitor DMVs to determine what resources your application is competing and waiting for:
    • Monitor and identify instance level waits by using sys.dm_os_wait_stats. Make sure to review the total wait time as a percentage of the total. The following table describes the wait profile that are indicative of Page Latch contention.

Other common profiles depend on the predominant wait type and include: SOS_SCHEDULER_YIELD, which indicates that the operation is waiting on the CPU to become free; long PAGEIOLATCH waits, which indicate that SQL Server is generating I/O requests faster than the I/O system can process them; and RESOURCE_SEMAPHORE, which indicates memory pressure on memory. See the Snapshot wait stats script provided in the Appendix. It demonstrates how to automate the calculation of waits during a time interval. This is useful for identifying the dominant underlying wait in SQL Server when symptoms appear with your workload.

    • Monitor query resource consumers by using sys.dm_exec_query_stats to identify top resource consumers and monitor query execution and efficiency
    • Monitor I/O consumptions and characteristics by using sys.dm_io_virtual_file_stats and the logical disk Performance Monitor counters. The following table summarizes the key performance counters to monitor.
Logical disk counterTypical storage termSuggested actions in

Microsoft Azure virtual machine environment

Disk reads / second

Disk writes / second

IOPSMeasure the number of I/O’s per second.

Consider adding more data disks in line with your IOPS requirements.

Average disk sec / read

Average disk sec / write

LatencyMeasure disk latency.

Note: Numbers might vary; look at averages over time.

Average disk bytes / read

Average disk bytes / write

Block sizeMeasure the size of I/O’s being issued.

Note: Larger I/O’s tend to have higher latency, such as those associated with BACKUP/RESTORE.

Average / current disk queue lengthOutstanding or waiting IOPSProvides insight into the applications I/O pattern.
Disk read bytes/sec

Disk write bytes/sec

Throughput or aggregate throughputMeasure of total disk throughput.

Note: Ideally, larger block scans should be able to heavily utilize connection bandwidth (for example, your throughput can be higher with a smaller number of larger IOPS).

    • Take snapshots of currently executing SQL Server requests by using sys.dm_exec_requests to check for locking, blocking, latching and other performance issues caused by resource contention related performance issues.
    • Monitor the application and SQL Server event logs to identify errors.
  • Use a delta approach to monitor beyond DMVs: Some DMVs provide cumulative information from the last time that the SQL Server process was started, such as sys.dm_os_wait_stats. Others contain a snapshot from a point in time, such as sys.dm_exec_requests. The performance of a query is affected by factors including plan selection and resource availability. An effective approach that works in on-premises and Azure Virtual Machine environments is to combine the usage of sys.dm_os_wait_stats and sys.dm_exec_query_stats. This helps you to understand the query performance and resource constraints, which can be inferred from the system-wait information. To identify locking and blocking issues, you should routinely monitor active request information using sys.dm_exec_requests. In summary, we recommend that you:
    • Take a regular periodic snapshots of query stats, wait stats, and exec requests.
    • Calculate the delta between two snapshots to understand what happened during that period. You can use the sample script Snapshot wait stats in the Appendix for this purpose.
  • Monitor Spinlock and Backoff events: Spinlocks are used to synchronize access to key memory regions that SQL Server uses for internal constructs. Use sys.dm_os_spinlock_stats to monitor the number of spins. For more information, see Diagnosing and Resolving Spinlock Contention on SQL Server article.

Appendix

The scripts and guidance covered in this section can be used to facilitate the information given in this article:

  • Guidance on defining key performance indicators (KPIs): Provides information on KPIs that you can use to monitor application tier performance and user characteristics while running SQL Server in Azure.
  • Raw storage performance testing scripts: Discusses the usage of SQLIO.
  • Snapshot wait stats script: Demonstrates how to automate the calculation of waits during a time interval.
  • Requests executing on the system script: Provides a view of requests executing on SQL Server sorted by total elapsed time, includes the corresponding plan for each statement.
  • Top query statements and plan by total CPU time script: Provides information from Query Stats on the top overall resource consumers.
  • Snapshot spinlock stats script: Uses a temporary table to provide the delta of spinlock information since the last execution of this script.
  • Snapshot I/O stats script: Uses a temporary table to provide the delta of I/O information since the last execution of this script.

Raw storage performance testing scripts

You can run the following SQLIO scripts in Azure Virtual Machine to generate some of the most common I/O patterns that SQL Server utilizes and measure related performance results on different storage configurations.

The following SQLIO test scripts demonstrate testing random 8K reads/writes (a typical OLTP I/O pattern), sequential writes for log files, and large sequential reads and writes for table scans and OLAP workloads.

The script uses the following options:

    • The -k option to specify the I/O operation type (read or write)
    • The -s option to specify the test duration in seconds
    • The –f option to specify the type of I/O access (sequential or random)
    • The –o option to specify the number of outstanding requests
    • The –b option to specify the size of the I/O request in bytesblock size
    • The –LS option to capture the disk latency option
    • The –F option to specify the name of the file which contain the test files to run SQLIO against

Copy and save the following script in a file called exectests.bat.

::Test random 8K reads/writes
sqlio -kW -s300 -frandom -o32 -b8 -LS -Fparam.txt
sqlio -kR -s300 -frandom -o32 -b8 -LS -Fparam.txt
::Test random 32K reads/writes (for example, SQL Server Analysis Services I/O pattern)

sqlio -kW -s300 -frandom -o32 -b32 -LS -Fparamnew.txt

sqlio -kR -s300 -frandom -o32 -b32 -LS -Fparamnew.txt

::Test small sequential writes
sqlio -kW -s180 -fsequential -o1 -b4 -LS -Fparam.txt
sqlio -kW -s180 -fsequential -o1 -b8 -LS -Fparam.txt
sqlio -kW -s180 -fsequential -o1 -b16 -LS -Fparam.txt
sqlio -kW -s180 -fsequential -o1 -b32 -LS -Fparam.txt
sqlio -kW -s180 -fsequential -o1 -b64 -LS -Fparam.txt
::Test large sequential reads/writes
sqlio -kR -s180 -fsequential -o8 –b8 -LS -Fparam.txt
sqlio -kR -s180 -fsequential -o8 -b64 -LS -Fparam.txt
sqlio -kR -s180 -fsequential -o8 -b128 -LS -Fparam.txt
sqlio -kR -s180 -fsequential -o8 -b256 -LS -Fparam.txt
sqlio -kR -s180 -fsequential -o8 –b512 -LS -Fparam.txt
sqlio -kW -s180 -fsequential -o8 –b8 -LS -Fparam.txt
sqlio -kW -s180 -fsequential -o8 -b64 -LS -Fparam.txt
sqlio -kW -s180 -fsequential -o8 -b128 -LS -Fparam.txt
sqlio -kW -s180 -fsequential -o8 -b256 -LS -Fparam.txt
sqlio -kW -s180 -fsequential -o8 –b512 -LS -Fparam.txt

The following is a copy of the param.txt configuration file that we referenced in our test scripts. Basically, the test scripts run a series of tests against the drive or drives specified in the param.txt file. This file should reside in the same directory as SQLIO.exe. The options on each line of the param.txt file are as follows, where 0x0 is a mask value:

PathToTestFile NumberofThreadsPerTestFile 0x0 TestFileSizeinMegaBytes

This param.txt file skips operating system and temporary disks (C: and D:) and tests a 16 data disks volume (F:) (using 1 thread per disk), with a file size of 50 GB.

#c:\testfile.dat 2 0x0 100
#d:\testfile.dat 1 0x0 1000
f:\testfile.dat 16 0x0 50000

Open a Command Prompt window, then run the test batch file as follows.

exectests.bat > results.txt

This operation captures all test results in a text file that can be processed manually or automatically to extract relevant disk performance figures later.

Guidance on defining key performance indicators

We recommend that you define key performance indicators (KPIs) for all of the components in your application scenario. Typically, you can use performance counters to define your KPIs.

This section lists KPIs that we’ve used to monitor application tier performance and user characteristics while running SQL Server in Azure. Note that the KPIs listed in this section are meant to be guidance only. You should define your KPIs based on the characteristics of your application’s components.

Typical SQL Server KPIs:

  • Maximum value for \Process(SQLServ)\% Processor Time
  • Average value for \Process(SQLServ)\% Processor Time
  • Maximum value for \Processor(_Total)\% Processor Time
  • Average value for \Processor(_Total)\% Processor Time
  • Maximum value for \SQLServer:SQL Statistics\Batch Requests/sec
  • Average value for \SQLServer:SQL Statistics\Batch Requests/sec

If your workload is likely to be I/O bound, you should also add the Logical Disk Performance Monitor counters referenced earlier. If you use additional SQL Server features, such as AlwaysOn, it is recommended that you add the appropriate performance monitor counters.

Typical web application tier KPIs:

  • Maximum value for \ASP.NET Applications (_Total_)\Requests/sec
  • Average value for \ASP.NET Applications (_Total_)\Requests/sec
  • Average value for \Memory\Available Mbytes
  • Maximum value for \Processor(_Total)\% Processor Time
  • Average value for \Processor(_Total)\% Processor Time
  • Average value for \ASP.NET\Request Wait Time
  • Average value for \ASP.NET\Request Execution Time
  • Average value for \ASP.NET\Requests Queued
  • Average value for \ASP.NET\Requests Rejected
  • Average value for \ASP.NET\Requests Current

Typical user/test characteristics KPI:

  • Number of concurrent users
  • Average/Max request execution time
  • Number of web servers
  • Ramp up period, test method
  • Start and end time of test

How to use KPIs

First, identify five to eight KPIs for each application component, such as SQL Server, application tier, and a similar number for the test characteristics KPI’s. The KPIs you choose should be a subset of overall performance counter collection. Note that these KPIs represent an overall performance of the application during different time intervals. In a production environment, you may notice KPIs return different information during the day. We recommend that you calculate and analyze KPIs regularly by using tools, such as Excel. For example, you should be able to make assertions such as “When we scale from one application server to two application servers, the CPU utilization on my SQL Server increases by 2.5x”. In this example, you could continue investigating further using detailed Performance Monitor logs and DMV stats to understand why SQL Server CPU utilization has increased by 2.5x and understand whether this is a normal characteristic of our workload or there is an issue we need to investigate.

You can automate the collection of the raw performance logs and the calculation of the KPIs by using the Logman and Log Parser tools.

SQL Server troubleshooting scripts

Snapshot wait stats script

/* Snapshot the current wait stats and store them so that they can be compared over a time period

Return the statistics between this point in time and the last collection point in time.

*/

use tempdb

go

if exists (select * from sys.objects where name = ‘snap_waits’)

drop procedure snap_waits

go

create procedure snap_waits

as

declare @current_snap_time datetime

declare @previous_snap_time datetime

set @current_snap_time = GETDATE()

if not exists(select name from tempdb.sys.sysobjects where name like ‘wait_stats%’)

create table wait_stats

(

wait_type varchar(128)

,waiting_tasks_count bigint

,wait_time_ms bigint

,avg_wait_time_ms int

,max_wait_time_ms bigint

,signal_wait_time_ms bigint

,avg_signal_wait_time int

,snap_time datetime

)

insert into wait_stats (

wait_type

,waiting_tasks_count

,wait_time_ms

,max_wait_time_ms

,signal_wait_time_ms

,snap_time

)

select

wait_type

,waiting_tasks_count

,wait_time_ms

,max_wait_time_ms

,signal_wait_time_ms

,@current_snap_time

from sys.dm_os_wait_stats

–get the previous collection point

select top 1 @previous_snap_time = snap_time from wait_stats

where snap_time < (select max(snap_time) from wait_stats)

order by snap_time desc

–get delta in the wait stats

select top 10

s.wait_type

, (e.wait_time_ms – s.wait_time_ms)/((e.waiting_tasks_count – s.waiting_tasks_count)) as [avg_wait_time_ms]

,(e.signal_wait_time_ms – s.signal_wait_time_ms)/((e.waiting_tasks_count – s.waiting_tasks_count)) as [avg_signal_time_ms]

, (e.waiting_tasks_count – s.waiting_tasks_count) as [waiting_tasks_count]

, (e.wait_time_ms – s.wait_time_ms) as [wait_time_ms]

, (e.max_wait_time_ms) as [max_wait_time_ms]

, (e.signal_wait_time_ms – s.signal_wait_time_ms) as [signal_wait_time_ms]

, s.snap_time as [start_time]

, e.snap_time as [end_time]

, DATEDIFF(ss, s.snap_time, e.snap_time) as [seconds_in_sample]

from wait_stats e

inner join (

select * from wait_stats

where snap_time = @previous_snap_time

) s on (s.wait_type = e.wait_type)

where

e.snap_time = @current_snap_time

and s.snap_time = @previous_snap_time

and e.wait_time_ms – s.wait_time_ms > 0

and e.waiting_tasks_count – s.waiting_tasks_count > 0

and e.wait_type NOT IN (‘LAZYWRITER_SLEEP’, ‘SQLTRACE_BUFFER_FLUSH’

, ‘SOS_SCHEDULER_YIELD’ ,’DBMIRRORING_CMD’, ‘BROKER_TASK_STOP’

, ‘CLR_AUTO_EVENT’, ‘BROKER_RECEIVE_WAITFOR’, ‘WAITFOR’

, ‘SLEEP_TASK’, ‘REQUEST_FOR_DEADLOCK_SEARCH’, ‘XE_TIMER_EVENT’

, ‘FT_IFTS_SCHEDULER_IDLE_WAIT’, ‘BROKER_TO_FLUSH’, ‘XE_DISPATCHER_WAIT’

, ‘SQLTRACE_INCREMENTAL_FLUSH_SLEEP’)

order by (e.wait_time_ms – s.wait_time_ms) desc

 

–clean up table

delete from wait_stats

where snap_time < @current_snap_time

go

exec snap_waits

Requests executing on the system script

/* requests executing on the system

*****************************************************************/

select r.session_id

,blocking_session_id

,wait_type

,wait_time

,wait_resource

,r.status

,r.cpu_time

,r.total_elapsed_time

,r.reads

,s.reads [session reads]

,s.logical_reads [session logical reads]

,r.writes

,r.logical_reads

–,r.scheduler_id

,s.host_name

,qt.dbid

,qt.objectid

,substring(substring(qt.text,r.statement_start_offset/2+1,

(case when r.statement_end_offset = -1

then len(convert(nvarchar(max), qt.text)) * 2

else r.statement_end_offset end – r.statement_start_offset)/2)

, 1, 255) –substring

as statement_text

–,qp.query_plan

from sys.dm_exec_requests r

inner join sys.dm_exec_sessions s on (s.session_id = r.session_id)

cross apply sys.dm_exec_sql_text(sql_handle) as qt

cross apply sys.dm_exec_query_plan (plan_handle) as qp

where r.session_id > 50

–and r.session_id = 55

order by r.total_elapsed_time desc –r.status — r.scheduler_id, r.status, r.session_id

 

Top query statements and plan by total CPU time

/* Top statements by total CPU time

************************************************/

SELECT TOP 25

SUBSTRING(qt.text,qs.statement_start_offset/2+1,

(case when qs.statement_end_offset = -1

then len(convert(nvarchar(max), qt.text)) * 2

else qs.statement_end_offset end -qs.statement_start_offset)/2)

as statement_text,

–substring (qt.text , 1, 512) as batch_text,

qs.total_worker_time/qs.execution_count as average_cpu_time,

qs.total_elapsed_time/qs.execution_count as average_elapsed_time,

qs.total_logical_reads/qs.execution_count as average_logical_reads,

qs.total_logical_writes/qs.execution_count as average_logical_writes,

qs.execution_count,

qs.plan_generation_num,

qs.total_worker_time,

qs.total_elapsed_time,

cast((cast(qs.total_worker_time as decimal) / cast(qs.total_elapsed_time as decimal) * 100) as int) as cpu_vs_elapsed_percentage,

qs.total_logical_reads,

qs.total_logical_writes,

db_name(qt.dbid) as [database name],

qs.plan_handle,

qt.objectid

,qp.query_plan

FROM sys.dm_exec_query_stats qs

cross apply sys.dm_exec_sql_text(qs.sql_handle) as qt

cross apply sys.dm_exec_query_plan(qs.plan_handle) as qp

–order by qs.total_logical_reads/qs.execution_count desc

–ORDER BY total_logical_reads desc

–ORDER BY total_elapsed_time desc

ORDER BY total_worker_time desc

 

Snapshot spinlock stats

–115982764 (with lock)

–31250708

/* Snapshot the current spinlock stats and store so that this can be compared over a time period

Return the statistics between this point in time and the last collection point in time.*/

use tempdb

go

if exists (select * from sys.objects where name = ‘snap_spins’)

drop procedure snap_spins

go

create procedure snap_spins

as

declare @current_snap_time datetime

declare @previous_snap_time datetime

set @current_snap_time = GETDATE()

if not exists(select name from tempdb.sys.sysobjects where name like ‘spin_waits%’)

create table spin_waits

(

lock_name varchar(128)

,collisions bigint

,spins bigint

,sleep_time bigint

,backoffs bigint

,snap_time datetime

 

)

–capture the current stats

insert into spin_waits

(

lock_name

,collisions

,spins

,sleep_time

,backoffs

,snap_time

)

select name

,collisions

,spins

,sleep_time

,backoffs

,@current_snap_time

from sys.dm_os_spinlock_stats

 

select top 1 @previous_snap_time = snap_time from spin_waits

where snap_time < (select max(snap_time) from spin_waits)

order by snap_time desc

–get delta in the spin locks stats

select top 10

spins_current.lock_name

, (spins_current.spins – spins_previous.spins) as spins

,(spins_current.backoffs – spins_previous.backoffs) as backoffs

, (spins_current.collisions – spins_previous.collisions) as collisions

, (spins_current.sleep_time – spins_previous.sleep_time) as sleep_time

, spins_previous.snap_time as [start_time]

, spins_current.snap_time as [end_time]

, DATEDIFF(ss, @previous_snap_time, @current_snap_time) as [seconds_in_sample]

from spin_waits spins_current

inner join (

select * from spin_waits

where snap_time = @previous_snap_time

) spins_previous on (spins_previous.lock_name = spins_current.lock_name)

where

spins_current.snap_time = @current_snap_time

and spins_previous.snap_time = @previous_snap_time

and spins_current.spins > 0

order by (spins_current.spins – spins_previous.spins) desc

 

–clean up table

delete from spin_waits

where snap_time < @current_snap_time

go

exec snap_spins

Snapshot I/O stats

/* Snapshot the current file stats and store them so that they can be compared over a time period

Return the statistics between this point in time and the last collection point in time.

 

This uses a persisted table in tempdb. After the needed data is captured, drop this table.

use tempdb

go

drop table _iostats_

 

*/

use tempdb

go

if not exists(select name from tempdb.sys.sysobjects where name like ‘_iostats_’)

create table _iostats_

(

database_id int

,file_id int

,file_guid uniqueidentifier

,num_of_bytes_read bigint

,num_of_bytes_written bigint

,num_of_reads bigint

,num_of_writes bigint

,io_stall_write_ms bigint

,io_stall_read_ms bigint

,size_on_disk_bytes bigint

,physical_name nvarchar(260)

,type_desc nvarchar(60)

,snap_time datetime

)

declare @current_snap_time datetime

declare @previous_snap_time datetime

set @current_snap_time = GETDATE()

insert into _iostats_ (

database_id

,file_id

,file_guid

,num_of_bytes_read

,num_of_bytes_written

,num_of_reads

,num_of_writes

,io_stall_write_ms

,io_stall_read_ms

,size_on_disk_bytes

,physical_name

,type_desc

,snap_time

)

select

vfs.database_id

,vfs.file_id

,mf.file_guid

,vfs.num_of_bytes_read

,vfs.num_of_bytes_written

,vfs.num_of_reads

,vfs.num_of_writes

,vfs.io_stall_write_ms

,vfs.io_stall_read_ms

,vfs.size_on_disk_bytes

,mf.physical_name

,mf.type_desc

,@current_snap_time

from sys.dm_io_virtual_file_stats(null, null) as vfs

join sys.master_files as mf on vfs.database_id = mf.database_id and vfs.file_id = mf.file_id

where vfs.database_id > 4 or vfs.database_id = 2

order by vfs.database_id, vfs.file_id

select top 1 @previous_snap_time = snap_time from _iostats_

where snap_time < (

select max(snap_time) from _iostats_) order by snap_time desc

print ‘Current snap time: ‘ + convert(varchar(32), @current_snap_time)

print ‘Previous snap time: ‘ + convert(varchar(32), @previous_snap_time)

declare @tick_count_between_snaps bigint

set @tick_count_between_snaps = DATEDIFF(ms, @previous_snap_time, @current_snap_time)

print ‘@tick_count_between_snaps: ‘ + convert(varchar(32), @tick_count_between_snaps)

select

time_in_sample_secs = DATEDIFF(s, @previous_snap_time, @current_snap_time)

,db = db_name(iostats_now.database_id)

,iostats_now.physical_name

,iostats_now.type_desc

 

,iostats_now.file_id

,(iostats_now.num_of_bytes_read – iostats_lastsnap.num_of_bytes_read) num_of_bytes_read

,(iostats_now.num_of_bytes_written – iostats_lastsnap.num_of_bytes_written) num_of_bytes_written

,(iostats_now.num_of_reads – iostats_lastsnap.num_of_reads) num_of_reads

,(iostats_now.num_of_writes – iostats_lastsnap.num_of_writes) num_of_writes

 

,avg_read_IOPs = case when (iostats_now.num_of_reads – iostats_lastsnap.num_of_reads) = 0 then 0 else ((iostats_now.num_of_reads – iostats_lastsnap.num_of_reads) /(@tick_count_between_snaps/1000)) end

,avg_read_bytes_sec = case when (iostats_now.num_of_bytes_read – iostats_lastsnap.num_of_bytes_read) = 0 then 0 else ((iostats_now.num_of_bytes_read – iostats_lastsnap.num_of_bytes_read)/(@tick_count_between_snaps/1000)) end

,avg_read_stall_ms = case when (iostats_now.num_of_reads – iostats_lastsnap.num_of_reads) = 0 then 0 else ((iostats_now.io_stall_read_ms – iostats_lastsnap.io_stall_read_ms) /(iostats_now.num_of_reads – iostats_lastsnap.num_of_reads)) end

,avg_read_size = case when (iostats_now.num_of_reads – iostats_lastsnap.num_of_reads) = 0 then 0 else ((iostats_now.num_of_bytes_read – iostats_lastsnap.num_of_bytes_read)/(iostats_now.num_of_reads – iostats_lastsnap.num_of_reads)) end

 

,avg_write_IOPs = case when (iostats_now.num_of_writes – iostats_lastsnap.num_of_writes) = 0 then 0 else ((iostats_now.num_of_writes – iostats_lastsnap.num_of_writes) /(@tick_count_between_snaps/1000)) end

,avg_write_bytes_sec = case when (iostats_now.num_of_bytes_written – iostats_lastsnap.num_of_bytes_written) = 0 then 0 else ((iostats_now.num_of_bytes_written – iostats_lastsnap.num_of_bytes_written)/(@tick_count_between_snaps/1000)) end

,avg_write_stall_ms = case when (iostats_now.num_of_writes – iostats_lastsnap.num_of_writes) = 0 then 0 else ((iostats_now.io_stall_write_ms – iostats_lastsnap.io_stall_write_ms) /(iostats_now.num_of_writes – iostats_lastsnap.num_of_writes)) end

,avg_write_size = case when (iostats_now.num_of_writes – iostats_lastsnap.num_of_writes) = 0 then 0 else ((iostats_now.num_of_bytes_written – iostats_lastsnap.num_of_bytes_written)/(iostats_now.num_of_writes – iostats_lastsnap.num_of_writes)) end

 

,iostats_now.size_on_disk_bytes

,filegrowth = iostats_now.size_on_disk_bytes – iostats_lastsnap.size_on_disk_bytes

 

,iostats_now.file_guid

 

from _iostats_ as iostats_now

inner join

(select * from _iostats_

where snap_time = @previous_snap_time)

iostats_lastsnap on ( –iostats_lastsnap.file_guid = iostats_now.file_guid

iostats_lastsnap.file_id = iostats_now.file_id AND iostats_lastsnap.database_id = iostats_now.database_id

)

 

where (iostats_now.database_id > 4 or iostats_now.database_id = 2)

and iostats_now.snap_time = @current_snap_time

and iostats_lastsnap.snap_time = @previous_snap_time

order by iostats_now.database_id asc, iostats_now.file_id asc

–clean up

delete from _iostats_

where snap_time = @previous_snap_time

–drop table _iostats_

Performance monitor

In addition to the scripts provided earlier, you should capture the following performance monitor counters over the same time period.

\LogicalDisk(*)\Avg. Disk Bytes/Read

\LogicalDisk(*)\Avg. Disk Bytes/Write

\LogicalDisk(*)\Avg. Disk Queue Length

\LogicalDisk(*)\Avg. Disk sec/Read

\LogicalDisk(*)\Avg. Disk sec/Write

\LogicalDisk(*)\Current Disk Queue Length

\LogicalDisk(*)\Disk Read Bytes/sec

\LogicalDisk(*)\Disk Reads/sec

\LogicalDisk(*)\Disk Write Bytes/sec

\LogicalDisk(*)\Disk Writes/sec

\Memory\Available MBytes

\Memory\Free System Page Table Entries

\Memory\Pages/sec

\Network Interface(*)\*

\Process(*)\*

\Processor(*)\% Privileged Time

\Processor(*)\% Processor Time

\SQLServer:Availability Replica(*)\*

\SQLServer:Access Methods\*

\SQLServer:Buffer Manager\*

\SQLServer:Buffer Node\*

\SQLServer:Databases(*)\*

\SQLServer:Database Replica(*)\*

\SQLServer:General Statistics\*

\SQLServer:Latches\*

\SQLServer:Locks(_Total)\Average Wait Time (ms)

\SQLServer:Locks(_Total)\Lock Requests/sec

\SQLServer:Memory Manager\*

\SQLServer:Plan Cache\*

\SQLServer:Wait Statistics\*

\SQLServer:SQL Statistics\*

\System\Context Switches/sec

\System\Processor Queue Length

Note that we also recommend that you capture all replication counters when using replication.

The logman utility can be used to create the Performance Monitor counter set. Simply copy the above counter definition listed here into a file, such as counters.txt, and then run the following at the command prompt:

logman create counter <counterfilename> –cf counters.txt

Reverse DNS Azure ! Quick Tips How To Setup Azure DNS

Reverse DNS Azure ! What are reverse DNS records?

Pointing to your IP to DNS Servers names reverse DNS. This is required to authenticate the caller.RDNS are widely used in email servers or any application /server that tend to send some transactional emails.Reverse DNS records are used in a variety of situations to weakly authenticate the caller. For example, reverse DNS records are widely used in addition combating email spam through by verifying that the sender of an email message did so from a host for which there was a reverse DNS record, and optionally, where that host was recognized as one that was authorized to send email from the originating domain.

How to Setup Reverse DNS in azure.?

Step 1: Login into your Azure portal using your credentials.

Step 2 :Identify the (Virtual machine)IP you want setup RDNS

Azure > Virtual machines

Step 3 : Identify the IP and assign it as a static IP and Domain names for your server such as mail.centralus.cloudapp.azure.com

Go to your Virtual Machine >Network Interface >Select the network Adopter properties >Click on the Overview > Click on the Publick IP you want to set as static and assign a RDNS.

Step 4: Setup Static IP and DNS name for your IP.

Click on Configuration >Assignment Set as static and on DNS label set a name such as mail( or whatever you want )

Step 5:Now setup A record or CName at your domain registrar.

If you are choosing A record set A record name and assign point IP to your azure IP Or
If you are setting up Cname create Cname and point the value to mail.centralus.cloudapp.azure.com( depends on the value you set at step 4 .
Note:wait for the DNS replication is over at your domain registrar (it should usually take less than five minutes to up to 24 hours depends on your DNS provider)

Step 6: Once you DNS replication is over with your DNS registrarSetup the RDNS using azure PowerShell.
Open Windows Azure Powershell
Inside Powershell, log in to your account with this command:
Login-AzureRMAccount
Azure Login pop will open give your username and password to login.
azurelogin

Step 7: Now we are almost at the end .Gather your IP resource group name and IP name using azure portal as shown in below image.

Reverse DNS Azure

Step 8 : Run this following commands to setup Reverse DNS Azure

Note command to be in sequence.
$ipName = “xxxxmedhahosting-ip”
$locName = “Central US”
$rgName = “xxxxmedhahosting”
$rvFqdn = “mail.yourdomainname.com”
$dnLabel = “yourazurednslable”
New-AzureRmPublicIpAddress -Name $ipName -ResourceGroupName $rgName -Location $locName -ReverseFqdn $rvFqdn -AllocationMethod Static -DomainNameLabel $dnLabel
Hurrey your Reverse DNS Azure is completed.

If you have any Azure consulting related queries.Please look at our azure consulting page.we are happy to help you.