End-to-End Analytics with Azure Synapse Analytics – Modern Data Warehouses in Azure

End-to-End Analytics with Azure Synapse Analytics

Azure Synapse Analytics is an enterprise analytics system that integrates multiple services that serve analytical workloads in a single environment. Through the Azure Synapse workspace, users can leverage the following services to build a modern data warehouse solution:

  • Synapse Studio is a unified environment where users can manage all components of the Azure Synapse Analytics ecosystem. The following tasks can be performed with Synapse Studio:
    • Build ETL and ELT workflows that can be automated to run at predetermined times or after specific events.
    • Configure and deploy dedicated SQL, Apache Spark, and Data Explorer pools.
    • Develop SQL, Spark, or KQL code to analyze data with SQL, Spark, or Data Explorer pools.
    • Monitor resource utilization, query performance, and user access across SQL, Spark, or Data Explorer pools.
    • Integrate with CI/CD and data catalog services such as Azure DevOps and Azure Purview.
  • Dedicated SQL pools are analytical data stores that use a scale-out, massively parallel processing (MPP) architecture to effectively manage several petabytes of data. Storage and compute are decoupled, allowing users to easily scale compute power without having to move data. Azure Synapse workspaces can have one or more dedicated SQL pools.
  • Serverless SQL pool is a serverless query service that allows analysts to use T-SQL to interactively query Azure Storage files. It does not have local storage or ingestion capabilities. Every Azure Synapse workspace comes with a serverless SQL pool endpoint that cannot be deleted. Azure Synapse workspaces only support a single serverless SQL pool (named “Built-in”).
  • Apache Spark pools are managed, open-source Apache Spark clusters in the Azure Synapse ecosystem. Users can set the number of compute nodes in a cluster, with an option to automatically scale clusters up and down based on the workload. Cluster nodes can be configured with predefined node sizes, ranging from small (4 vCores, 32 GB of memory) to xxx large (80 vCores, 504 GB of memory). With Synapse notebooks, data engineers can use an Apache Spark pool to analyze data with Python, SQL, R, Scala, Java, or .NET code. More information about Azure Synapse Analytics Apache Spark pools can be found at https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-pool-configurations.
  • Synapse pipelines are orchestration workflows that define a set of actions to perform on data. This service has the same functionality as ADF but is available through the Azure Synapse workspace, making it more ideal for users who want to manage their analytical data stores, data engineering activities, and orchestration pipelines from the same environment. The concepts covered previously in this chapter for ADF also apply to Azure Synapse pipelines.
  • Synapse Link is a hybrid transactional and analytical processing (HTAP) tool that enables users to run near real-time analytical queries over transactional data. With Azure Synapse Link, users do not need to build complex ETL workflows that move data from a transactional data store to an analytical one. Instead, Synapse Link synchronizes data from transactional data stores like Azure Cosmos DB and Azure SQL Database with a column-oriented analytical data store that can be explored with the Azure Synapse Analytics serverless SQL pool or an Azure Synapse Analytics Apache Spark pool. More information about Azure Synapse Link can be found at https://docs.microsoft.com/en-us/azure/cosmos-db/synapse-link.
  • Data Explorer pools are optimized for telemetry analytics. Azure Synapse data explorer automatically indexes free-text and semi-structured data that is found in telemetry data, such as logs and time series data. The concepts covered previously in this chapter for Azure Data Explorer also apply to Azure Synapse data explorer.
  • Power BI is a reporting service that can be used to develop dashboards, reports, and datasets for self-service BI. Azure Synapse Analytics allows users to connect a Power BI workspace to an Azure Synapse Analytics workspace for a seamless development experience. This provides analysts with a single environment for analyzing data, developing insightful reports, and sharing the reports to various business users. Power BI workspaces will be described in further detail in Chapter 6, “Reporting with Power BI.”

As you can see, Azure Synapse Analytics allows users to leverage several different technologies to build modern data warehouse solutions in the same environment. The following sections describe how to get started with Azure Synapse Analytics, including how to deploy a workspace and how to navigate Synapse Studio. Afterward, we will examine the two categories of SQL pools, dedicated and serverless, and when to use them.

Leave a Reply

Your email address will not be published. Required fields are marked *