Azure Data Factory
Azure Data Factory is a PaaS ETL technology that can be used to orchestrate data movement and data transformation activities. With the Azure Data Factory’s native data store connectors, users can quickly build connections to on-premises and cloud data stores. Developers can then use those established connections, called Linked Services, to build datasets that are used in Azure Data Factory pipelines. Pipelines consist of activities that process datasets, storing them in formats that can be used by data science and reporting applications.
One of the core components of Azure Data Factory is its Copy Data activity. Developers can use this activity to move large amounts of data from on-premises and cloud data stores to a central data repository in Azure Storage. The Copy activity is typically the first step used in an Azure Data Factory ETL pipeline, consolidating raw, source data in a single ADLS account. This activity is also used to migrate binary objects such as videos, images, and audio files to Azure Blob Storage.

FIGURE 4.25 Upload Files pop-up page
Creating Azure Data Factory resources such as linked services, datasets, pipelines, and pipeline activities is covered in further detail in Chapter 5, “Modern Data Warehouses in Azure.”
Azure Data Box
For some organizations, using a programmatic approach to migrating data from an on-premises appliance to Azure can take longer than what is acceptable. Microsoft can support organizations facing this issue with Azure Data Box. Azure Data Box is a physical device that lets organizations send large amounts of data to Azure very quickly. It is typically used to migrate datasets that are larger than 40 TBs in scenarios with limited or no network connectivity. Azure Data Box is used in the following scenarios:
- Moving large amounts of media data such as videos, images, and audio files to Azure
- Migrating several VMs to Azure at once
- Migrating large amounts of historical data that is used by distributed analytics solutions
Azure Data Box can also be used to export data from Azure Storage to an on-premises datacenter.
The following steps describe the workflow used to migrate data to Azure with Azure Data Box:
- Order the device through the Azure Portal. Provide shipping information and the destination Azure storage account for the data.
- Once the device is delivered, connect the device to your network using a wired connection. Make sure the computer from which you will be copying the data has access to the data.
- Copy the data to the device.
- Once the data has finished copying, turn off the device and ship it back to the Azure datacenter that you are migrating the data to.
- The data is moved to the designated Azure storage account once the device is returned.
More information about procuring and managing an Azure Data Box device can be found at https://docs.microsoft.com/en-us/azure/databox/data-box-overview.
Leave a Reply