If your Data Platform is running on Azure there is a high probability that you are using Azure Data Factory (ADF) for Extraction, Transformation and Loading (ETL) of the data. Once you’ve created and published pipelines in Azure Data Factory, you may monitor all of your pipeline runs natively in the Azure Data Factory user experience. However, this default view lacks various management features which you need to take control of your Data Platform. Let’s take a close look at the primary risks your organization faces by not being able to get deeper insights into Azure Data Factory environment(s).
Time is money in Azure Data Factory and Azure in General. The more xtime each pipeline takes to get the job done, the longer the Integration Runtime (IR) will run. If you are on a Pay-As-You-Go model of billing, this directly implies to higher consumption bills. Getting insights on how much time a job takes/took historically can help you reduce time as it prompts for optimization. By taking a deeper look at key metrics like Top 10 Longest Running Jobs, you can take actions to make your data pipelines more performant.
In Azure as you pay for use (consumption), you not only pay for succeeding jobs but also for failing ones. ADF Pipelines are recurring in nature and if the errors are not fixed, job failure also inherits this recurring nature. Knowing which job fails can save you money leading you towards pay for value from current pay for use. In the current default ADF monitoring view it is currently very difficult to get insights like Top 10 Failing Jobs, which can greatly help you in prioritizing your efforts towards optimizing the jobs with the highest impact.
Unmonitored Data platform complexity usually grows over time as jobs take longer time to complete and more data is transferred. When the processing time grows, the cost of running the pipelines and failures grows exponentially. Knowing runtimes and data transfer historically can help you channelize your efforts in the right direction. By taking a look at the amount of data being transferred in each run and building insights like Top 10 highest Data Volume Runs, you are able to deep dive into the velocity and volume of the data transfers through your Data Factory Environment.
Operation details are often hidden in Azure. Uncovering these details require considerable efforts by enabling Azure logging options or calling APIs and storing them in a consumable format. Getting access to the data is the first hurdle, making sense of this data is the next. To be able to monitor routine operations and be assured that business is as as usual, you need to have a granular as well as high level insights on metrics such as cost, usage, runtime and error trends. Key KPIs such as Azure Costs, Runtime, Errors and Alerts can give you an uber level view and provide you a firmer grip on your Azure Data Factory.
When an error occurs in an Azure Data Factory pipeline or when a pipeline runs beyond the usual time, you should have some insights on what is going behind the scene. To be able to identify that the pipeline has taken more time than usual, it is important to analyze the average run time over a period of time and then set a threshold accordingly. Being able to record Average, Minimum and Maximum execution time for each pipeline although a tedious task when done manually, can give you a projection on usual operation window.
Setting up proper monitoring incurs extra (hidden) cost. So how can you navigate around the above challenges? Leveraging the expertise from specialists who excel in gathering the relevant data and interpreting this data might be a great idea. Instead of reinventing the wheel by investing your time, money and efforts in building a monitoring and alerting system in-house, by onboarding to a pre-built product you can get the results starting from Day 1.
The idea for the FullData ADF Insight Monitoring and Alerting service was born while investigating failing customer ADF jobs that kept running nonstop for days without anybody noticing them. Avoid wasting your precious time (and money) because of unexpected errors or long running jobs by employing our solution. Just this one simple step – to get one step ahead. With the FullData ADF Insights monitoring & alerting service, you will receive an Email alert when the duration thresholds are exceeded. This gives you the ability to track + visualize duration of all your pipeline executions and get notified when the business is not normal. The FullData service will bring you peace of mind.
At FullData we are experts in the Data Platform products and their implementation. We help you uncover telemetry of your Data Platforms and turn it into actionable insights, detect trends and provide early warnings, keeping an eye on relevant event and we capture volatile information that normally vaporizes. We provide the basis for long term trend analysis and the deep insights needed to keep your data platform healthy.
Ready to embark a journey of making your Data Platform smarter, efficient and Connected?