Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Azure Data Explorer: Real-Time Analytics — Fortinet Logs
Latest   Machine Learning

Azure Data Explorer: Real-Time Analytics — Fortinet Logs

Last Updated on July 26, 2023 by Editorial Team

Author(s): Rory McManus

Originally published on Towards AI.

Data Analytics

Azure Data Explorer (ADX) is a fully managed data analytics service for real-time analysis on large volumes of data streaming from applications, websites, and IoT devices.

The primary use of ADX is the ingestion of structured, semi-structured, and unstructured data for big data analytics, with speeds of up to 200 Megabytes/sec per node (up to 1000 nodes) returning results in less than a second across billions of records.

More businesses are opening their network to a wide variety of IoT devices and applications, it becomes increasingly vital for network and security teams to proactively react to these threatening events in a timely and cost-effective manner.

I recently employed ADX with a government client to migrate an existing Kafka workload which ingests and transforms Fortinet, Paloalto, and Bluecoat web security logs. During Covid-19, their workload increased 10-fold, with an associated 5-fold increase in costs. The migration of this workload resulted in a 60% cost reduction, a simplified solution, and an improvement in data reliability.

How can data be ingested into Azure Data Explorer?

Automated Pipelines — Ingestion Methods

  • Event Grid Blob Created — When a ‘blob’ is created on the Azure storage account it results in the firing of an event that triggers the Data Explorer ingestion pipeline.
  • Event Hub
  • IoT Hub
  • Azure Data Factory
  • Light Ingest — Command line tool for historical loads to minimize cost.

Supported Formats

  • Uncompressed Formats — ApacheAvro, AvroCSV, JSON, MultiJSON, ORC, Parquet, PSV, RAW, SCsv, SOHsv, TSV, TSVE, TXT, W3CLOGFILE

When the source data has a schema provided e.g. Avro, parquet, w3clogfile it can be directly inserted into the final destination table with the expected data types, column names, etc.

  • Compressed Formats — GZip, Zip

Transformations

Data is transformed in ADX by using the native language KQL — Kusto Query Language. This is a simple, yet powerful language to query structured, semi-structured, and unstructured data. It assumes a relational data model of tables and columns, with a minimal set of data types. The language is very expressive, easy to read, and understand the query intent.

Ingesting Fortinet Logs from Azure Storage to ADX

In this article, I will demonstrate how to create an Ingestion Pipeline to ingest and transform Fortinet Web Security log files uploaded hourly to an Azure Storage Account and which accumulate to a daily total of 400GB (when uncompressed).

The file is a compressed .gz file split into three different formats:

  1. Space delimited values
  2. Pipe delimited values
  3. Pipe delimited Key-Value Pairs

Solution

The solution used follows the high-level steps below:

  1. Fortinet Log Files are uploaded/created on Azure Storage(ADLS Gen2) This action in turn triggers the ingestion process using an Event Grid-created subscriber.
  2. The file is ingested into an ADX staging table.
  3. An ADX user-defined Update Policy reads the newly uploaded data in the staging table and transforms the data into the destination table as required.

Ingestion Pipeline

Prerequisites

  • Install Kusto explorer and connect to the ADX cluster. Alternatively, the Web UI can be utilized.

https://docs.microsoft.com/en-us/azure/data-explorer/kusto/tools/kusto-explorer

  • Microsoft recommends each file must be 1GB uncompressed for optimal ingestion and no larger than 4GB.
  • Register Event Grid with the Azure Subscription.

To create the ingestion pipeline the following steps must be completed

  1. Create a container on Azure Storage — ADLS Gen2.
  2. Create an ADX Staging Table.
  3. Set a Retention Policy on the ADX Staging table.

4. Create an ADX Query Function to read and transform the data landing in the staging table.

5. Create an ADX Destination Table for the curated data.

6. Create ADX Update policy.

The Update Policy instructs ADX to automatically append data to the target table whenever new data is inserted into the staging table, based on the transformation function created in step 3.

7. Create an Event Grid Ingestion Method.

The chosen ingestion method is ingesting data into data explorer via Event Grid from ADLS.

8. Test 🙂

Steps

  1. Create a container on Azure Storage — ADLS Gen2.
  2. Create an ADX staging table with one column of data type string.

3. Set a Retention Policy on the ADX Staging table to only keep 14 days of data.

4. Create an ADX Function.

The function reads and transforms the data from the staging table to the desired output. Only a subset of source columns are required in the output.

5. Create an ADX Destination Table for the curated data.

The ingestion function can be used to create the schema for the destination table using the following script:

NOTE: Ensure the DateTime and numeric columns are typed correctly as ADX stores metadata and statistics for each column. ADX will also store the maximum and minimum values of the extent of the data. This will ensure that when the user requests the data from the store, with certain conditions, it will be compared and only relevant extents are scanned and returned as results.

6. Create ADX Update Policy

The Update Policy instructs ADX to automatically append data to the target table whenever new data is inserted into the staging table, based on the transformation function created above.

7. Create an Event Grid Ingestion Method.

The chosen ingestion method is ingesting data into data explorer via Event Grid from ADLS.

  • Log in to the Azure Portal.
  • Navigate to the ADX Cluster U+279C Databases (Select appropriate database) U+279C Data connections.
  • Add Data Connection — see below.
  • Click ‘Next: Review + create >’ to the next tab Ingest Properties.

NOTE: Txt files do not have mappings. Mappings are only used for CSV, JSON, AVRO, and W3CLOGFILE files.

8. Test 🙂

Upload a file to the Azure storage container. If the ingestion has failed run the query below to check why.

Conclusion

If you would like a copy of my code, please drop me a message on LinkedIn.

I hope you have found this helpful and will save your company money and time getting started with Azure Data Explorer.

Please share your thoughts, questions, corrections, and suggestions. All feedback and comments are very welcome.

Data Mastery U+007C LinkedIn

Azure Data Platforms U+007C Databricks U+007C Big Data U+007C Power BI U+007C Analytics Industries Information Technology and Services…

au.linkedin.com

If you liked this article, here are some other articles you may enjoy:

Databricks: Upsert to Azure SQL using PySpark

An Upsert is an RDBMS feature that allows a DML statement’s author to automatically either insert a row, or if the row…

rorymcmanus.medium.com

Azure Cognitive Services Sentiment Analysis V3 — Using PySpark

What is Azure Cognitive Services — Text Analytics?

pub.towardsai.net

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓