8/13/25, 1:53 AM                                              What is Azure Databricks?
- Azure Databricks | Microsoft Learn
      What is Azure Databricks?
      05/06/2025
      Azure Databricks is a unified, open analytics platform for building, deploying, sharing, and
      maintaining enterprise-grade data, analytics, and AI solutions at scale. The Databricks Data
      Intelligence Platform integrates with cloud storage and security in your cloud account, and
      manages and deploys cloud infrastructure for you.
      Azure Databricks uses generative AI with the data lakehouse to understand the unique semantics
      of your data. Then, it automatically optimizes performance and manages infrastructure to match
      your business needs.
https://learn.microsoft.com/en-us/azure/databricks/introduction/                                                               1/5
8/13/25, 1:53 AM                                              What is Azure Databricks? - Azure Databricks | Microsoft Learn
      Natural language processing learns your business's language, so you can search and discover
      data by asking a question in your own words. Natural language assistance helps you write code,
      troubleshoot errors, and find answers in documentation.
      Managed open source integration
      Databricks is committed to the open source community and manages updates of open source
      integrations with the Databricks Runtime releases. The following technologies are open source
      projects originally created by Databricks employees:
              Delta Lake          and Delta Sharing
              MLflow
              Apache Spark             and Structured Streaming
              Redash
              Unity Catalog
      Common use cases
      The following use cases highlight some of the ways customers use Azure Databricks to
      accomplish tasks essential to processing, storing, and analyzing the data that drives critical
      business functions and decisions.
      Build an enterprise data lakehouse
      The data lakehouse combines enterprise data warehouses and data lakes to accelerate, simplify,
      and unify enterprise data solutions. Data engineers, data scientists, analysts, and production
      systems can all use the data lakehouse as their single source of truth, providing access to
      consistent data and reducing the complexities of building, maintaining, and syncing many
      distributed data systems. See What is a data lakehouse?.
      ETL and data engineering
      Whether you're generating dashboards or powering artificial intelligence applications, data
      engineering provides the backbone for data-centric companies by making sure data is available,
      clean, and stored in data models for efficient discovery and use. Azure Databricks combines the
      power of Apache Spark with Delta and custom tools to provide an unrivaled ETL experience. Use
https://learn.microsoft.com/en-us/azure/databricks/introduction/                                                               2/5
8/13/25, 1:53 AM                                              What is Azure Databricks? - Azure Databricks | Microsoft Learn
      SQL, Python, and Scala to compose ETL logic and orchestrate scheduled job deployment with a
      few clicks.
      Lakeflow Declarative Pipelines further simplifies ETL by intelligently managing dependencies
      between datasets and automatically deploying and scaling production infrastructure to ensure
      timely and accurate data delivery to your specifications.
      Azure Databricks provides tools for data ingestion, including Auto Loader, an efficient and
      scalable tool for incrementally and idempotently loading data from cloud object storage and data
      lakes into the data lakehouse.
      Machine learning, AI, and data science
      Azure Databricks machine learning expands the core functionality of the platform with a suite of
      tools tailored to the needs of data scientists and ML engineers, including MLflow and Databricks
      Runtime for Machine Learning.
      Large language models and generative AI
      Databricks Runtime for Machine Learning includes libraries like Hugging Face Transformers that
      allow you to integrate existing pre-trained models or other open source libraries into your
      workflow. The Databricks MLflow integration makes it easy to use the MLflow tracking service with
      transformer pipelines, models, and processing components. Integrate OpenAI                                               models or
      solutions from partners like John Snow Labs in your Databricks workflows.
      With Azure Databricks, customize a LLM on your data for your specific task. With the support of
      open source tooling, such as Hugging Face and DeepSpeed, you can efficiently take a foundation
      LLM and start training with your own data for more accuracy for your domain and workload.
      In addition, Azure Databricks provides AI functions that SQL data analysts can use to access LLM
      models, including from OpenAI, directly within their data pipelines and workflows. See Apply AI
      on data using Azure Databricks AI Functions.
      Data warehousing, analytics, and BI
      Azure Databricks combines user-friendly UIs with cost-effective compute resources and infinitely
      scalable, affordable storage to provide a powerful platform for running analytic queries.
      Administrators configure scalable compute clusters as SQL warehouses, allowing end users to
      execute queries without worrying about any of the complexities of working in the cloud. SQL
https://learn.microsoft.com/en-us/azure/databricks/introduction/                                                                           3/5
8/13/25, 1:53 AM                                              What is Azure Databricks? - Azure Databricks | Microsoft Learn
      users can run queries against data in the lakehouse using the SQL query editor or in notebooks.
      Notebooks support Python, R, and Scala in addition to SQL, and allow users to embed the same
      visualizations available in legacy dashboards alongside links, images, and commentary written in
      markdown.
      Data governance and secure data sharing
      Unity Catalog provides a unified data governance model for the data lakehouse. Cloud
      administrators configure and integrate coarse access control permissions for Unity Catalog, and
      then Azure Databricks administrators can manage permissions for teams and individuals.
      Privileges are managed with access control lists (ACLs) through either user-friendly UIs or SQL
      syntax, making it easier for database administrators to secure access to data without needing to
      scale on cloud-native identity access management (IAM) and networking.
      Unity Catalog makes running secure analytics in the cloud simple, and provides a division of
      responsibility that helps limit the reskilling or upskilling necessary for both administrators and end
      users of the platform. See What is Unity Catalog?.
      The lakehouse makes data sharing within your organization as simple as granting query access to
      a table or view. For sharing outside of your secure environment, Unity Catalog features a
      managed version of Delta Sharing.
      DevOps, CI/CD, and task orchestration
      The development lifecycles for ETL pipelines, ML models, and analytics dashboards each present
      their own unique challenges. Azure Databricks allows all of your users to leverage a single data
      source, which reduces duplicate efforts and out-of-sync reporting. By additionally providing a
      suite of common tools for versioning, automating, scheduling, deploying code and production
      resources, you can simplify your overhead for monitoring, orchestration, and operations.
      Jobs schedule Azure Databricks notebooks, SQL queries, and other arbitrary code. Databricks
      Asset Bundles allow you to define, deploy, and run Databricks resources such as jobs and
      pipelines programmatically. Git folders let you sync Azure Databricks projects with a number of
      popular git providers.
      For CI/CD best practices and recommendations, see Best practices and recommended CI/CD
      workflows on Databricks. For a complete overview of tools for developers, see Develop on
      Databricks.
https://learn.microsoft.com/en-us/azure/databricks/introduction/                                                               4/5
8/13/25, 1:53 AM                                              What is Azure Databricks? - Azure Databricks | Microsoft Learn
      Real-time and streaming analytics
      Azure Databricks leverages Apache Spark Structured Streaming to work with streaming data and
      incremental data changes. Structured Streaming integrates tightly with Delta Lake, and these
      technologies provide the foundations for both Lakeflow Declarative Pipelines and Auto Loader.
      See Structured Streaming concepts.
https://learn.microsoft.com/en-us/azure/databricks/introduction/                                                               5/5