Data Lakehouse Platforms Reviews and Ratings
What are Data Lakehouse Platforms?
A lakehouse is a converged infrastructure design environment that combines the semantic flexibility of a data lake with the production optimization and delivery capabilities of a data warehouse. Data lakehouses are considered transformational and can serve as the foundational analytic data store for the organization. They are designed to unify the capabilities of data warehouses and data lakes into a single platform to support comprehensive data management and AI lifecycle.
Product Listings
Filter by
Dremio provides an agentic lakehouse platform designed to support AI-driven analytics and automation. It enables AI agents and users to access and analyze data across sources through federated query capabilities, unstructured data processing, and an AI-powered semantic layer that adds business context. The platform automates performance management and query optimization, reducing manual administration and supporting scalable, self-managing data operations. Dremio is built on open standards, including Apache Iceberg, Apache Polaris, and Apache Arrow, and is used by global enterprises across industries to accelerate data access and insights.
Amazon SageMaker is a software that enables developers and data scientists to build, train, and deploy machine learning models at scale. The software offers a managed environment that supports various machine learning frameworks and algorithms, including built-in tools for data labeling, model tuning, and data preparation. It provides infrastructure automation for distributed training, as well as model hosting for real-time and batch inference. Users can take advantage of integrated Jupyter notebooks to perform data exploration and preprocessing. Amazon SageMaker supports deployment across cloud and edge environments, helping organizations accelerate and standardize machine learning workflows. The software addresses the challenges of operationalizing machine learning by streamlining development and deployment processes.
Microsoft Fabric is a data analytics software that integrates multiple tools for data integration, data engineering, data warehousing, data science, real-time analytics and business intelligence into a single platform. The software supports connectivity across sources and provides a unified experience for data preparation, transformation and modeling. Microsoft Fabric enables organizations to store, manage and analyze data from various sources by offering access to lakehouse architecture, semantic models and reporting features. The software addresses challenges related to disparate data tools and data silos by offering centralized governance, security and collaboration functions aimed at streamlining analytics processes for business decision-making.
Google Cloud BigLake is software designed to unify data lakes and warehouses, enabling organizations to manage and analyze structured and unstructured data across cloud storage and data warehouses. The software allows governance and fine-grained access controls over stored data, supporting multi-engine query capabilities. It provides centralized management for various data formats and integrates with analytics and machine learning tools. BigLake addresses the challenge of disparate data silos by providing a centralized platform to streamline data storage, security, and processing while supporting interoperability across multiple analytics engines. This helps organizations improve efficiency in data analysis and management.
IBM watsonx.data is a hybrid, open data lakehouse that helps organizations easily access, integrate, and analyze structured and unstructured data across hybrid cloud and on-premises environments. It combines data lakes and data warehouses to support enterprise AI, analytics, and real-time workloads, using open engines such as Apache Spark, Cassandra, and Presto, along with real‑time data services like DataStax optimized for price and performance.
IBM watsonx.data prioritizes data governance and security with end-to-end lineage, consistent access control, and open formats/APIs to prevent vendor lock-in. By unifying data sources, enabling flexible analytics, and providing no-code, low-code, and pro-code interfaces, watsonx.data helps break down silos, streamline workflows, and prepare data for reliable AI and analytics.
IOMETE is a software that delivers a unified data platform designed to streamline analytics infrastructure and data engineering workflows. It enables organizations to manage, process, and analyze large datasets efficiently by integrating data lake, data warehouse, and ETL capabilities within a single environment. Its features include centralized data management, real-time data processing, support for scalable storage, and compatibility with various data formats and BI tools. The software addresses the business problem of fragmented data systems by providing a cohesive solution for data ingestion, transformation, and access, facilitating collaboration among data teams.
Onehouse is a software designed to manage and optimize data lakes by automating data ingestion, transformation, and management workflows. The software supports open data lake formats and provides features for data versioning, indexing, and compaction to enhance data reliability and query performance. It integrates with data processing engines and offers governance capabilities for schema evolution and access controls. Onehouse aims to address challenges in managing large-scale analytics infrastructure by delivering tools for seamless data operations and reducing the complexities involved in building and maintaining data lake architectures.
Oracle Autonomous Data Warehouse is a cloud-based software designed to automate database management and optimize data analytics workloads. The software utilizes machine learning techniques to handle routine tasks such as patching, upgrading, and tuning without human intervention. It provides scalable storage and compute resources tailored for analytical processing and reporting. Oracle Autonomous Data Warehouse supports integration with various business intelligence tools and data sources, enabling organizations to aggregate, analyze, and visualize large volumes of data efficiently. The software addresses business demands for secure data storage, automated performance optimization, and simplified management, helping organizations focus on extracting insights rather than on database maintenance.
Features of Data Lakehouse Platforms
Updated October 2025Mandatory Features:
Data ingestion: Collecting data from sources and transferring to the lakehouse, including batch ingestion, CDC, stream ingestion and file transfer.
Persistent storage: Leverages simple object storage and is expected to be in an open table format (OTF) that may be complemented by other data types.
Data catalog: Ability to identify and discover data objects, data governance, security, lineage and metadata management of information associated with data assets to enhance integration, access and utility across an organization.
Data management: The lakehouse must ensure features like capacity planning, backup and disaster recovery are performed. This can be executed by the lakehouse or delegated to other services.
Unified data management: Multimodal data storage, schema flexibility and processing for a broad range of data types.
Converged design architecture: Unifies the architecture and workloads of a data warehouse and data lake on a single platform.
Data sources: Types of information sources that are utilized as inputs, including unstructured, semistructured, structured and streaming data.
Data science/machine learning: Involves the application of predictive and prescriptive analytics methods to extract insights and build models.
Query engine(s): Execute queries from one or more query engines that share the same metadata and physical assets of the lakehouse.
Workload management: Ability to execute different workloads without conflicts and with acceptable availability and performance.







