Gartner defines the market for cloud database management systems (DBMSs) as the market for software products that store and manipulate data and that are primarily delivered as software as a service (SaaS) in the cloud. Cloud DBMSs may optionally be capable of running on-premises, or in hybrid, multicloud or intercloud configurations. They can be used for transactional work and/or analytical work. They may have features that enable them to participate in a wider data ecosystem. Must-have capabilities for this market include: Availability as SaaS on provider-managed public or private cloud systems; Management of data within cloud storage — that is, cloud DBMSs are not hosted in infrastructure as a service (IaaS), such as in a virtual machine or a container managed by the customer.
Gartner defines a data science and machine learning platform as an integrated set of code-based libraries and low-code tooling that support the independent use by, and collaboration between, data scientists and their business and IT counterparts through all stages of the data science life cycle. These stages include business understanding, data access and preparation, experimentation and model creation, and sharing of insights. They also support machine learning engineering workflows including creation of data, feature, deployment and testing pipelines. The platforms are provided via desktop client or browser with supporting compute instances and/or as a fully managed cloud offering. Data science and machine learning (DSML) platforms are designed to allow a broad range of users to develop and apply a comprehensive set of predictive and prescriptive analytical techniques. Leveraging data from distributed sources, cutting-edge user experience, and native machine learning and generative AI (GenAI) capabilities, these platforms help to augment and automate decision making across an enterprise. They provide a range of proprietary and open-source tools to enable data scientists and domain experts to find patterns in data that can be used to forecast financial metrics, understand customer behavior, predict supply and demand, and many other use cases. Models can be built on all types of data, including tabular, images, video and text for applications that require computer vision or natural language processing.
Hadoop distributions are used to provide scalable, distributed computing against on-premises and cloud-based file store data. Distributions are composed of commercially packaged and supported editions of open-source Apache Hadoop-related projects. Distributions provide access to applications, query/reporting tools, machine learning and data management infrastructure components. First introduced as collections of components for any use case, distributions are now often delivered as part of a specific solution for data lakes, machine learning or other uses. They subsequently grow into additional, expanded roles, competing with both older technologies like database management systems (DBMSs) and newer ones like Apache Spark.