Data preparation is an iterative and agile process for finding, combining, cleaning, transforming and sharing curated datasets for various data and analytics use cases including analytics/business intelligence (BI), data science/machine learning (ML) and self-service data integration. Data preparation tools promise faster time to delivery of integrated and curated data by allowing business users including analysts, citizen integrators, data engineers and citizen data scientists to integrate internal and external datasets for their use cases. Furthermore, they allow users to identify anomalies and patterns and improve and review the data quality of their findings in a repeatable fashion. Some tools embed ML algorithms that augment and, in some cases, completely automate certain repeatable and mundane data preparation tasks. Reduced time to delivery of data and insight is at the heart of this market.
The market for ESP platforms consists of software subsystems that perform real-time computation on streaming event data. They execute calculations on unbounded input data continuously as it arrives, enabling immediate responses to current situations and/or storing results in files, object stores or other databases for later use. Examples of input data include clickstreams; copies of business transactions or database updates; social media posts; market data feeds; images; and sensor data from physical assets, such as mobile devices, machines and vehicles.