Make your development, test and support team succeed on Spark.
Accelerate your development and deployment process.
Unique and intuitive design environment that helps you design superior workflows on Spark.
Code = Visual IDE
Two views of Spark code in Git for different developers.
Spark code in Git is 100% open source.
Standardized components that are extensible
Step-by-step interactive execution on Spark for development and debugging.
Unit test, integration test and data quality test support.
Enterprise-grade Metadata system that gets you faster and smarter insights about your data assets.
Manage All Assets
Store Organizational, Technical and Execution metadata with support for browsing, graph traversal and text search interfaces.
Multi Spark Integration
We support multiple Hive and Spark metastores integrating data distribution and statistics into Prophecy Hub. We also integrate with external metastores such as on-prem Hadoop and AWS Glue catalog..
Use Prophecy Hub API to add new entities (such as datasets) and new aspects (such as schemas). We support adding custom aspects (json) to existing entities.
Get a single view of your metadata across multiple clouds.
Enhance governance and trust for your data.
Column Level Lineage
View column level lineage at systems, project and workflow level. Per column, get the summary of transformations done on source data to produce its value.
Propagate column level tags across data engineering pipelines to ensure that your sensitive data is appropriately tagged with privacy and security restrictions.
Import External Sources
Add lineage for existing sources by running Prophecy Crawler on your Git repositories. We currently support Spark and Hive code.
Simplified Cloud Execution
All clouds provide Spark service. Execution of data engineering workflows on these Spark services requires scheduling, clusters lifetime management, cluster sizing.
Prophecy supports a single view across multiple execution Fabrics. The workflows can be run on different Fabrics when equivalent Datasets are present on these.
Fabric is our virtual equivalent of on-premise big data clusters. This allows you to have separate environments for test, integration and production. This is a basic unit that supports multi cloud and continuous deployment .
Prophecy can install on any Kubernetes, whether on premise or on public cloud as a Kubernetes Operator. In public cloud, we can run within your VPC, in addition to being available as a hosted service.
Bring the speed of DevOps to Data Engineering with automation tools
We can auto-generate high coverage (90%+) unit tests for your existing workflows and accelerate writing new ones. Every modification (git commit) runs unit tests to maintain high quality of workflows.
Integration tests. Data quality tests.
We accelerate data quality tests (including auto-generation of suggested tests) that can ensure production of correct data for downstream tests, as well as warn no changes in data patters.
On promotion request, we run current and new workflows on the same data in parallel and give you data comparison, performance comparison, and downstream impact analysis of changed data columns to enable easy and fast promotion.
Convert legacy ETL workflows to Spark with high automation.
Prophecy has transpilers, or cross compilers for formats and languages of Legacy ETL providers that provide highly automated modernization of your existing ETL workflows. Our expertise in moving multiple Enterprises allows you to move faster, more reliably and lower risk.