What's new in CloudQuery Plugin Protocol v3
July 10, 2023
We are thrilled to announce the release of the new gRPC Plugin Version v3 (opens in a new tab), which brings exciting enhancements to writing CloudQuery plugins!
The gRPC protocol (opens in a new tab) is the underlying protocol which enables CloudQuery plugins to be decoupled from one another, which is crucial for a system with an unlimited number of integrations.
This blog covers the gRPC changes. For language-specific changes, see Go SDK V4.
Overview
Let's begin with a high-level summary before diving deeper into each of the updates:
- Apache Arrow - protobuf V3 now fully supports Apache Arrow, which was introduced in protobuf v2 (Sources) and protobuf v1 (Destinations).
- Unified Protocol for Sources and Destinations - We have streamlined the gRPC protocol to utilize a single protocol. This allows CloudQuery plugins to function as both a source and a destination simultaneously, simplifying gRPC versioning and updates.
- Transition Sync/Write to Streaming API - We have transitioned the
Sync
andWrite
operations to a streaming-based API, enabling new use cases like Change Data Capture (CDC).
Apache Arrow
We extensively discussed this update in our previous blog post. Now, with all our destinations migrated to SDK V4, they support over 30 different Apache Arrow data types (opens in a new tab)!
All fields that are sending CloudQuery tables are encoded as Apache Arrow Schemas, and all fields sending data are Apache Arrow records (Fields are commented in the code).
Unified Protocol
Updates to plugin-pb (opens in a new tab) are vital for introducing new features and enhancing the developer experience for plugin authors and end users.
Having a single protocol version ensures better manageability of upgrades and maintains backward compatibility, providing users and developers with ample time for transitioning. It also enables plugin authors to create plugins that function as both sources and destinations.
Moreover, any of the CloudQuery destinations can now be utilized as backends for incremental syncs. For example, you can now specify the following for sources with incremental syncs:
kind: source
spec:
name: aws
path: cloudquery/aws
version: "v22.15.2"
destinations: ["postgresql"]
spec:
backend_options:
table_name: "test_state_table"
connection: "@@plugins.postgresql.connection"
...
---
kind: destination
spec:
name: postgresql
path: "cloudquery/postgresql"
version: "v6.1.2"
spec:
connection_string: ....
@@plugins.X
can reference any of the destinations specified in your config.
Lastly, this also opens the door for more easily creating a CloudQuery SDK for new languages, so stay tuned!
Streaming API
One noteworthy use case on top of CloudQuery is Change Data Capture (CDC), which involves operations like creating or deleting tables during a sync.
With the new streaming support, the Write
method now supports three messages:
MessageMigrateTable
- Migrate table to a specific target table safely or force (dropping and re-creating)MessageInsert
- Insert Arrow RecordMessageDeleteStale
- This is a specific message that might be generalized later to a more general delete message. This indicates a plugin to delete data from a table with_cq_source_name
=source_name
and_cq_sync_time
<sync_time
.
With this change to streaming that contains messages, we can easily extend the protocol in a backward-compatible way to support more messages like DeleteTable
, DeleteRecord
to support use cases such as CDC or other use cases that require those kinds of data updates.
Multiple messages, such as CreateTable
, can easily be accommodated to enhance and facilitate such use cases. All available messages are available here (opens in a new tab).
Migration Guide
To learn how to migrate Go plugins, please take a look at Go SDK V4 Update.
Summary
We are thrilled about this major release! Don't hesitate to try your hand at developing a new CloudQuery plugin. Feel free to reach out to us on GitHub (opens in a new tab) or Discord (opens in a new tab) with any questions or feedback.