Google introduced its Apache Kafka for BigQuery cloud service at its convention Google Cloud Subsequent 2024 in Las Vegas. Welcome to the information streaming membership becoming a member of Amazon, Microsoft, IBM, Oracle, Confluent, and others. This weblog publish explores this new managed Kafka providing for GCP, opinions the present standing of the information streaming panorama, and shares some standards to guage when Kafka normally and Google Apache Kafka particularly ought to (not) be used.
Welcome Google Apache Kafka to the Information Streaming Membership
Higher late than by no means… Google introduced a model new Apache Kafka cloud service for GCP at Google Cloud Subsequent 2024. All different main cloud suppliers have already got one, together with AWS, Azure, Oracle, IBM, and Alibaba. Numerous different software program distributors present Kafka companies, together with Confluent, Aiven, Redpanda, WarpStream, and plenty of extra. Most leverage the open-source Kafka venture as its core element, whereas others re-implement the Kafka protocol.
Apache Kafka and Apache Flink dominate the open-source knowledge streaming ecosystem. Distributors and cloud options present cloud-native choices. Some builders, knowledge engineers, and enterprise individuals nonetheless wrestle with a paradigm shift: Steady knowledge processing permits higher knowledge high quality, lowered value, and quicker time to market with modern new purposes. Kafka and Flink are a match made in heaven for knowledge streaming.
Use Instances for knowledge streaming exist throughout all industries. Google Apache Kafka for BigQuery is probably a superb match for a few of them, however not for others.
Google Apache Kafka for BigQuery — What Is It?
What’s Google Apache Kafka for BigQuery? Quoting Google’s web site: “Apache Kafka for BigQuery is a managed service that operates highly available Apache Kafka clusters. It is compatible with open source versions of Apache Kafka and includes first-party Google Cloud IAM, monitoring, logging, key management, organization policy, networking, and more.” Listed below are a couple of extra ideas:
- Asynchronous messaging with true decoupling and producers and shoppers utilizing the publish/subscribe sample is feasible with GCP proprietary service Google Pub/Sub. Why did Google now introduce a Kafka service? Limitations of Google Pub/Sub or as a result of Kafka grew to become the usual (e.g., emigrate on-premise Kafka workloads from prospects)? I assume a little bit of each.
- Google re-uses open-source Kafka as an alternative of re-implementing the Kafka protocol (like Microsoft Azure’s Occasion Hubs). I like this method as a brand new implementation at all times creates a number of new challenges like lacking completeness, delays of latest options, and sudden conduct. The compatibility with open-source Kafka is talked about a number of instances. My private assumption is that Google’s important strategic aim for the brand new Kafka service is emigrate current on-premise workloads into Google Cloud.
- I actually like that the service is safe out of the field. It’s built-in with and helps Google Cloud IAM, customer-managed encryption keys (CMEK), and Digital Non-public Cloud (VPC) from the start. That is necessary as most workloads at enterprises require this.
- Together with the time period ‘BigQuery’ is barely a advertising technique: “Data engineers often rely on Apache Kafka to build pipelines that stream data into BigQuery and other analytics systems. Apache Kafka for BigQuery can be used for real-time and batch use cases”. There isn’t any requirement to make use of BigQuery for analytics. Google’s Kafka service is usable with different analytics platforms, too.
- Google emphasizes analytics use instances in every single place round its Kafka service; NOT transactional workloads. This method is just like Amazon MSK. Hopefully, the Google phrases and circumstances do not exclude Kafka help when the service is GA (that is what MSK does — sadly, too many individuals do not learn T&C and simply use a cloud service in manufacturing).
Information Streaming Is a NEW Software program Class
Information streaming represents a brand new software program class that revolutionizes the best way companies harness and course of knowledge in real-time. In contrast to conventional batch processing strategies, knowledge streaming permits steady ingestion, evaluation, and processing of information because it flows by way of programs.
The Information Streaming Panorama 2024
Many software program corporations have emerged within the knowledge streaming class in the previous few years. And a number of other mature gamers within the knowledge market added help for knowledge streaming of their platforms or cloud service ecosystem. Most software program distributors use Kafka for his or her knowledge streaming platforms. Nonetheless, there’s greater than options powered by open-source Kafka. Some distributors solely use the Kafka protocol (e.g., Azure Occasion Hubs) or completely totally different APIs (like Amazon Kinesis).
The next Information Streaming Panorama 2024 summarizes the present standing of related merchandise and cloud companies for knowledge streaming round Kafka and extra stream processing engines.
Forrester Wave for Streaming Information and IDG MarketScape for Stream Processing
Apache Kafka grew to become the de facto customary for knowledge streaming, just like how Amazon S3 grew to become the de facto customary for object storage.
In December 2023, the analysis firm Forrester printed “The Forrester Wave™: Streaming Information Platforms, This fall 2023.” Get free entry to the report right here. The leaders are Microsoft, Google, and Confluent, adopted by Oracle, Amazon, Cloudera, and some others.
In April 2024, IDC named Confluent a frontrunner within the IDC MarketScape for Worldwide Analytic Stream Processing 2024.
It will not be a shock if we see a Gartner Magic Quadrant for Information Streaming quickly, too. Gartner stories point out Kafka and associated distributors increasingly yr by yr.
When Not To Select Google Apache Kafka for BigQuery
Qualifying out a know-how is commonly the better choice. Why consider a service if it doesn’t meet the necessities? Let’s discover when NOT to make use of Kafka in any respect, and particularly when the Google Apache Kafka service might be NOT the best alternative for you.
When Not To Use Apache Kafka
Apache Kafka has overlaps with applied sciences like a message dealer (like IBM MQ, TIBCO, or RabbitMQ), and different streaming analytics platforms, and it truly is a database, too. However Apache Kafka just isn’t an allrounder to unravel each downside.
Apache Kafka is NOT:
- A substitute in your favourite database, knowledge warehouse, or knowledge lake. As a substitute, it enhances and integrates with these platforms.
- An analytics platform for AI/ML mannequin coaching, although mannequin scoring is commonly executed inside the streaming platform for vital or low-latency use instances.
- A proxy for hundreds of shoppers in dangerous networks.
- An API Administration resolution, although you’ll be able to join REST/HTTP producers and shoppers in opposition to Kafka.
- An IoT gateway, although direct integration with IoT protocols like MQTT or OPC-UA is feasible.
- Onerous real-time for safety-critical embedded workloads.
Learn the thorough evaluation “When NOT to use Apache Kafka?” for extra particulars. Or watch this YouTube video:
When To Select One other Kafka As a substitute of Google’s
If Apache Kafka is the best alternative in your venture, you continue to have loads of choices.
Listed below are a couple of standards that allow you to simply disqualify Google Apache Kafka for BigQuery:
- Non-GCP: In case your use case requires on-premise, multi-cloud, hybrid cloud, or edge deployments, then you definitely want one other supply.
- Essential SLAs: In case you want 24/7 vital help and consulting experience, a devoted Kafka vendor like Confluent is the higher alternative. Kafka is not only for analytics, however shines for transactional workloads, too. Google’s Managed Apache Kafka service just isn’t GA but. This may most likely occur within the second half of 2024. Therefore, do not even contemplate it for vital purposes earlier than GA.
- Serverless: A managed service just isn’t at all times a really managed service. The longer term will present the place Google goes with Kafka. However proper now, Google Apache Kafka just isn’t serverless like e.g., Confluent Cloud. You pay for capability pricing and cluster capability administration is required. Amazon even created a second service Amazon MSK Serverless to deal with this concern with its conventional MSK providing.
- Full knowledge streaming platform: An information streaming platform requires extra than simply messaging: knowledge integration with first and third-party programs, stream processing for steady knowledge correlation, versatile (long-term) retention with Tiered Storage, knowledge governance, and extra. The longer term will present us the place Google’s Kafka service goes. Google is a automobile, however not (but) a Porsche (full luxurious automobile) and never but a Google Waymo (self-driving automobile stage 5). Google Apache Kafka even misses primary options for knowledge streaming finest practices, like defining knowledge contracts in schemas for constructing knowledge merchandise with good knowledge high quality.
The Evolution of Information Streaming Is Not Stopping
In case you didn’t qualify out Kafka normally or Google Apache Kafka particularly but, that is nice. Begin evaluating Google’s Managed Apache Kafka cloud service and evaluate it in opposition to self-managed open supply Kafka and different semi-managed or fully-managed Kafka cloud companies on GCP.
As we glance forward, the longer term potentialities for knowledge streaming are boundless, promising extra agile, clever, and real-time insights into the ever-increasing streams of information.
I usually get the query if I’m anxious concerning the rising competitors as I work for Confluent the place we “only do data streaming”?
No, I’m not! Truly, the brand new Google Apache Kafka cloud service is nice information for the business! Information Streaming established itself as a brand new software program class. Analysis analysts like Forrester and IDG already created devoted waves and comparisons. What may very well be higher than working with the individuals who invented Kafka and the corporate that created this software program class throughout all industries and continents? And competitors is at all times good for innovation, too.
Actual-time knowledge beats gradual knowledge. That’s true in virtually each use case. At Confluent, we are actually ~3000 individuals working solely on one factor: Information Streaming. I feel we must always have a good time this Google announcement and sit up for extra mass adoption of information streaming around the globe.
And as a strategic Google accomplice, prospects can
- Leverage GCP credit to devour Confluent Cloud
- Leverage GCPs safety and personal networking infrastructure
- Combine through totally managed connectors into numerous GCP companies like Google Huge Question or Google Cloud Storage and third-party cloud options like MongoDB, Snowflake, or Databricks.
Are you excited concerning the new Google Apache Kafka cloud service? Or do you continue to plan to make use of open-source Kafka or one other cloud service like Confluent Cloud? Let’s join on LinkedIn and talk about it! Keep knowledgeable about new weblog posts by subscribing to my publication.