Techno Blender
Digitally Yours.
Browsing Tag

Kafka

A Critical Detail About Kafka Partitioners

Apache Kafka is the de facto standard for event streaming today. Part of what makes Kafka so successful is its ability to handle tremendous volumes of data, with a throughput of millions of records per second, not unheard of in production environments. One part of Kafka's design that makes this possible is partitioning.   Kafka uses partitions to spread the load of data across brokers in a cluster, and it's also the unit of parallelism; more partitions mean higher throughput. Since Kafka works with key-value pairs,…

HLD or High Level System Design of Apache Kafka Startup

Apache Kafka is a distributed data store optimized for ingesting and lower latency processing streaming data in real time. It can handle the constant inflow of data sequentially and incrementally generated by thousands of data sources.Why use Kafka in the first place?Let’s look at the problem that inspired Kafka in the first place on Linkedin. The problem is simple: Linkedin was getting a lot of logging data, like log messages, metrics, events, and other monitoring/observability data from multiple services. They wanted to…

Introducing Quix Streams: an open-source Python library for Kafka | by Tomáš Neubauer | Mar, 2023

Easily produce and consume time-series data streams with a Pandas-like interfaceImage by authorYou might be wondering why the world needs another Python framework for Kafka. After all, there are a lot of existing libraries and frameworks to choose from, such as kafka-python, Faust, PySpark, and so on.The focus of Quix Streams however is time-series and telemetry data, so the features are optimized for telemetry-related use cases. This could be device telemetry (it was originally road-testing on sensor data from Formula 1…

Comparing the Top 3 Schema Management Tools

Before deepening into the different supporting technologies, let’s create a baseline about schemas and message brokers or async server-server communication. Schema = Struct. The shape and format of a “message” are built and delivered between different applications/services/electronic entities. Schemas can be found in SQL and No SQL databases, in different shapes of the data the database expects to receive (for example, first_name:string, first.name, etc..). An unfamiliar or noncompliant schema will result in a drop, and…

Using Apache Kafka for Data Streaming | by Wei-Meng Lee | Mar, 2023

Learn how to install and use Kafka to send and receive messagesPhoto by Patrick Perkins on UnsplashApache Kafka is an open source application used for real-time streaming of big data. It is a publish-subscribe messaging system where you can use it to send messages between processes, applications, and servers. The following diagram shows a high-level architecture overview of Apache Kafka:All images by authorUnlike other messaging systems, Kafka also have additional features such as partitioning, replication, and has higher…

Custom Kafka metrics using Apache Spark PrometheusServlet | by Vitor Teixeira | Feb, 2023

Creating and exposing custom Kafka Consumer Streaming metrics in Apache Spark using PrometheusServletPhoto by Christin Hume on UnsplashIn this blog post, I will describe how to create and enhance current Spark Structured Streaming metrics with Kafka consumer metrics and expose them using the Spark 3 PrometheusServlet that can be directly targeted by Prometheus. In previous Spark versions, one must set up either a JmxSink/JmxExporter, GraphiteSink/GraphiteExporter, or a custom sink deploying metrics to a PushGateway…

How to send tabular time series data to Apache Kafka with Python and Pandas | by Tomáš Neubauer | Jan, 2023

Learn now to produce and consume data in Kafka using a sample log of online retail transactionsPhoto by Tech Daily on UnsplashTime-series data comes in all shapes and sizes and it’s often produced in high frequencies in the form of sensor data and transaction logs. It’s also produced in huge volumes where the records are separated by milliseconds rather than hours or days.But what kind of system that can handle such a constant stream of data? An older approach would be to dump the raw data in Data Lake and process it in…

Nell Zink: ‘Guys who like Kafka are insufferable’ | Fiction

Was it important to you to write unsatirically about a millennial protagonist?She’s generation Z, I think, but I don’t think of the struggles of young people without money as amusing enough to satirise. By the standards of most American novelists, I’m from the wrong side of the tracks. Most people doing this job are solidly middle class and they have anxieties that I don’t feel about the kind of disgrace poverty would be. That’s something I could satirise, whereas the struggles of someone like Bran, who’s down and out…

Building a Data Mesh on the Kafka Ecosystem | by Sven Balnojan | Dec, 2022

A Data Mesh for data-heavy-product companies.(That’s how I picture a data mesh on kafka to look like. Photo by NASA on Unsplash)Finding a good technical architecture for a data mesh is hard and a very company-specific process. Kafka-based data meshes are a great choice for companies that build data-heavy products already.If your software engineers won’t be comfortable processing heavy amounts of data, and are not able to get into the Kafka ecosystem, then this data mesh technical architecture is not for you.But if you do…

Streaming Iceberg Table, an Alternative to Kafka? | by Jean-Claude Cote | Dec, 2022

Spark Structured Streaming supports a Kafka source and a file source, meaning it can treat a folder as a source of streaming messages. Can a solution, entirely based on files, really compare to a streaming platform such as Kafka?Photo by Edward Koorey on UnsplashIn this article, we explore using an Iceberg table as a source of streaming messages. To do this, we create a Java program that writes messages to an Iceberg table and a pySpark Structured Streaming job that reads these messages.Azure Event Hubs is a big data…