Event hub provides a distributed stream processing platform, with low latency and seamless integration with services inside and outside of Azure.
In this article, we will have look at event hub and some basic concepts around it.
What is Event Hub ?
Event hub is very useful in real-time events processing. It can handle millions of events per seconds. By default, the event hubs store messages for 1 day. But they can be configured to hold messages for up to 7 days.
Event hubs messages can be easily logged to storage account or they can be easily integrated with stream analytics.
Also, this is a PaaS offering, and hence configuring this in Azure and using it in your application is amazingly easy.
Also, it is very easy to use event hub with Java, C, Go, Node.JS or python frameworks.
The following scenarios are some of the scenarios where you can use Event Hubs:
- Anomaly detection (fraud/outliers)
- Application logging
- Analytics pipelines, such as clickstreams
- Live dashboarding
- Archiving data
- Transaction processing
- User telemetry processing
- Device telemetry streaming
Architecture and Basic Concepts
The below diagram shows high level architecture of event hub. The important terms in the diagram are explained in this section.
Any entity that sends data to event hub.
Each consumer reads only specific subset of messaging stream. Each subset is called as partition.
A view (state, position, or offset) of an entire event hub. Each consuming application can have their own consumer group and they can read messages independently, at their own pace.
Pre-purchased units of capacity that control the throughput capacity of Event Hubs.
The Event Hubs service delivers events through a session as they become available. All Kafka consumers connect via the Kafka protocol 1.0 and later.
This section highlights some of the important features related to event hubs:
Unlike Apache Kafka, Azure event hub is completely PaaS based. You do not need to create/maintain any clusters. Because of this, you can solely focus on your business applications.
Real-Time and Batch Processing
With Azure event hub, you can ingest, buffer, store, and process your stream in real time to get actionable insights. Event hubs also uses partitions to let consumers process events independently, at their own pace.
With Event Hubs, you can start with data streams in megabytes, and grow to gigabytes or terabytes. The Auto-inflate feature is one of the many options available to scale the number of throughput units to meet your usage needs.
You can enable Capture from the Azure portal, and specify a minimum size and time window to perform the capture. Using Event Hubs Capture, you specify your own Azure Blob Storage account and container, or Azure Data Lake Service account, one of which is used to store the captured data. Captured data is written in the Apache Avro format.
Event hub also provides a Kafka endpoint, which can be used by your existing applications which talk to Apache Kafka – as an alternative to running Apache Kafka server. Event Hubs supports Apache Kafka protocol 1.0 and later.
I hope you found this information useful. Please do comment and let me know your thoughts and let me know your experiences with Apache Kafka or Event hubs.