Data is valuable only when there is an easy way to process and get timely insights from data sources.
Event hub provides a distributed stream processing platform, with low latency and seamless integration with services inside and outside of Azure.
In this article, we will have look at event hub and some basic concepts around it.
What is Event Hub ?
Event hub is very useful in real-time events processing. It can handle millions of events per seconds. By default, the event hubs store messages for 1 day. But they can be configured to hold messages for up to 7 days.
Event hubs messages can be easily logged to storage account or they can be easily integrated with stream analytics.
Also, this is a PaaS offering, and hence configuring this in Azure and using it in your application is amazingly easy.
Also, it is very easy to use event hub with Java, C, Go, Node.JS or python frameworks.
The following scenarios are some of the scenarios where you can use Event Hubs:
- Anomaly detection (fraud/outliers)
- Application logging
- Analytics pipelines, such as clickstreams
- Live dashboarding
- Archiving data
- Transaction processing
- User telemetry processing
- Device telemetry streaming
Architecture and Basic Concepts
The below diagram shows high level architecture of event hub. The important terms in the diagram are explained in this section.
Any entity that sends data to event hub.
Data can be sent to event hub using HTTP or AMQP protocols. You can also use Apache Kafka 1.0 or above to send data to event hub only if you have enabled Apache Kafka on event hub.
Each consumer reads only specific subset of messaging stream. Each subset is called as partition.
Generally, it is recommended to have 1:1 ration of throughput units to partitions. This partition count cannot be changed after creating event hub instance in Azure.
A view (state, position, or offset) of an entire event hub. Each consuming application can have their own consumer group and they can read messages independently, at their own pace.
Pre-purchased units of capacity that control the throughput capacity of Event Hubs.
Any entity that reads event data from an event hub. All Event Hubs consumers connect via the AMQP 1.0 session.
The Event Hubs service delivers events through a session as they become available. All Kafka consumers connect via the Kafka protocol 1.0 and later.
This section highlights some of the important features related to event hubs:
Unlike Apache Kafka, Azure event hub is completely PaaS based. You do not need to create/maintain any clusters. Because of this, you can solely focus on your business applications.
Real-Time and Batch Processing
With Azure event hub, you can ingest, buffer, store, and process your stream in real time to get actionable insights. Event hubs also uses partitions to let consumers process events independently, at their own pace.
Capture your data in near-real time in an Azure Blob storage or Azure Data Lake Storage for long-term retention or micro-batch processing.
With Event Hubs, you can start with data streams in megabytes, and grow to gigabytes or terabytes. The Auto-inflate feature is one of the many options available to scale the number of throughput units to meet your usage needs.
You can enable Capture from the Azure portal, and specify a minimum size and time window to perform the capture. Using Event Hubs Capture, you specify your own Azure Blob Storage account and container, or Azure Data Lake Service account, one of which is used to store the captured data. Captured data is written in the Apache Avro format.
Event hub also provides a Kafka endpoint, which can be used by your existing applications which talk to Apache Kafka – as an alternative to running Apache Kafka server. Event Hubs supports Apache Kafka protocol 1.0 and later.
I hope you found this information useful. Please do comment and let me know your thoughts and let me know your experiences with Apache Kafka or Event hubs.