Apache Pulsar Source for Numaflow
This document details the setup of an Apache Pulsar source within a Numaflow pipeline. The integration of Apache Pulsar as a source allows for the efficient ingestion of data from Apache Pulsar topics into a Numaflow pipeline for further processing and routing to various sinks like Redis.
Quick Start
Follow this guide to set up an Apache Pulsar source in your Numaflow pipeline.
Prerequisites
Step-by-Step Guide
Ensure that you have an Apache Pulsar topic and a corresponding subscription ready for use. You can create these using the Pulsar admin CLI or the Pulsar dashboard.
2. Deploy a Numaflow Pipeline with Apache Pulsar Source
Create a Kubernetes manifest (e.g., pulsar-source-pipeline.yaml
) with the following configuration:
apiVersion: numaflow.numaproj.io/v1alpha1
kind: Pipeline
metadata:
name: apache-pulsar-source
spec:
limits:
readBatchSize: 10
vertices:
- name: in
source:
udsource:
container:
image: "quay.io/numaio/numaflow-go/apache-pulsar-source-go:latest"
env:
- name: PULSAR_TOPIC
value: "test-topic"
- name: PULSAR_SUBSCRIPTION_NAME
value: "test-subscription"
- name: PULSAR_HOST
value: "pulsar-broker.numaflow-system.svc.cluster.local:6650"
- name: PULSAR_ADMIN_ENDPOINT
value: "http://pulsar-broker.numaflow-system.svc.cluster.local:8080"
- name: log-sink
sink:
log: {}
edges:
- from: in
to: log-sink
Replace test-topic
, test-subscription
, and the Pulsar host details with your specific Apache Pulsar configuration.
Apply the configuration to your cluster:
kubectl apply -f pulsar-source-pipeline.yaml
3. Verify the Pipeline
Check the Numaflow pipeline's logs to ensure that it successfully connects to Apache Pulsar and ingests data from the specified topic.
4. Clean up
To delete the Numaflow pipeline:
kubectl delete -f pulsar-source-pipeline.yaml
# Remove Docker volumes
# Replace with your volume names
docker volume rm pulsar-data-volume pulsar-config-volume
Additional Resources