Mongo Kafka Source Connector

It is a recovery tool which restore data from Mongo DB to Druid DB. It’s stand alone application, after processing it terminates itself.

It pulls data from Mongo DB and pushes to Kafka topics. It does not do any validation on data while fetching and pushing to Kafka topic. It fetches all data from Mongo DB and pushes a single item at once. Till now, It iterates the over all the records and pushes one record at a time without any validation.

Data Validation

It generate record processing count. How many records, it founds and how many it has processed. All details we can fetch from it’s log.

After execution of the application we expect data population in Druid. For validation we have to make sure that number of records from Druid is matching to record count from Mongo-Kafka-Source-Connector log.

Deployment Details

It requires below configurations for deployment. All props and config are implemented in application itself. Only deployment of this application can start it’s process and end itself.

Environment Variables	Description
`Mongodb.collection.name`	This the Mongo DB schema name. It is being used for data pull.
`kafka.topic`	After data fetch, application pushes all data in this Kafka topic.

Note:

Before executing this application, make sure that there is no ongoing data process is executing. All data processing from iFix should be processed before and should stop to do any activity which can lead to data ingestion in Druid.

Presumptions:

Druid data should be cleaned or handled before pulling data from Mongo DB because it could lead to duplicate or spurious data.