DIGIT Architecture and Technical overview

DIGIT is India’s largest open source platform for Urban Governance. It is built OpenAPI (OAS 2.0) and provides API based access to variety of urban/municipal services enabling state governments and citi administrators to provide citizen services with relevant new services and also integrating existing system into the platform and run seamlessly on any commercial/on-prem cloud infrastructure with scale and speed. 

  • DIGIT is a microservices based platform

  • DIGIT follows layered architecture which increases flexibility, maintainability, and scalability. Every layer consists of set of microservices.

  • Backend Microservices are comparatively very small (SpringBoot), each service is backed into docker containers and orchestrated in Kubernetes Cluster to run with scale and speed .

  • Frontend is React, NodeJS (Web, Mobile(WebView))

  • Kafka: Every service puts their topic into kafka queue and the consumers like persister/Intexer/searcher/any other consumer will consume.

  • PostGres DB

  • ZUUL API gateway

  • ElasticSearch for the large data and analytics

  • nginx Ingress to route the traffic

 

Highlights:

  • DIGIT is a microservices based platform which is built to scale, microservices are small, autonomous and developer friendly services that work together.

  • A big software or system can be broken down into multiple small components or services. These components can be designed, developed & deployed independently without compromising the integrity of the application.

  • Parallelism in development: Microservices architectures are mainly business centric.

  • MicroServices have smart endpoints that process info and apply logic. They receive requests, process them, and generate a response accordingly.

  • Decentralized control between teams, so that its developers strive to produce useful tools that can then be used by others to solve the same problems.

  • MicroServices architecture allows its neighbouring services to function while it bows out of service. This architecture also scales to cater to its clients’ sudden spike in demand.

  • MicroService is ideal for evolutionary systems where it is difficult to anticipate the types of devices that may be accessing our application.

 

Multilayer Architecture or N-Tier Architecture:

Digit follows Multilayer or n-tiered distributed architecture pattern. As you can see above there are different horizontal layer with some set of components eg. Data Access Layer, Infra services, Business Services, layer of different modules, client Apps along with some vertical adapters. Every layer is consist of set of microservices. Each layer of the layered architecture pattern has a specific role and responsibility within the application.

  • Layered architecture increases flexibility, maintainability, and scalability.

  • Multiple applications can reuse the components.

  • Parallelism

  • Different components of the application can be independently deployed, maintained, and updated, on different time schedules.

  • Layered architecture also makes it possible to configure different levels of security to different components

  • Layered architecture also helps you to test the components independently of each other.


Read/Write Frequency Bias (Kafka - asynchronous):

Digit has a write heavy platform that benefits from asynchronous execution. It removes the need to wait for a response thereby decoupling the execution of two or more services. Asynchronous communication is based on AMQP (Advanced Message Queuing Protocol). The client or service usually doesn't wait for a response. It just sends the message as and when sending a message to a Kafka queue or any other message broker. Digit uses Kafka as message broker.

 

Persister:

Persister is a service running independently that simply reads the designated kafka topics and put the messages in DB. By design we write a yml configuration and put the file path in application.properties.

Features supported: 

  • Insert/Update Incoming Kafka messages to Database.

  • Add Modify kafka msg before putting it into database

 

Indexer:

Indexer is a service running independently that index the various tasks of the DIGIT platform. The service reads records posted on specific kafka topics and picks the corresponding index configuration from the yml file provided by the respective module.

Features supported:

  • Multiple indexes of a record posted on a single topic

  • Provision for custom index id

  • Performs both bulk and non-bulk indexing

  • Supports custom json indexing with field mappings, Enrichment of the input object on the queue

  • Performs ES down handling

  • Application properties (application.properties of the citizen-indexer application)

  • Key : egov.indexer.yml.repo.path

As you can see in below diagram that how different services, kafka, DB, ES, Persister & indexer are working together. Services post their data to different topics, persister & indexer pull those data from kafka and push those to DB & ElasticSearch. As you can here, clients or the services are not waiting for the response whether data has been updated or inserted into DB & elasticsearch.

 

Tech Stack

  • Defined in Open API Specifications - Swagger

    1. 0

  • SpringBoot

    • REST layer

  • Kafka

    • Asynch processing for scale and extensions

  • TestNG/Postman

    • Tested with best tools

  • ElasticSearch

    • Faster search

  • Tech Agnost

    • New APIs can be built on any stack

  • ReactJS

    • Mobile First UI

  • HTML5/CSS3

  • Designed to include modularized UI (Your UI within DIGIT app)

 

Spring Boot: Though we had multiple choices of frameworks available in market but we decided to go with spring boot for our microservice development. The clear advantages of spring boots come along with the legacy of spring framework which spring boot is carrying. It is highly configurable, lightweight, easy to maintain dependencies, annotations, availability of open source jars & library and highly active open source community which really speedup our development. Interesting to know that Netflix’s all major services are developed using spring boot framework.

 

NodeJS: It is the fastest growing programming language currently and startups & enterprises have adopted it very well. We have some microservices such as Telemetry, Content Share etc. in nodeJs and planning to have more services in nodeJs in future. 

 

ReactJs: I do not think ReactJs needs any introduction here. It is the most obvious choice for frontend. We realized React’s ability to remain relevant in the market for a longtime.

ReactJs was developed by facebook in 2011 and made it open source in 2013. It is literally capturing the entire frontend market. Whether it is startup or giant tech enterprises, everyone is using ReactJs.

 

Api Gateway (Zuul): Zuul is an open source API Gateway by netflix which is an unified interface for a set of microservices so that the clients do not need to know about all the details of microservices internals. Digit uses Zuul as an edge service that proxies requests to multiple back-end services. This allows any browser, mobile app or other user interface to consume underlying services. Zuul is an open source API gateway developed by Netflix and it is one of the most popular and widely used API gateway services.

 

Kafka: Kafka is an open-source  Distributed, Replicated Messaging Queue platform developed by Linkedin. Kafka is everywhere now a days. It is the backbone of every microservices & distributed computing architecture. Kafka is the backbone of digit platform. Every stream of data goes through kafka and later on this data gets stored in persistent storage (RDBMs) and Elasticsearch.

ELK Stack or Elastic Stack (Elasticsearch, Logstash & Kibana): Elastic stack is the world's most popular open source stack for search, log management & analytics visualisation. It is the combination of 3 open source platforms Elasticsearch, Logstash & Kibana, developed by Elastic.co. It is end to end solution for searching, analysing & visualisation. It is interesting to know that tech giant like Facebook, linkedin, microsoft & netflix use ELK stack.

  • Elasticsearch is based on the Lucien search and it is mainly used for storage, indexing and search of data. You can think of it as a nosql database but it is not exactly a nosql database. The best use of elasticsearch is with persistent storage (RDBMs or NoSql DB).

  • Logstash is log pipeline which takes the data from various sources and execute various transformations. 

  • Kibana is visualisation layer on elasticsearch. Digit’s various visualisation dashboards are on kibana which are being used government, citizen & our internal team.

 

Postgresql: We use postgresql for persistent storage. Though we can use any rdbms such as mysql, mariadb etc. PostgreSQL isn't just relational, it's object-relational. This gives it some advantages over other open source SQL databases like MySQL, MariaDB and Firebird.


Tools & Technology Infra

  • Cloud - AWS, Azure, Google Cloud, On Premisis … - run anywhere

  • Containers - Docker

    • Allows Polyglot stack, Faster Deployment

  • Orchestration - Kubernetes

    • Managing cluster

  • Monitoring and Alerting - ElasticSearch/Kibana/Prometheus

  • CI/CD pipeline - Jenkins, Spinnaker

  • Repositories - github

 

Docker & Kubernetes: Maintaining hundreds of services in production could be a nightmare. If you are a developer you can mange small set of services running on your development environment for build & test but on production environment it should be more robust, reliable, easy to maintain & highly scalable. In last few years containers have changed the face of software development & deployment. Our each microservice is packed into a docker container with all it’s dependencies and deployed into docker environment. Kubernetes (K8S) which a container orchestration system helps us in automating deployments, and scaling and management of these containerized services.

 

Jenkins & spinnaker: Spinnaker is an open-source cloud agnostic multi-cloud deployment tool, which automates the deployments with built-in deployment strategies/pipelines with ACL as well as Manual approvals. It integrates with various DevOps echo systems like Git, CI (jenkins),  Docker Repos and bakes the docker/container that can be deployed across clusters and infrastructure with efficiency canary analysis or Red/Black strategies. This is how our CI-CD pipeline looks like.

 

We also use many open source tools for managing production servers & entire development life cycle such as prometheus for monitoring & alert for production servers, swagger2.0 for defining API, atlassian jira to manage development sprints, Vagrant, Ansible, Sonar, Codacy, JMeter, Postman & GIT hub as code repository.