Big data architecture is the foundation for big data analytics. Think of big data architecture as an architectural blueprint of a large campus or office building. Architects begin by understanding the goals and objectives of the building project, and the advantages and limitations of different approaches. It’s not an easy task, but it’s perfectly doable with the right planning and tools.
Lambda architecture and Kappa architecture are the two most popular Big Data Architecture. Let’s go to review them.
what is Lambda architecture?
A lambda architecture is a generic, scalable, and fault-tolerant data processing architecture to address batch and speed latency scenarios as described by Nathan Marz.
The basic principles of a lambda architecture are described in the preceding diagram as per http://lambda-architecture.net.
- All data is pushed into both the batch layer and speed layer.
- The batch layer has a master dataset (immutable, append-only set of raw data) and pre-computes the batch views.
- The serving layer has batch views for fast queries.
- The speed layer compensates for processing time (to the serving layer) and deals with recent data only.
- All queries can be answered by merging results from batch views and real-time views or pinging them individually.
- Batch layer of Lambda architecture manages historical data with the fault tolerant distributed storage which ensures low possibility of errors even if the system crashes.
- It is a good balance of speed and reliability.
- Fault tolerant and scalable architecture for data processing.
- It can result in coding overhead due to involvement of comprehensive processing.
- Re-processes every batch cycle which is not beneficial in certain scenarios.
- A data modeled with Lambda architecture is difficult to migrate or reorganize.
Kappa Architecture is a simplification of Lambda Architecture. A Kappa Architecture system is like a Lambda Architecture system with the batch processing system removed. To replace batch processing, data is simply fed through the streaming system quickly.
- Kappa architecture can be used to develop data systems that are online learners and therefore don’t need the batch layer.
- Re-processing is required only when the code changes.
- It can be deployed with fixed memory.
- It can be used for horizontally scalable systems.
- Fewer resources are required as the machine learning is being done on the real time basis.
Absence of batch layer might result in errors during data processing or while updating the database that requires having an exception manager to reprocess the data or reconciliation.