Saturday, August 30, 2025

Apache Kafka Series with Real Word Example

What is Apache Kafka?

Apache Kafka is an open-source distributed event streaming platform.

Kafka as a Messaging System: It allows systems to communicate with each other by sending messages. However, Kafka can handle much more data at a much larger scale than typical messaging systems.

Real-Time Streaming: Kafka handles data in real-time, meaning the moment data is generated, it can be sent to systems for processing.

Key Components of Kafka

There are several key concepts in Kafka

Producer: This is an application that sends data (messages) to Kafka. Imagine you have a website where users post comments. The website (producer) sends the comment data to Kafka.

Consumer: This is an application that reads data (messages) from Kafka. For example, an application that analyzes user comments or a reporting system that looks at the comments data would be a consumer.

Broker: Kafka runs on a cluster of servers called brokers. Brokers are responsible for receiving messages from producers and storing them until consumers fetch them. Think of a broker as a middleman that stores and distributes the data.

Topic: Kafka stores data in categories called "topics." You can think of topics as a channel where messages are grouped. For instance, you could have a topic called user-comments that stores all the comments from your website.

Partition: Topics are divided into smaller chunks called partitions. Each partition can be stored on different brokers, which helps Kafka to scale and balance the load. Think of partitions like sub-folders inside your main folder (the topic).

Offset: Kafka keeps track of each message in a partition with an offset. This is a unique ID that allows consumers to know where to start reading from. This helps consumers to pick up messages where they left off.

ZooKeeper: Kafka uses Apache ZooKeeper for coordinating and managing the Kafka cluster. ZooKeeper helps Kafka to keep track of which broker is active, managing metadata, etc. However, newer versions of Kafka are moving towards removing the need for ZooKeeper.

Real Life Example of Kafka

Let’s say you have an e-commerce website. You have several systems:

User Actions (add to cart, purchase, etc.)
Inventory Management
Email Notification System

Producer

When a user adds an item to their cart, the website's backend is the producer and sends a message to a Kafka topic called user-actions.

Broker

Kafka’s brokers store these messages. Each message will contain details like:

Item added
User ID
Timestamp

Topic

The messages from different users go into the user-actions topic. As more users interact with the website, more messages are added to this topic.

Consumer

The Inventory Management System is a consumer. It reads the messages from the user-actions topic to update stock levels.

The Email System is another consumer. It reads messages from the user-actions topic to send an email to the user confirming their action .

Why Use Apache Kafka?

Scalability: Kafka can handle very high throughput, making it ideal for applications that need to process large amounts of data continuously.

Fault Tolerance: Kafka stores copies of messages across multiple brokers. If one broker goes down, Kafka ensures that the data is still available from other brokers.

Real-Time Data Processing: Since Kafka can process data in real-time, it’s commonly used in scenarios like stream processing, monitoring, or logging.

Monday, July 7, 2025

Mastering Selenium Automation Framework: From Zero to Hero with Real-World Strategy

Automation is no longer a luxury — it's a necessity in today’s fast-paced development cycles. Whether you're a manual tester aiming to scale up, or an automation engineer looking to refine your framework, mastering Selenium can open powerful opportunities.

build test automation framework with selenium

What is Selenium and Why Should You Care?

Selenium is a game-changing open-source suite that empowers you to automate web applications across different browsers and platforms.

Framework Structure That Scales

Modular Design using Page Object Model (POM)
Data-Driven Testing (Excel, JSON, or Database)
Cross-browser Support
Reusable Utility Libraries
CI/CD Integration (Jenkins, GitHub Actions)
Robust Reporting (ExtentReports, Allure)
Cloud Execution Support (BrowserStack, LambdaTest)

Best Practices You Can’t Ignore

Use Explicit Waits – Say goodbye to flaky tests
Keep locators maintainable – One broken XPath should not crash your suite
Separate test logic from test data
Embrace version control (Git is your best friend)
Automate builds and triggers in your CI pipeline
Document everything, especially for team scalability

Wednesday, March 13, 2024

Python: Functions Cheatsheet Part-1

Functions

A function is like a mini program within a program that performs a specific task.

Why functions?

Creating a new function gives you an opportunity to name a group of statements, which makes your program easier to read, understand, and debug.
A major purpose of functions is to group code that gets executed multiple times. Without a function defined, you would have to copy and paste this code each time.

https://w3bigdata.blogspot.com/2024/03/python-if-ifelse-statement-with-examples.html

Common Builtin Functions:

● abs()

● help()

● min()

● max()

● hex() hexadecimal representation of an integer

● bin() binary

● oct() octal

● id()

● input()

● int()

● float()

● str()

● print()

● bool()

● range()

● round()

● pow()

● sum()

● ord()

● len()

● type()

Common str methods:

● capitalize()

● count()

● encode()

● endswith()

● expandtabs()

● find()

● format()

● isalpha()

● isdigit()

● isdecimal()

● islower()

● isupper()

● join()

● startswith()

● swapcase()

● title()

User defined functions

Definition of a function

Syntax

def functionName(parameters) : statement(s)

● The keyword def introduces a function definition.

● It must be followed by the function name and the parenthesized list of formal parameters.

● There can be number of arguments in a function.

● The statements that form the body of the function start at the next line, and must be indented.

https://w3bigdata.blogspot.com/2024/03/python-data-types-complete-guide.html

Tuesday, March 12, 2024

Python if, if...else Statement With Examples

Step 1: The if Statement

The if statement is the basic form of control flow in Python. It allows for conditional execution of a statement or group of statements based on the value of an expression.

Step 2: The Condition

After the if keyword, you provide a condition. This condition is an expression that can be evaluated as True or False. Python will execute the body of the if only when the condition is True.

Step 3: The Body

If the condition is True, Python executes the block of code that's indented under the if statement. If the condition is False, the code block under the if is skipped.

Step 4: The else and elif Statements (Optional)

else: You can follow an if statement with an optional else block which is executed when the if condition is False.

elif (else if): It's used for multiple conditions to provide additional checks.

Example: Choosing a Path Based on Age

Imagine you're writing a program that suggests activities based on a person's age.

age = 18 # Let's assume the person is 18 years old

if age < 13:
print("You can play in the kids' area.")
elif age < 18:
print("You can join the teenagers' group.")
else:
print("Welcome to the adults' zone.")

Explanation:

First condition (if age < 13): Python checks if age is less than 13. If True, it would print "You can play in the kids' area." Since our age is 18, this condition is False, so Python skips this block.

Second condition (elif age < 18): Next, Python checks if age is less than 18. Again, since our age is 18, this condition is also False, so Python skips this block too.

Else: Since none of the previous conditions were True, Python executes the code in the else block, printing "Welcome to the adults' zone.

Python Data Types - Complete Guide

A data type is a classification that specifies what type of value a variable has and what types of operations can be performed on it. Here's a list of the main data types in Python

1. Numbers:

Integers (int):

Whole numbers, both positive and negative, without decimals. Like 3, -5, 42.

Floating-point numbers (float):

Numbers with decimal points or in exponential form, such as 3.14, -0.001, or 2e2

2. Text:

Strings (str):

Text or characters. Anything inside quotes is a string, like "Hello!", "1234", or even "True".

3. Boolean:

Booleans (bool):

Represents truth or falsehood. Only two possible values: True or False. Great for making decisions in your code.

4. Sequences:

Lists (list):

A list of items in an order. You can change the items (mutable). For example, [1, 'apple', True].

Tuples (tuple):

Like a list, but you can't change it after making it (immutable). Useful for fixed data. Example: (1, 'banana', False).

5. Mappings:

Dictionaries (dict):

Stores pairs of items as keys and values. Like a real dictionary with words and their meanings. Example: {'name': 'John', 'age': 30}.

6. Sets:

Sets (set):

A collection of unique items, without order. Great for when you need to ensure no duplicates. Example: {1, 2, 3, 4}.

7. None Type:

None (NoneType):

Special type representing the absence of a value, or "nothing". It's like saying a variable is empty or has no value.

Wednesday, March 6, 2024

Big Data - Hadoop (HDFS) File Blocks and Replication Factor

The Hadoop Distributed File System (HDFS) uses a block replication mechanism to ensure data reliability and availability across the cluster. Each file in HDFS is split into one or more blocks, and these blocks are stored on different DataNodes in the Hadoop cluster.

The replication factor is a configuration setting that determines the number of copies (replicas) of each block to be stored across different DataNodes. The default replication factor in HDFS is three, meaning that for every block of data, three copies are made and stored on three separate DataNodes.

Administrators can change the default replication factor according to the importance of the data and the storage capacity of the cluster. the replication factor in HDFS is a crucial parameter that impacts data reliability, availability, and storage efficiency in a Hadoop cluster.

HDFS Architecture with Example

HDFS

HDFS, or the Hadoop Distributed File System, is designed to store very large files across multiple machines in a large cluster of computers. Think of it as a way to store and access huge amounts of data by spreading it out over many computers, making it easier to handle and process big data. Let's break down the HDFS architecture into simpler terms and use an example to understand how it works.

Basic Components

HDFS has two main types of components:

NameNode: This is like the manager or directory of the system. The NameNode keeps track of where the pieces of your files are stored across the cluster. It doesn't store the actual data but has a map or index that tells which part of your data is on which computer.
DataNodes: These are the workers. Each DataNode stores a part of your data. Your big file is divided into smaller pieces (called blocks), and these blocks are stored on different DataNodes. This way, your data is spread across many machines.

Working Together

When you want to store a big file, the NameNode decides how to split the file into blocks and tells the DataNodes to store these blocks. If you want to access your file, the NameNode tells you which DataNodes have the parts you need, and then you can put the pieces together to get your file back.

Example

Imagine you have a very large photo album you want to keep in a library. The library (HDFS) decides to photocopy each photo (data block) and store each copy in different drawers (DataNodes) across several rooms (machines in the cluster). The catalog card (NameNode) doesn't hold the photos themselves but tells you exactly in which drawer and room you can find each photo. When you want to see your photo album, the library checks the catalog card and guides you to all the drawers in different rooms where the photos are kept. You can then collect the photocopies from each drawer to view your entire album.