Setup multi node Cassandra Cluster Debian

If you want to use separate disk for cassandra add a disk, format it and mount it on /var/lib/cassandra/

Bash

mkfs.ext4 /dev/sdb
mkdir /var/lib/cassandra
nano /etc/fstab
/dev/sdb /var/lib/cassandra ext4 defaults 0 0
mount -a

In case you want to resize the disk you don’t need to shutdown the server, extend the disk in your virtual environment, vmware, proxmox or whatever you use and run

Bash

resize2fs /dev/sdb

Let’s start with adding the apt source for the cassandra.

Bash

echo "deb https://debian.cassandra.apache.org 41x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
sudo apt-get update
sudo apt-get install cassandra

Stop the Cassandra service and remove everything Cassandra creates by default under /var/lib/cassandra

Bash

sudo service cassandra stop
sudo rm -rf /var/lib/cassandra/*

Now let’s modify the Cassandra yaml configuration file

Bash

nano /etc/cassandra/cassandra.yaml

YAML

cluster_name: 'MyCassandraCluster'
num_tokens: 256
seed_provider:

class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "10.10.0.120"
listen_address: 10.10.0.120
rpc_address: 10.10.0.120
endpoint_snitch: SimpleSnitch

You can pick to set IP Address or set the interface such as “eth0”

You can add more seeds in the “seeds” fields or just keep one, however the seeds are like “master” where all the information is spread from to the other nodes that would be outside of the seeds value.

Now edit cassandra-rackdc.properties

Bash

nano /etc/cassandra/cassandra-rackdc.properties

dc=DC1
rack=RAC1

Now let’s start Cassandra.

Bash

systemctl start cassandra

We should now see some files getting created in /var/lib/cassandra and the logs should generate some info, check the /var/log/cassandra/system.log

Running nodetool status will show you the status of the cassandra, it will also show all the nodes, in this case we only have 1 node.

Bash

root@cassandra-node1:~# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load        Tokens  Owns (effective)  Host ID                               Rack
UN  10.10.0.120  23.88 MiB  254     99.9%             8d07dc57-5832-43fe-8f92-f03c0c8c43e9  RAC1

Before we try to insert some data with python let’s create our keyspace. Connect to the node with cqlsh

Bash

root@cassandra-node1:~# cqlsh 10.10.0.120

To create a keyspace run the following command, adjust the name and replication as you wish.

Bash

CREATE KEYSPACE dem_linux
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};

When this is done, let’s now try to insert some data in to that keyspace.

Bash

apt install python3-pip
pip install cassandra-driver

Now let’s create a new python file, I will call it dem-linux.py and paste the following code, adjust the cluster IP 10.10.0.120 > to your IP and the “dem_linux” to your keyspace.

Python

from cassandra.cluster import Cluster
from cassandra.query import BatchStatement
import uuid

# Connect to your Cassandra cluster
cluster = Cluster(['10.10.0.120'])
session = cluster.connect('dem_linux')

# Create a table if it doesn't exist
create_table_query = """
CREATE TABLE IF NOT EXISTS your_table (
   id UUID PRIMARY KEY,
   data text
)
"""
session.execute(create_table_query)

# Define the batch size for data insertion (adjust as needed)
batch_size = 10000
total_data_size = 900000  # Total number of rows to insert

#prepared_statement = session.prepare("INSERT INTO your_table (id, data) VALUES (?, ?")
prepared_statement = session.prepare("INSERT INTO your_table (id, data) VALUES (?, ?)")

for i in range(total_data_size):
    data = "dem_linux data #" + str(i)
    batch = BatchStatement()

    # Insert data into the batch
    batch.add(prepared_statement, (uuid.uuid4(), data))

    # Execute the batch
    session.execute(batch)

# Close the session and cluster
session.shutdown()
cluster.shutdown()

Run the script with python3 dem-linux.py and connect to the cassandra cluster with cqlsh and we can now inspect the data we are inserting with python.

Select the keyspace with following command

Bash

cqlsh> use dem_linux;

And to select from our table called “your_table” run following

Bash

cqlsh:dem_linux> SELECT * FROM your_table;

The output should be:

Bash

id | data
--------------------------------------+------------------------
530acbb5-8a3a-4e4b-a5dc-4323d49b20d2 | dem_linux data #76491
cca35035-f6a3-4511-a53a-0c2a4bd520d3 | dem_linux data #275773
c40af766-5afe-4610-a3d6-e72a1f0dd534 | dem_linux data #262751
03a70955-02d3-4f49-bedf-65ecb5f8b115 | dem_linux data #166656
2ea0a41a-9c56-4545-9e30-c1d47af808fe | dem_linux data #268583
83c1fe80-d22f-4801-8752-2d9b6709a477 | dem_linux data #218708
d1bd3e5a-730f-4287-a18c-ba6568c48570 | dem_linux data #463446
f57c207c-847c-4be3-a7db-e46a7dea09d1 | dem_linux data #189919
fab727f7-fdaf-4834-b131-a81d9e153b16 | dem_linux data #16194
ac55be94-f59d-44a8-8673-5e514073396e | dem_linux data #389065

Now if you add more nodes this data will spread around in the cluster. If you leave the script and check “nodetool status” you will see that the cluster is getting more data.

Now if you want to add more nodes, you would do install cassandra in the same way but change the LISTEN IP and leave the SEEDS to be same. Or even add the second node to the seeds IP as well. You should have a few nodes as SEEDS but not all.

Now let’s add one more node so we get a multi node cluster.

Run the installation process and we will only modify the configuration file.

Bash

nano /etc/cassandra/cassandra.yaml

Now modify the listen IP on the other node.

YAML

cluster_name: 'MyCassandraCluster'
num_tokens: 256
seed_provider:

class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "10.10.0.120"
listen_address: 10.10.0.121
rpc_address: 10.10.0.121
endpoint_snitch: SimpleSnitch

Last part would be to edit

Bash

nano /etc/cassandra/cassandra-rackdc.properties

When that is done, start the Cassandra on the second node.

Now start the Cassandra on node 2 and we should soon see the node when running

Bash

nodetool status

Bash

root@cassandra-node1:~# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load        Tokens  Owns (effective)  Host ID                               Rack
UN  10.10.0.240  352.15 MiB  254     73.7%             8d07dc57-5832-43fe-8f92-f03c0c8c43e9  RAC1
UN  10.10.0.237  353.16 MiB  254     60.4%             dafe78cd-b255-4631-9a81-68400921490e  RAC1
UN  10.10.0.238  371.25 MiB  254     65.9%             dd3b7396-a49c-4c29-85c2-46353eba7154  RAC1

In my case I have 3 nodes but in your case you would see two servers, server 10.10.0.120 and 10.10.0.121

So if you would run the script again you can then see how the data reaches all the nodes.

Post Views: 3,817

Setup multi node Cassandra Cluster Debian

By dem-linux

One thought on “Setup multi node Cassandra Cluster Debian”

Leave a Reply Cancel reply

You Missed

Setup multi node Cassandra Cluster Debian

Install WordPress on Debian 12/13

Bypass Netflix sharing policy with WireGuard

SET UP PATRONI POSTGRESQL 16 HA CLUSTER

Setup multi node Cassandra Cluster Debian

By dem-linux

Related Post

One thought on “Setup multi node Cassandra Cluster Debian”

Leave a Reply Cancel reply

You Missed

Setup multi node Cassandra Cluster Debian

Install WordPress on Debian 12/13

Bypass Netflix sharing policy with WireGuard

SET UP PATRONI POSTGRESQL 16 HA CLUSTER