If you want to use separate disk for cassandra add a disk, format it and mount it on /var/lib/cassandra/

Bash
mkfs.ext4 /dev/sdb
mkdir /var/lib/cassandra
nano /etc/fstab
/dev/sdb /var/lib/cassandra ext4 defaults 0 0
mount -a

In case you want to resize the disk you don’t need to shutdown the server, extend the disk in your virtual environment, vmware, proxmox or whatever you use and run

Bash
resize2fs /dev/sdb

Let’s start with adding the apt source for the cassandra.

Bash
echo "deb https://debian.cassandra.apache.org 41x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
sudo apt-get update
sudo apt-get install cassandra

Stop the Cassandra service and remove everything Cassandra creates by default under /var/lib/cassandra

Bash
sudo service cassandra stop
sudo rm -rf /var/lib/cassandra/*

Now let’s modify the Cassandra yaml configuration file

Bash
nano /etc/cassandra/cassandra.yaml
YAML
cluster_name: 'MyCassandraCluster'
num_tokens: 256
seed_provider:

class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "10.10.0.120"
listen_address: 10.10.0.120
rpc_address: 10.10.0.120
endpoint_snitch: SimpleSnitch

You can pick to set IP Address or set the interface such as “eth0”

You can add more seeds in the “seeds” fields or just keep one, however the seeds are like “master” where all the information is spread from to the other nodes that would be outside of the seeds value.

Now edit cassandra-rackdc.properties

Bash
nano /etc/cassandra/cassandra-rackdc.properties
dc=DC1
rack=RAC1

Now let’s start Cassandra.

Bash
systemctl start cassandra

We should now see some files getting created in /var/lib/cassandra and the logs should generate some info, check the /var/log/cassandra/system.log

Running nodetool status will show you the status of the cassandra, it will also show all the nodes, in this case we only have 1 node.

Bash
root@cassandra-node1:~# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load        Tokens  Owns (effective)  Host ID                               Rack
UN  10.10.0.120  23.88 MiB  254     99.9%             8d07dc57-5832-43fe-8f92-f03c0c8c43e9  RAC1

Before we try to insert some data with python let’s create our keyspace. Connect to the node with cqlsh

Bash
root@cassandra-node1:~# cqlsh 10.10.0.120

To create a keyspace run the following command, adjust the name and replication as you wish.

Bash
CREATE KEYSPACE dem_linux
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};

When this is done, let’s now try to insert some data in to that keyspace.

Bash
apt install python3-pip
pip install cassandra-driver

Now let’s create a new python file, I will call it dem-linux.py and paste the following code, adjust the cluster IP 10.10.0.120 > to your IP and the “dem_linux” to your keyspace.

Python
from cassandra.cluster import Cluster
from cassandra.query import BatchStatement
import uuid

# Connect to your Cassandra cluster
cluster = Cluster(['10.10.0.120'])
session = cluster.connect('dem_linux')

# Create a table if it doesn't exist
create_table_query = """
CREATE TABLE IF NOT EXISTS your_table (
   id UUID PRIMARY KEY,
   data text
)
"""
session.execute(create_table_query)

# Define the batch size for data insertion (adjust as needed)
batch_size = 10000
total_data_size = 900000  # Total number of rows to insert

#prepared_statement = session.prepare("INSERT INTO your_table (id, data) VALUES (?, ?")
prepared_statement = session.prepare("INSERT INTO your_table (id, data) VALUES (?, ?)")

for i in range(total_data_size):
    data = "dem_linux data #" + str(i)
    batch = BatchStatement()

    # Insert data into the batch
    batch.add(prepared_statement, (uuid.uuid4(), data))

    # Execute the batch
    session.execute(batch)

# Close the session and cluster
session.shutdown()
cluster.shutdown()

Run the script with python3 dem-linux.py and connect to the cassandra cluster with cqlsh and we can now inspect the data we are inserting with python.

Select the keyspace with following command

Bash
cqlsh> use dem_linux;

And to select from our table called “your_table” run following

Bash
cqlsh:dem_linux> SELECT * FROM your_table;

The output should be:

Bash
id | data
--------------------------------------+------------------------
530acbb5-8a3a-4e4b-a5dc-4323d49b20d2 | dem_linux data #76491
cca35035-f6a3-4511-a53a-0c2a4bd520d3 | dem_linux data #275773
c40af766-5afe-4610-a3d6-e72a1f0dd534 | dem_linux data #262751
03a70955-02d3-4f49-bedf-65ecb5f8b115 | dem_linux data #166656
2ea0a41a-9c56-4545-9e30-c1d47af808fe | dem_linux data #268583
83c1fe80-d22f-4801-8752-2d9b6709a477 | dem_linux data #218708
d1bd3e5a-730f-4287-a18c-ba6568c48570 | dem_linux data #463446
f57c207c-847c-4be3-a7db-e46a7dea09d1 | dem_linux data #189919
fab727f7-fdaf-4834-b131-a81d9e153b16 | dem_linux data #16194
ac55be94-f59d-44a8-8673-5e514073396e | dem_linux data #389065

Now if you add more nodes this data will spread around in the cluster. If you leave the script and check “nodetool status” you will see that the cluster is getting more data.

Now if you want to add more nodes, you would do install cassandra in the same way but change the LISTEN IP and leave the SEEDS to be same. Or even add the second node to the seeds IP as well. You should have a few nodes as SEEDS but not all.

Now let’s add one more node so we get a multi node cluster.

Run the installation process and we will only modify the configuration file.

Bash
nano /etc/cassandra/cassandra.yaml

Now modify the listen IP on the other node.

YAML
cluster_name: 'MyCassandraCluster'
num_tokens: 256
seed_provider:

class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "10.10.0.120"
listen_address: 10.10.0.121
rpc_address: 10.10.0.121
endpoint_snitch: SimpleSnitch

Last part would be to edit

Bash
nano /etc/cassandra/cassandra-rackdc.properties

When that is done, start the Cassandra on the second node.

Now start the Cassandra on node 2 and we should soon see the node when running

Bash
nodetool status
Bash
root@cassandra-node1:~# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load        Tokens  Owns (effective)  Host ID                               Rack
UN  10.10.0.240  352.15 MiB  254     73.7%             8d07dc57-5832-43fe-8f92-f03c0c8c43e9  RAC1
UN  10.10.0.237  353.16 MiB  254     60.4%             dafe78cd-b255-4631-9a81-68400921490e  RAC1
UN  10.10.0.238  371.25 MiB  254     65.9%             dd3b7396-a49c-4c29-85c2-46353eba7154  RAC1

In my case I have 3 nodes but in your case you would see two servers, server 10.10.0.120 and 10.10.0.121

So if you would run the script again you can then see how the data reaches all the nodes.

One thought on “Setup multi node Cassandra Cluster Debian”

Leave a Reply

Your email address will not be published. Required fields are marked *