If you want to use separate disk for cassandra add a disk, format it and mount it on /var/lib/cassandra/
mkfs.ext4 /dev/sdb
mkdir /var/lib/cassandra
nano /etc/fstab
/dev/sdb /var/lib/cassandra ext4 defaults 0 0
mount -a
In case you want to resize the disk you don’t need to shutdown the server, extend the disk in your virtual environment, vmware, proxmox or whatever you use and run
resize2fs /dev/sdb
Let’s start with adding the apt source for the cassandra.
echo "deb https://debian.cassandra.apache.org 41x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
sudo apt-get update
sudo apt-get install cassandra
Stop the Cassandra service and remove everything Cassandra creates by default under /var/lib/cassandra
sudo service cassandra stop
sudo rm -rf /var/lib/cassandra/*
Now let’s modify the Cassandra yaml configuration file
nano /etc/cassandra/cassandra.yaml
cluster_name: 'MyCassandraCluster'
num_tokens: 256
seed_provider:
class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "10.10.0.120"
listen_address: 10.10.0.120
rpc_address: 10.10.0.120
endpoint_snitch: SimpleSnitch
You can pick to set IP Address or set the interface such as “eth0”
You can add more seeds in the “seeds” fields or just keep one, however the seeds are like “master” where all the information is spread from to the other nodes that would be outside of the seeds value.
Now edit cassandra-rackdc.properties
nano /etc/cassandra/cassandra-rackdc.properties
dc=DC1
rack=RAC1
Now let’s start Cassandra.
systemctl start cassandra
We should now see some files getting created in /var/lib/cassandra and the logs should generate some info, check the /var/log/cassandra/system.log
Running nodetool status will show you the status of the cassandra, it will also show all the nodes, in this case we only have 1 node.
root@cassandra-node1:~# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.10.0.120 23.88 MiB 254 99.9% 8d07dc57-5832-43fe-8f92-f03c0c8c43e9 RAC1
Before we try to insert some data with python let’s create our keyspace. Connect to the node with cqlsh
root@cassandra-node1:~# cqlsh 10.10.0.120
To create a keyspace run the following command, adjust the name and replication as you wish.
CREATE KEYSPACE dem_linux
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
When this is done, let’s now try to insert some data in to that keyspace.
apt install python3-pip
pip install cassandra-driver
Now let’s create a new python file, I will call it dem-linux.py and paste the following code, adjust the cluster IP 10.10.0.120 > to your IP and the “dem_linux” to your keyspace.
from cassandra.cluster import Cluster
from cassandra.query import BatchStatement
import uuid
# Connect to your Cassandra cluster
cluster = Cluster(['10.10.0.120'])
session = cluster.connect('dem_linux')
# Create a table if it doesn't exist
create_table_query = """
CREATE TABLE IF NOT EXISTS your_table (
id UUID PRIMARY KEY,
data text
)
"""
session.execute(create_table_query)
# Define the batch size for data insertion (adjust as needed)
batch_size = 10000
total_data_size = 900000 # Total number of rows to insert
#prepared_statement = session.prepare("INSERT INTO your_table (id, data) VALUES (?, ?")
prepared_statement = session.prepare("INSERT INTO your_table (id, data) VALUES (?, ?)")
for i in range(total_data_size):
data = "dem_linux data #" + str(i)
batch = BatchStatement()
# Insert data into the batch
batch.add(prepared_statement, (uuid.uuid4(), data))
# Execute the batch
session.execute(batch)
# Close the session and cluster
session.shutdown()
cluster.shutdown()
Run the script with python3 dem-linux.py and connect to the cassandra cluster with cqlsh and we can now inspect the data we are inserting with python.
Select the keyspace with following command
cqlsh> use dem_linux;
And to select from our table called “your_table” run following
cqlsh:dem_linux> SELECT * FROM your_table;
The output should be:
id | data
--------------------------------------+------------------------
530acbb5-8a3a-4e4b-a5dc-4323d49b20d2 | dem_linux data #76491
cca35035-f6a3-4511-a53a-0c2a4bd520d3 | dem_linux data #275773
c40af766-5afe-4610-a3d6-e72a1f0dd534 | dem_linux data #262751
03a70955-02d3-4f49-bedf-65ecb5f8b115 | dem_linux data #166656
2ea0a41a-9c56-4545-9e30-c1d47af808fe | dem_linux data #268583
83c1fe80-d22f-4801-8752-2d9b6709a477 | dem_linux data #218708
d1bd3e5a-730f-4287-a18c-ba6568c48570 | dem_linux data #463446
f57c207c-847c-4be3-a7db-e46a7dea09d1 | dem_linux data #189919
fab727f7-fdaf-4834-b131-a81d9e153b16 | dem_linux data #16194
ac55be94-f59d-44a8-8673-5e514073396e | dem_linux data #389065
Now if you add more nodes this data will spread around in the cluster. If you leave the script and check “nodetool status” you will see that the cluster is getting more data.
Now if you want to add more nodes, you would do install cassandra in the same way but change the LISTEN IP and leave the SEEDS to be same. Or even add the second node to the seeds IP as well. You should have a few nodes as SEEDS but not all.
Now let’s add one more node so we get a multi node cluster.
Run the installation process and we will only modify the configuration file.
nano /etc/cassandra/cassandra.yaml
Now modify the listen IP on the other node.
cluster_name: 'MyCassandraCluster'
num_tokens: 256
seed_provider:
class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "10.10.0.120"
listen_address: 10.10.0.121
rpc_address: 10.10.0.121
endpoint_snitch: SimpleSnitch
Last part would be to edit
nano /etc/cassandra/cassandra-rackdc.properties
When that is done, start the Cassandra on the second node.
Now start the Cassandra on node 2 and we should soon see the node when running
nodetool status
root@cassandra-node1:~# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.10.0.240 352.15 MiB 254 73.7% 8d07dc57-5832-43fe-8f92-f03c0c8c43e9 RAC1
UN 10.10.0.237 353.16 MiB 254 60.4% dafe78cd-b255-4631-9a81-68400921490e RAC1
UN 10.10.0.238 371.25 MiB 254 65.9% dd3b7396-a49c-4c29-85c2-46353eba7154 RAC1
In my case I have 3 nodes but in your case you would see two servers, server 10.10.0.120 and 10.10.0.121
So if you would run the script again you can then see how the data reaches all the nodes.
Hey people!!!!!
Good mood and good luck to everyone!!!!!