Friday 23 October 2020

JDBI 3 - How to enable sql logging

 

Sometimes we would require to enable printing of sqls along with the execution time while debugging the issues or doing psr testing. 

Following are the steps to enable sql logging with time taken to execute in JDBI3:

1. Add the following code on injected Jdbi managed object for enabling sql logging:

if (config.isEnableSqlLogging()) {
SqlLogger sqlLogger = new SqlLogger() {
@Override
public void logAfterExecution(StatementContext context) {
log.info("sql {}, parameters {}, timeTaken {} ms", context.getRenderedSql(),
context.getBinding().toString(), context.getElapsedTime(ChronoUnit.MILLIS));
}
};
jdbi.setSqlLogger(sqlLogger);
}

2. Add a config parameter (optional) to control enabling of sql logging as per requirement in various environments:

enableSqlLogging : false




Sunday 18 October 2020

Puzzle - 100 people standing in a circle - Answer

 If the total number of people is 2^n, the winner will always be the person at the starting point.

You can think of this 2^n behavior with example of 2, 4, 8 etc as a sample.

For the case of 100, the highest 2^n number inside 100 is 64, so we need to eliminate 36 and the next starting person will be the winner.

For eliminating 36 numbers, the last number being removed will be 72 and next starting point will be 73.
So, the answer is 73.


A java program to solve this puzzle is as follows:
@Test
public void testPuzzle () {
int numOfPeople = 8;
int powOfTwo = 1;
while (powOfTwo < numOfPeople) {
powOfTwo = powOfTwo * 2;
}

int winnerIndex = 1;
if (powOfTwo > numOfPeople) {
powOfTwo = powOfTwo/2;
int diff = numOfPeople - powOfTwo;
winnerIndex = diff * 2 + 1;
}
System.out.println(winnerIndex);
}

Puzzle - 100 people standing in a circle

There are 100 people standing in a circle in an order 1 to 100. No.1 has a marker. He marks next person (i.e. no. 2) and gives the marker to next (i.e no.3). Whoever gets marked gets eliminated from the game. All person does the same until there is one winner in the game. Which would be the index of the person who will remain unmarked and wins the game?

Also, you can try writing a program which prints the solution for any given number of people in the game.

Click here to find the solution


Elastic Search basics

 

ELK stack consists of Elastic Search, Logstash, Kibana

Logstash and Beats are the connectors used to bring the data to the elastic search cluster, they have connectors for DB changes, log changes etc

Elastic search ingest the data from logstash/beats and index the data and distribute it across all nodes in its cluster.

It provides an interface to search the documents and has algorithms to index and score the data.

Kibana - provides a nice web UI on top of elastic search for searching the data


Elastic Search use cases:

log analytics

security analytics, anomaly detection

Marketing based on the data

Operational needs like server health, web app response time etc


Elastic search cluster is in the below format:

1. Cluster consists of nodes

2. Each node consists of indexes, the index in elastic search is an inverted index which maps words to documents

3. Each index is further divided into types for storing documents - like order type, product types etc

4. Each types consists of documents

An Index is distributed in shards and each shard have replicas to avoid failover.

By default for creating an index, there will be 5 shards with one replica for each shard.


Downloading and installing elastic search and kibana is straight forward and can be done from elstic site.

Kibana by default starts on http://localhost:5601/


sample search query on elastic search on sample data provided from kibana:

# show me everything

GET kibana_sample_data_ecommerce/_search


# show only data having category contains clothing

GET kibana_sample_data_ecommerce/_search

{

  "query": {

    "match": {

      "category":"clothing"

    }

  }

}


# for filtering on multiple conditions - similarly must_not can be used for not matching query

GET kibana_sample_data_ecommerce/_search

{

  "query": {

    "bool": {

      "must": [

        { "match": {"category": "clothing"} },

        { "match": {"currency": "EUR"}},

        { "range": {

          "taxful_total_price": {

            "gte": 36,

            "lte": 100

          }

        }}

      ]

    }

  }

}


Kibana provides a nice UI for searching these directly.



Completion suggestors:


Elastic search provides a completion suggestor for typeahead scenarios where you want to provide suggestion for the user when he is typing a text.

A typical data structure used for serving such use cases is a trie.

Trie stores data in tree like format with the alphabets forming the words forming nodes of the tree.


Elastic search uses a trie in memory data structure for completion suggestor.

While creating an ES index, you need to specify completion suggestor with type as completion.

Completion suggestor is a prefix suggestor, so only the data with prefix will get selected by default.

however, you can configure multiple suggestion entries while inputting the data.

you also have option to provide fuzzy option to show results even when you have typos

the weightage of the search result can be improved based on the hits by maintaining weight in the documents to be searched.




Wednesday 14 October 2020

Oracle Cloud Foundation Associate Certification Exam - Notes

 Cloud:

provides ondemand self service provisioning of computing capabilities over network

resources are pooled to optimize resources

provides elasticity

measured service - pay for usage, visibility for what  we are paying for


IaaS - OS, middleware, runtime env, applications, data need to be managed by the customer

PaaS - Application and Data needs to be managed by the customer

SaaS - customer can directly use the application as a service


Disaster recovery:

RTO - recovery time objective

RPO - recovery point objective


CAPEX (Capital Expense) Vs OPEX (Operational Expense)

Cloud allows rmoving CAPEX for OPEX


Regions - OCI has around 22 regions + 14 planned

Each region has ADs (Availability Domain)

Each availability domain has 3 Fault Domains. FDs have separate hardware point of failure


Compute Service:

Bare Metal, Dedicated Virtual Host, Virtual Machine, Container Engine, Oracle functions

compute instances are placed on virtual cloud networks (VCN)

VM - have to take care of patch OS, configure network, firewall, scaling etc

block volume of 2 types - for boot data/boot volume and block volume for application data, is kept remote for the instance for maintaining HA

NIC - Network Interface Card - Virtual NIC is placed in a subnet of VCN and has privte/public IP

Autoscaling - create a gold image - config file - os image, metadata, shape of VM, vNICs, storage, subnet

you can specify initial pool size and max pool size and scale it based on rules like 70% cpu utilization etc


Container Engine - is managed Kubernetes offering and we just need to give our docker for deployment to Container Engine, scaling is taken care of.

Oracle functions - serverless coding, charged based on execution resources.


Autoscaling - no cost - instance pool - define min and max


Storage Service:

Block volume, local NVMe, file storage, object storage, archive storage


Block volume of compute instance is kept remote for the instance for maintaining HA.

If backup is configured, the backup copy is stored in object storage which can be restored to a block volume in same region.

There is also provision to copy backup of block volume cross-region


Block Volume tiers - Basic, Balanced, High Performance

KMS - key management service can be used for data encryption


Local NVMe (Non-Volatile Memory Express) - temporary local storage - for applications that requires high performance local storage like NoSQL/Inmemory databases. it is not durable, not encrypted as it is local storage to compute instance and if instance dies, data is gone


File Storage - shared file system storage for various compute instances.

NFS - Network File Storage SMB - Server Message Block

File Storage snapshots can be maintained and used for restoring in case of corruption.

to access your file system, you need to create a mount target and a mount target can contain 100 file systems


Object Storage:

data stored in buckets, flat heirarchy - means faster access, object also contains metadata making it easy for index

detect and auto repair corrupt data

size allowed 10TB per object

bucket - for grouping storage of objects

can have private as well as public bucket

data is stored with encryption - AES (Advanced Encryption Standard) 256

data is replicated across ADs and FDs to maintains HA and durability


have standard (hot storage) and cold storage option - cold storage can be used for rarely used and requiring long retention


pre-authenticated requests - can be used to generate urls that can be shared for accessing a private object in a secured way.


Network Service:

Virtual Cloud Network (VCN) - software defined private network for OCI - assigned an address space or IP address range.

subnets allow VCN to be divided into subnetworks and a vcn should have at leats one subnet

compute instances have to be placed inside a subnet of a VCN

subnets can be further isolated and secured

Internet gateway - provides a path for internet traffic to communicate with your compute instance in a VCN

NAT gateway (Network Address Transalation) - allows outbound communications to the internet but does not allow any inbound communications initiated from internet

Service gateway - allows resources in a VCN access public OCI services such as object storage, connects via OCI network fabric and will not go via internet

Dynamic Routing Gateway (DRG) - provides a path for private traffic between VCN and a private network. can be used in the hybrid cloud model for connecting between on-premise netwrok and VCN. 

DRG uses IPSec VPN (IPSec encryption) or FastConnect (dedicated private connectivity BGP (Borader gateway protocol)) to establish network with on-premise network.


Oracle Kubernetes Engine (OKE):

OCIR - Managed docker conatiner registry service that can be used by kubernetes for Oracle Kubernetes Engine

OCIR provides full integration with OKE


Security List - VCN can further add security list to control ips that can send requests egress or ingress to the VCN

NSG (Network Security Group) can be further used to control egress/ingress to a set of VNICs inside a VCN


Local VCN peeering - process of connecting two VCNs in the same region using a local peering gateway.

Similarly Remote VCN peering option is available for VCNs in different regions


Load Balancer - performs tasks such as service discovery, health check, algorithm for request distribution for scaling.

In cloud, load balancers will have a stand by to avoid single point of failure.


IAM:

user/instance principal

user cant directly have permissions, need to be part of a group, same goes with instance/service principal


Authentication - either using user/password (UI), using API signing key (SDK/REST API/cli) or using auth token


Authorization - using allow/admit group to policy verb on tenancy/compartment


Database Services:

VM Database - can scale storage, but not CPU core

BareMetal Database - can scale CPU core but not storage

RAC

Exadata

ATP-S, ATP Dedicated


Database is created in a separate VCN (Virtual Cloud Network) and is given a private IP address

Backup and recovery - backup is stored in object storage via service gateway with private access although object storage is public


Data gaurd - maintains copies of DB and switches over in case of disaster recovery/data corruption etc 

Active data guard - has additional capabilities like data protection and availability

switch over - planned migration/failover - unplanned migration modes


Datasafe:

unified database securty control center

monitor user activity and mask sensitive data for test and development


OCI Security:

Oracle responsibility of security - physical security, comute, data center/storage isolation, network, IAM framework, infra services such as load balancer, WAF etc

Customer's responsibility of security - maintain user credentials and other info, data encryption, VCN configuration, adding security list, route table, setting up strong IAM policies, patching etc


MFA - Multi factor Authentication support is there

Federation - Using the organizations idP, the user can log in to cloud and no need to have separate user/password 


data protection - data is encrypted at rest and in transit, there is also KMS (Key Management Service)


OS Management Service - security/compliance reporting to make sure all critical fixes are in place


Web Application Firewall - server side plugin to filter http/s requests with a set of rules

OCI has a managed WAF Service


OCI Pricing, Billing and Cost Management:

Pricing models:

PAYG - Pay as you go - depends on usage

Monthly Flex (Universal Credit) - min 1k USD/month for an year - discounts based on deal

BYOL - Bring Your Own Licenses

compute instance - pricing is per OCPU

Block Volume pricing - GB/month, for performance - cost per VPUs for Balanced and high performance block volumes

pricing per region - OCI has same pricing for all regions, some cloud providers have different pricing for different regions

data transfer costs - its free inside an AD and between ADs of same region, but there is egress charge for between regions data transfer

cost tags can be used for tracking spending by different cost center

create budget can be used for setting limit on spending by resources in a compartment etc

Usage reports gives detail of the spending on each resourcess - generated daily

OCI free tier - $300 credit for 30 days - 5TB free Always free 2VMs 2 ATP


SLA: calculated for a month, example SLA provided by OCI Vault service

99-99.9% up time - means 10% credit to customer

95-99% - 25% credit

less than 95% up time - 100% credit

oracle offers end to end SLAs covering availability, performance and manageability (control-plane)


control plane - administration of OCI resources

data plane - for usage of the resources - other cloud providers give SLA only for data plane operations


OCI services - status monitoring page - https://ocistatus.oraclecloud.com - can also subscribe to notification emails


My Oracle Support (MOS) account - free of cost - can link with OCI account - CSI number to identify the customer, you also need tenancy and resource info

only paid accounts can have support account - can raise an SR request for:

technical issues, password reset, changing tenancy admin, service limit in crease.


Cloud customer connect - OCI general forum - in which customer can raise queries



Java 8 Functions, Lambda, Streams

 

Most commonly used functions in Java 8 with examples

Predicate: takes one argument and returns true/false

@Test
public void testPredicate() {
//using the test method of Predicate
Predicate<String> stringLen = (s)-> s.length() < 10;
Assert.assertTrue(stringLen.test("Apples"));
Assert.assertFalse(stringLen.test("Applesssdnvonwoe"));
}

Consumer: accepts one argument, returns no value

//Consumer example uses accept method
Consumer<String> consumerStr = (s) -> System.out.println(s.toLowerCase());
consumerStr.accept("ABCD");

Function: accepts one argument, returns result of execution of a different type

//Function example        
Function<Integer,String> converter = (num)-> Integer.toString(num);
System.out.println("length of 26: " + converter.apply(26).length());

Unary Operator: is almost same as function except that the return value has to be of same type as the input argument

//Unary Operator example
UnaryOperator<String> str = (msg)-> msg.toUpperCase();
System.out.println(str.apply("This is my message in upper case"));

Supplier: Does not accept any argument, returns result of execution

//Supplier example
Supplier<String> s = ()-> "Java is fun";
System.out.println(s.get());

Binary Operator: accepts two arguments, returns result of execution, all of same type

//Binary Operator example
BinaryOperator<Integer> add = (a, b) -> a + b;
System.out.println("add 10 + 25: " + add.apply(10, 25));


Compute and Merge functions in Map:

If we are not using Java 8, then for creating a map with frequency of the data in a list, we would do the following:

@Test
public void testListToMapNoJava8() {

List<String> dataList = new ArrayList<>();
dataList.add("v1");
dataList.add("v1");
dataList.add("v2");
dataList.add("v3");
dataList.add("v3");
dataList.add("v3");
Map<String, Integer> dataCountMap = new HashMap<>();
for (String data : dataList) {
if (dataCountMap.containsKey(data)) {
dataCountMap.put(data, dataCountMap.get(data) + 1);
} else {
dataCountMap.put(data, 1);
}
}
Assert.assertEquals(2, dataCountMap.get("v1").intValue());
Assert.assertEquals(3, dataCountMap.get("v3").intValue());
}

In java 8, there are few functions for Map that makes this task simpler.

Using Compute function:

@Test
public void testMapCompute() {

List<String> dataList = new ArrayList<>();
dataList.add("v1");
dataList.add("v1");
dataList.add("v2");
dataList.add("v3");
dataList.add("v3");
dataList.add("v3");
Map<String, Integer> dataCountMap = new HashMap<>();
for (String data : dataList) {
dataCountMap.compute(data, (key, count) -> count == null? 1 : count + 1);
}
System.out.println(dataCountMap);
Assert.assertEquals(2, dataCountMap.get("v1").intValue());
Assert.assertEquals(3, dataCountMap.get("v3").intValue());
}

Using Merge function:

@Test
public void testMapMerge() {

List<String> dataList = new ArrayList<>();
dataList.add("v1");
dataList.add("v1");
dataList.add("v2");
dataList.add("v3");
dataList.add("v3");
dataList.add("v3");
Map<String, Integer> dataCountMap = new HashMap<>();
for (String data : dataList) {
dataCountMap.merge(data, 1, (value, replacement) -> value + 1);
}
System.out.println(dataCountMap);
Assert.assertEquals(2, dataCountMap.get("v1").intValue());
Assert.assertEquals(3, dataCountMap.get("v3").intValue());
}

Using Streams, Collectors.toMap with Merge function:

@Test
public void testStreamsToMap() {

List<String> dataList = new ArrayList<>();
dataList.add("v1");
dataList.add("v1");
dataList.add("v2");
dataList.add("v3");
dataList.add("v3");
dataList.add("v3");
Map<String, Integer> dataCountMap = dataList.stream()
.collect(Collectors.toMap(Function.identity(), value -> 1, (value, replacement) -> value + 1));
Assert.assertEquals(2, dataCountMap.get("v1").intValue());
Assert.assertEquals(3, dataCountMap.get("v3").intValue());
}



Saturday 3 October 2020

Cassandra Basics


Apache Cassandra is a scalable open-source distributed NoSQL database that manages large amounts of data across multiple distributed locations. It is recommended for read heavy use cases of huge volume of data.


Architecture

A Cassandra cluster can be visualized as a ring because internally it uses a consistent hashing algorithm to distribute data. 

When a cluster of Cassandra nodes is created, the data is dynamically partitioned across the cluster so that every node in the cluster holds a proportionate range of data. 

If the data volume grows, a new node is added to the cluster and the ring arranges itself automatically so that each node in the cluster again holds an equal proportion of data. 

This is why Cassandra can scale incrementally with the ability to dynamically partition data across nodes.

When a node joins the Cassandra ring, it is assigned a token range that determines its position in the cluster. 

Each node is responsible for the range of token values and the associated data assigned to that token range.

Cassandra is a distributed system that implements a peer-to-peer architecture. It uses a gossip protocol to perform internal communication. There is no master node point of failure, so every node is able to handle both reads and writes.

Cassandra requires a keyspace to store the data which is like a schema in RDBMS and table is similar to a table in RDBMS.

Cassandra Query Language (cql) is mostly similar to sql. But Query filtering can be done only on primary key index on the  same order, for every other queries, new secondary index is required.

First member of the primary key list in cassandra is used as partitioning key which determines on which node the row gets stored.

The rest of the primary key is used as a clustering key which determines the order in which data needs to be stored.

This default ordering can be changed by specifying the clustering order at the time of creating the table.


Cassandra supports collection data types such as list, map and set.

These collections helps in modelling tables to save space while storing the denormalized data. 

Index on collections - For map collections, index can be created on keys, values or entries. For List and set index are more straight forward. The collection with index is queried using contains, contains key and query on a specific key.


Replication Strategies

SimpleStrategy - Use for a single data center replication only. In a simple strategy, cassandra usually replicates data to the next nodes in clockwise order.

NetworkTopologyStrategy - If the data replication is to be done across data centres or planned to be done across data centres, this strategy should be used so that replication is ensured across data centres to over come outages.


Write Consistency Levels - specifies the number of replicas on which the write must succeed before returning acknowledgement to the client. 

ANY - write must be done on at least one node.

ALL - all nodes should successfully do the write - results in lower availability.

ONE, TWO, QUOROM are other main options.

Read Consistency Levels - specifies the number of replicas to be queries before returning the most recent result.

For example read consistency level - ALL - will fail if even one node did not respond with data where as read consistency level of one will return data from the first queried node with data.

Read consistency also have similar consistency level properties.

In cassandra, tables should be modelled based on the queries to be executed, data is stored in denormalized way and if there is a need, replicated in different tables for meeting the query requirement. Data need to stored in sort order and cant be done with queries.


Materialized views

In Cassandra, the table is defined by the query requirement. 

Sometimes there would be requirement for multiple queries and secondary index may not be useful for the use case.

Secondary index is recommended to be used only when the cardinality of the secondary index column is high. If all the records are unique, then the secondary index will not be useful.

In such scenarios, a Materialized view can be used which would replicate the data from original table with new primary key combination that helps in querying the second column. There is around 10% write overhead while using materialized views.


Committing data in cassandra

Commit logs - is an append only log. all data is first written to commit logs in the order in which it is received.

MemTables - Data is also written to MemTables. Acts as a write back cache - which keeps the data in memory and serves read/write and keep syncing to the DB. If cache fails due to restart, the data is read from commit logs to sync. helps in achieving low latency and fast reads/writes. Rows maintain sorted order.

SSTables - When MemTables reaches a threshold, data is written to SSTables (SortedSet Tables in SSDs). Data written on SSTables are immutable and can't be removed.


Bloom filter

When a node in cassandra receives a query, bloom filter is used to determine which SSTable has the data. 

Bloom filter is a datatructure used to test if an element is part of a dataset. 

Bloom filter may sometimes return true even when entity is not present but always return false if entity is not present. But bloom filter is fast and requires less memory.

So this is an efficient way of narrowing down the SSTables required to be searched for data.


Tombstones

Data written in SSTables are not deleted but marked as deleted using tombstones. This is a write operation and if not synced there could be ghost data till the replica is synced. Data marked with tombstones are freed using a process called compaction.


Compaction

During compaction, new SSTables are created with the relevant data and the old SSTables will get eventually purged java garbbage collector.


Creating a keyspace and table in cassandra - sample commands

https://www.datastax.com/try-it-out can be leveraged to run cassandra query language (cql) on cqlsh directly from your browser

cqlsh> CREATE KEYSPACE perfmonitor

   ... WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 1};

cqlsh> USE perfmonitor;

cqlsh:perfmonitor> CREATE TABLE app_instance (

               ...             app_id int,

               ...     app_name varchar,

               ...     proc_id varchar,

               ...     host_id varchar,

               ...     os_priority int,

               ...     cpu_time int,

               ...     num_io_ops int,

               ... PRIMARY KEY (host_id, proc_id)

               ... )

               ... WITH CLUSTERING ORDER BY (proc_id DESC);

cqlsh:perfmonitor> Describe app_instance;

qlsh:perfmonitor> insert into app_instance

               ...     (app_id, host_id, proc_id, app_name,os_priority,cpu_time,num_io_ops)

               ... values

               ...    (1,'Host1','Proc1','App1',90,145,250);

cqlsh:perfmonitor>

cqlsh:perfmonitor> insert into app_instance

               ...     (app_id, host_id, proc_id, app_name,os_priority,cpu_time,num_io_ops)

               ... values

               ...    (2,'Host2','Proc2','App2',60,155,550);

cqlsh:perfmonitor> 

cqlsh:perfmonitor> Select * from app_instance;


 host_id | proc_id | app_id | app_name | cpu_time | num_io_ops | os_priority

---------+---------+--------+----------+----------+------------+-------------

   Host2 |   Proc2 |      2 |     App2 |      155 |        550 |          60

   Host1 |   Proc1 |      1 |     App1 |      145 |        250 |          90


Query filtering can be done only on primary key index on the  same order, for every other queries, new secondary index is required.

cqlsh:perfmonitor> Select * from app_instance where host_id = 'Host1';


 host_id | proc_id | app_id | app_name | cpu_time | num_io_ops | os_priority

---------+---------+--------+----------+----------+------------+-------------

   Host1 |   Proc1 |      1 |     App1 |      145 |        250 |          90


(1 rows)

cqlsh:perfmonitor> Select * from app_instance where host_id = 'Host1' and proc_id = 'Proc1';


 host_id | proc_id | app_id | app_name | cpu_time | num_io_ops | os_priority

---------+---------+--------+----------+----------+------------+-------------

   Host1 |   Proc1 |      1 |     App1 |      145 |        250 |          90


(1 rows)

cqlsh:perfmonitor> Select * from app_instance where proc_id = 'Proc1';

InvalidRequest: Error from server: code=2200 .............

cqlsh:perfmonitor> select * from app_instance where app_name ='App1';

InvalidRequest: Error from server: code=2200 .............

cqlsh:perfmonitor> Create index appname_idx on app_instance(app_name);

cqlsh:perfmonitor> select * from app_instance where app_name ='App1';


 host_id | proc_id | app_id | app_name | cpu_time | num_io_ops | os_priority

---------+---------+--------+----------+----------+------------+-------------

   Host1 |   Proc1 |      1 |     App1 |      145 |        250 |          90


(1 rows)

cqlsh:perfmonitor>



Create table samples using collections

cqlsh:perfmonitor> CREATE TABLE devices (

                    id uuid,                         

                    device_name text,

                    ip_address  set<text>,

                    location map<text, text>,

                    installation_date date,

                    installation_year int,

                    manufacturer text,

                    serial_number text,

                    PRIMARY KEY (id));

cqlsh:perfmonitor> Insert into devices

                  (id, device_name, ip_address, location, installation_date, installation_year, manufacturer, serial_number)

               Values

                (uuid(), 'Server3', {'192.168.0.4'}, {'data center':'DC2', 'rack':'Rack3'}, '2016-12-10', 2016, 'Acme', 'SN7891852');

cqlsh:perfmonitor> Insert into devices  

                (id, device_name, ip_address, location, installation_date, installation_year, manufacturer, serial_number) 

                Values 

                (uuid(), 'Server1', {'192.168.0.1'}, {'data center':'DC1', 'rack':'Rack1'}, '2015-01-20', 2015, 'Acme', 'SN12345');

cqlsh:perfmonitor> Insert into devices  

                (id, device_name, ip_address, location, installation_date, installation_year, manufacturer, serial_number) 

                Values 

                (uuid(), 'Server2', {'192.168.0.2','192.168.0.3'}, {'data center':'DC1', 'rack':'Rack1'}, '2016-02-10', 2016, 'Acme', 'SN32415746');

cqlsh:perfmonitor> select * from devices;


 id                                   | device_name | installation_date | installation_year | ip_address                     | location           | manufacturer | serial_number

--------------------------------------+-------------+-------------------+-------------------+--------------------------------+-----------------------------------------+--------------+---------------

 cfd4f71b-fc8c-4566-9e74-7c475969d284 |     Server2 |        2016-02-10 |              2016 | {'192.168.0.2', '192.168.0.3'} | {'data center': 'DC1', 'rack': 'Rack1'} |         Acme |    SN32415746

 1b63f842-3f9e-4223-85b9-5dd4c8e9b72e |     Server3 |        2016-12-10 |              2016 |                {'192.168.0.4'} | {'data center': 'DC2', 'rack': 'Rack3'} |         Acme |     SN7891852

 45b9c6c0-9789-44c3-adbc-61c42f1a146e |     Server1 |        2015-01-20 |              2015 |                {'192.168.0.1'} | {'data center': 'DC1', 'rack': 'Rack1'} |         Acme |       SN12345


(3 rows)

cqlsh:perfmonitor> create index testIndex on devices(location);

cqlsh:perfmonitor> select * from devices where location contains 'DC1';


 id                                   | device_name | installation_date | installation_year | ip_address                     | location           | manufacturer | serial_number

--------------------------------------+-------------+-------------------+-------------------+--------------------------------+-----------------------------------------+--------------+---------------

 cfd4f71b-fc8c-4566-9e74-7c475969d284 |     Server2 |        2016-02-10 |              2016 | {'192.168.0.2', '192.168.0.3'} | {'data center': 'DC1', 'rack': 'Rack1'} |         Acme |    SN32415746

 45b9c6c0-9789-44c3-adbc-61c42f1a146e |     Server1 |        2015-01-20 |              2015 |                {'192.168.0.1'} | {'data center': 'DC1', 'rack': 'Rack1'} |         Acme |       SN12345


(2 rows)



Friday 2 October 2020

Redis Basics

 Redis is an open source, in-memory data structure store, used as a database, cache and message broker. 

It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams.


Redis persistence - Data is saved in memory for fast access.

Persistence can be managed by the below options:

RDB - Redis Database File - keeps snapshots of Redis state - helps in disaster recovery - default

AOF - Append Only File - keeps appending the data to a file on the list

Redis conf file can be updated for enabling AOF or updating snapshot frequency with RDB

Enable password for accessing the redis can be done using redis.conf by enabling requirepass


Redis Architecture

Redis follows master-slave replication architecture. 

It allows replica Redis instances to be exact copies of master instances. 

The replica will automatically reconnect to the master every time the link breaks, and will attempt to be an exact copy of it regardless of what happens to the master.

Redis uses by default asynchronous replication, which being low latency and high performance.

Redis cluster - will have a set of masters and slave/replicas. If a master node crahes, the replica will take over.

Redis Sentinel - If it is not possible to have a fully fledged redis cluster, redis sentinel can be used for automatic failover to the replicas

A Redis cluster is divided up among 16,384 slots — the maximum number of nodes or shards in a Redis cluster.

Since most clusters consist of a much smaller number of nodes, these hash slots are logical divisions of the keys. 

In an oversimplified example of a 4-node cluster, we’d have the following layout:

------------------------------------

Slots                    Node

------------------------------------

0 – 4,095                    Node #0

4,096 – 8,191            Node #1

8,192 – 12,287    Node #2

12,288 – 16,384 Node #3

The data will get distributed according to the hash slots. If more number of nodes are added, then rebalancing will be required of the hash slots.

Redis cluster is not able to guarantee strong consistency. 

That’s because when you send a write request to a Redis cluster, the master writes the data first on itself and immediately returns success to the client. Then, the replication to the slave nodes asynchronously starts. 

What happens if the master node crashes before the data gets replicated and a slave node is promoted to be the new master? 

Basically, the client will receive success but the write was actually lost.


Basic Redis features with sample usage using redis-cli

Redis string key/value - can be set/retrieved using simple set/get of key-value, expiry can also be specified:

127.0.0.1:6379> set zip 432985792 ex 10

OK

127.0.0.1:6379> get zip

"432985792"

127.0.0.1:6379> get zip

(nil)

127.0.0.1:6379> exists zip

(integer) 0


127.0.0.1:6379> set counter 100

OK

127.0.0.1:6379> incr counter

(integer) 101

127.0.0.1:6379> incrby counter 50

(integer) 151

127.0.0.1:6379> decr counter

(integer) 150

127.0.0.1:6379> expire counter 5000

(integer) 1


Redis Hash data structure

Keeps data in key value pair like individual hashMap in java.

can be used for storing data equivalent to what we store in an RDBMS record, but is really flexible.

It can be used for storing denormalized data.

127.0.0.1:6379> hmset user:1000 username antirez birthyear 1977 verified 1

OK

127.0.0.1:6379> hgetall user:1000

1) "username"

2) "antirez"

3) "birthyear"

4) "1977"

5) "verified"

6) "1"

127.0.0.1:6379> hget user:1000 username

"antirez"

127.0.0.1:6379> hget user:1000 birthyear

"1977"

127.0.0.1:6379> hexists user:1000 firstname

(integer) 0

127.0.0.1:6379> hexists user:1000 username

(integer) 1

127.0.0.1:6379> hmset user:1001 username prasune birthyear 1984 verified 1

OK


Redis List

can be used for storing items in the order of insertion, there is support for insertion at the beginning as well as the end.

can use LPUSH to the list for new items getting added to the timeline in redis cache or tweets or messages in an ordered way.

127.0.0.1:6379> rpush chat_messages hi "how are you?"

(integer) 2

127.0.0.1:6379> lrange chat_messages 0 -1

1) "hi"

2) "how are you?"

127.0.0.1:6379> lpush chat_messages "before hi"

(integer) 3

127.0.0.1:6379> lrange chat_messages 0 -1

1) "before hi"

2) "hi"

3) "how are you?"

127.0.0.1:6379> rpush chat_messages "can we meet today?"

(integer) 4

127.0.0.1:6379> lrange chat_messages 0 -1

1) "before hi"

2) "hi"

3) "how are you?"

4) "can we meet today?"

127.0.0.1:6379> lpop chat_messages

"before hi"

127.0.0.1:6379> lrange chat_messages 0 -1

1) "hi"

2) "how are you?"

3) "can we meet today?"


Redis Set

can use sinter for scenarios like find all online users who are friends toa a particular user - 

by maintaining set of all online users and another set of all friends of the particular user.


127.0.0.1:6379> sadd set1 1 2 3

(integer) 3

127.0.0.1:6379> sadd set2 3 4 5

(integer) 3

127.0.0.1:6379> sinter set1 set2

1) "3"

127.0.0.1:6379> smembers set1

1) "3"

2) "1"

3) "2"


Redis Sorted Set

can be used for scenarios where we want to keep score for an exam and display toppers as well as rank of a particular user.


127.0.0.1:6379> zadd score 1544 prasune

(integer) 1

127.0.0.1:6379> zadd score 555 dfmo

(integer) 1

127.0.0.1:6379> zadd score 123 xyz

(integer) 1


127.0.0.1:6379> zrevrange score 0 -1

1) "prasune"

2) "dfmo"

3) "xyz"


127.0.0.1:6379> zrange score 0 -1

1) "xyz"

2) "dfmo"

3) "prasune"


127.0.0.1:6379> zscore score prasune

"1544"

127.0.0.1:6379> zrank score prasune

(integer) 2

127.0.0.1:6379> zrevrank score prasune

(integer) 0


127.0.0.1:6379> zpopmax score

1) "prasune"

2) "1544"


Redis publish/subscribe

Redis also have a publish subscribe mechanism by specifying a topic name.

Useful for chatting application like scenarios to publish message to all the users that have subscribed to a topic.

127.0.0.1:6379> publish news dfoibgnvomdobe 

(integer) 0

127.0.0.1:6379> publish news dfoibgnvomdobe 

(integer) 1

127.0.0.1:6379> publish news testing

(integer) 1


127.0.0.1:6379> subscribe news

Reading messages... (press Ctrl-C to quit)

1) "subscribe"

2) "news"

3) (integer) 1

1) "message"

2) "news"

3) "dfoibgnvomdobe"

1) "message"

2) "news"

3) "testing"


Handling geospatial indexes with radius queries

This will be useful for use cases like figuring out a near by cab for a customer in a taxi aggregator app.

geoadd uberdb 13.138 15.3583 cab1 14.4363 14.43766 cab2

(integer) 2

127.0.0.1:6379> georadius uberdb 14 15 200 km withdist

1) 1) "cab1"

   2) "100.7491"

2) 1) "cab2"

   2) "78.1989"

127.0.0.1:6379> georadius uberdb 14 15 200 km withcoord

1) 1) "cab1"

   2) 1) "13.13799887895584106"

      2) "15.35830007022889276"

2) 1) "cab2"

   2) 1) "14.43630069494247437"

      2) "14.43765892850307608"