Introduction to Cassandra
- Massively scalable
- Linearly scalable (If 2 nodes handle x traffic, then 4 nodes handle 2x!)
- NoSQL Database
- Failed nodes can be replaced with no downtime.
- Integrates with Hadoop and has MapReduce support. Also supports Apache Pig and Hive.
- Can store structured and unstructured data alike.
- Masterless architecture - meaning all nodes are the same.
- Data automatically distributed among all nodes in the ring.
- Replication factor is configurable.
- Replication can be configured to work across data-centers also.
- Failed nodes' data is automatically replicated to maintain replication factor.
- Cassandra emphasizes denormalization.
Language Features (CQL 3.0)
- CQL has drivers available in Java, Python and Node.JS
- Does not support joins or subqueries, except for batch analysis through Hive.
- Lightweight transactions using the IF keyword in INSERT and UPDATE statements.
- Initial support for triggers.
- Cassandra is essentially a hybrid between a key-value and a column-oriented (or tabular) database.
- Cassandra's column-family resembles a Table in RDBMS.
- Column families contain rows and columns.
- Each row has multiple columns, each of which has a name, value, and a timestamp.
- Different rows in the same column family do not have to share the same set of columns.
- Columns may be added to one or multiple rows at any time.
- Each key in Cassandra identifies one row with multiple columns (like a Primary Key).