Facebook Twitter Linkedin
article | 28 Jan 2022
10 Apache Cassandra use cases in 5 Big Data directions
Dima Pleczko
Due to the speed, reliability and other advantages of Apache Cassandra, this distributed NoSQL DBMS is widely used in many Big Data projects around the world. In this article, we have collected for you some interesting examples of Cassandra's real-world use in 5 key areas of modern IT.



Industrial solutions based on Cassandra are deployed at Cisco, IBM, Cloudkick, Reddit, Digg, Rackspace, Twitter and a host of other big data companies. For example, Expedia, a major U.S. travel company, stores billions of constantly updated prices from 140,000 hotels using Cassandra. Apple has more than 100,000 Cassandra nodes in production, which confirms the excellent scalability of this DBMS. Another data-driven organization, the international taxi company Uber uses Cassandra in several data centers to provide an information base for its rides.

Having analyzed the information on the use of Apache Cassandra in real Big Data projects, we identified 5 main areas of the practical application of this distributed DBMS:

  • product catalogs in online stores or playlists. In particular, Apache Cassandra is used by Spotify, a well-known Internet audio streaming service that allows you to legally and free of charge listen to more than 50 million music tracks, audiobooks, and podcasts without downloading them to your device. For example, this DBMS is used for a service that supports a set of two-part key-value pairs:
    - A real-time pipeline that records data every time a song is streamed to Spotify. The data is entered by an anonymous user ID and the name of the feature being recorded.
  • A client that periodically reads all functions for the anonymous identifier in the package.

In this case, according to SLA (Service Level Agreement), it is necessary to keep a very low average response delay (less than 5 milliseconds) and the maximum possible number of operations per second even at peak load. Cassandra's high read and write operation rates allowed the Spotify team to implement such a service.

Similarly, Apache Cassandra is used by Netflix and Expedia to store user viewing data and support the streaming API.

  • recommendation systems and personalization of marketing offers. Cassandra helps track user actions by storing data on what content (movies, games, articles, or songs) the consumer interacted with and how much time they spent on each action. Cassandra can then feed this information into an analytics tool that recommends something similar to the customer. For example, the event management platform Eventbrite uses Cassandra instead of MySQL for its mobile apps that let users know what events are happening around them. Out brain, a web advertising platform that displays links to website pages in addition to sponsored content, generating revenue from the latter, applies Cassandra to support content search, helping companies increase revenue streams by providing relevant third-party articles that may interest users [4]. Also, the already mentioned streaming service Spotify has built its recommendation system based on Apache Cassandra.

  • Internet of Things (IoT), including Industrial Internet of Things. Due to its architectural features, Apache Cassandra is designed for intensive workloads and fast recording of a lot of data. Such qualities make this DBMS very useful for IoT sensors and other smart devices in various industries, from logistics to agriculture. Regardless of sensor types, Cassandra quickly and reliably processes the incoming data stream, providing opportunities for subsequent analysis with other Big Data tools. In particular, the main US research laboratory for renewable energy and energy efficiency, NREL (National Renewable Energy Laboratory) uses Cassandra to store data from its smart sensors in order to analyze them for water and energy conservation. And I2O Water Ltd, which designs and installs intelligent pressure management solutions for utilities and water utilities, has used Cassandra to create a product that helps customer businesses save more than 235 million liters of water a day that would normally be wasted. That amount of water is more than a hundred Olympic swimming pools.

  • Messaging systems (chat rooms, collaboration apps, mobile messengers, etc.) are a great use-case for Cassandra, as is data from IoT sensors, since these cases do not require updating information, but rather fast recording and quick reading. Cassandra writes new incoming messages at high speed, lets you read them quickly, and removes obsolete ones with the data mashing and compaction operations we covered here.

  • Fraud Detection. Despite some difficulties with ACID transaction support (remember, starting from version 1.1, Cassandra provides ACID only at a single record level, i.e. for a set of columns with one key) [8], banks can use this non-relational DBMS within their antifraud systems to detect and prevent fraudulent transactions in time. This is possible due to Cassandra's high speed and real-time analytics through seamless integration with relevant Big Data tools provided by, for example, Apache Spark with its MLLib library of machine learning algorithms [6]. Also worth noting in the context of security is the experience of Internet Identity, which has used Cassandra to protect its customers' data [4]. Cassandra can also be used to detect spam in social networks by detecting messages with the same content in real-time. Thanks to this, spammers' activity can be analyzed, identifying the patterns of unwanted posts and the frequency of their appearance. In particular, Orange, a French telecommunications company, one of the world's leading telecommunications and cellular operators, as well as an Internet service provider, uses Cassandra together with the already mentioned Apache Spark MLLib to detect fraudulent transactions in real-time.

10 Apache Cassandra use cases in 5 Big Data directions02




Also an interesting case study of the practical use of Cassandra in a large-scale Big Data project is the example of building a healthcare analytics system on its basis. For this case, it is necessary to process streaming data in real time, eliminating the problems of data inconsistency in the distributed computing environment. In this case, a medical data analysis system works with multiple sources of information: drug databases, electronic medical records, anonymous case histories, test results, etc.

The data come in large volumes and at high speed, including information from special equipment (sensors and monitors). Such a distributed system requires a Hadoop environment with a non-relational DBMS to handle the document repository or key-value pairs. For real-time data streaming, it is reasonable to use Apache Storm, however, it does not support application stateful storage. So there is a need for an external persistent repository of information, a role that can also be performed by Apache Cassandra, along with storage and fast processing of medical data.

And thanks to its decentralized architecture with no single point of failure, Cassandra can guarantee the safety of information even if several nodes or an entire data center fail (assuming data is distributed across multiple data centers). This DBMS property is especially important in a critical industry such as medicine.

10 Apache Cassandra use cases in 5 Big Data directions03

If you liked our article check others:

  1. Top 5 mobile banking apps in the world
  2. Mobile Banking. How we managed to build an app in a year
  3. How to offshore IT projects without a risk? Welcome to BOT 2.0
  4. IT positions filled in 3 working days
Estimated read time: 5 minutes