Udacity Data Engineering Capstone Project

Project Summary The project follows the follow steps: Step 1: Scope the Project and Gather Data Step 2: Explore and Assess the Data Step 3: Define the Data Model Step 4: Run ETL to Model the Data Step 5: Complete Project Write Up Step 1: Scope the Project and Gather Data Scope The project is […]

A Comparison between Cassandra and MySQL

Introduction Cassandra is a distributed, no single point of failure, continuously available and scalable. NoSQL database that manages a large amount of data across many data centres and cloud servers. It offers both operation simplicity and capacity to scale linearly. While MySQL is the world’s most popular, cost-effective, high-performance relational database(Kumar, 2016). It comes with a […]

Effect on Facebook’s Stock Price after Cambridge Analytica Scandal

Recently, Facebook is in news almost daily for its inability to prevent Cambridge Analytica(CA) from gathering personal data from 87 million users. CA used the harvested data to profile users to predict voting patterns in 2016 US presidential elections. Some of the important dates for the event are: March 16, 2018: The news broke of […]

Text Analytics in the Healthcare Industry: Data Warehousing and Applications

Abstract— Text analytics is the method of extracting information from text. It involves structuring the text to evaluate, discover patterns and interpret the output. It enhances meaning to data and finds nuggets of information from both transaction-based and decision support systems by removing the barrier between structured and unstructured data. Analysis of text data helps […]