Solutions to Support Real-Time Data Analytics
Data Analysis Resources, Spark

Solutions to Support Real-Time Data Analytics

Organisations embracing big data use non-traditional strategies and technologies to gather, organize, process and gather insights from large datasets. These solutions do not support real-time analytics. Real-time analytics require technology to handle data that is generated at high velocity and send by the sources simultaneously in small sizes. Data is required to be processed sequentially and incrementally on a record-by-record…

Continue Reading

Text Analytics in the Healthcare Industry: Data Warehousing and Applications
Data Analysis Resources, Predictive Analysis

Text Analytics in the Healthcare Industry: Data Warehousing and Applications

Abstract— Text analytics is the method of extracting information from text. It involves structuring the text to evaluate, discover patterns and interpret the output. It enhances meaning to data and finds nuggets of information from both transaction-based and decision support systems by removing the barrier between structured and unstructured data. Analysis of text data helps to discover new relationships from…

Continue Reading

Kaggle, Predictive Analysis, scikit-learn

Features selection for determining House Prices ?

Home values are influenced by many factors. Basically, there are two major aspects: The environmental information, including location, local economy, school district, air quality, etc. The characteristics information of the property, such as lot size, house size and age, the number of rooms, heating / AC systems, garage, and so on. When people consider buying homes, usually the location has…

Continue Reading

Submission for Kaggle's Titanic Competition
Kaggle, Machine Learning, Predictive Analysis

Submission for Kaggle’s Titanic Competition

Following is my submission for Kaggle’s Titanic Competition In [361]: import pandas as pd import numpy as np In [362]: df_train = pd.read_csv(r’C:\Users\piush\Desktop\Dataset\Titanic\train.csv’) In [363]: df_train.head(2) Out[363]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 1 2 1 1 Cumings, Mrs. John…

Continue Reading

Kaggle, Predictive Analysis

Submission for Predicting Red Hat Business Value

In this competition, a classification algorithm is supposed to accurately identify which customers have the most potential business value for Red Hat based on their characteristics and activities. For more information, please visit: https://www.kaggle.com/c/predicting-red-hat-business-value In [2]: import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import warnings warnings.filterwarnings(‘ignore’) Loading the dataset In [14]:…

Continue Reading

Predictive Analysis , Binary Classification
Data Analysis Resources, Machine Learning, Predictive Analysis

Predictive Analysis , Binary Classification (Cookbook) – 7

This notebook contains my notes for Predictive Analysis on Binary Classification. It acts as a cookbook. It is a continuation from the previous post on assessing performance of Predictive Models. For Deployment  Retrain the model on the full data set and pull out the coefficients corresponding to the best alpha—the one determined to minimize out-of-sample error, which is estimated in…

Continue Reading

Predictive Analysis , Binary Classification-6
Data Analysis Resources, Machine Learning, Predictive Analysis

Predictive Analysis , Binary Classification (Cookbook) – 6

This notebook contains my notes for Predictive Analysis on Binary Classification. It acts as a cookbook. It is a continuation from the previous post on Pearson’s Correlation. This notebook discusses assessing performance of Predictive Models. One of the most used is the misclassification error—that is, the fraction of examples that the function pred() predicts incorrectly. Reading and Arranging data In [29]:…

Continue Reading