Saturday, 30 March 2013

IT BAL - Session # 10 - Plotting in R

PLOTTING IN R

Assignment 1:

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length, bind them together.Create 3 dimensional plots of the same.

Data Set Creation Commands and DataSet :

Normal Plot: plot3d(T[, 1:3])

Colour Plot: plot3d(T[, 1:3], col = rainbow(1000))

Color Plot of spheres: plot3d(T[, 1:3], col = rainbow(1000), type = 's')

Assignment 2:

Choose 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories)
3. Color code and draw the graph
4. Smooth and best fit line for the curve

Data set creation for two random variables and then introducing third variable z

Plots:

>qplot(x,y)

>qplot(x,z)

Semi-transparent plot

> qplot(x,z, alpha=I(2/10))

Colour plot

> qplot(x,y, color=z)

Logarithmic colour plot

> qplot(log(x),log(y), color=z)

Best Fit and Smooth curve using "geom"

> qplot(x,y,geom=c("path","smooth"))

> qplot(x,y,geom=c("point","smooth"))

> qplot(x,y,geom=c("boxplot","jitter"))

Sunday, 24 March 2013

ITBAL Session # 9 ( Data Visualization)

Many Eyes:

Developed by IBM, Many Eyes allows you to quickly build visualizations from publicly available or uploaded data sets, and features a wide range of analysis types including the ability to scan text for keyword density and saturation.

Visualization Options Available in Many Eyes

Choosing a visualization type:

Analyze a text

Word Tree

See a branching view of how a word or phrase is used in a text. Navigate the text by zooming and clicking.

Compare a set of values

Bar Chart

How do the items in your data set stack up? A bar chart is a simple and recognizable way to compare values. You can display several sets of bars for multivariate comparisons.

Block Histogram

This versatile chart lets you get a quick sense of how a single set of data is distributed. Each item in the data is an individually identifiable block.

Bubble Chart

Have so many items that your bar chart is baffling? Do the values vary so much that one bar pushes to the top of the screen while another virtually disappears? Try our bubble chart, which displays values as circles of different sizes.

See relationships among data points

Scatter Plot

Point one variable across the x-axis, the other up the y-axis. The size of a dot can represent a third variable. The classic scatterplot gives you a bird's eye view of how your factors relate to each other.

Matrix Chart

A grid-based view of multidimensional data.

Learn more

Network Diagram

Is your data all about relationships? Take a set of links -- say flight departure and arrival points or romantic pairings -- and see the connections laid out as a network.

Learn more

See the parts of a whole

Pie Chart

Each component is a slice of the big pie. A simple and popular classic.

Treemap

The pie chart's big brother. Treemaps divide up a rectangle into hierarchical categories, letting you see relationships among large numbers of components. This lets you get an overview of a complex whole -- and drill down.

Treemap for Comparisons

Want to map a comparison of now vs. then? City vs. highway? Decaf vs. regular? This version of the treemap lets you directly compare two different takes on a set of categorized items.

Track rises and falls over time

Line Graph

Put the value you're measuring on the y-axis and draw lines to watch items change over time. (Think stock prices.)

Stack Graph

Track the changing values of items that add together to make a whole, like the components of a budget or the sales figures of multiple divisions. Also known as an "area chart."

Stack Graph for Categories

This version of the stacked graph is designed for items arranged into a set of categories and subcategories.

Data Set : World Development Indicators ( Source: World Bank)

http://goo.gl/2iWnI

Data Visualization

Bubble Chart :

Scatter Plot :

Pie Chart :

Customizing Stack Graph for Categories:

Visualizing Your Data With IBM’s Many Eyes :

Many Eyes is a powerful tool that enables a user to create visualizations from any kind of data set.

Here’s where it gets fun: while a user can upload their own data set, Many Eyes is a community-powered tool. There are over 150,000 data sets to choose from, and many are pre-visualized.

Topic Centers allow teams of people to collaborate on visualizations. Topic Centers are organized around certain topics as well as teams of people at organizations and classes.

But selecting a dataset from the community is not always the best option: the metadata associated with many of the datasets is inaccurate or incomplete. Rest assured, because what makes Many Eyes such a versatile tool is that any type of data is accepted, so long as it is in a structured format. Data needs to be pre-formatted in Microsoft Excel (or similar spreadsheet software), then pasted into Many Eyes’ Web interface.

Then the user is presented with an array of visualization options, from tag clouds and word trees to assorted graphs and even maps.

Friday, 15 March 2013

IT BAL Assignment - Session 8

Assignment :- Load the data "Produc" and do the Panel Data Analysis.

We will be analysing on three types of model :
Pooled affect model
Fixed affect model
Random affect model

Then we will be determining which model is the best by using functions:
pFtest : for determining between fixed and pooled
plmtest : for determining between pooled and random
phtest: for determining between random and fixed

Commands:

Loading data:
> data(Produc , package ="plm")
> head(Produc)

Pooled Affect Model

> pool <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("pooling"), index = c("state","year"))

> summary(pool)

Fixed Affect Model:

> fixed <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("within"), index = c("state","year"))

> summary(fixed)

Random Affect Model:

> random <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("random"), index = c("state","year"))

> summary(random)

Comparison

The comparison between the models would be a Hypothesis testing based on the following concept:

H0: Null Hypothesis: the individual index and time based params are all zero

H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis : Fixed Affect Model

Command:

> pFtest(fixed,pool)

Result:

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16

Alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Pooled vs Random

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis: Random Affect Model

Command :

> plmtest(pool)

Result:

Lagrange Multiplier Test - (Honda)

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

normal = 57.1686, p-value < 2.2e-16

Alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.

Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model

Alternate Hypothesis: Fixed Affect Model

Command:

> phtest(fixed,random)

Result:

Hausman Test

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

chisq = 93.546, df = 7, p-value < 2.2e-16

Alternative hypothesis: one model is inconsistent

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Conclusion:

So after making all the comparisons we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation

Bharat Creations

Pages

Saturday, 30 March 2013

IT BAL - Session # 10 - Plotting in R

PLOTTING IN R

Sunday, 24 March 2013

ITBAL Session # 9 ( Data Visualization)

ITBAL Session # 9 ( Data Visualization)

Visualization Options Available in Many Eyes

Compare a set of values

See relationships among data points

See the parts of a whole

Track rises and falls over time

Data Visualization

Customizing Stack Graph for Categories:

Visualizing Your Data With IBM’s Many Eyes :

Friday, 15 March 2013

IT BAL Assignment - Session 8