Pages

Saturday, 30 March 2013

IT BAL - Session # 10 - Plotting in R



PLOTTING IN R


Assignment 1:

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length, bind them together.Create 3 dimensional plots of the same.

Data Set Creation Commands and DataSet :





Normal Plot:   plot3d(T[, 1:3])


Colour Plot: plot3d(T[, 1:3], col = rainbow(1000))


Color Plot of spheres:  plot3d(T[, 1:3], col = rainbow(1000), type = 's')



Assignment 2:

Choose 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories)
3. Color code and draw the graph
4. Smooth and best fit line for the curve


Data set creation for two random variables and then introducing third variable z



Plots:

>qplot(x,y)

>qplot(x,z)



Semi-transparent plot

> qplot(x,z, alpha=I(2/10))


Colour plot

> qplot(x,y, color=z)




 Logarithmic colour plot

> qplot(log(x),log(y), color=z)



Best Fit and Smooth curve using "geom"

> qplot(x,y,geom=c("path","smooth"))


> qplot(x,y,geom=c("point","smooth"))


> qplot(x,y,geom=c("boxplot","jitter"))

Sunday, 24 March 2013

ITBAL Session # 9 ( Data Visualization)


ITBAL Session # 9 ( Data Visualization)

Many Eyes:

Developed by IBM, Many Eyes allows you to quickly build visualizations from publicly available or uploaded data sets, and features a wide range of analysis types including the ability to scan text for keyword density and saturation. 



Visualization Options Available in Many Eyes






Choosing a visualization type: 


Analyze a text 

Word Tree 

See a branching view of how a word or phrase is used in a text. Navigate the text by zooming and clicking.

Tag Cloud 

How are you using your words? This enhanced tag cloud will show you the words popularity in the given set of text.
Learn more  

Phrase Net 

Display networks of related words and ideas.

Word Cloud Generator 

Word Cloud Generator is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text.

Compare a set of values 


Bar Chart 

How do the items in your data set stack up? A bar chart is a simple and recognizable way to compare values. You can display several sets of bars for multivariate comparisons.

Block Histogram 

This versatile chart lets you get a quick sense of how a single set of data is distributed. Each item in the data is an individually identifiable block.

Bubble Chart 

Have so many items that your bar chart is baffling? Do the values vary so much that one bar pushes to the top of the screen while another virtually disappears? Try our bubble chart, which displays values as circles of different sizes.

See relationships among data points 


Scatter Plot 

Point one variable across the x-axis, the other up the y-axis. The size of a dot can represent a third variable. The classic scatterplot gives you a bird's eye view of how your factors relate to each other.
  
Matrix Chart 

A grid-based view of multidimensional data.
Learn more  

Network Diagram 

Is your data all about relationships? Take a set of links -- say flight departure and arrival points or romantic pairings -- and see the connections laid out as a network.
Learn more  

See the parts of a whole 

Pie Chart 

Each component is a slice of the big pie. A simple and popular classic.

Treemap 

The pie chart's big brother. Treemaps divide up a rectangle into hierarchical categories, letting you see relationships among large numbers of components. This lets you get an overview of a complex whole -- and drill down.

Treemap for Comparisons 

Want to map a comparison of now vs. then? City vs. highway? Decaf vs. regular? This version of the treemap lets you directly compare two different takes on a set of categorized items.


Track rises and falls over time 

Line Graph 

Put the value you're measuring on the y-axis and draw lines to watch items change over time. (Think stock prices.)

Stack Graph 

Track the changing values of items that add together to make a whole, like the components of a budget or the sales figures of multiple divisions. Also known as an "area chart."

Stack Graph for Categories 

This version of the stacked graph is designed for items arranged into a set of categories and subcategories.



Data Set :  World Development Indicators ( Source: World Bank)
                                http://goo.gl/2iWnI




Data Visualization 


Bubble Chart :



Scatter Plot :



Pie Chart :



Customizing Stack Graph for Categories:




Visualizing Your Data With IBM’s Many Eyes :

Many Eyes is a powerful tool that enables a user to create visualizations from any kind of data set.

Here’s where it gets fun: while a user can upload their own data set, Many Eyes is a community-powered tool. There are over 150,000 data sets to choose from, and many are pre-visualized.

Topic Centers allow teams of people to collaborate on visualizations. Topic Centers are organized around certain topics  as well as teams of people at organizations and classes.

But selecting a dataset from the community is not always the best option: the metadata associated with many of the datasets is inaccurate or incomplete. Rest assured, because what makes Many Eyes such a versatile tool is that any type of data is accepted, so long as it is in a structured format. Data needs to be pre-formatted in Microsoft Excel (or similar spreadsheet software), then pasted into Many Eyes’ Web interface.

Then the user is presented with an array of visualization options, from tag clouds and word trees to assorted graphs and even maps.

Friday, 15 March 2013

IT BAL Assignment - Session 8


Assignment :- Load the data "Produc" and do the Panel Data Analysis.

We will be analysing on three types of model :
      Pooled affect model
      Fixed affect model
      Random affect model

Then we will be determining which model is the best by using functions:
       pFtest : for determining between fixed and pooled
       plmtest : for determining between pooled and random
       phtest: for determining between random and fixed


Commands:

Loading data:
> data(Produc , package ="plm")
> head(Produc)

Pooled Affect Model 

> pool <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("pooling"), index = c("state","year"))

> summary(pool)


Fixed Affect Model:

> fixed <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("within"), index = c("state","year"))

> summary(fixed)


Random Affect Model:
> random <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("random"), index = c("state","year"))

> summary(random)


Comparison

The comparison between the models would be a Hypothesis testing based on the following concept:

H0: Null Hypothesis: the individual index and time based params are all zero
H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Affect Model
Alternate Hypothesis : Fixed Affect Model

Command:
> pFtest(fixed,pool)

Result:

data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp) 
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16

Alternative hypothesis: significant effects 
Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.


Pooled vs Random

Null Hypothesis: Pooled Affect Model
Alternate Hypothesis: Random Affect Model

Command :
> plmtest(pool)

Result:

        Lagrange Multiplier Test - (Honda)
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp)
normal = 57.1686, p-value < 2.2e-16
Alternative hypothesis: significant effects 

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.


Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model
Alternate Hypothesis: Fixed Affect Model

Command:
 > phtest(fixed,random)

Result:

        Hausman Test
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp)
chisq = 93.546, df = 7, p-value < 2.2e-16
Alternative hypothesis: one model is inconsistent 

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Conclusion

So after making all the comparisons we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation