Pages

Saturday, 30 March 2013

IT BAL - Session # 10 - Plotting in R



PLOTTING IN R


Assignment 1:

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length, bind them together.Create 3 dimensional plots of the same.

Data Set Creation Commands and DataSet :





Normal Plot:   plot3d(T[, 1:3])


Colour Plot: plot3d(T[, 1:3], col = rainbow(1000))


Color Plot of spheres:  plot3d(T[, 1:3], col = rainbow(1000), type = 's')



Assignment 2:

Choose 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories)
3. Color code and draw the graph
4. Smooth and best fit line for the curve


Data set creation for two random variables and then introducing third variable z



Plots:

>qplot(x,y)

>qplot(x,z)



Semi-transparent plot

> qplot(x,z, alpha=I(2/10))


Colour plot

> qplot(x,y, color=z)




 Logarithmic colour plot

> qplot(log(x),log(y), color=z)



Best Fit and Smooth curve using "geom"

> qplot(x,y,geom=c("path","smooth"))


> qplot(x,y,geom=c("point","smooth"))


> qplot(x,y,geom=c("boxplot","jitter"))

Sunday, 24 March 2013

ITBAL Session # 9 ( Data Visualization)


ITBAL Session # 9 ( Data Visualization)

Many Eyes:

Developed by IBM, Many Eyes allows you to quickly build visualizations from publicly available or uploaded data sets, and features a wide range of analysis types including the ability to scan text for keyword density and saturation. 



Visualization Options Available in Many Eyes






Choosing a visualization type: 


Analyze a text 

Word Tree 

See a branching view of how a word or phrase is used in a text. Navigate the text by zooming and clicking.

Tag Cloud 

How are you using your words? This enhanced tag cloud will show you the words popularity in the given set of text.
Learn more  

Phrase Net 

Display networks of related words and ideas.

Word Cloud Generator 

Word Cloud Generator is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text.

Compare a set of values 


Bar Chart 

How do the items in your data set stack up? A bar chart is a simple and recognizable way to compare values. You can display several sets of bars for multivariate comparisons.

Block Histogram 

This versatile chart lets you get a quick sense of how a single set of data is distributed. Each item in the data is an individually identifiable block.

Bubble Chart 

Have so many items that your bar chart is baffling? Do the values vary so much that one bar pushes to the top of the screen while another virtually disappears? Try our bubble chart, which displays values as circles of different sizes.

See relationships among data points 


Scatter Plot 

Point one variable across the x-axis, the other up the y-axis. The size of a dot can represent a third variable. The classic scatterplot gives you a bird's eye view of how your factors relate to each other.
  
Matrix Chart 

A grid-based view of multidimensional data.
Learn more  

Network Diagram 

Is your data all about relationships? Take a set of links -- say flight departure and arrival points or romantic pairings -- and see the connections laid out as a network.
Learn more  

See the parts of a whole 

Pie Chart 

Each component is a slice of the big pie. A simple and popular classic.

Treemap 

The pie chart's big brother. Treemaps divide up a rectangle into hierarchical categories, letting you see relationships among large numbers of components. This lets you get an overview of a complex whole -- and drill down.

Treemap for Comparisons 

Want to map a comparison of now vs. then? City vs. highway? Decaf vs. regular? This version of the treemap lets you directly compare two different takes on a set of categorized items.


Track rises and falls over time 

Line Graph 

Put the value you're measuring on the y-axis and draw lines to watch items change over time. (Think stock prices.)

Stack Graph 

Track the changing values of items that add together to make a whole, like the components of a budget or the sales figures of multiple divisions. Also known as an "area chart."

Stack Graph for Categories 

This version of the stacked graph is designed for items arranged into a set of categories and subcategories.



Data Set :  World Development Indicators ( Source: World Bank)
                                http://goo.gl/2iWnI




Data Visualization 


Bubble Chart :



Scatter Plot :



Pie Chart :



Customizing Stack Graph for Categories:




Visualizing Your Data With IBM’s Many Eyes :

Many Eyes is a powerful tool that enables a user to create visualizations from any kind of data set.

Here’s where it gets fun: while a user can upload their own data set, Many Eyes is a community-powered tool. There are over 150,000 data sets to choose from, and many are pre-visualized.

Topic Centers allow teams of people to collaborate on visualizations. Topic Centers are organized around certain topics  as well as teams of people at organizations and classes.

But selecting a dataset from the community is not always the best option: the metadata associated with many of the datasets is inaccurate or incomplete. Rest assured, because what makes Many Eyes such a versatile tool is that any type of data is accepted, so long as it is in a structured format. Data needs to be pre-formatted in Microsoft Excel (or similar spreadsheet software), then pasted into Many Eyes’ Web interface.

Then the user is presented with an array of visualization options, from tag clouds and word trees to assorted graphs and even maps.

Friday, 15 March 2013

IT BAL Assignment - Session 8


Assignment :- Load the data "Produc" and do the Panel Data Analysis.

We will be analysing on three types of model :
      Pooled affect model
      Fixed affect model
      Random affect model

Then we will be determining which model is the best by using functions:
       pFtest : for determining between fixed and pooled
       plmtest : for determining between pooled and random
       phtest: for determining between random and fixed


Commands:

Loading data:
> data(Produc , package ="plm")
> head(Produc)

Pooled Affect Model 

> pool <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("pooling"), index = c("state","year"))

> summary(pool)


Fixed Affect Model:

> fixed <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("within"), index = c("state","year"))

> summary(fixed)


Random Affect Model:
> random <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("random"), index = c("state","year"))

> summary(random)


Comparison

The comparison between the models would be a Hypothesis testing based on the following concept:

H0: Null Hypothesis: the individual index and time based params are all zero
H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Affect Model
Alternate Hypothesis : Fixed Affect Model

Command:
> pFtest(fixed,pool)

Result:

data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp) 
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16

Alternative hypothesis: significant effects 
Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.


Pooled vs Random

Null Hypothesis: Pooled Affect Model
Alternate Hypothesis: Random Affect Model

Command :
> plmtest(pool)

Result:

        Lagrange Multiplier Test - (Honda)
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp)
normal = 57.1686, p-value < 2.2e-16
Alternative hypothesis: significant effects 

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.


Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model
Alternate Hypothesis: Fixed Affect Model

Command:
 > phtest(fixed,random)

Result:

        Hausman Test
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp)
chisq = 93.546, df = 7, p-value < 2.2e-16
Alternative hypothesis: one model is inconsistent 

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Conclusion

So after making all the comparisons we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation






Wednesday, 13 February 2013

IT Business Application Lab : Session # 6


Assignment 1: Create log of returns data  and calculate its historical volatility.




Assignment 2: Create ACF plot of log returns and do Augmented Dickey-Fuller test.




Thursday, 7 February 2013

IT Business Application Lab : Session # 5



IT Business Application Lab Assignment: #Session 5

Assignment #1:  Calculate returns after converting a data set into Time Series format. 

Data set used: S&P CNX 500 01-01-2012-31-12-2013
                       http://goo.gl/XwXaQ

Commands Used:


> z<-read.csv(file.choose(),header=T)
> head(z)
> open<-z$Open[10:95]
> open.ts<-ts(open,deltat=1/252)
> open.ts
> summary(open.ts)
> z.diff<-diff(open.ts)
> z.diff
> returns<-cbind(open.ts,z.diff,lag(open.ts,k=-1))
> returns
> plot(returns)
> returns<-z.diff/lag(open.ts,k=-1)
> returns
> plot(returns)




Assignment #2: 1-700 data is available, Predict the data from 701-850, use the GLM estimation  using LOGIT.

Commands Used:



> z<-read.csv(file.choose(),header=T)
> p1<-z[1:700,1:9]
> head(p1)
> p1$ed<-factor(z1$ed)
> p1.est<-glm(default ~ age + ed + employ + address + income, data=p1, family ="binomial")
> summary(p1.est)
> forecast<-z[701:850,1:8]
> forecast$ed<-factor(forecast$ed)
> forecast$probability<-predict(p1.est,newdata=forecast,type="response")
> head(forecast) 


Workspace Text Filehttp://goo.gl/HgyQl

Wednesday, 23 January 2013

IT Business Application Lab - Session 3


Assignment 1(a): Regression Analysis

Given Data:

    mileage       groove
0 394.33
4 329.5
8 291
12 255.17
16 22.33
20 204.83
24 179
28 163.83
32 150.33


Residual Plot : Linearity is not applicable as plot is not scattered.




Assignment 1(b) :


Given Data :



Regression and Residual Plot:



QQ Plot and QQ Line :


Linearity is applicable as the plot is random.

Assignment 2 : ANOVA


As the p-value is high, we accept the null hypothesis (Ho).

Wednesday, 16 January 2013

Business Application Lab - Session 2

Assignment 1 : Usage of cbind



> a<-c(10,31,42,91,17,23,35,25,16)
> dim(a)<-c(3,3)
> a
     [,1] [,2] [,3]
[1,]   10   91   35
[2,]   31   17   25
[3,]   42   23   16
> b<-c(1,3,4,9,11,3,5,21,11)
> dim(b)<-c(3,3)
> b
     [,1] [,2] [,3]
[1,]    1    9    5
[2,]    3   11   21
[3,]    4    3   11
> cbind(a,b)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]   10   91   35    1    9    5
[2,]   31   17   25    3   11   21
[3,]   42   23   16    4    3   11


> cbind(a[3,],b[3,])
     [,1] [,2]
[1,]   42    4
[2,]   23    3
[3,]   16   11
> cbind(a[,3],b[,3])
     [,1] [,2]
[1,]   35    5
[2,]   25   21
[3,]   16   11

> cbind(a[,3],b[,1])
     [,1] [,2]
[1,]   35    1
[2,]   25    3
[3,]   16    4
>

Assignment 2,3:
Plotting the regression Line,Residuals and getting the intercepts:

nse<-read.csv(file.choose(),header=T)
> nse
        Date    Open    High     Low   Close Shares.Traded Turnover..Rs..Cr.
1  01-Oct-12 5704.75 5722.95 5694.00 5718.80     123138510           4798.17
2  03-Oct-12 5727.70 5743.25 5715.80 5731.25     165037864           6654.02
3  04-Oct-12 5751.55 5807.25 5751.35 5787.60     171404290           6954.74
4  05-Oct-12 5815.00 5815.35 4888.20 5746.95     255569804          12995.80
.
.
.
62 01-Jan-13 5937.65 5963.90 5935.20 5950.85      77902745           3298.74
63 02-Jan-13 5982.60 6006.05 5982.00 5993.25     116057389           4992.90
64 03-Jan-13 6015.80 6017.00 5986.55 6009.50      99989933           4883.13
65 04-Jan-13 6011.95 6020.75 5981.55 6016.15     113232990           5191.38
> high<-nse[,2]
> open<-nse[,3]
> regression<-lm(high~open,data=nse)
> plot(regression)
Waiting to confirm page change...
Waiting to confirm page change...


> regression<-lm(high~open,data=nse)
> regression

Call:
lm(formula = high ~ open, data = nse)

Coefficients:
(Intercept)         open  
    -67.107        1.008  

> regression<-lm(high~open)
> regression

Call:
lm(formula = high ~ open)

Coefficients:
(Intercept)         open  
    -67.107        1.008  
 residuals(regression)
          1           2           3           4           5           6 
  5.9196164   8.4171333 -32.2136015  23.0755554  23.9025346   3.5768010 
          7           8           9          10          11          12 
  9.0434099 -33.4664873 -19.1957821   4.8893273  15.7868442  21.1595596 
         13          14          15          16          17          18 
-23.0770034  15.8041206 -29.8198675  18.9857661  -5.7988354  10.6630371 
         19          20          21          22          23          24 
 -8.7952256  -9.1821291  -2.7901270 -15.2305431   9.2571252   8.0728993 
         25          26          27          28          29          30 
-12.6393487 -34.9886327 -11.5422560   3.3036613  -6.2999621  15.7551500 
         31          32          33          34          35          36 
 23.3551851  -0.6835477   9.6476114  16.0402458 -12.8085788   9.9675304 
         37          38          39          40          41          42 
 22.4595947  24.1235882 -50.6573763 -73.5107780 -26.3494972   1.8960932 
         43          44          45          46          47          48 
 -5.3223585  11.4560041   6.2200949   6.5652611  18.7398544 -19.0496646 
         49          50          51          52          53          54 
 15.8049260  15.6337479 -16.3058819  -2.6555063  -9.4538581   0.3937561 
         55          56          57          58          59          60 
 19.3572767  22.8798463  20.1007811 -29.6902402  21.9583548  -5.9285974 
         61          62          63          64          65 
  4.8469903  -3.9402752  -1.4568842  20.7108651  13.0826970 

Assignment 4: Plotting a normal distribution for the data.