Saturday, 30 March 2013

IT BAL - Session # 10 - Plotting in R

PLOTTING IN R

Assignment 1:

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length, bind them together.Create 3 dimensional plots of the same.

Data Set Creation Commands and DataSet :

Normal Plot: plot3d(T[, 1:3])

Colour Plot: plot3d(T[, 1:3], col = rainbow(1000))

Color Plot of spheres: plot3d(T[, 1:3], col = rainbow(1000), type = 's')

Assignment 2:

Choose 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories)
3. Color code and draw the graph
4. Smooth and best fit line for the curve

Data set creation for two random variables and then introducing third variable z

Plots:

>qplot(x,y)

>qplot(x,z)

Semi-transparent plot

> qplot(x,z, alpha=I(2/10))

Colour plot

> qplot(x,y, color=z)

Logarithmic colour plot

> qplot(log(x),log(y), color=z)

Best Fit and Smooth curve using "geom"

> qplot(x,y,geom=c("path","smooth"))

> qplot(x,y,geom=c("point","smooth"))

> qplot(x,y,geom=c("boxplot","jitter"))

Sunday, 24 March 2013

ITBAL Session # 9 ( Data Visualization)

Many Eyes:

Developed by IBM, Many Eyes allows you to quickly build visualizations from publicly available or uploaded data sets, and features a wide range of analysis types including the ability to scan text for keyword density and saturation.

Visualization Options Available in Many Eyes

Choosing a visualization type:

Analyze a text

Word Tree

See a branching view of how a word or phrase is used in a text. Navigate the text by zooming and clicking.

Compare a set of values

Bar Chart

How do the items in your data set stack up? A bar chart is a simple and recognizable way to compare values. You can display several sets of bars for multivariate comparisons.

Block Histogram

This versatile chart lets you get a quick sense of how a single set of data is distributed. Each item in the data is an individually identifiable block.

Bubble Chart

Have so many items that your bar chart is baffling? Do the values vary so much that one bar pushes to the top of the screen while another virtually disappears? Try our bubble chart, which displays values as circles of different sizes.

See relationships among data points

Scatter Plot

Point one variable across the x-axis, the other up the y-axis. The size of a dot can represent a third variable. The classic scatterplot gives you a bird's eye view of how your factors relate to each other.

Matrix Chart

A grid-based view of multidimensional data.

Learn more

Network Diagram

Is your data all about relationships? Take a set of links -- say flight departure and arrival points or romantic pairings -- and see the connections laid out as a network.

Learn more

See the parts of a whole

Pie Chart

Each component is a slice of the big pie. A simple and popular classic.

Treemap

The pie chart's big brother. Treemaps divide up a rectangle into hierarchical categories, letting you see relationships among large numbers of components. This lets you get an overview of a complex whole -- and drill down.

Treemap for Comparisons

Want to map a comparison of now vs. then? City vs. highway? Decaf vs. regular? This version of the treemap lets you directly compare two different takes on a set of categorized items.

Track rises and falls over time

Line Graph

Put the value you're measuring on the y-axis and draw lines to watch items change over time. (Think stock prices.)

Stack Graph

Track the changing values of items that add together to make a whole, like the components of a budget or the sales figures of multiple divisions. Also known as an "area chart."

Stack Graph for Categories

This version of the stacked graph is designed for items arranged into a set of categories and subcategories.

Data Set : World Development Indicators ( Source: World Bank)

http://goo.gl/2iWnI

Data Visualization

Bubble Chart :

Scatter Plot :

Pie Chart :

Customizing Stack Graph for Categories:

Visualizing Your Data With IBM’s Many Eyes :

Many Eyes is a powerful tool that enables a user to create visualizations from any kind of data set.

Here’s where it gets fun: while a user can upload their own data set, Many Eyes is a community-powered tool. There are over 150,000 data sets to choose from, and many are pre-visualized.

Topic Centers allow teams of people to collaborate on visualizations. Topic Centers are organized around certain topics as well as teams of people at organizations and classes.

But selecting a dataset from the community is not always the best option: the metadata associated with many of the datasets is inaccurate or incomplete. Rest assured, because what makes Many Eyes such a versatile tool is that any type of data is accepted, so long as it is in a structured format. Data needs to be pre-formatted in Microsoft Excel (or similar spreadsheet software), then pasted into Many Eyes’ Web interface.

Then the user is presented with an array of visualization options, from tag clouds and word trees to assorted graphs and even maps.

Friday, 15 March 2013

IT BAL Assignment - Session 8

Assignment :- Load the data "Produc" and do the Panel Data Analysis.

We will be analysing on three types of model :
Pooled affect model
Fixed affect model
Random affect model

Then we will be determining which model is the best by using functions:
pFtest : for determining between fixed and pooled
plmtest : for determining between pooled and random
phtest: for determining between random and fixed

Commands:

Loading data:
> data(Produc , package ="plm")
> head(Produc)

Pooled Affect Model

> pool <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("pooling"), index = c("state","year"))

> summary(pool)

Fixed Affect Model:

> fixed <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("within"), index = c("state","year"))

> summary(fixed)

Random Affect Model:

> random <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("random"), index = c("state","year"))

> summary(random)

Comparison

The comparison between the models would be a Hypothesis testing based on the following concept:

H0: Null Hypothesis: the individual index and time based params are all zero

H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis : Fixed Affect Model

Command:

> pFtest(fixed,pool)

Result:

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16

Alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Pooled vs Random

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis: Random Affect Model

Command :

> plmtest(pool)

Result:

Lagrange Multiplier Test - (Honda)

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

normal = 57.1686, p-value < 2.2e-16

Alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.

Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model

Alternate Hypothesis: Fixed Affect Model

Command:

> phtest(fixed,random)

Result:

Hausman Test

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

chisq = 93.546, df = 7, p-value < 2.2e-16

Alternative hypothesis: one model is inconsistent

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Conclusion:

So after making all the comparisons we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation

Wednesday, 13 February 2013

IT Business Application Lab : Session # 6

Assignment 1: Create log of returns data and calculate its historical volatility.

Assignment 2: Create ACF plot of log returns and do Augmented Dickey-Fuller test.

Thursday, 7 February 2013

IT Business Application Lab : Session # 5

IT Business Application Lab Assignment: #Session 5

Assignment #1: Calculate returns after converting a data set into Time Series format.

Data set used: S&P CNX 500 01-01-2012-31-12-2013

http://goo.gl/XwXaQ

Commands Used:

> z<-read.csv(file.choose(),header=T)

> head(z)

> open<-z$Open[10:95]
> open.ts<-ts(open,deltat=1/252)
> open.ts
> summary(open.ts)

> z.diff<-diff(open.ts)

> z.diff

> returns<-cbind(open.ts,z.diff,lag(open.ts,k=-1))
> returns
> plot(returns)
> returns<-z.diff/lag(open.ts,k=-1)
> returns
> plot(returns)

Assignment #2: 1-700 data is available, Predict the data from 701-850, use the GLM estimation using LOGIT.

Commands Used:

> z<-read.csv(file.choose(),header=T)
> p1<-z[1:700,1:9]
> head(p1)
> p1$ed<-factor(z1$ed)
> p1.est<-glm(default ~ age + ed + employ + address + income, data=p1, family ="binomial")
> summary(p1.est)
> forecast<-z[701:850,1:8]
> forecast$ed<-factor(forecast$ed)
> forecast$probability<-predict(p1.est,newdata=forecast,type="response")
> head(forecast)

Workspace Text File : http://goo.gl/HgyQl

Wednesday, 23 January 2013

IT Business Application Lab - Session 3

Assignment 1(a): Regression Analysis

Given Data:

mileage	groove
0	394.33
4	329.5
8	291
12	255.17
16	22.33
20	204.83
24	179
28	163.83
32	150.33

Residual Plot : Linearity is not applicable as plot is not scattered.

Assignment 1(b) :

Given Data :

Regression and Residual Plot:

QQ Plot and QQ Line :

Linearity is applicable as the plot is random.

Assignment 2 : ANOVA

As the p-value is high, we accept the null hypothesis (Ho).

Wednesday, 16 January 2013

Business Application Lab - Session 2

Assignment 1 : Usage of cbind

> a<-c(10,31,42,91,17,23,35,25,16)
> dim(a)<-c(3,3)
> a
[,1] [,2] [,3]
[1,] 10 91 35
[2,] 31 17 25
[3,] 42 23 16
> b<-c(1,3,4,9,11,3,5,21,11)
> dim(b)<-c(3,3)
> b
[,1] [,2] [,3]
[1,] 1 9 5
[2,] 3 11 21
[3,] 4 3 11
> cbind(a,b)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 10 91 35 1 9 5
[2,] 31 17 25 3 11 21
[3,] 42 23 16 4 3 11

> cbind(a[3,],b[3,])
[,1] [,2]
[1,] 42 4
[2,] 23 3
[3,] 16 11
> cbind(a[,3],b[,3])
[,1] [,2]
[1,] 35 5
[2,] 25 21
[3,] 16 11

> cbind(a[,3],b[,1])
[,1] [,2]
[1,] 35 1
[2,] 25 3
[3,] 16 4

Assignment 2,3:
Plotting the regression Line,Residuals and getting the intercepts:

nse<-read.csv(file.choose(),header=T)

> nse

Date Open High Low Close Shares.Traded Turnover..Rs..Cr.

1 01-Oct-12 5704.75 5722.95 5694.00 5718.80 123138510 4798.17

2 03-Oct-12 5727.70 5743.25 5715.80 5731.25 165037864 6654.02

3 04-Oct-12 5751.55 5807.25 5751.35 5787.60 171404290 6954.74

4 05-Oct-12 5815.00 5815.35 4888.20 5746.95 255569804 12995.80

62 01-Jan-13 5937.65 5963.90 5935.20 5950.85 77902745 3298.74

63 02-Jan-13 5982.60 6006.05 5982.00 5993.25 116057389 4992.90

64 03-Jan-13 6015.80 6017.00 5986.55 6009.50 99989933 4883.13

65 04-Jan-13 6011.95 6020.75 5981.55 6016.15 113232990 5191.38

> high<-nse[,2]

> open<-nse[,3]

> regression<-lm(high~open,data=nse)

> plot(regression)

Waiting to confirm page change...

> regression<-lm(high~open,data=nse)

> regression

Call:

lm(formula = high ~ open, data = nse)

Coefficients:

(Intercept) open

-67.107 1.008

> regression<-lm(high~open)

> regression

Call:

lm(formula = high ~ open)

Coefficients:

(Intercept) open

-67.107 1.008

residuals(regression)

1 2 3 4 5 6

5.9196164 8.4171333 -32.2136015 23.0755554 23.9025346 3.5768010

7 8 9 10 11 12

9.0434099 -33.4664873 -19.1957821 4.8893273 15.7868442 21.1595596

13 14 15 16 17 18

-23.0770034 15.8041206 -29.8198675 18.9857661 -5.7988354 10.6630371

19 20 21 22 23 24

-8.7952256 -9.1821291 -2.7901270 -15.2305431 9.2571252 8.0728993

25 26 27 28 29 30

-12.6393487 -34.9886327 -11.5422560 3.3036613 -6.2999621 15.7551500

31 32 33 34 35 36

23.3551851 -0.6835477 9.6476114 16.0402458 -12.8085788 9.9675304

37 38 39 40 41 42

22.4595947 24.1235882 -50.6573763 -73.5107780 -26.3494972 1.8960932

43 44 45 46 47 48

-5.3223585 11.4560041 6.2200949 6.5652611 18.7398544 -19.0496646

49 50 51 52 53 54

15.8049260 15.6337479 -16.3058819 -2.6555063 -9.4538581 0.3937561

55 56 57 58 59 60

19.3572767 22.8798463 20.1007811 -29.6902402 21.9583548 -5.9285974

61 62 63 64 65

4.8469903 -3.9402752 -1.4568842 20.7108651 13.0826970

Bharat Creations

Pages

Saturday, 30 March 2013

IT BAL - Session # 10 - Plotting in R

PLOTTING IN R

Sunday, 24 March 2013

ITBAL Session # 9 ( Data Visualization)

ITBAL Session # 9 ( Data Visualization)

Visualization Options Available in Many Eyes

Compare a set of values

See relationships among data points

See the parts of a whole

Track rises and falls over time

Data Visualization

Customizing Stack Graph for Categories:

Visualizing Your Data With IBM’s Many Eyes :

Friday, 15 March 2013

IT BAL Assignment - Session 8

Wednesday, 13 February 2013

IT Business Application Lab : Session # 6

Assignment 1: Create log of returns data and calculate its historical volatility.

Thursday, 7 February 2013

IT Business Application Lab : Session # 5

IT Business Application Lab Assignment: #Session 5

Wednesday, 23 January 2013

IT Business Application Lab - Session 3

Assignment 1(a): Regression Analysis

Assignment 1(b) :

Assignment 2 : ANOVA

Wednesday, 16 January 2013

Business Application Lab - Session 2

Assignment 1 : Usage of cbind

Assignment 2,3:
Plotting the regression Line,Residuals and getting the intercepts:

Assignment 4: Plotting a normal distribution for the data.

Pages

Saturday, 30 March 2013

IT BAL - Session # 10 - Plotting in R

PLOTTING IN R

Sunday, 24 March 2013

ITBAL Session # 9 ( Data Visualization)

ITBAL Session # 9 ( Data Visualization)

Visualization Options Available in Many Eyes

Compare a set of values

See relationships among data points

See the parts of a whole

Track rises and falls over time

Data Visualization

Customizing Stack Graph for Categories:

Visualizing Your Data With IBM’s Many Eyes :

Friday, 15 March 2013

IT BAL Assignment - Session 8

Wednesday, 13 February 2013

IT Business Application Lab : Session # 6

Assignment 1: Create log of returns data and calculate its historical volatility.

Thursday, 7 February 2013

IT Business Application Lab : Session # 5

IT Business Application Lab Assignment: #Session 5

Wednesday, 23 January 2013

IT Business Application Lab - Session 3

Assignment 1(a): Regression Analysis

Assignment 1(b) :

Assignment 2 : ANOVA

Wednesday, 16 January 2013

Business Application Lab - Session 2

Assignment 1 : Usage of cbind

Assignment 2,3:Plotting the regression Line,Residuals and getting the intercepts:

Assignment 4: Plotting a normal distribution for the data.

Assignment 2,3:
Plotting the regression Line,Residuals and getting the intercepts: