Dummy variables in panel data.

March 21, 2013, 3:47 am

Hi

I am working on a regression looking at income inequality pre and post financial crisis across G-7 countries between 1975-2006. I have included a dummy variable for banking crisis years in the 7 countries across this time period.

I am wanting to add two more dummy variables for pre and post banking crisis periods however I would like to carry out multiple regressions of the same model but looking at different time horizons. e.g. looking at 1 year before and after the crisis and also looking at 2, 3, 4, and 5 years before and after each crisis took place. Due to lagged distributional effects.

Is there any way to do this? I have used the following paper as a basis for my analysis: http://www3.eeg.uminho.pt/economia/n...WP_30_2011.pdf

Thanks in advanced

ajl0411

note: I am looking at all banking crises not just the current 2007-08 crisis.
I am also using Stata to analyse my data.

↧

running maximum

March 21, 2013, 5:23 am

≫ Next: First differences

≪ Previous: Dummy variables in panel data.

I have the following series,

obs x
1 1.0000
2 1.1306
3 1.0142
4 1.0389
5 1.0781
6 1.1120
7 1.0829
8 1.1277
9 1.0926
10 1.1628
11 1.2528
12 1.2552
13 1.1949
14 1.2505

what I'm hoping to achieve is,

obs x y
1 1.0000 1.0000
2 1.1306 1.1306
3 1.0142 1.1306
4 1.0389 1.1306
5 1.0781 1.1306
6 1.1120 1.1306
7 1.0829 1.1306
8 1.1277 1.1306
9 1.0926 1.1306
10 1.1628 1.1628
11 1.2528 1.2528
12 1.2552 1.2552
13 1.1949 1.2552
14 1.2505 1.2552

where y is kind of like a running maximum but not within one variable. All I really need is to generate y by carrying the maximum of x forward and running.

I've tried the following, but none of them does the above exactly:

Code:

gen y=max(x, x[_n-1])



gen y=0

replace y= max(y, x[_n-1])

This can easily be done in excel by typing "=MAX(x2,y3)" at [Y2] and copy [Y2] down to the end [Y14].

happy to provide further clarifications if needed, thanks a lot in advance!

↧

First differences

March 21, 2013, 6:21 am

≫ Next: Stata Maths Problem Help

≪ Previous: running maximum

I have a panel data set over a 5 year period, with which I have undertaken pooled ols, fixed effect and random effect estimation. Conducting a Hausman test shows that fixed effects is the most appropriate model.
I have been advised that I need to carry out a first difference regression. Is it possible to conduct such regression over a period for over two years? If so, what are the steps that need to be taken?
I am using stata 12 if that helps.

↧

Stata Maths Problem Help

March 22, 2013, 6:36 am

≫ Next: ttest problem

≪ Previous: First differences

File is here: http://www.principlesofeconometrics....stata/star.dta

Question
(a) Using children who are in either a regular-sized class or a small class, estimate the regression model explaining students’ combined aptitude scores as a function of class size,TOTALSCORE= B1+B2SMALL+ei. Interpret the estimates. Based on this regression result, what do you conclude about the effect of class size on learning?

(b) Repeat part (a) using dependent variables READSCORE and MATHSCORE. Do you observe any differences?

(c) Using children who are in either a regular-sized class or a regular-sized class with a teacher aide, estimate the regression model explaining student’s combined aptitude scores as a function of the presence of a teacher aide, TOTALSCORE= y1+y2AIDE. Interpret the estimates. Based on this regression result, what do you conclude about the effect on learning of adding a teacher aide to the classroom?

(d) Repeat part (c) using dependent variables READSCORE and MATHSCORE. Do you observe any differences?

I would really appreciate if someone could help me with this problem.

↧

ttest problem

March 22, 2013, 11:59 am

≫ Next: infix Dictionary issues

≪ Previous: Stata Maths Problem Help

Hallo Stata Users!

I have to test whether autocratic regimes produce higher rates of mortality in children aged under 5 than democratic regimes.
The p_polity2 variable has 10 values ranging from 0 to 10.
ihme_fmort is the mortality rate under 5.
I tried:

ttest ihme_fmort if p_polity2<=5| p_polity2>6, by( p_polity2)

but it does not work, being the error: more than 2 groups found, only 2 allowed.
How can I do that?
thanks a lot!

↧

infix Dictionary issues

March 23, 2013, 9:48 pm

≫ Next: Adding values for one variable given similar observations of another variable

≪ Previous: ttest problem

I'm having an unexpected problem with a dictionary I created to use with infix.

I'm using Stata 12.1 for 64bit windows.

Here is my dictionary:
I've replaced my file path.

infix dictionary using "MyData.dat" {
str state 1-2
q1-q4 17-20
q5 21-28
q6 29-32
q7 33-38
q8-q86 39-117
qn8-qn86 185-263
qnfrcig 350
qnanytob 351
qndepo 352
qndepopl 353
qndual 354
qnfrvg 355
qnfrvg2 356
qnfruit 357
qnveg 358
qndlype 359
qnpa0day 360
qnpa7day 361
qnowt 362
qnobese 363
weight 364-373
stratum 374-376
bmipct 383-387
raceeth 389-389
}

This is the response I get in the log.
...
stratum 374-376
bmipct 383-387
raceeth 389-389
dictionary invalid

I feel that this should be a simple issue to resolve, but I have not been able to.
Thank you for any help
-Dave

↧

Adding values for one variable given similar observations of another variable

March 24, 2013, 9:49 am

≫ Next: Creating new variable about relationship between observations

≪ Previous: infix Dictionary issues

I am wondering how to add observations for var1 given that var2 and var3 have the same value for each observation.

For example:

var2 var3 var1
4 1/2/03 5.3
4 1/2/03 12.1
4 1/2/04 10.0
5 1/2/03 1.7

In this case, I want to add 5.3 + 12.1 since the same value of var2 corresponds to the same date in var3.

How would I execute this?

↧

Creating new variable about relationship between observations

March 25, 2013, 4:23 pm

≫ Next: Homework Problem

≪ Previous: Adding values for one variable given similar observations of another variable

Hi Everybody,

I have a tricky question (well I find it tricky) about creating a new variable. Just as an illustration, suppose I have 3 people's heights and let's call them A, B and C. I want to create a variable which is the difference between each person's height (in absolute terms). That is, I want to construct a variable which tells me Height(A) - Height(B), Height(A) - Height(C), Height(B) - Height(C). And if you have more than 3 as you see it can get rather complicated.

I have been trying to construct this manually in excel, but because it is a rather large dataset it has become a nightmare. Does anybody have any idea how I could code something like this? Thank you very much!

Nick

↧

Homework Problem

March 25, 2013, 6:32 pm

≫ Next: Plotting least squares residuals

≪ Previous: Creating new variable about relationship between observations

Stata newbie here. Cannot figure how to do the following...

build a scale called Americanism, which measures a Latino respondents beliefs that being an American embodies specific behaviors. More specifically, the scale will measure the extent to which one believes in a restrictive definition of an American. This scale should consist of the following variables: AMERBORN (that an American is born in the U.S.); AMERENGL (that an American speaks English); AMERWHIT (that an American is white); and, AMERCHRT (that an American is a Christian). Each variable is measured from 1 to 3, 1 equal to Not Important and 3 equal to Very Important.

Any help greatly appreciated

↧

Plotting least squares residuals

March 29, 2013, 6:00 am

≫ Next: Log linear model (How to fit it on a graph?)

≪ Previous: Homework Problem

[SOLVED] xx

↧

Log linear model (How to fit it on a graph?)

March 29, 2013, 6:31 am

≫ Next: Need HELP with my model!! PLEASE

≪ Previous: Plotting least squares residuals

For the log-linear fitted line what is the word you put in to fit the line? For example i know that the quadratic model is qfit I dont know the code for the log linear model.

I know the code is: twoway (scatter price sqft) (qfit price sqft)

just what replaces qfit for a log linear model?

regards

↧

Need HELP with my model!! PLEASE

March 30, 2013, 9:07 am

≫ Next: winsorising problem

≪ Previous: Log linear model (How to fit it on a graph?)

Hi everyone,
I am stuck in my model, it's been more than two years that I have not had any econometric courses and I have forgotten some parts...
So, here's the thing: I want to see the effect of the farmland values on the urban sprawl and as a proxy of sprawl, I am using the "land permits" data on county level for Kansas.
as explanatory variable, I have income, density of population, farmland price and a dummy variable, 1 for a county being adjacent to the metropolitan area and 0 for the opposite case.
When I run the model with the dummy, the dummy turns out to be insignificant and the model seems good without it, (Rsquared, pvalue and the signs are correct) BUT

1- My datas are not normally distributed (but quite large sample)
2- httest shows I have the heteroscedasity problem.
3- my residuals are not normally distributed.
4- ovtest shows that my model has some omitted variables (I tried to add some others but they are not statically significant! :( )

I know its awful :shakehead but I have no idea WHERE I should start, I need some help,

here is the result of regression:

****
reg permits nilv income density dummy

Source | SS df MS Number of obs = 105
-------------+------------------------------ F( 4, 100) = 45.51
Model | 1642023.55 4 410505.888 Prob > F = 0.0000
Residual | 901959.21 100 9019.5921 R-squared = 0.6455
-------------+------------------------------ Adj R-squared = 0.6313
Total | 2543982.76 104 24461.3727 Root MSE = 94.972

------------------------------------------------------------------------------
permits | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nilv | -.0356831 .0147927 -2.41 0.018 -.0650313 -.0063349
income | .0052909 .0016778 3.15 0.002 .0019622 .0086197
density | .958824 .127327 7.53 0.000 .7062109 1.211437
dummy | 31.06607 20.52525 1.51 0.133 -9.655439 71.78757
_cons | -186.5026 70.08598 -2.66 0.009 -325.5512 -47.45402
------------------------------------------------------------------------------

and here it is without dummy:

****
reg permits nilv income density

Source | SS df MS Number of obs = 105
-------------+------------------------------ F( 3, 101) = 59.16
Model | 1621361.06 3 540453.687 Prob > F = 0.0000
Residual | 922621.701 101 9134.86832 R-squared = 0.6373
-------------+------------------------------ Adj R-squared = 0.6266
Total | 2543982.76 104 24461.3727 Root MSE = 95.577

------------------------------------------------------------------------------
permits | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nilv | -.0343146 .0148591 -2.31 0.023 -.063791 -.0048382
income | .0052326 .0016881 3.10 0.003 .0018839 .0085813
density | .9651885 .1280682 7.54 0.000 .7111357 1.219241
_cons | -176.8083 70.23728 -2.52 0.013 -316.1401 -37.47641
------------------------------------------------------------------------------

** ovtest

Ramsey RESET test using powers of the fitted values of permits
Ho: model has no omitted variables
F(3, 98) = 22.88
Prob > F = 0.0000

** hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of permits

chi2(1) = 533.52
Prob > chi2 = 0.0000

* sktest r, noadj

Skewness/Kurtosis tests for Normality
------- joint ------
Variable | Obs Pr(Skewness) Pr(Kurtosis) chi2(2) Prob>chi2
-------------+---------------------------------------------------------------
r | 105 0.0000 0.0000 65.22 0.0000

Any idea where I should start?

↧

winsorising problem

April 2, 2013, 1:26 am

≫ Next: dealing with indicator variables in STATA

≪ Previous: Need HELP with my model!! PLEASE

Hi guys

I installed the package "winsor" to winsorise my data set. Now I'd like to winsorise my data set so that all values lie between 0 and 1.
I have only found out how to winsorise a percentage or a certain amount of my observations, but not how I can set the value the data should be winsorised to.
Does anyone know how this can be done?
Thank you!

the command to winsorise with the percantage would be:
winsor var, p(.01) gen(wvar)

↧

dealing with indicator variables in STATA

April 2, 2013, 10:49 am

≫ Next: Blundell-Bond estimation of dynamic panel data in Stata 12

≪ Previous: winsorising problem

hi all, I am trying to analyze a data set with a variable for total years of eduction.

The variable is partially an indicator variable and is set up like this:

5 years or less education = 1
5 to 8 years education = 2
9 year of eduction = 3
10 years of education =4
11 years of education = 5
12 years of education =6
13 years of education =7
14 to 17 years education =8
18 or more years = 9

I am trying to find a way to regress years of education against salary, but to do that I need to generate a new variable. I'm not sure how to proceed with this. Any thoughts?

↧

Blundell-Bond estimation of dynamic panel data in Stata 12

April 4, 2013, 7:40 am

≫ Next: Regression: why aren't all observations included?

≪ Previous: dealing with indicator variables in STATA

Hello everyone,

For my master's thesis, I need to estimate the following equation with the Blundell-Bond GMM method:

Leverage(i,t) = logtotalassets(i,t-1) + intangibles(i,t-1) + dividends(i,t-1) + r&dexpenditures(i,t-1) + leverage(i,t-1)

As I am working with panel data, I would also like to include cross-sectional fixed effects.

I know I must use the command 'xtdpdsys', but can anyone tell me which expression I need to use to estimate my equation?

Thanks already a lot.

↧

Regression: why aren't all observations included?

April 6, 2013, 2:02 am

≫ Next: likelihood ratio test for SUR model in stata?

≪ Previous: Blundell-Bond estimation of dynamic panel data in Stata 12

Hi everybody

I did a simple OLS-regression with stata for a panel dataset (group: years, individuals: firms). I have a total of 117960 observation. Now, the OLS-regression is done with only 97365 observations. It's clear that some observations aren't included in the regression, since some values are missing. So, I calculated how many observations have missing values and therefore can't be included in the regression: it gives me 11369 observation with missing values, which would include 106591 observation in the regression.
Now why are there around 10'000 observations which aren't included in the regression?
Did I miss to set something?

If someone could help me I would be very grateful!

Ps: I did the regression as follows:
regress cr wmsdcf_at wmtbr realsi wcf_at wsnwc wcapxr wlev wrrdsa divdum wacac if fyear>=1980 & fyear<=2006, vce(robust)

↧

likelihood ratio test for SUR model in stata?

April 7, 2013, 3:39 am

≫ Next: Looking for help! Reshaping my database

≪ Previous: Regression: why aren't all observations included?

Could anyone please tell me how do we do likelihood ratio test for sureg model. For instance i run constrained and unconstrained models with an 'sureg' command, and i only get chi2 values, where would i get the log-likelihood values to create a Likelihood ratio?

↧

Looking for help! Reshaping my database

April 7, 2013, 4:45 am

≫ Next: heteroskedasticity in fixed effect

≪ Previous: likelihood ratio test for SUR model in stata?

Hello everyone,
I am having troubles with reshaping my data. The data currently looks as follows:

Firm Variables 1990 1991 1992 etc.
x-----1
x-----2
x-----3
y-----1
y-----2
y-----3
z-----1
z-----2
z-----3

I would like the data to look as follows:

Firm Year Variable1 Variable2 Variable3
x---1990
x---1991
x---1992
y---1990
y---1991

I started off with the command: reshape long y, i(Name Variable) j(Year)
Now I have all my data long. Like this:

Firm Year Variables Data
x---1990----1
x---1991----2
x---1992----3
y---1990----1
y---1991----2
y---1992----3

Next I want to make my variables wide, I already encoded the variable names because these are string variables, with the command: reshape wide Encoded_Variables, i(Name Year y) j(Variables). But when I do this it screws up my database, it does not correctly assign my data across variables, firms and year. The variables are correct, but they are also in the database itself. Like this:

Firm Year Data Variable1 Variable2 Variable3
x---1990---------V1-----V2
x---1991---------V1-----V2
x---1992---------V1-----V2
y---1990
y---1991---------V1-----V3

If anybody could help, it would really be appreciated.

↧

heteroskedasticity in fixed effect

April 7, 2013, 8:29 am

≫ Next: STATA help with variable renaming!

≪ Previous: Looking for help! Reshaping my database

Hi guys,

Recently i am testing heteroskedasticity in fixed effect by using xttest3. I found my Prob>Chi2 = 1.0000, could i assume my model does not have heteroskedasticity problem because Prob>Chi higher than 10%??

Thanx,

↧

STATA help with variable renaming!

April 8, 2013, 7:56 am

≫ Next: Generate a variable based on 3 dummy variables

≪ Previous: heteroskedasticity in fixed effect

Hello stata crew,

I've encountered a peculiar problem and can't find a good fix. In short, I'm running surveys, exporting the data to excel, and then appending to my stata master file on a regular basis using insheet or the import function (I'm using stata 12). I do this on a weekly basis and have had no problems. Until now. I just installed office365 and was greeting with the following message when I tried to import the latest excel file:

Unable to load excel file
Error: Mandatory element missing

A quick google search shows that a few other people have had this issue, but no one reports a fix.

No problem, I though, I'll either save as a comma delineated file or copy and paste into data editor. Both get the data into stata, but both have the same problem -- when I specify that the first line of data should be saved as variable names, all of the variables names are changed to lower case. In my original data set, first letter of variables are capitalized, the rest is lower case (ex: Bio1). This means that I can't append the data sets, because the variable names don't match (stata saves Bio1 as bio1).

I have several hundred columns (due to the way qualtrics handles survey experiments and randomization), so renaming each variable by hand would take a long really long time, and I'd need to do it each time I import new data. So, two questions:

1. Any idea why I can't import the excel file properly anymore? I can't find an explanation for the problem either.
2. Any idea how I can import the data manually (copy and paste or csv), but keep stata from changing the variable names to all lower case?

Really appreciate any help! I'm far away from the office at the moment, so there are no doors for me to knock on!

↧