PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0
Showing posts with label linear-regression. Show all posts
Showing posts with label linear-regression. Show all posts

Sunday, October 9, 2022

[FIXED] Why is my variables linearly dependent? Regression, Diff-n-diff, interaction term and dummies

 October 09, 2022     linear-regression, r, regression, statistics     No comments   

Issue

I've created a small dataframe for testing differences-in-differences, in order to gain some intuition about the method and theory. I guess I have two questions.

  1. Why is the correlation between free_cookies and free_cookies*teenager = 1?
  2. Is there a way to fix the data so that the regression lm(cookies_eaten ~ teenager + free_cookies + teenager*free_cookies, data), does not drop the interaction term(free_cookies*teenager)?

It should be possible to run a regression with the format

outcome ~ dummy1 + dummy2 + dummy1*dummy2

and get coefficient estimates for all independent variables, which I've seen work elsewhere. To be clear: teenager and free_cookies are dummy variables. I'm guessing I've just done something silly when I constructed my sample data.

# cookie eating data
data <- read.table(text = "

year    cookies_eaten   teenager    free_cookies
2000    110 1   0
2001    110 1   0
2002    120 1   0
2003    120 1   0
2004    125 1   0
2005    125 1   0
2006    125 1   0
2007    145 1   1
2008    155 1   1
2009    160 1   1
2010    160 1   1
2000    100 0   0
2001    100 0   0
2002    110 0   0
2003    110 0   0
2004    115 0   0
2005    115 0   0
2006    115 0   0
2007    115 0   0
2008    115 0   0
2009    120 0   0
2010    120 0   0", header=TRUE)


# Regressions
one <- lm(cookies_eaten ~ teenager, data)
summary(one)

two <- lm(cookies_eaten ~ teenager + free_cookies, data)
summary(two)

three <- lm(cookies_eaten ~ teenager + free_cookies + teenager*free_cookies, data)
summary(three) # Coefficients: (1 not defined because of singularities)

# four without free_cookies
four <- lm(cookies_eaten ~ teenager + teenager*free_cookies, data)
summary(four) # Coefficients: (1 not defined because of singularities)

# Corrolation testing
attach(data)
cor(free_cookies, free_cookies*teenager, method = c("pearson", "kendall", "spearman"))
# = 1
cor(cookies_eaten, free_cookies*teenager, method = c("pearson", "kendall", "spearman"))
# = 0.9090648
detach(data)


Solution

Looking at the data one can easily see that whenever teenager == 0 there is also free_cookies==0 So these data are in perfect alignment. When teenager==1 every value of free_cookies is multiplied by 1 so that does not change anything on free_cookies so that is why free_cookies and teenager times free_cookies is always the same value so the correlation is 1. With these data you cannot investigate interactions. You need to sample some data where teenager == 0 and free_cookies ==1.

data <- read.table(text = "
year    cookies_eaten   teenager    free_cookies
2000    110 1   0
2001    110 1   0
2002    120 1   0
2003    120 1   0
2004    125 1   0
2005    125 1   0
2006    125 1   0
2007    145 1   1
2008    155 1   1
2009    160 1   1
2010    160 1   1
2000    100 0   0
2001    100 0   0
2002    110 0   0
2003    110 0   0
2004    115 0   0
2005    115 0   0
2006    115 0   0
2007    115 0   0
2008    115 0   0
2009    120 0   0
2010    120 0   0", header=TRUE)

data$interaction <- data$teenager * data$free_cookies

print(data[, c("free_cookies", "interaction")])

any(data$free_cookies != data$interaction)


Answered By - Bernhard
Answer Checked By - Willingham (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

Saturday, October 8, 2022

[FIXED] How to force zero interception in linear regression?

 October 08, 2022     linear-regression, numpy, python, scipy, statistics     No comments   

Issue

I have some more or less linear data of the form:

x = [0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 20.0, 40.0, 60.0, 80.0]
y = [0.50505332505407008, 1.1207373784533172, 2.1981844719020001, 3.1746209003398689, 4.2905482471260044, 6.2816226678076958, 11.073788414382639, 23.248479770546009, 32.120462301367183, 44.036117671229206, 54.009003143831116, 102.7077685684846, 185.72880217806673, 256.12183145545811, 301.97120103079675]

I am using scipy.optimize.leastsq to fit a linear regression to this:

def lin_fit(x, y):
    '''Fits a linear fit of the form mx+b to the data'''
    fitfunc = lambda params, x: params[0] * x + params[1]    #create fitting function of form mx+b
    errfunc = lambda p, x, y: fitfunc(p, x) - y              #create error function for least squares fit

    init_a = 0.5                            #find initial value for a (gradient)
    init_b = min(y)                         #find initial value for b (y axis intersection)
    init_p = numpy.array((init_a, init_b))  #bundle initial values in initial parameters

    #calculate best fitting parameters (i.e. m and b) using the error function
    p1, success = scipy.optimize.leastsq(errfunc, init_p.copy(), args = (x, y))
    f = fitfunc(p1, x)          #create a fit with those parameters
    return p1, f    

And it works beautifully (although I am not sure if scipy.optimize is the right thing to use here, it might be a bit over the top?).

However, due to the way the data points lie it does not give me a y-axis interception at 0. I do know though that it has to be zero in this case, if x = 0 than y = 0.

Is there any way I can force this?


Solution

I am not adept at these modules, but I have some experience in statistics, so here is what I see. You need to change your fit function from

fitfunc = lambda params, x: params[0] * x + params[1]  

to:

fitfunc = lambda params, x: params[0] * x 

Also remove the line:

init_b = min(y) 

And change the next line to:

init_p = numpy.array((init_a))

This should get rid of the second parameter that is producing the y-intercept and pass the fitted line through the origin. There might be a couple more minor alterations you might have to do for this in the rest of your code.

But yes, I'm not sure if this module will work if you just pluck the second parameter away like this. It depends on the internal workings of the module as to whether it can accept this modification. For example, I don't know where params, the list of parameters, is being initialized, so I don't know if doing just this will change its length.

And as an aside, since you mentioned, this I actually think is a bit of an over-the-top way to optimize just a slope. You could read up linear regression a little and write small code to do it yourself after some back-of-the envelope calculus. It's pretty simple and straightforward, really. In fact, I just did some calculations, and I guess the optimized slope will just be <xy>/<x^2>, i.e. the mean of x*y products divided by the mean of x^2's.



Answered By - Abhranil Das
Answer Checked By - Cary Denson (PHPFixing Admin)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

Friday, October 7, 2022

[FIXED] How do I find the estimated decay constants from my linear mixed effects model?

 October 07, 2022     linear-regression, lme4, mixed-models, r, statistics     No comments   

Issue

I am trying to figure out how to find the estimated slope for my linear mixed effects model, where I am looking at the decay rate of log-transformed copies of DNA/RNA over time. My lmer model is as follows:

model1 <- lmer(log.copies ~ 0 + sample + hours + sample:hours + 
                (1 | uniqueID), data)

I am trying to find the slope estimate (the decay rate) of each sample, but I am not sure how to find that information. Any insight would be highly appreciated.

Here is the data frame for the model:

uniqueID <- c('4a', '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c',  '4a',   '4b',   '4c',   '7a',   '7b',   '7c',   '10a',  '10b',  '10c')
hours <- c('0', '0',    '0',    '0',    '0',    '0',    '0',    '0',    '0',    '1',    '1',    '1',    '1',    '1',    '1',    '1',    '1',    '1',    '3',    '3',    '3',    '3',    '3',    '3',    '3',    '3',    '3',    '5',    '5',    '5',    '5',    '5',    '5',    '5',    '5',    '5',    '12',   '12',   '12',   '12',   '12',   '12',   '12',   '12',   '12',   '24',   '24',   '24',   '24',   '24',   '24',   '24',   '24',   '24',   '48',   '48',   '48',   '48',   '48',   '48',   '48',   '48',   '48',   '96',   '96',   '96',   '96',   '96',   '96',   '96',   '96',   '96',   '168',  '168',  '168',  '168',  '168',  '168',  '168',  '168',  '168',  '336',  '336',  '336',  '336',  '336',  '336',  '336',  '336',  '336',  '504',  '504',  '504',  '504',  '504',  '504',  '504',  '504',  '504',  '720',  '720',  '720',  '720',  '720',  '720',  '720',  '720',  '720',  '0',    '0',    '0',    '0',    '0',    '0',    '0',    '0',    '0',    '1',    '1',    '1',    '1',    '1',    '1',    '1',    '1',    '1',    '3',    '3',    '3',    '3',    '3',    '3',    '3',    '3',    '3',    '5',    '5',    '5',    '5',    '5',    '5',    '5',    '5',    '5',    '12',   '12',   '12',   '12',   '12',   '12',   '12',   '12',   '12',   '24',   '24',   '24',   '24',   '24',   '24',   '24',   '24',   '24',   '48',   '48',   '48',   '48',   '48',   '48',   '48',   '48',   '48',   '96',   '96',   '96',   '96',   '96',   '96',   '96',   '96',   '96',   '168',  '168',  '168',  '168',  '168',  '168',  '168',  '168',  '168',  '336',  '336',  '336',  '336',  '336',  '336',  '336',  '336',  '336',  '504',  '504',  '504',  '504',  '504',  '504',  '504',  '504',  '504',  '720',  '720',  '720',  '720',  '720',  '720',  '720',  '720',  '720',)
log.copies <- c('1.45521893628741', '1.45521893628741', '1.45521893628741', '1.45521893628741', '1.45521893628741', '1.45521893628741', '1.45521893628741', '1.45521893628741', '1.45521893628741', '1.19441603388691', '1.24355334723342', '1.40690849045049', '1.31002191752687', '1.39124834261194', '1.29235753729037', '1.11936230018844', '1.1374556403934',  '1.39758382302761', '1.68018850963636', '1.25191207562977', '1.23641746512146', '1.39679718172058', '1.38090501093842', '1.24605609103987', '1.21030026698775', '1.19544292124108', '1.17601950399624', '1.31273309142159', '1.26185528648986', '1.25966627646237', '1.31215362175609', '1.1797649397335',  '1.06962991082736', '1.06648871526762', '1.01999877463483', '1.0713647526277',  '1.33727512803554', '1.32714821681501', '1.36095600888816', '1.20806814685821', '1.01506836010089', '1.16512723872603', '1.26641479125218', '1.09785021474546', '1.13833983315088', '1.23033598728608', '1.40813602545636', '1.4646125625919',  '1.00258079576351', '1.17601950399624', '1.06788775208508', '1.04454452321033', '1.05059619746453', '1.03045177956328', '1.37430638458471', '1.41452140821272', '1.75913091767113', '0.77346280655772', '0.767002923329097',    '1.01188172732087', '0.74623404934399', '0.944256557003784',    '1.05150543231433', '1.2251192503473',  '1.29174874539612', '1.43997134954361', '-0.0296134397450946',  '-0.286448979024821',   '-0.156823792139869',   '0.839166900510674',    '0.724365376990744',    '0.678610613822442',    '0.786777342262311',    '1.29478377172835', '1.37297391495997', '-0.82434417051388',    '-0.82434417051388',    '-0.496558364627612',   '0.0829498390130417',   '0.427489632415672',    '0.158891649802325',    '1.49438314328631', '0.729149969776687',    '0.819893243439029',    '-0.82434417051388',    '-1.1900799975662', '-1.1900799975662', '-0.717704531413791',   '-0.398265926539503',   '-0.362000706739917',   '-0.196897956423786',   '0.64307632093276', '0.675913769256037',    '-0.717704531413791',   '-1.1900799975662', '-1.1900799975662', '0.0555326061931463',   '-0.822304338094534',   '-0.507174381784299',   '0.180394322691316',    '0.640136342503974',    '0.393190666987189',    '-1.1900799975662', '-0.967508136400586',   '-0.967508136400586',   '-0.823663106508838',   '-0.967990407588446',   '-0.967026427628234',   '0.171315600631744',    '0.171315600631744',    '0.171315600631744',    '0.171315600631744',    '0.171315600631744',    '0.171315600631744',    '0.171315600631744',    '0.171315600631744',    '0.171315600631744',    '-0.386292183084873',   '-0.0999771866567739',  '0.0867221503952878',   '-0.525277163372224',   '0.00383206432719449',  '-0.0999771866567739',  '-0.199592846032811',   '0.047352829483974',    '0.408978520878843',    '-0.386292183084873',   '-0.617450004710004',   '-0.525277163372224',   '-0.0714786306395753',  '-0.163651471977355',   '0.171315600631744',    '0.067506349647776',    '-0.0448200118538059',  '0.047352829483974',    '-0.617450004710004',   '-0.617450004710004',   '-0.525277163372224',   '-0.331135008281905',   '0.0261653532722218',   '-0.0197780887736235',  '-0.0448200118538059',  '-0.163651471977355',   '-0.0197780887736235',  '-0.282482932100905',   '-0.163651471977355',   '-0.903765001138104',   '-0.331135008281905',   '-0.199592846032811',   '-0.238962166944125',   '-0.525277163372224',   '-0.386292183084873',   '-0.386292183084873',   '-1.1900799975662', '-0.903765001138104',   '-0.903765001138104',   '-1.1900799975662', '-1.1900799975662', '-0.903765001138104',   '-0.903765001138104',   '-0.903765001138104',   '-1.1900799975662', '-1.1900799975662', '-0.903765001138104',   '-1.1900799975662', '-1.1900799975662', '-0.903765001138104',   '-0.903765001138104',   '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-0.903765001138104',   '-0.903765001138104',   '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-0.903765001138104',   '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-0.903765001138104',   '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-0.903765001138104',   '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-0.903765001138104',   '-0.903765001138104',   '-0.903765001138104',   '-1.1900799975662', '-0.903765001138104',   '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-1.1900799975662', '-0.903765001138104',   '-1.1900799975662')
sample <- c('4eDNA',    '4eDNA',    '4eDNA',    '7eDNA',    '7eDNA',    '7eDNA',    '10eDNA',   '10eDNA',   '10eDNA',   '4eDNA',    '4eDNA',    '4eDNA',    '7eDNA',    '7eDNA',    '7eDNA',    '10eDNA',   '10eDNA',   '10eDNA',   '4eDNA',    '4eDNA',    '4eDNA',    '7eDNA',    '7eDNA',    '7eDNA',    '10eDNA',   '10eDNA',   '10eDNA',   '4eDNA',    '4eDNA',    '4eDNA',    '7eDNA',    '7eDNA',    '7eDNA',    '10eDNA',   '10eDNA',   '10eDNA',   '4eDNA',    '4eDNA',    '4eDNA',    '7eDNA',    '7eDNA',    '7eDNA',    '10eDNA',   '10eDNA',   '10eDNA',   '4eDNA',    '4eDNA',    '4eDNA',    '7eDNA',    '7eDNA',    '7eDNA',    '10eDNA',   '10eDNA',   '10eDNA',   '4eDNA',    '4eDNA',    '4eDNA',    '7eDNA',    '7eDNA',    '7eDNA',    '10eDNA',   '10eDNA',   '10eDNA',   '4eDNA',    '4eDNA',    '4eDNA',    '7eDNA',    '7eDNA',    '7eDNA',    '10eDNA',   '10eDNA',   '10eDNA',   '4eDNA',    '4eDNA',    '4eDNA',    '7eDNA',    '7eDNA',    '7eDNA',    '10eDNA',   '10eDNA',   '10eDNA',   '4eDNA',    '4eDNA',    '4eDNA',    '7eDNA',    '7eDNA',    '7eDNA',    '10eDNA',   '10eDNA',   '10eDNA',   '4eDNA',    '4eDNA',    '4eDNA',    '7eDNA',    '7eDNA',    '7eDNA',    '10eDNA',   '10eDNA',   '10eDNA',   '4eDNA',    '4eDNA',    '4eDNA',    '7eDNA',    '7eDNA',    '7eDNA',    '10eDNA',   '10eDNA',   '10eDNA',   '4eRNA',    '4eRNA',    '4eRNA',    '7eRNA',    '7eRNA',    '7eRNA',    '10eRNA',   '10eRNA',   '10eRNA',   '4eRNA',    '4eRNA',    '4eRNA',    '7eRNA',    '7eRNA',    '7eRNA',    '10eRNA',   '10eRNA',   '10eRNA',   '4eRNA',    '4eRNA',    '4eRNA',    '7eRNA',    '7eRNA',    '7eRNA',    '10eRNA',   '10eRNA',   '10eRNA',   '4eRNA',    '4eRNA',    '4eRNA',    '7eRNA',    '7eRNA',    '7eRNA',    '10eRNA',   '10eRNA',   '10eRNA',   '4eRNA',    '4eRNA',    '4eRNA',    '7eRNA',    '7eRNA',    '7eRNA',    '10eRNA',   '10eRNA',   '10eRNA',   '4eRNA',    '4eRNA',    '4eRNA',    '7eRNA',    '7eRNA',    '7eRNA',    '10eRNA',   '10eRNA',   '10eRNA',   '4eRNA',    '4eRNA',    '4eRNA',    '7eRNA',    '7eRNA',    '7eRNA',    '10eRNA',   '10eRNA',   '10eRNA',   '4eRNA',    '4eRNA',    '4eRNA',    '7eRNA',    '7eRNA',    '7eRNA',    '10eRNA',   '10eRNA',   '10eRNA',   '4eRNA',    '4eRNA',    '4eRNA',    '7eRNA',    '7eRNA',    '7eRNA',    '10eRNA',   '10eRNA',   '10eRNA',   '4eRNA',    '4eRNA',    '4eRNA',    '7eRNA',    '7eRNA',    '7eRNA',    '10eRNA',   '10eRNA',   '10eRNA',   '4eRNA',    '4eRNA',    '4eRNA',    '7eRNA',    '7eRNA',    '7eRNA',    '10eRNA',   '10eRNA',   '10eRNA',   '4eRNA',    '4eRNA',    '4eRNA',    '7eRNA',    '7eRNA',    '7eRNA',    '10eRNA',   '10eRNA',   '10eRNA')

What I'd like to find now is the slope of each of the "samples", e.g. the slope of the log.copies of 4eRNA against hours, etc.

Here's an example of what I'm looking for (these are just dummy values)

1


Solution

After cleaning up the data a bit (converting to numeric):

d <- data.frame(log.copies, sample, hours, uniqueID)
library(lme4)
## use `0 + sample + sample:hours` to fit intercept & slope
##  separately for each sample
model1 <- lmer(log.copies ~ 0 + sample + sample:hours + (1 | uniqueID), d)
## fixed-effect coefficient table
cc <- coef(summary(model1))
cc[grepl("hours", rownames(cc)),]

Results:

                       Estimate   Std. Error    t value
sample10eDNA:hours -0.003205399 0.0003059636 -10.476405
sample10eRNA:hours -0.001238915 0.0003059636  -4.049224
sample4eDNA:hours  -0.001525579 0.0003059636  -4.986143
sample4eRNA:hours  -0.001060511 0.0003059636  -3.466135
sample7eDNA:hours  -0.003824935 0.0003059636 -12.501274
sample7eRNA:hours  -0.001456568 0.0003059636  -4.760591

If you load lmerTest after or instead of lme4 your table will include df and Pr(>|t|) columns as in your example.

To get the specific table you're looking for, I think you want

emmeans::emtrends(model1, ~sample, var="hours")
 sample hours.trend       SE  df lower.CL  upper.CL
 10eDNA    -0.00321 0.000306 198 -0.00381 -0.002602
 10eRNA    -0.00124 0.000306 198 -0.00184 -0.000636
 4eDNA     -0.00153 0.000306 198 -0.00213 -0.000922
 4eRNA     -0.00106 0.000306 198 -0.00166 -0.000457
 7eDNA     -0.00382 0.000306 198 -0.00443 -0.003222
 7eRNA     -0.00146 0.000306 198 -0.00206 -0.000853

Note that the estimates are the same up to rounding as the previous table, although emmeans is more convenient because it computes CI for you and you can specify the model any way you want, e.g. as ~ sample*hours + (1|uniqueID), and still extract the same table.



Answered By - Ben Bolker
Answer Checked By - Robin (PHPFixing Admin)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Older Posts Home
View mobile version

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
All Comments
Atom
All Comments

Copyright © PHPFixing