Employment status of women 40+ on the Sunshine Coast, Qld
R
Summary: of the women who are aged 40 or more years old and live on the Sunshine Coast, that are employed by a company that they do not own, majority are between 40-44 years old. Additionally, if a woman lives on the Sunshine Coast and is over 40 years old, it is likely that she will be working full time.
Data source: Australian Bureau of Statistics (ABS) 2022, 2021 Census of Population and Housing, working population profile of the Sunshine Coast. https://www.abs.gov.au/census/find-census-data/community-profiles/2021/316
Descriptive statistics & mode
> full_time<-read.table("Full_Time.csv", header=TRUE, sep=',')
> barplot(full_time[ ,2],
+ names.arg=full_time[ ,1],
+ col=terrain.colors(8),
+ main="Employment Status of Women over 40 on the Sunshine Coast, Qld")
df<-read.table("Employment_status.csv", header=TRUE, sep=',')
> getmode<- function(full){
+ unique<-unique(full)
+ unique[which.max(tabulate(match(full,unique)))]}
> result<-getmode(full)
> print(result)
[1] 3803
> age<-df$X
> result<-getmode(age)
> print(result)
[1] "40-44 years"
Bivariate analysis - is there a correlation between being an employee and full time employed?
> master<-read.csv("Master_data.csv")
> input<-scatter[,c('Employee', 'Worked.full.time')]
> print(head(input))
Employee Worked.full.time
1 6978 3803
2 7866 4624
3 7636 4548
4 6917 3678
5 4737 2106
6 1783 677
> plot(x=input$Employee, y=input$Worked.full.time,
+ xlab="Employee",
+ ylab="Worked Full Time",
+ main="If they work for a company, how many work full time")
Kmeans
> set.seed(111)
> scatter_scaled<-scale(scatter)
> scatter_kmeans <- kmeans(scatter_scaled, 3, nstart=25)
> scatter_kmeans
K-means clustering with 3 clusters of sizes 3, 4, 1
Cluster means:
Employee Worked.full.time
1 -1.14500412 -1.1053132
2 0.84530726 0.8738486
3 0.05378333 -0.1794545
Clustering vector:
[1] 2 2 2 2 3 1 1 1
Within cluster sum of squares by cluster:
[1] 0.2019506 0.2520749 0.0000000
(between_SS / total_SS = 96.8 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size" "iter" "ifault"
> fviz_cluster(scatter_kmeans,
+ data = scatter_scaled,
+ palette = c("#4C424D", "#00AFBB", "#ffd700"),
+ geom = "point",
+ ellipse.type = "convex",
+ ggtheme = theme_minimal())
Linear regression & predictions
> linear<-lm(Employee ~ Worked.full.time, data = scatter)
> linear
Call:
lm(formula = Employee ~ Worked.full.time, data = scatter)
Coefficients:
(Intercept) Worked.full.time
444.005 1.675
> summary(linear)
Call:
lm(formula = Employee ~ Worked.full.time, data = scatter)
Residuals:
Min 1Q Median 3Q Max
-432.82 -350.49 -47.71 231.35 764.71
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 444.0048 274.9262 1.615 0.157
Worked.full.time 1.6753 0.0898 18.656 1.53e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 464.1 on 6 degrees of freedom
Multiple R-squared: 0.9831, Adjusted R-squared: 0.9802
F-statistic: 348 on 1 and 6 DF, p-value: 1.531e-06
> abline(linear, col="purple")
> predict(linear, data.frame(Worked.full.time=6))
1
454.0569
> abline(v=0.8, col="green")