Employment status of women 40+ on the Sunshine Coast, Qld

R

Summary: of the women who are aged 40 or more years old and live on the Sunshine Coast, that are employed by a company that they do not own, majority are between 40-44 years old. Additionally, if a woman lives on the Sunshine Coast and is over 40 years old, it is likely that she will be working full time.

Data source: Australian Bureau of Statistics (ABS) 2022, 2021 Census of Population and Housing, working population profile of the Sunshine Coast. https://www.abs.gov.au/census/find-census-data/community-profiles/2021/316

Descriptive statistics & mode

> full_time<-read.table("Full_Time.csv", header=TRUE, sep=',')

> barplot(full_time[ ,2],

+ names.arg=full_time[ ,1],

+ col=terrain.colors(8),

+ main="Employment Status of Women over 40 on the Sunshine Coast, Qld")

df<-read.table("Employment_status.csv", header=TRUE, sep=',')

> getmode<- function(full){

+ unique<-unique(full)

+ unique[which.max(tabulate(match(full,unique)))]}

> result<-getmode(full)

> print(result)

[1] 3803

> age<-df$X

> result<-getmode(age)

> print(result)

[1] "40-44 years"

Bivariate analysis - is there a correlation between being an employee and full time employed?

> master<-read.csv("Master_data.csv")

> input<-scatter[,c('Employee', 'Worked.full.time')]

> print(head(input))

Employee Worked.full.time

1 6978 3803

2 7866 4624

3 7636 4548

4 6917 3678

5 4737 2106

6 1783 677

> plot(x=input$Employee, y=input$Worked.full.time,

+ xlab="Employee",

+ ylab="Worked Full Time",

+ main="If they work for a company, how many work full time")

Kmeans

> set.seed(111)

> scatter_scaled<-scale(scatter)

> scatter_kmeans <- kmeans(scatter_scaled, 3, nstart=25)

> scatter_kmeans

K-means clustering with 3 clusters of sizes 3, 4, 1

Cluster means:

Employee Worked.full.time

1 -1.14500412 -1.1053132

2 0.84530726 0.8738486

3 0.05378333 -0.1794545

Clustering vector:

[1] 2 2 2 2 3 1 1 1

Within cluster sum of squares by cluster:

[1] 0.2019506 0.2520749 0.0000000

(between_SS / total_SS = 96.8 %)

Available components:

[1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size" "iter" "ifault"

> fviz_cluster(scatter_kmeans,

+ data = scatter_scaled,

+ palette = c("#4C424D", "#00AFBB", "#ffd700"),

+ geom = "point",

+ ellipse.type = "convex",

+ ggtheme = theme_minimal())

Linear regression & predictions

> linear<-lm(Employee ~ Worked.full.time, data = scatter)

> linear

Call:

lm(formula = Employee ~ Worked.full.time, data = scatter)

Coefficients:

(Intercept) Worked.full.time

444.005 1.675

> summary(linear)

Call:

lm(formula = Employee ~ Worked.full.time, data = scatter)

Residuals:

Min 1Q Median 3Q Max

-432.82 -350.49 -47.71 231.35 764.71

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 444.0048 274.9262 1.615 0.157

Worked.full.time 1.6753 0.0898 18.656 1.53e-06 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 464.1 on 6 degrees of freedom

Multiple R-squared: 0.9831, Adjusted R-squared: 0.9802

F-statistic: 348 on 1 and 6 DF, p-value: 1.531e-06

> abline(linear, col="purple")

> predict(linear, data.frame(Worked.full.time=6))

1

454.0569

> abline(v=0.8, col="green")