0% found this document useful (0 votes)

30 views90 pages

Uow 272261

Visa processing charges

Uploaded by

sharma.pranshu2388

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views90 pages

Uow 272261

Visa processing charges

Uploaded by

sharma.pranshu2388

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 90

Which stats package

should I use?
Data and Decision Science Network
Part of the UOW Data and Decision Science Initiative
Marijka Batterham, Bradley Wakefield, Alberto Nettel-Aguirre
Outline
Which Stats package should I use?
• Introductions
• Data and Decision Science Network – why are we giving this talk?
• The first step – Why do you need the stats package? What is your research question?
• How many stats packages are there?
• Let’s have a look at
• SPSS
• R (Rstudio)
• STATA
• Jamovi
• SAS
• Python
• Excel
• Comparison – t test
• How will you choose?
• Take home message
• Where to from here?
Introductions
• Professor Marijka Batterham • Brad Wakefield • Professor Alberto Nettel-Aguirre
• Co-Ordinator Data & Decision • Statistical Consultant in the Stats • Director CHSA
Science Initiative Consulting Centre.
• Crusader for correct use and
• Director NIASRA • Rstudio is my go-to but understanding of biostatistics
commonly use other packages in
• Director Stats Consulting teaching and consulting. • Enjoys teaching stats-without-
Centre
• Interests in data privacy, pain to other disciplines
• Passionate about data literacy probability theory, statistical
• R/Rstudio preference,
• Use RStudio/SPSS most often inference, and data analytics.
STATA/SPSS due to
• Favourite analysis: logistic • Passion for ethical applications of collaborations.
regression data science methods in research
and industry. • Python exposure due to
• Mostly use: mixed models consultancy work
• Enjoys learning and collaborating
• Likes learning machine with other disciplines and solving
learning/data mining & • Always wanting to learn and try
real-world problems.
exploring new packages new techniques
• Always up for a chat.
UOW Data & Decision Science Initiative
• The Data and Decision Science Initiative is part of the UOW
strategic Plan (2.5 Transformative technologies)
• Developed from a 2019 review and recommendations of “Big Data”
and Health Informatics at UOW
• Updated to reflect UOW in 2021, presented at VCAG in April
commenced July 2021

Data Science is the extraction of actionable

knowledge directly from data through a process of Domain
discovery, or hypothesis formation and hypothesis knowledge
testing

Data
Science

Statistics Computer
science
Data & Decision Science Initiative
four key areas of focus
Research: virtual network and working groups of Data and Decision Science researchers
• Focal point for coordinating the development of Data Science at UOW
• Composed of researchers actively using or interested in Data Science methods
• Themed meetings emphasising translation: Data and Decision Science Network (DDSN)
• Strategically collaborations through the DDSI give a competitive advantage in translation
Education: Training in data science and reproducibility of research.
• Internal and external training and education in data science
• Upskilling research students & staff (particularly ECRs) in data & decision science methods
• Workshops (GRS, Statistical Consulting Centre)
T shaped graduates: Reviewing service subjects to refocus on data science.
• Review of service subjects in statistics and quantitative methods offered through SMAS to give data
science focus
• Graduates literate in data science and reproducible research
External/Industry engagement: Capitalising on existing links
• Provide enhanced opportunities for external engagement
Choosing a Stats Package
• Why do you need a stats package?
• Data manipulation
• Descriptive statistics
• Visualisation
• Modeling/inference
• teaching
• What is your research question?
• What does your data look like?
What is your research question?
• Are you describing a sample/population?
• Are you looking at differences or relationships between groups?
• Is visualization (pretty figures and graphs) important?
• Are you analysing survey data?
• Are you investigating a change over time?
• Is there missing data?
• Is there clustered/multilevel data?
• Will your model be complicated (non linearities, assumption violations,
interactions)
What does your data look like?

• How big is your dataset?

• Can you do it on your laptop? Do you need special computing facilities,
high performance computing.
• Do you have to link datasets?
• Is your data complicated (linked, relational, administrative)
• Are you going to be working on this project for a long time?
• Are you working with collaborators in other schools, disciplines, UOW
(different packages)?
• Are you teaching with stats software?
How many Stats packages are there?
Stats/Data Science packages open •
•
OpenBUGS
OpenEpi – epidemiology and statistics

• ADaMSoft – a generalized statistical software with data mining and data • OpenNN – neural networks,, deep learning
management
• OpenMx – A package for structural equation modeling running in R
• ADMB – non-linear statistical modeling (programming language)
• Chronux – for neurobiological time series data • Orange, a data mining, machine learning, and bioinformatics software
• DAP – free replacement for SAS • Pandas – High-performance computing (HPC) data analysis tools
for Python in Python and Cython (statsmodels, scikit-learn)
• Environment for DeveLoping KDD-Applications Supported by Index-
Structures (ELKI) data mining in Java • Perl Data Language – Scientific computing with Perl
• Epi Info – statistical software for epidemiology developed by the CDC] • Ploticus – software for generating a variety of graphs from raw data
• Fityk – nonlinear regression software
• PSPP – A free software alternative to IBM SPSS Statistics
• GNU Octave – programming language very similar to MATLAB with statistical
features • R – free implementation of the S (programming language)
• gretl – gnu regression, econometrics and time-series library • ROOT – data storage, processing and analysis, developed by CERN and
used to find the Higgs boson
• intrinsic Noise Analyzer (iNA) – analyzing intrinsic fluctuations in biochemical
systems • Salstat – menu-driven statistics software
• JASP – A free software alternative to IBM SPSS Statistics with additional option for • SciPy – Python library for scientific computing
Bayesian methods
• scikit-learn – extends SciPy with a host of machine learning
• Just another Gibbs sampler (JAGS) – a program for analyzing Bayesian models (classification, clustering, regression, etc.)
hierarchical models
• statsmodels – extends SciPy with statistical models and tests
• JMulTi – For econometric analysis, univariate and multivariate time series analysis
• Shogun, large-scale machine learning toolbox that provides several SVM
• KNIME – analytics platform built with Java and Eclipse using modular data pipeline (Support Vector Machine) implementations
workflows
• Simfit – simulation, curve fitting, statistics, and plotting
• LIBSVM – C++ support vector machine libraries
• SOFA Statistics – desktop GUI program focused on ease of use, learn as
• mlpack – open-source library for machine learning you go, and beautiful output
• Mondrian – data analysis & interactive statistical graphics with a link to R • Stan (software) – open-source package for obtaining Bayesian inference
• Neurophysiological Biomarker Toolbox – data-mining of neurophysiological • Statistical Lab – R-based and focusing on educational purposes
biomarkers
• TOPCAT – graphical analysis and manipulation package for astronomers.
• Torch – a deep learning software library written in Lua
Source: Wikipedia • Weka – machine learning software
Some proprietary stats packages
• Alteryx – statistical models; R and Python integration • MaxStat Pro – general statistical software
• SigmaStat – package for group analysis
• Analytica – visual analytics and statistics package • MedCalc – for biomedical sciences
• SmartPLS – partial least squares path modeling (PLS)
• Angoss – data mining algorithms • Microfit – econometrics package, time series
• SOCR – teaching statistics and probability theory
• ASReml – for restricted maximum likelihood analyses • Minitab – general statistics package
• Speakeasy – statistical and econometric analysis features
• BMDP – general statistics package • MLwiN – multilevel models (free to UK academics)
• SPSS Modeler – data mining and text analytics workbench
• DataGraph – visual analysis and regression • Nacsport Video Analysis Software – analysing sports
• SPSS Statistics – comprehensive statistics package
• DB Lytix – 800+ in-database models • NAG Numerical Library – math and statistics library
• Stata – comprehensive statistics package
• EViews – for econometric analysis • Neural Designer – commercial deep learning package
• NCSS – general statistics package • StatCrunch – comprehensive statistics package
• FAME (database) – managing time-series databases
• NLOGIT – statistics and econometrics package • Statgraphics – general statistics package
• GAUSS – programming language for statistics
• nQuery Sample Size Software – Sample Size/Power • Statistica – comprehensive statistics package
• Genedata –experimental data in life science R&D
• O-Matrix – programming language • StatsDirect – statistics for public health, health science
• GenStat – general statistics package
• GLIM – generalized linear models • OriginPro – statistics and graphing, • StatXact – exact nonparametric and parametric statistics

• GraphPad InStat – • PASS Sample Size Software (PASS) – power/sample size • Systat – general statistics package
• GraphPad Prism – biostatistics nonlinear regression • Plotly – plotting library fo R, Python, MATLAB, Julia, Perl • SuperCROSS – comprehensive statistics package
• IMSL Numerical Libraries – software library • Primer-E Primer – environmental and ecological specific • S-PLUS – general statistics package
• JMP – visual analysis and statistics package • PV-WAVE – data analysis/visualization • Unistat – general statistics package
• LIMDEP – statistics and econometrics • Qlucore Omics Explorer – data analysis software • The Unscrambler – multivariate analysis
• LISREL – structural equation modeling • RapidMiner – machine learning toolbox
• WarpPLS – structural equation modeling
• Maple – programming language with statistical features • Regression Analysis of Time Series (RATS) – econometrics
• Wolfram Language[6] – some statistical capabilities
• Mathematica – some statistical features • SAS (software) – comprehensive statistical package
• World Programming System (WPS) – supports use
• MATLAB – programming language with statistics • SHAZAM– econometrics and statistics package of Python, R and SAS within single user program.
• Simul – econometric tool multidimensional modeling • XploRe
Source: Wikipedia
Commonly used packages at UOW
IBM® SPSS® Statistics
• IBM commercial product
• Statistical Package for the Social Sciences, first released 1968
• Widely used in teaching
• Many online resources
• Menu driven GUI
• Good for standard and most common advanced methods
• Nice missing data/multiple imputation options
• Bayesian analyses
• New meta-analysis capacity in V28
• New workbook facility incorporating syntax in version 28.
• https://www.ibm.com/products/spss-statistics
IBM® SPSS® Statistics - cons

• Menu use is repetitive and time consuming, switch to syntax if using

frequently
• Outputs everything for some analyses, can be overwhelming
• Graphing capacity is limited but editable.
• Major interface change is currently in beta testing
R (R Studio) Pros
• Free and open source
© https://www.r-project.org/logo/
• Released in 1995 developed by Ross Ihaka and Robert Gentleman at the
University of Auckland, based on the S Plus software package
• Relies on active user community to develop and maintain discipline specific
packages
• R open source programming language designed for statistical analysis
• R is not often used stand alone, ubiquitously used through and Integrated
Development Environment (IDE) R Studio most widely used, there are others eg
EMACS.
• R Studio has a commercial arm which supports business and funds the free
development.
• Extensive standard and advanced statistical methods
• Constantly increasing number of statistical packages, discipline specific
packages
• You can develop your own statistical package
• Encourages reproducible research
R (R Studio) Cons
• Steep learning curve
• Dependencies, relies on user community to maintain packages
• Some packages dependent on others may cease working
• Work arounds require advanced knowledge (note that can at least save the
versions used as part of RR)
• Changing constantly – stay up to date
• Base R versus the Tidyverse (two ways to use R)
R User interface packages
• There are many of these
• Jamovi https://www.jamovi.org/
• JASP https://jasp-stats.org/
• BlueSky https://blueskystatistics.com
• Rcommander https://socialsciences.mcmaster.ca/jfox/Misc/Rcmdr/
• R-Instat www.r-instat.org
• Deducer www.deducer.org
• RKWard https://rkward.kde.org/
• Rattle https://rkward.kde.org/
• RAF https://r.analyticflow.com/en/
www.stata.com

• First released in 1985, StataCorp, Bill Gould

• Code driven and GUI
• Reasonably priced for academics/students
• Good for standard and many advanced methods including robust analysis
• Nice for survey analysis, many Australian surveys have STATA code for weighting
• Nice for meta-analysis (more advanced options in R)
• Some user written ado files eg stepped wedge designs
• Multiple imputation for missing data
• SEM
• Used extensively in epidemiology, public health, social science
STATA® - Cons

• Learning the code

• Menu driven options not as user friendly as other packages
• Only used in some discipline areas
Pros
• Open source project, 2 of 3 founders are Australian
• Looks like SPSS
• Has good support/longevity
• Nice modules for analyses commonly performed procedures,
• immediately visible output
• Great for introductory teaching
• Has free online textbook, many online resources
• Used in teaching at UOW
Jamovi -Cons
• Output to pdf, html
• Output to word through editing pdfs
• Unable to edit graphs
• Dependent on existing modules (this is constantly increasing)
• Currently no machine learning, AI modules
www.sas.com
Pros
• commercial product developed from 1966-76, SAS Institute Inc.
• Statistical Analysis System
• AI business market focus
• Substantial investment in AI capacity
• Planning for IPO listing in 2024
• Gold standard for pharmaceutical trials and governments (? R use increasing)
• Runs on code line using DATA and PROC statements
• Available free for academic use if used in the cloud
• SAS® OnDemand for academics
• SAS Studio has pull down menu options
SAS® - Cons
• Menu driven SAS Studio not as intuitive as other packages
• Learning the code, unique to SAS
• Still currently used in health, pharma and business. Undergoing
generational change as new analysts come through
Pros
• Open source general-purpose programming language, cross-platform
• Released in 1989, developed by Guido van Rossum at Centrum Wiskunde & Informatica in the
Netherlands named after Monty Python.
• Nice interface with low-level languages and GPU acceleration.
• Used extensively in web development.
• Gained popularity for machine learning/ deep learning/data science
• Supported by user community
• Used in scalable production environments
• Many libraries are available
• Can be used in IDE and others like Jupyter: Web-based, interactive computing notebook
environment.
• Can be run local or server (Google colaboratory)
• No braces, no semi-colons, indentation is used to structure code.
• Not so steep learning curve
• Currently Number 1 programming language https://www.tiobe.com/tiobe-index/
Python - Cons
• Not as many stats packages as R (though many general programming
packages)
• No braces, no semi-colons, indentation is used to structure code.
• Reading someone else's code can be tough for beginners
• Visualizations possible but not as good as others ( R )
• High memory usage
Do Not Use Excel for Data Analysis

• No capacity to store code of changes made

• Analysis is not reproducible
• Formulas are hidden in cells and can be accidentally overwritten
• Easy to accidentally change numbers and no way to trace this
• Limited data size
• Encourages chartjunk
• “Friends don’t let Friends use excel for statistics”
(Cryer, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.617.4297&rep=rep1&type=pdf )
Demonstration of t test in different packages
Research question
Sample dataset compares Body Mass Index BMI(kg/m2) between people with
and without diabetes. Simulated from the Pima Indian Dataset. Dabelea et al.
Journal of Maternal Fetal Medicine 2000;9:83-88.
BMI is continuous and reasonable to assume normally distributed,
Diabetes is categorical(binary) 0= no diabetes, 1=diabetes
Research question: Is there a significant difference in mean BMI between
those with and those without diabetes?
T test procedure

1. Always, Always, Always plot your data – side by side boxplots

2. Check assumptions
• equality of variance (Levene’s test)
• Normality of groups (Shapiro-Wilk)
3. Perform t test
One way it might be written for a paper Group number Mean(SD)
BMI kg/m2
• Methods: Data analysis
• An independent two-sided t test was No 223 31.21(5.68)
used to determine if the difference in diabetes
BMI between those with and those
without diabetes was statistically Diabetes 109 35.86(5.01)
significant. Assumptions were tested
visually prior to analysis, homogeneity
of variance was assessed using the
Levene’s test and normality using the
Shapiro-Wilk test. An alpha level <
0.05 was considered statistically
significant. Data was analysed using
(Stats package, Version, company)
• Results:
• Report difference (CI or SD), t
statistics and df.
• The mean difference in BMI between
those with and without diabetes was
4.65(SD 5.47)kg/m2, (CI 3.39,5.90),
t=7.27(df=330), P<0.001.
• Performing the t test in
SPSS,
Rstudio,
STATA,
Jamovi,
SAS and
Python
Package Good for

Jamovi Teaching, infrequent use of stats (easy to pick up again if you have a break),
basic analysis some advanced methods, easy to learn, good default outputs

Python Machine learning, AI, in demand skill, regular users, good for research
collaboration and integration to web platforms, regular user
Rstudio Data manipulation, visualisation, advanced analysis, in demand skill,
reproducible research, advanced missing data options, regular users
SAS Good overall package for most standard and many advanced methods, regular
user, big data, good for pharma and govt
SPSS Good overall package for most standard and many advanced methods, easy to
learn, infrequent use
STATA Good overall package, has many useful advanced procedures, Used regularly
in some professions, particularly good for survey analysis, meta analysis, SEM
Take home message
• If you are analysing data, it is likely you will need to use more than one
package during your career
• All packages are changing significantly over time as more methods
become available and computing power increases
• If you publish - Learn a package that encourages reproducible research
• To have a competitive advantage now for your career use R (or Python)
• If you don’t use stats much stick to what you know, and ask a
professional
• Regardless of the package you will need to understand the stats to
perform and interpret the output.
• If doing advanced, specialized or machine learning methods “best”
package will depend on ease of use for that analysis. Some packages do
not have advanced methods eg SPSS does not have a menu option for
Generalised Additive Models GAMs (can access this through the R plugin
in SPSS), JAMOVI has only the modules developed.
SPSS
RStudio
Write your code in the Script window
Select Run or CTRL + ENTER to run your code.
INSTALLING PACKAGES
• When using R and RStudio you may need to install packages in order to
run the analyses.

• Whilst many functions are included in base R, installing packages is easy!

• To install packages all you need to do is call

install.packages("package-name")
R will look online, download it, and install it for you.
install.packages('tidyverse')
install.packages('car’)

• You will need to load the library to use it!

library(tidyverse)
READING IN YOUR DATA
Loading in your data is pretty straightforward!
data <- read_csv("pathto/ttestdiabetes.csv")

Or you can just find it in your files window in RStudio!

THE DATA IS NOW LOADED
data <- read_csv("ttestdiabetes.csv")
head(data)
## # A tibble: 6 x 8
## npreg gluc bp skin ped age Diabetes BMI
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 5.72 80 11 0.491 22 0 17.9
## 2 1 5.11 62 25 0.482 25 0 17.9
## 3 1 5.55 74 12 0.149 28 0 18.3
## 4 1 5.27 66 13 0.334 25 0 18.5
## 5 6 7.16 90 7 0.582 60 0 20.2
## 6 0 5.83 68 22 0.236 22 0 20.5
data$Diabetes <- factor(data$Diabetes)
levels(data$Diabetes) <- c("No Diabetes","Diabetes")
OBTAINING A BOXPLOT
boxplot(BMI ~ Diabetes,data = data,col = c('blue'))
OBTAINING A BOXPLOT (GGPLOT2)
ggplot(data)+ geom_boxplot(aes(x=Diabetes,y=BMI))
WHAT ABOUT YOUR DESCRIPTIVES?
You can go line by line…
data0 <- filter(data,Diabetes == "No Diabetes")
data1 <- filter(data,Diabetes == "Diabetes")

mean(data0$BMI)
## [1] 31.21093
sd(data0$BMI)
## [1] 5.675752
summary(data0$BMI)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 17.89 27.33 31.21 31.21 34.88 49.05
mean(data1$BMI)
## [1] 35.85649
sd(data1$BMI)
## [1] 5.009008
summary(data1$BMI)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21.28 32.41 35.94 35.86 38.96 47.72
WHAT ABOUT YOUR DESCRIPTIVES?
But there are always multiple ways in R…
data <- group_by(.data=data,Diabetes)
summarise(.data=data,
Avg = mean(BMI),Std_Dev = sd(BMI),Min = min(BMI),
Max = max(BMI),Median = median(BMI),Q1 = quantile(BMI,0.25),
Q3 = quantile(BMI,0.75))
## # A tibble: 2 x 8
## Diabetes Avg Std_Dev Min Max Median Q1 Q3
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 No Diabetes 31.2 5.68 17.9 49.0 31.2 27.3 34.9
## 2 Diabetes 35.9 5.01 21.3 47.7 35.9 32.4 39.0
data %>% group_by(Diabetes) %>%
summarise(N = length(BMI), Avg = mean(BMI), Std_Dev = sd(BMI),
Min = min(BMI), Max = max(BMI), Median = median(BMI),
Q1 = quantile(BMI,0.25), Q3 = quantile(BMI,0.75))
CHECKING ASSUMPTIONS
You only get out what you ask… Here’s the Shapiro Wilk Test
shapiro.test(data0$BMI)
##
## Shapiro-Wilk normality test
##
## data: data0$BMI
## W = 0.99531, p-value = 0.7283
shapiro.test(data1$BMI)
##
## Shapiro-Wilk normality test
##
## data: data1$BMI
## W = 0.9926, p-value = 0.8256
CHECKING ASSUMPTIONS
But you can always manipulate in a better way!

summarise(.data=group_by(.data=data,Diabetes),
Shapiro_W = shapiro.test(BMI)$statistic,
Shapiro_p = shapiro.test(BMI)$p.value)
## # A tibble: 2 x 3
## Diabetes Shapiro_W Shapiro_p
## <fct> <dbl> <dbl>
## 1 No Diabetes 0.995 0.728
## 2 Diabetes 0.993 0.826
CHECKING ASSUMPTIONS
You’ll need to load a library for the homogeneity of variances check!

library(car)

leveneTest(BMI~Diabetes,data=data)
## Levene's Test for Homogeneity of Variance
(center = median)
## Df F value Pr(>F)
## group 1 2.433 0.1198
## 330
PERFORMING THE T-TEST
Make sure to specify the correct parameters.
t.test(BMI~Diabetes,data=data,var.equal = TRUE)
##
## Two Sample t-test
##
## data: BMI by Diabetes
## t = -7.2715, df = 330, p-value = 2.599e-12
## alternative hypothesis: true difference in means between
group No Diabetes and group Diabetes is not equal to 0
## 95 percent confidence interval:
## -5.902336 -3.388790
## sample estimates:
## mean in group No Diabetes mean in group Diabetes
## 31.21093 35.85649
PERFORMING THE T-TEST
Make sure to specify the correct parameters.
t.test(BMI~Diabetes,data=data,var.equal = TRUE)
##
## Two Sample t-test
##
## data: BMI by Diabetes
## t = -7.2715, df = 330, p-value = 2.599e-12
## alternative hypothesis: true difference in means between
group No Diabetes and group Diabetes is not equal to 0
## 95 percent confidence interval:
## -5.902336 -3.388790
## sample estimates:
## mean in group No Diabetes mean in group Diabetes
## 31.21093 35.85649
PERFORMING THE T-TEST
If you are unsure of what parameters are needed and what their
default values are just ask for help
help("t.test")

t.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)
STATA
STATA syntax
Jamovi
The window shown below should be open.

Tab Bar

Spreadsheet View Results View

Go to variables view and delete the three default variables.
Go to File (≡); Import; Browse and then select your Dataset.
Select the Data Tab and you should see your Data.
QUALITATIVE VARIABLES IN JAMOVI
Categories of qualitative variables are referred to as Levels in jamovi.

Levels can be given

text labels in the
Levels list.
Exploratory Data Analysis is always essential.
You can perform basic EDA with Analyses; Exploration; Descriptives
Results are shown in an editable and dynamic results window.

Variable

Factor
A range of Descriptive statistics can be produced.
Plots are produced immediately and with some customisations.
To perform the T-Test select
Analyses; T-Tests; Independent Samples T-Test
Output is generated as options are selected.
You Assumption Checks can be found and selected easily!
Unpooled (Welch’s), non-parametric (Mann-Whitney U),
confidence intervals, and alternate hypothesis options are all
found in the same window!
To see the code….
Select
Options

Select Syntax
This is R code!
mode

jamovi outputs can be

generated directly in R
with the jmv package
and the code.
SAS Studio
Click on tasks to
open the statistics
menu

Select Data Exploration, click on

the spreadsheet icon to identify
the dataset, click + to add the
analysis variable and the category,
click on plots check the BOX PLOT
click run
Levene’s test through one-way
anova
Normality test done as part of the t test
Python

Statistical Packages
No ratings yet
Statistical Packages
32 pages
SPSS, 2025
No ratings yet
SPSS, 2025
16 pages
List of Statistical Packages
No ratings yet
List of Statistical Packages
9 pages
List of Statistical Packages
No ratings yet
List of Statistical Packages
2 pages
STA 122 Notes Part I
No ratings yet
STA 122 Notes Part I
11 pages
R Programming - An Approach To Data Analytics
No ratings yet
R Programming - An Approach To Data Analytics
402 pages
Course Introduction: Prof. Sourav Saha
No ratings yet
Course Introduction: Prof. Sourav Saha
54 pages
Data Mining Lab 1
No ratings yet
Data Mining Lab 1
16 pages
Introduction To Data Science - 1650687630477
No ratings yet
Introduction To Data Science - 1650687630477
34 pages
Unit 1
No ratings yet
Unit 1
84 pages
Fha Unit 1 Introduction
No ratings yet
Fha Unit 1 Introduction
8 pages
R Programming Text Book
No ratings yet
R Programming Text Book
384 pages
R Programming. An Approach To Data Analytics - G. Sudhamathy, C. Jothi Venkateswaran
91% (11)
R Programming. An Approach To Data Analytics - G. Sudhamathy, C. Jothi Venkateswaran
384 pages
R Proook Pages 1
No ratings yet
R Proook Pages 1
15 pages
MEd Sem II Research Unit 5
No ratings yet
MEd Sem II Research Unit 5
11 pages
Statistical Analysis Overview
No ratings yet
Statistical Analysis Overview
9 pages
Stat Packages
No ratings yet
Stat Packages
50 pages
Statistical Software for Analysts
No ratings yet
Statistical Software for Analysts
5 pages
Introduction - R Programming
100% (1)
Introduction - R Programming
26 pages
Statistical Software - Overview
No ratings yet
Statistical Software - Overview
8 pages
COVID-19 Data Analysis Report
No ratings yet
COVID-19 Data Analysis Report
21 pages
Data Analysis Is A Cornerstone of Modern Social Science Research
No ratings yet
Data Analysis Is A Cornerstone of Modern Social Science Research
43 pages
Statistical Software: An Overview: January 2011
No ratings yet
Statistical Software: An Overview: January 2011
9 pages
Comp Chapter 2
No ratings yet
Comp Chapter 2
9 pages
MODULE (Data Management&Statistical Analysis) PDF
No ratings yet
MODULE (Data Management&Statistical Analysis) PDF
86 pages
Data Visualization
No ratings yet
Data Visualization
25 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
37 pages
Statistical Softwares
No ratings yet
Statistical Softwares
11 pages
Statistical Packages
No ratings yet
Statistical Packages
11 pages
Applications of Softwares in or & Energy Trading
No ratings yet
Applications of Softwares in or & Energy Trading
14 pages
Data Science I: Charles C.N. Wang
No ratings yet
Data Science I: Charles C.N. Wang
68 pages
Comparision of Python, MatLab and R
No ratings yet
Comparision of Python, MatLab and R
18 pages
R Programming
No ratings yet
R Programming
61 pages
Intro of Bi Mba
No ratings yet
Intro of Bi Mba
17 pages
Sta 222 - New (1) - 1-1
No ratings yet
Sta 222 - New (1) - 1-1
25 pages
Unit 2
No ratings yet
Unit 2
26 pages
Data Exam 3
No ratings yet
Data Exam 3
42 pages
Use of Computers in Statistics
No ratings yet
Use of Computers in Statistics
2 pages
From Data To Decisions in Music Education Research Data Analytics and The General Linear Model Using R 1st Edition Brian C. Wesolowski Download
100% (1)
From Data To Decisions in Music Education Research Data Analytics and The General Linear Model Using R 1st Edition Brian C. Wesolowski Download
31 pages
Principles of Data Science WEB 3
No ratings yet
Principles of Data Science WEB 3
30 pages
Applications of Statistical Software For Data Analysis
83% (6)
Applications of Statistical Software For Data Analysis
5 pages
Software in Reasearch
No ratings yet
Software in Reasearch
20 pages
Analysis Complex Samples 131108
No ratings yet
Analysis Complex Samples 131108
31 pages
Unit I - Introduction To R
No ratings yet
Unit I - Introduction To R
21 pages
Spss
No ratings yet
Spss
50 pages
An Introduction To Stata Programming 2nd Edition Christopher F. Baum PDF Version
No ratings yet
An Introduction To Stata Programming 2nd Edition Christopher F. Baum PDF Version
163 pages
Selection of Statistical Software For Solving Big Data Problems: A Guide For Businesses, Students, and Universities
No ratings yet
Selection of Statistical Software For Solving Big Data Problems: A Guide For Businesses, Students, and Universities
12 pages
01-08 Statistics Software
No ratings yet
01-08 Statistics Software
19 pages
01-08 Statistics Software
No ratings yet
01-08 Statistics Software
19 pages
02 - Programming Languages
No ratings yet
02 - Programming Languages
4 pages
Statistical Software for Researchers
No ratings yet
Statistical Software for Researchers
4 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
49 pages
Module 2 Textbook Content
No ratings yet
Module 2 Textbook Content
104 pages
DMDW Lab Report: Data Analytics Branch
No ratings yet
DMDW Lab Report: Data Analytics Branch
51 pages
Data Science
No ratings yet
Data Science
8 pages
Edr 1
No ratings yet
Edr 1
6 pages
Data Science Manual - CSE (UPDATED) PDF
No ratings yet
Data Science Manual - CSE (UPDATED) PDF
60 pages
B Ei
No ratings yet
B Ei
44 pages
Berryman
No ratings yet
Berryman
24 pages
NB 24 Aug
No ratings yet
NB 24 Aug
82 pages
Chapter-1 Introduction of Microsoft Excel: Structure
No ratings yet
Chapter-1 Introduction of Microsoft Excel: Structure
182 pages
Machine Translation MT
No ratings yet
Machine Translation MT
29 pages
Dash Sylvereye: A Webgl-Powered Library For Dashboard-Driven Visualization of Large Street Networks
No ratings yet
Dash Sylvereye: A Webgl-Powered Library For Dashboard-Driven Visualization of Large Street Networks
20 pages
Alshammari 2024 Ijca 923446
No ratings yet
Alshammari 2024 Ijca 923446
6 pages
Book English Version
100% (1)
Book English Version
167 pages
Compute Brochure
No ratings yet
Compute Brochure
4 pages
Maglumi X8 M1012e01 210202
No ratings yet
Maglumi X8 M1012e01 210202
4 pages
c2lc40760d 2
No ratings yet
c2lc40760d 2
4 pages
Block 3
No ratings yet
Block 3
83 pages
Healthcare Service Engineer Profile
No ratings yet
Healthcare Service Engineer Profile
3 pages
Math for Machine Learning Fans
No ratings yet
Math for Machine Learning Fans
433 pages
Book List
No ratings yet
Book List
17 pages
2CS2010303 - Advance Java Programming
No ratings yet
2CS2010303 - Advance Java Programming
3 pages
Support For Resource Constrained Microcontroller Programming by A Broad Developer Community
No ratings yet
Support For Resource Constrained Microcontroller Programming by A Broad Developer Community
240 pages
UC Berkeley Electronic Theses and Dissertations
No ratings yet
UC Berkeley Electronic Theses and Dissertations
95 pages
2020 Acl-Demos 13
No ratings yet
2020 Acl-Demos 13
6 pages
myCobot: Comprehensive Guide
No ratings yet
myCobot: Comprehensive Guide
302 pages
Hong Molei
No ratings yet
Hong Molei
46 pages
Strategic IT Leadership Profile
No ratings yet
Strategic IT Leadership Profile
3 pages
Inner Class
No ratings yet
Inner Class
5 pages
Industrial Training Report
No ratings yet
Industrial Training Report
17 pages
Salinan Dari Copy of Genshin Impact Materials Tracker (By Oble)
No ratings yet
Salinan Dari Copy of Genshin Impact Materials Tracker (By Oble)
242 pages
Lotus Ques & Ans
No ratings yet
Lotus Ques & Ans
40 pages
Course Title: CP Assignment No.3: Bahria University, Islamabad
No ratings yet
Course Title: CP Assignment No.3: Bahria University, Islamabad
5 pages
ISO Implementation Guide
No ratings yet
ISO Implementation Guide
74 pages
ZWindows 10 Keyboard Shortcut
No ratings yet
ZWindows 10 Keyboard Shortcut
25 pages
s7-1500 Techn Data Cpu en PDF
No ratings yet
s7-1500 Techn Data Cpu en PDF
11 pages
Inspect S50: Easy To Use Mainstream SEM Enabling Quick, Accurate Answers
No ratings yet
Inspect S50: Easy To Use Mainstream SEM Enabling Quick, Accurate Answers
4 pages
AI-HPC Is Happening Now
No ratings yet
AI-HPC Is Happening Now
16 pages
Hospital IT System Overview
No ratings yet
Hospital IT System Overview
31 pages
Current Log
No ratings yet
Current Log
55 pages
020 - BCA - 2nd & 4th SEMESTER - REVISED REAPPEAR RESULT - 11 STUDENTS - NOVEMBER, 2020
No ratings yet
020 - BCA - 2nd & 4th SEMESTER - REVISED REAPPEAR RESULT - 11 STUDENTS - NOVEMBER, 2020
14 pages
Steven Slate Drums 3.5 Guide
No ratings yet
Steven Slate Drums 3.5 Guide
61 pages
How To Download A Scientific Paper: To My Dear Advisor: Simonina Ol 'Ga Aleksandrovna
No ratings yet
How To Download A Scientific Paper: To My Dear Advisor: Simonina Ol 'Ga Aleksandrovna
19 pages
Invoice - Bitrefill
No ratings yet
Invoice - Bitrefill
2 pages
Datareader in C#: John Hudai Godel
No ratings yet
Datareader in C#: John Hudai Godel
6 pages
No Software Will Be Installed or Removed.: Installation Summary
No ratings yet
No Software Will Be Installed or Removed.: Installation Summary
1 page
FM1100 Simple User Guide For Recommended Configuration V2.0
No ratings yet
FM1100 Simple User Guide For Recommended Configuration V2.0
8 pages
Week 3
No ratings yet
Week 3
3 pages
Ujwal Maharjan IT CV & Experience
No ratings yet
Ujwal Maharjan IT CV & Experience
2 pages
Translam College Timetable
No ratings yet
Translam College Timetable
6 pages
Final Report Gtu
No ratings yet
Final Report Gtu
39 pages
Maximum Supported Hopping Rate Measurements Using The Universal Software Radio Peripheral Software Defined Radio
No ratings yet
Maximum Supported Hopping Rate Measurements Using The Universal Software Radio Peripheral Software Defined Radio
7 pages
Tutorial - 5 and 6
100% (1)
Tutorial - 5 and 6
2 pages
Milfy City Game Guide & Tips
83% (6)
Milfy City Game Guide & Tips
14 pages
2010-08-18 Zernik, J: Data Mining of Online Judicial Records of The Networked US Federal Courts, International Journal On Social Media: Monitoring, Measurement, Mining, 1:69-83 (2010)
No ratings yet
2010-08-18 Zernik, J: Data Mining of Online Judicial Records of The Networked US Federal Courts, International Journal On Social Media: Monitoring, Measurement, Mining, 1:69-83 (2010)
13 pages
Assignment MET1233
No ratings yet
Assignment MET1233
12 pages
Gideon Intel Drop 56 Justpasteit
No ratings yet
Gideon Intel Drop 56 Justpasteit
12 pages

Uow 272261

Uploaded by

Uow 272261

Uploaded by

Which stats package

Data Science is the extraction of actionable

• How big is your dataset?

• Menu use is repetitive and time consuming, switch to syntax if using

• First released in 1985, StataCorp, Bill Gould

• Learning the code

• No capacity to store code of changes made

1. Always, Always, Always plot your data – side by side boxplots

• Whilst many functions are included in base R, installing packages is easy!

• To install packages all you need to do is call

• You will need to load the library to use it!

Or you can just find it in your files window in RStudio!

Spreadsheet View Results View

Levels can be given

jamovi outputs can be

Select Data Exploration, click on

You might also like