Skip to Main Content

GEB2405 Understanding Data Science: Home

Course Description

We live in a world surrounded by data and technology. While China is massively investing in Artificial Intelligence,  Data science is believed to be a key skill of the 21st century. Capturing, framing, and analyzing data is already a crucial challenge for companies and scientists. These skills are also critical for students since data science can be applied to discourse analysis, economic prediction, marketing, social sciences, and so forth. For those reasons, many universities in the US already offer “introduction to data science” as part of their general education program. As a data science GE class, this course provides the students with some basic proficiency in data analysis (organize, manage, examine, prepare, analyze, and visualize data using R studio). Students also discover how data science can help to understand social and natural phenomena in various domains by exploring real datasets (social science survey, titanic death toll, economic development indicators, health survey…etc).

Recommended Books

Quantitative Social Science Data with R

This book covers the basics of R, how to get data and manipulate variables, and introductory to intermediate data analysis all in R. The book will guide the readers through the basics of RStudio, differences between primary data and secondary data, how to examine, clean, and subset data and variables, how to create good empirical hypotheses, univariate and descriptive statistics, measures of central tendency and dispersion, how to graphically display data and quantitative relationships, the foundations and origins of hypothesis testing and the notion of statistical significance, testing bivariate relationships, etc.

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data

This book introduces R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. It teaches how to wrangle- transform your datasets into a form convenient for analysis; how to program- use powerful R tools for solving data problems with greater clarity and ease; how to explore- examine your data, generate hypotheses, and quickly test them; how to model- provide a low-dimensional summary that captures true signals in your dataset, and how to communicate- use R Markdown for integrating prose, code, and results.

Applied Logistic Regression Analysis

This book includes detailed discussions of goodness of fit, indices of predictive efficiency, and standardized logistic regression coefficients, and examples using SAS and SPSS are included. It explicates the estimation, interpretation, and diagnostics of such logistic regression models. The logistic counterparts to the OLS statistics- the R2, the standard error of estimate, the t ratio, and the slope- are systematically presented. Traditional regression diagnostics- the studentized residual, leverage, dbeta- are included in an innovative logistic protocol of diagnostics. The last chapter dissects the problem of a polytomous dependent variable, with multiple ordered or unordered categories.

Applied Logistic Regression

This third edition introduces logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Beginning with an introduction to the logistic regression model, the book discusses the multiple logistic regression model, interpretation of the fitted logistic regression model, model-building strategies and methods for logistic regression, assessing the fit of the model, application of logistic regression with different sampling models, logistic regression for matched case-control studies, logistic regression models for multinomial and ordinal outcomes, logistic regression models for the analysis of correlated data, and some special topics.

Recommeded Databases