Data Analytics Boot Camp

Presented by China Data Lab

The China Data Lab is hosting a Data Analytics Boot Camp this quarter – with two parts: training & group projects.

Part 1: Training on Wednesday, February 12^th 9 AM – 1 PM (Village 15B)

Part 2: Project Presentations on Friday, February 14^th 12 – 4 PM (GPS 3106)

Social science research often requires the use of a large quantity of data. Owing to the development of Internet and social media in recent years, many novel datasets have become available, and they offer researchers new opportunities for obtaining micro-level evidence.

This boot camp is designed to train emerging social scientists to explore some of the China Data Lab’s datasets and hone techniques to utilize the emerging big social data.

In this boot camp, you will put the skills, tools, and techniques you are taught to work. You will receive hands-on instructions on how to perform text data analysis and master practical data exploration techniques including data manipulation and cleaning, description and visualizations based on real examples.You will then be introduced to several distinct datasets, divided into small groups of four to work on one particular dataset for 36 hours, and invited back to give a short presentation with critique and coaching by the faculty.

We are looking for 24 highly motivated students, both graduate and undergraduate, who are familiar with R and, in the case of some datasets, have Chinese language skills, to participate in the boot camp.

Text Analytics

Margaret ROBERTS, Associate Professor of Political Science, UC San Diego

Text analytics are expanding the types of data that social scientists can use to discover and measure new concepts, make predictions, and establish causality. This section will provide an overview of these tools, with an emphasis on how they can be applied to the Chinese language.

Text analytics are expanding the types of data that social scientists can used to discover and measure new concepts, make predictions, and establish causality. This section will provide an overview of these tools, with an emphasis on how they can be applied to the Chinese language.

Data exploration techniques and tools

Yang (Young) YANG, Postdoctoral Fellow, 21st Century China Center, UC San Diego

Garbage in, garbage out is the first rule when you’re dealing with real world data. This section will give you the most efficient tool and techniques to transform your raw data into analysis ready material.

How to utilize the emerging tools to build straightforward, inspiring, interactive visuals is the first step as well as the last step of your data-related research. This section will also introduce techniques to visualize different data sets within clicks, build your own interactive, customized visuals using various software like ggplot2, Shiny and D3.js.

Tentative Schedule

Time	Topic	Instructor
Wednesday, February 12 (@Village 15B)
9:00 – 10:00	Text data analysis	Margaret ROBERTS
10:00 – 11:00	Data exploration techniques and tools	Young YANG
11:00 – 12:00	Introduce datasets and problems	Young / Molly / Weiyi etc.
12:00 – 13:00	Lunch and planning by student groups
13:00 – 15:00	Room available for group projects
Friday, February 14 (@GPS 3106)
12:00 – 14:00	Group presentations	Various
14:00 – 15:30	Research Workshop Presentation	Lizhi Liu, Georgetown
15:30 – 15:45	Conclusion	Margaret ROBERTS

Host

China Data Lab

Date

February 12 – 14, 2020

Location

Wednesday, February 12
The Village 15B (15th Floor)
UC San Diego
2202 Scholars Dr North
La Jolla, CA 92093

Friday, February 14
GPS Robinson Building Complex 3106
International Lane
University of California San Diego
San Diego, CA 92093

Sign up

Click here to sign up