Data Analytics Boot Camp
Presented by China Data Lab
Social science research often requires the use of a large quantity of data. Owing to the development of the Internet and social media in recent years, many novel datasets have become available, and they offer researchers new opportunities for obtaining micro-level evidence.
China Data Lab will host a one-day boot camp to help future social scientists develop methods and techniques to utilize the emerging big social data sources.
In this boot camp, you will put the skills, tools, and techniques you are taught to work. You will receive hands-on instruction on how to get data online, carry out panel data analysis and text analysis, and create interactive visualizations. You will leave the boot camp with practical knowledge and ideas for future research.
NOTE: Due to limited space, priority will be given to students at UC San Diego and participating institutions. Final participants will be selected from the pool of registrants after a qualification process that determines the registrant’s current level of skills in data analysis.
Panel Data Analysis
Panel data are commonly used in the social sciences to study the causal effects of policy interventions on certain outcomes. Yet the assumptions under which popular panel data methods—such as two-way fixed effect models—can perform properly are often unsatisfied. We will discuss an emerging literature on causal inference with panel data and focus on a special setting of a single, dichotomous treatment variable. We will review a variety of methods, including difference-in-differences, the synthetic control method, latent factors models, as well as related matching and reweighing approaches, and provide recommendations for applied researchers.
Text analytics are expanding the types of data that social scientists can use to discover and measure new concepts, make predictions, and establish causality. This section will provide an overview of these tools, with an emphasis on how they can be applied to the Chinese language.
Text analytics are expanding the types of data that social scientists can used to discover and measure new concepts, make predictions, and establish causality. This section will provide an overview of these tools, with an emphasis on how they can be applied to the Chinese language.
Web Scraping Techniques
Leo Y. YANG, PhD Student of Political Science, UC San Diego
As the data get bigger, the difficulty of collecting them has also increased. Instead of manually gathering data, it is essential for social scientists to command automated tools to collect data with higher precision, faster speed, and better reliability. In this section, the instructor will introduce the basic web scraping techniques including the HTTP mechanism, some useful crawling frameworks in R and Python, and illustrate them with several real-world cases. This section presumes very basic knowledge in computer programming.
Social Data Visualization and Toolbox Introduction
A picture is worth a thousand words. How to utilize the emerging tools to build straightforward, inspiring, interactive visuals is the first step as well as the last step of your data-related research. This section will introduce techniques to visualize different data sets within clicks, build your own interactive, customized visuals using various software like D3.js and Shiny and Tableau.
|9:00 a.m.–9:15 a.m.||Basic Introduction|
|5 minute break|
|9:20 a.m.–10:50 a.m.||Panel Data Analysis||Yiqing XU|
|10 minute break|
|11:00 a.m.–12:30 p.m.||Web Scraping Techniques||Leo YANG|
|Lunch will be served|
|1:30 p.m.–3:00 p.m.||Social Data Visualization||Young YANG|
|10 minute break|
|3:10 p.m.–4:40 p.m.||Text Analytics||Margaret ROBERTS|