Summer School 2025: All courses

Summer School

The CEMFI Summer School aims to provide practitioners and academics from all over the world with an opportunity to update their training in fields within CEMFI's range of expertise.

A variety of one-week courses are offered each year, during late August and early September.

The courses are taught by leaders in their fields. They are based on innovative teaching practices that combine regular lectures and personalized interaction between the instructor and course participants. Courses typically combine formal lectures, discussion sessions and, in some cases, workshop sessions where some participants can discuss their work. In more applied courses, practical classes outside the regular schedule are organized to provide additional hands-on experience. A course manager is assigned to each course to coordinate all activities.

In person courses: Each course has between 10 and 36 participants. It consists of five daily sessions that lasts three and a half hours, including a 30-minute break, and can take place either in the morning or in the afternoon. In the evening of the second day of each course, the School organizes a course dinner aimed at providing participants and instructors with an occasion to interact in a relaxed atmosphere.

Online courses: All courses take place from Monday to Friday between 15:30 and 18:30 CEST. with a maximum enrollment of 24 people. The interaction between the instructor and course participants is complemented with up to two daily hours of online office time.

Difference-in-Differences with Panel Data

Instructor: Jeffrey Wooldridge (Michigan State University)
Dates: 18-22 August 2025
Hours: 9:30 to 13:00 CEST
Format: In person

Intended for

Empirical researchers and applied econometricians with an interest in recent advances in difference-in-differences estimation.

Prerequisites

A masters level course in probability and statistics - including a basic understand of asymptotic tools - and a course in econometrics at the level of W.H. Greene (2018), Econometric Analysis, 8th edition, F. Hayashi (2000), Econometrics, or B. Hansen (2023), Econometrics. Participants are expected to have a working knowledge of ordinary least squares, generalized least squares, two-way fixed effects, and basic nonlinear estimation methods. It is helpful to know about treatment effect estimation assuming unconfounded assignment. Derivations will be kept to a minimum except where making an essential point.

Overview

The purpose of the course is to provide applied economists with an update on some developments in intervention analysis using what are generally known as “difference-in-differences” methods. One theme is that flexible regression methods, whether based on pooled OLS or two-way fixed effects, can be very effective - in both common and staggered timing settings. Other methods that rely on commonly used treatment effects estimators, such as inverse probability weighting combined with regression adjustment, are easily motivated in a common framework. Imputation approaches to estimation, and their relationship to pooled methods, also will be discussed.

Special situations such as all units treated, exit from treatment, and time-varying controls also will be discussed. Event study approaches for detecting the presence of pre-trends, and accounting for heterogeneous trends using both regression and treatment effects estimator, also will be covered. As time permits, strategies with non-binary treatments also will be discussed.

The main focus is on microeconometric applications where the number of time periods is small. Nevertheless, some coverage of “large-T” panels is also included, including cases with few treated units. Simple strategies for small-N inference will be discussed and compared with synthetic control methods.

The course will end with a treatment of nonlinear difference-in-differences methods, with focus on binary, fractional, and nonnegative outcomes (including counts). Logit and Poisson regression an especially attractive for such applications.

Topics

Introduction and Overview. The T = 2 Case. No Anticipation and Parallel Trends. Controlling for Covariates via Regression Adjustment and Propensity Score Methods.
General Common Intervention Timing. Event Study Estimators and Heterogeneous Trends. Flexible Estimation with Covariates.
Staggered Interventions. Identification and Imputation. Pooled OLS and Extended TWFE. Aggregation.
All Units Eventually Treated. Event Study Estimators. Testing and Correcting for Violations of Parallel Trends. Equivalences of Estimators.
Imputation using Unit Fixed Effects. Rolling Methods. Long Differencing. Propensity Score and Doubly Robust Methods.
Strategies with Exit. Time-Varying Covariates. Unbalanced Panels.
Non-Binary Treatments.
Inference with few Treated Units. Time Series Approaches. Synthetic Control.
Nonlinear DiD. Binary, Fractional, and Nonnegative Responses.
DiD with Repeated Cross Sections.

Professor: Jeffrey M. Wooldridge

Data Science for Economics: Mastering Unstructured Data

Instructor: Christopher Rauh (University of Cambridge)
Dates: 18-22 August 2025
Hours: 9:30 to 13:00 CEST
Format: In person

Intended for

Academic researchers, policy analysts, data analysts, and consultants who are using, or who wish to use, unstructured data sources such as text, detailed surveys, images, or speech data in their work.

Prerequisites

A basic familiarity with probability and statistics at advanced undergraduate level. The hands-on classes will require students to work through Python notebooks that will be prepared in advance. Extension problems will involve the modification of these notebooks, which requires familiarity with the basics of Python. An introductory session to Python will be provided by a teaching assistant. Therefore, previous programming experience in other languages is sufficient.

Overview

Over the past decade, the use of unstructured data - such as text and images - has grown significantly in economics and related disciplines. The emergence of large language models (LLMs) like ChatGPT, as well as broader advancements in generative AI, has transformed how researchers analyze and interpret these data sources. These technologies not only enable text classification and sentiment analysis but also facilitate more complex tasks such as text generation, forecasting, and model fine-tuning. As a result, researchers now have unprecedented opportunities to extract insights, automate tasks, and develop AI-enhanced economic models.
By combining practical implementation with intuitive theoretical insights, this course prepares participants to effectively leverage unstructured data, large language models, and generative AI in economic research. Participants will gain hands-on experience in fine-tuning LLMs, developing AI-powered analytical pipelines, and using generative models to push the boundaries of modern economic analysis.
The course is structured around five key components:

Analytical Techniques: Key statistical and machine learning methods for analyzing unstructured data. Topics include Bayesian updating, matrix factorization, and predictive modeling using neural networks and random forests. Special attention will be given to how these techniques apply to natural language processing (NLP) and generative AI. Rather than focusing on technical derivations, the emphasis is on developing an intuitive understanding of these algorithms and their applications.
Large Language Models (LLMs) and Generative AI: Architecture, training methods, and practical applications in economics. Topics include:

Text embeddings and transformers: How models like BERT, GPT, and LLaMA process and generate text.
Fine-tuning and prompt engineering: Customizing LLMs for domain-specific economic research.
Generative AI for synthetic data: Using AI to create simulated datasets for economic modeling and forecasting.
Challenges and biases: Addressing interpretability, fairness, and limitations of generative models.

Economic Applications: topic modeling and sentiment analysis to analyze policy debates and financial markets; Fine-tuning LLMs for economic forecasting and macroeconomic policy analysis; generative AI for survey automation and synthetic data generation.
Hands-on Implementation: Through guided coding sessions, students will develop the skills to integrate AI-driven techniques into their own research projects. Participants will work with real-world datasets using Python and Hugging Face Transformers. They will learn fine-tuning techniques for LLMs, including training models on domain-specific text (e.g., financial reports, economic policy papers). They will implement custom NLP pipelines to analyze economic data.
Data Collection and Preparation: Methodologies for collecting, processing, and structuring unstructured data for AI-driven analysis. This includes web scraping and APIs for extracting textual and financial data, preprocessing pipelines for cleaning and structuring data for machine learning models and ethical considerations when working with LLMs and AI-generated data.

Topics

Topic modeling and probabilistic approaches: Latent Dirichlet Allocation (LDA)
Large language models: Transformer architectures, pretraining, and fine-tuning for domain-specific tasks
Evaluating AI predictions: Accuracy, precision-recall, and interpretability in unstructured data models
Image analysis and classification: Convolutional neural networks (CNNs) and transfer learning
Web scraping and automated data extraction from online sources
Speech-to-text processing and sentiment analysis of spoken language
Generative AI: Text generation, synthetic data, and AI-assisted research methods

Practical Classes

There will be between two and three voluntary sessions in the afternoon (from 15:00 to 17:00) led by a teaching assistant. Exact dates will be announced before the beginning of the course.

Professor: Christopher Rauh

Causal Inference for Health and Social Scientists

This is course is co-organized with

CAUSAab

Instructor: Miguel Hernán (Harvard University)
Dates: 25-29 August 2025
Hours: 9:30 to 13:00 CEST
Format: In person

Intended for

Researchers who work with large databases of individual-level data, health policy practitioners.

Prerequisites

Working knowledge of study design and regression analysis, interest in evaluation of policy interventions for public health and medicine.

Overview

The course introduces a general-purpose causal inference framework that integrates methods for both experimental and non-experimental data. The framework has two steps: 1) specification of the (hypothetical) target experiment or trial that would answer the causal question of interest, and 2) emulation of the target trial using the available data. The course explores key challenges for target trial emulation and critically reviews methods proposed to overcome those challenges. The methods are presented in the context of the evaluation of the comparative effectiveness of health interventions using existing databases of administrative and clinical data. At the end of the course students should be able to:

Formulate sufficiently well-defined causal questions
Specify the protocol of the target trial
Design analyses of observational data that emulate the target trial
Identify key assumptions for a correct emulation of the target trial

Topics

Causal inference as a key component of decision making
Target trial emulation as a unifying concept for causal inference
Target trial emulation to avoid self-inflicted biases in causal inference
Point interventions vs. sustained policies
G-methods to evaluate sustained policies

Professor: Miguel Hernán

Macroeconomics with Frictional Financial Markets

Instructor: Luigi Bocola (Stanford University)
Dates: 25-29 August 2025
Hours: 9:30 to 13:00 CEST
Format: In person

Intended for

Academic researchers and researchers in policy institutions interested in the intersection of Macroeconomics and Finance.

Prerequisites

Participants are assumed to have knowledge of macroeconomics at the master/PhD level. That includes basic knowledge of numerical methods for the solution of dynamic stochastic general equilibrium models.

Overview

The past fifteen years have seen a growing body of theoretical, quantitative, and empirical research focused on understanding the macroeconomic implications of shocks to the financial sector and on studying what policymakers can do to reduce the risk of financial crises. The aim of this course is to cover state-of-the-art research at the intersection of macroeconomics and finance and, specifically, contributions that focus on the propagation of financial shocks to the rest of the economy. We will start by introducing a canonical macroeconomic model where financial intermediaries face leverage constraints, discuss its implications for the propagation of shocks, and study different algorithms for its numerical solution. We will discuss how to use this framework for the analysis of financial crises and for the design of financial regulations. The course will conclude with a set of advanced topics at the intersection of macroeconomics and finance.

Topics

A macroeconomic model with financial intermediation
Local and global numerical solutions methods
Shock propagation
Welfare properties and optimal financial regulation
Advanced topics: i) From micro estimates to macro effects; ii) The financing of supply chains and non-bank intermediaries; iii) Financial crises in open economies

Professor: Luigi Bocola

Using Textual Data in Empirical Monetary Economics

Instructors: Michael McMahon (University of Oxford)
Dates: 25-29 August 2025
Hours: 15:00 to 18:30 CEST
Format: In person

Intended for
There will be between two and three voluntary sessions in the afternoon (from 11:00 to 13:00) led by a teaching assistant. Exact dates will be announced before the beginning of the course.

Intended for
Academic researchers and researchers in policy institutions who are interested in using unstructured textual data in their work.

Prerequisites

A basic familiarity with macroeconomics, econometrics, probability, and statistics at advanced undergraduate level. Some familiarity with the basics of Python will be useful for the applied sessions.

Overview

Central banks are charged with setting interest rates, and other related policies, to manage the business cycle. They are a part of the 24-hour news cycle, make their decision announcements with statements and press conferences, and they actively try to engage not only their traditional audiences, financial markets, but also a wider public. Communication became a policy tool in its own right when nominal interest rates hit their effective lower bound in the midst of the global financial crisis.

But this revolution of increasing transparency and an increasing role of communication brings challenges to central banks, and to the academics who study them and their policies. The changing framework shifts the emphasis from interest rates to words. But most of economic analysis is not designed to study words. To do so, requires the use of tools from computer science and the field of natural language processing. This course will explore how we, as researchers, rise to these challenges by combining new data, new approaches and innovative use of new statistical methodologies from outside economics.

There are three parts to the course. The first part will introduce how we think about central bank communication within the context of traditional models from monetary economics. And how do we answer perhaps the most fundamental empirical question in monetary economics: what is the effect of monetary policy on economic and financial variables? This will highlight how existing studies typically do not isolate the effect of communication separate to policy action, but such understanding is needed to design communication strategies.

The second part explores a range of tools to deploy for the study of unstructured textual data. Broadly, this part will cover tools that allow us to measure and understand the 3 Ts of communication - Tone, Topic and Temporal dimension. This part will introduce specific tools, but given that they are evolving at a fast pace, the focus will be on how the tools can be deployed to answer the questions identified in the first part. But, of course, the focus on broad use of the tools will mean that participants should be able to deploy these tools in other areas of economics or other topics of study.

The final part will discuss how we can bring such analysis to move towards a measure of narrative. Narrative is a concept studied across a wide variety of research disciplines, ranging from classic linguistics to social sciences, engineering, and computer science. The definitions of what constitutes a narrative, however, vary widely between those disciplines. And even within economics, we are yet to converge on a precise definition of narrative; ‘Narrative economics’ defines narratives as “popular stories that affect individual and collective economic behaviour” (Shiller, 2020). Narrative economics recognises that semantic text features and especially sentiment are key in defining what narratives are; this means that narratives must be thought of as simultaneously embodying textual and meta-textual feature of discourse.

Topics

The Monetary Transmission Mechanism and the role of communication
Shocks vs surprises
Measuring communication using NLP:
1. Probability topic models: latent Dirichlet allocation
2. Word embedding models: word2vec, GloVe
3. Large language models such as BERT and ChatGPT
Towards an empirical implementation of narrative

Professor: Michael McMahon

The Econometrics of Macro and Micro Interactions

Instructor: Frank Schorfheide (University of Pennsylvania)
Dates: 1-5 September 2025
Hours: 9:30 to 13:00 CEST
Format: In person

Intended for

Practitioners, researchers, and academics interested in methods to study the interaction of aggregate and cross-sectional data.

Prerequisites

A solid background in statistics and econometrics (masters level or first-year Ph.D. level) will be useful to follow the class, but no familiarity with the Bayesian approach is required, as the course will start with a brief introduction to Bayesian econometrics.

Overview

The course focuses on modeling the joint dynamics of macroeconomic aggregates and cross-sectional data. For instance, the macroeconomic aggregates could include a measure of productivity, gross domestic product, and the unemployment rate. The cross-sectional data could include administrative or survey data on labor earnings. Such models could be used, for instance, to examine the effects of an aggregate shock, such as a productivity shock, on the cross-sectional distribution of income. The course focuses on model specification and estimation using Bayesian techniques.

Topics

Introduction to Bayesian inference and computation
Empirical Bayes methods
Functional autoregressive models
Bayesian panel data analysis
Modeling distributional versus unit-level dynamics
Empirical applications: the effect of technology and monetary policy shocks on cross-sectional outcomes

Professor: Frank Schorfheide

New Developments in the Econometrics of Heterogeneous Workers and Firms

Instructor: Elena Manresa (New York University)
Dates: 1-5 September 2025
Hours: 15:00 to 18:30 CEST
Format: In person

Intended for

Researchers, economists, and policy practitioners.

Prerequisites

Participants should be comfortable with a Master’s level course in econometrics.

Overview

In recent years there has been an upsurge of interest in new methods for the analysis of matched employer-employee data to study the labor market, partly motivated by the increasing availability of this type of datasets. Existing methods are also finding applicability in other areas such as international trade, economic geography, environmental economics or intergenerational mobility. A main theme is how to deal with multiple heterogeneities and their potential interactions. This course will introduce newly developed approaches to deal with unobserved heterogeneity in conventional and matched panel data sets with an emphasis on discrete-classification methods. It will also provide an overview of traditional methods to decompose earnings variability into worker and firm-level effects as well as the more recent distributional approaches. Finally, the course will review methods for studying transitions and dynamic responses.

Topics

Introduction to clustering methods in panel data analysis
Grouped fixed-effects estimation
AKM decompositions of two-sided heterogeneity from matched panel data
Applications to labor markets, international trade, and the environment
Distributional approaches for linked employer-employee data
Models of worker and firm dynamics

Professor: Elena Manresa

Empirical Analysis of Firm Performance

Instructor: Jan de Loecker (KU Leuven)
Dates: 8-12 September 2025
Hours: 9:30 to 13:00 CEST
Format: In person

Practical Classes

There will be between two and three voluntary sessions in the afternoon (from 15:00 to 17:00) led by a teaching assistant. Exact dates will be announced before the beginning of the course.

Intended for

Academics, researchers, practitioners, and graduate students interested in industrial organization and firm performance.

Prerequisites

First-year Master or PhD level in econometrics and microeconomics.

Overview

The objective of this course is to introduce the main questions and methodologies in the analysis of firm performance from the point of view of the Industrial Organization literature. Participants will learn how to critically evaluate empirical work in Industrial Organization, and related fields studying firm performance (such as trade, macro, and development economics) and develop tools for research. The empirical work covered will typically have a close tie to a theoretical model. The basic structure of the course will involve presentation and discussion of papers that should be read in advance. Problem sets will be made available to implement the estimation routines and analysis presented during the lectures.

Topics

Cost and production: An IO perspective
Production Analysis: Estimating production functions
An empirical framework to study market power
Market power and pass-through
Market power and misallocation

Professor: Jan De Loecker

Local Projection Methods for Time Series and Panel Data

Instructor: Òscar Jordà (Federal Reserve Bank of San Francisco and UC Davis)
Dates: 8-12 September 2025
Hours: 9:30 to 13:00 CEST
Format: In person

Intended for

Academic researchers and policy analysts who are interested in modern multivariate time series methods to compute the dynamic effect of policy interventions and the method of local projections in particular.

Prerequisites

Some basic knowledge of probability or statistics is expected. Individuals with undergraduate degrees in economics, statistics or related disciplines should be able to follow the course. The emphasis will be on applications and practical aspects rather than on deep theory. The applications will primarily use the statistics software package STATA.

Overview

Applied economists are often interested in how an intervention will affect an economic outcome. When the data come in the form of a vector time series or a panel of data of a vector of variables for individual units observed over time, it is important to characterize the dynamic features of the problem in as general a manner as possible. The main objective of the course is thus to introduce the method of local projections (LPs) to examine how interventions affect outcomes over time in the context of general dynamic systems. The flexibility of LPs allow for convenient extensions to explore nonlinearities, state-dependence and policy evaluation more generally, in an easy and accessible way.

Over the past few years, there have been numerous extensions to LPs that will be discussed. These include estimation of multipliers and interpretation of impulse responses; new results on impulse response inference; a decomposition of the impulse response into the direct versus indirect effects of an intervention, and small-sample composition effects; simple linear in parameter methods to estimate time-varying impulse responses; stratification of impulse responses as a function of economic conditions and other nonlinear extensions, to name a few.

More recently, it has become more common to analyze panel data in macroeconomics. Panel data structures allow for richer options, especially on identification. The course will take advantage of these new developments, particularly in the area of difference-in-differences (DiD) identification. LP-DiD methods accommodate a wide range of recently proposed estimators of staggered, heterogeneous, treatment effects.

The breadth of topics covered limits the rigor with which each result will be discussed, though appropriate references will be provided for those interested. The goal of the course is to guide practitioners to appropriate methods for their problems, and to elicit fruitful extensions and avenues for new research. Applications of the methods discussed in class will use the econometrics software package STATA.

Topics

Introduction to the main questions of interest: a local projection as the dynamic version of traditional policy evaluation. Connection to vector autoregressions and their impulse responses. Multipliers and interpretation of impulse responses under different specifications.
Inference with local projections.
Identification.
Smoothing methods and economic interpretation.
Matching methods for estimation of Euler equations. Optimal policy perturbations.
Nonlinearities, Stratification, decomposition, and time-varying impulse responses.
Panel data structures and inference.
Staggered, heterogeneous treatment effects in difference-in-difference studies using LPs. Panel data applications.

Professor: Òscar Jordà

Geoeconomics: The Economic Effects of Geopolitical Risks

Instructor: Dario Caldara (Federal Reserve Board)
Dates: 8-12 September 2025
Hours: 15:00 to 18:30 CEST
Format: In person

Intended for

Academic researchers and researchers in policy institutions who are interested in geopolitics and international economics.

Prerequisites

Master level courses in macroeconomics and econometrics.

Overview

This course examines the intersection of economics and geopolitics, focusing on the measurement and economic impact of geopolitical risks. We will explore various approaches to quantifying geopolitical risks and assess how different countries, industries, and sectors are exposed to them. The course will analyze the economic and financial consequences of geopolitical shocks, including their effects on economic activity, financial markets, and cross-border linkages.

A key emphasis will be on the methodological tools used in geoeconomic analysis. Students will develop analytical skills in textual analysis and working with (un)structured large datasets, applying these methods to real-world economic and policy challenges. Additionally, we will examine selected aspects of international policy coordination in response to geopolitical risks, covering monetary, fiscal, and trade policy considerations.

This course will equip participants with the tools needed to analyze the economic impact of geopolitical developments in an increasingly uncertain global landscape.

Topics

Measurement of geopolitical risks
Overview of textual analysis in economic applications
Quantification of the economic effects of geopolitical risks
Selected topics in international policy coordination

Professor: Dario Caldara