Tracking consumer price statistics in real-time with Walmart data (part 1.1)

Alvaro A. Montoya
7 min readMay 25, 2022
Source: Diariofedecamaras.com

With causes varying from the pandemic economic rebound to the war in Ukraine, inflation is soaring all over the world in 2022. From advanced economies to low-income nations, most countries are suffering from inflationary trends. Inflation is a key economic indicator because it relates directly to people’s well-being. The current inflationary push is thus eroding most people’s purchasing power. Therefore, monitoring inflation is important, especially for open economies that are price-takers, as most of the developing world is.

Most inflation monitoring in the developing world is still based on surveying key retail sectors every month and matching these prices with a pre-defined (based on consumption-expenditures surveys) set of factor weights for each product. This results in lagged versions of price monitoring, in contrast with the possibility of tracking prices in real-time. With this background in mind, I sought to develop a project to monitor prices in developing countries using web scraped data from the online version of supermarket chains. For the pilot stage of the project, I chose to download grocery price data from Walmart Central America. The regional activities of the retail giant in Central America provide similar websites for Mexico, Guatemala, El Salvador, Honduras, and Nicaragua.

This document presents the first results with data from Nicaragua. This country offers an ideal case study for at least two reasons: it is the second-poorest country in the American continent after Haiti, and it is governed by an autocratic regime that has manipulated official statistics in the past. In other words, since Walmart provides relatively economical shopping, I expect a greater Walmart market share for Nicaraguans. The second motive is in my opinion the most important because there are reasons to believe the Ortega regime is actively hiding the current price hike.

Without a single causal stream, inflation is a reality for most sectors of the economy, and it is particularly high for food prices. Using official Consumer Price Index (CPI) data, Figure 1 depicts a surge of prices at two-digit rates, with an interannual inflation rate of almost 14% for food prices in March 2022. It is worth noting that the official inflation estimates are published each month by Nicaragua’s statistical institute INIDE (Instituto Nacional de Información para el Desarrollo, in Spanish).

Figure 1. Nicaragua: Recent inflation trends

Interestingly, INIDE also publishes reports on the Basic Consumption Basket (BCB, Canasta básica in Spanish), which contains 53 consumption goods considered essential for the monthly living expenses of a stereotypical family of five members. In Nicaragua, monitoring the evolution of BCB is widely known by the population and extensively covered by the few free media outlets left in the country. The firing of the Economics Minister in 2020 because he did not know the prices within the BCB at the moment is a testament to this relevance. For this project, I chose to replicate Nicaragua’s BCB in terms of representative goods that are equivalent to those included in the official BCB and also found in the scrapped data.

Figure 2 provides a composition portray of this official consumption basket with the relative importance given by its monthly cost distribution for March 2022. During March the average cost of the 53 items within Nicaragua’s BCB rose to 16,998 Córdobas, approximately 475 US dollars. A brief overview of Figure 2 tells how food items were the primordial component in the total basket value during March. Indeed, the cost of acquiring food items corresponds to 70% of the total BCB. Products like tortillas, milk, and beef were the main representatives of the typical Nicaraguan diet as depicted by the BCB.

Figure 2. Nicaragua: Official monthly consumption basket breakdown for March 2022

It is worth noting that while prices vary month to month, the ‘consensus’ quantity of consumption for each product is fixed (does not change over time) and determined by INIDE. To cite a few examples, for this typical family of five (two adults + three children), INIDE calculates a monthly consumption need of 30 liters of milk, 9 pounds of cheese, or 7 dozens of eggs. These quantities are based on caloric consumption needs calculated in turn by Central America’s Institute of Nutrition. This leads to one of the project’s main premises: that using Walmart as the source of price movements and INIDE’s item quantities you can potentially recreate Nicaragua’s BCB in real-time.

The data for replicating Nicaragua’s BCB relies on the regular update of web scraped price information from Walmart.com.ni. In particular, I used a Selenium Chrome driver to download .csv files for each country on a weekly basis. The web scraping code crawls through the store items and downloads their name, price, weight, and discount information. Using these indicators, a further process of Natural Language Processing (NLP) helped me generate harmonized variables, and create a time series of the downloaded datasets. I started writing the code in January-February 2022, and so far it has evolved to include more categories and countries (I started with two countries and three categories). For Nicaragua, and on average, each weekly download results in over 1,500 individual items.

Once the information is saved in relational database format, I obtained the BCB’s consensus quantities for each product from Nicaragua’s INIDE website. For this task, I developed different measures of text similarity as a first-tier filter for identifying which items to include in the index, among the thousands downloaded each time. Specifically, I experimented with the Difflib, FuzzyWuzzy, and RecordLinkage string matching and text similarities libraries. These libraries, along with a simpler approach using keyword search within Pandas’ matching functions (e.g. pandas.Series.str.contains) helped me identify 21 Official BCB items recurrently found in the web scraped database. It is worth mentioning that in the final stage of the NLP I chose the 21 items (less for the first months of script built-up; e.g. January/February) based on manually comparing the price and weight information of about 240 items. These 240 were the selection product of previously mentioned matching libraries and NLP heuristics. Part of the final selection was based on prior knowledge (as a consumer) about common brands and product variants offered by Walmart in Nicaragua. For the most part, these are food items, but also include personal hygiene and household articles. The next section introduces statistics about the price evolution of these BCB items.

Initially, the main objective for gathering this data was to track consumer prices in real or quasi-real-time and match price movements under the official CPI or BCB methodologies. For this initial stage when there is still no time series covering a longer time span, I will focus on replicating Nicaragua’s BCB instead of Nicaragua’s CPI. This is because the BCB methodology is more parsimonious and as the results in this section suggest, the BCB is more clearly represented in the products available at Walmart stores. The 21 articles discussed above integrated 87% of the food component of Nicaragua’s official BCB cost for March 2022 (60% of the total basket value for this month). In quantity (weight) they also represent about 60% of the overall consumption consensus.

Figure 3 is a stacked bar plot illustrating the cost of buying the minimum quantities of each of the BCB goods identified in the scraped files. Each value results from multiplying INIDE’s consensus quantities by the average monthly price for each of these products on Walmart’s site. As time passes from January to April each column has more articles. This is explained by the process of code iteration during the first months of the program creation, as I mentioned earlier. A reading of Figure 3 shows that in April these 21 products added to 10,131 Córdobas (around 280 dollars). As a reference, the same 21 articles summed up 10,239 Córdobas in INIDE’s March 2022 BCB.

The gaps or price differences can be explained by the fact that INIDE’s price surveys are not as narrow as mine; which has Walmart as the sole source. This determines which products I find, their prices and packaging, all factors affecting each product’s price trends, and comparability with the official items inspected regularly by INIDE. The gap, however, is very small, equivalent to 3 US dollars. Comparing both sets of price data (INIDE’s March 2022 with my web scraped data for April 2022) results in an implied monthly deflation of 1.7% for April 2022. This apparent contradiction, since we know officially that inflation is climbing up in Nicaragua, is again a result of collecting price data from a single retailer. In this case, Walmart is known internationally as an economical retail store. I hope to continue downloading data for the rest of 2022 to confirm current inflation patterns. This exercise will hopefully help me find other explanations for this trend.

Figure 3. The monthly cost of consumption goods found at Walmart.com.ni

The results for Nicaragua indicate that it is possible to recreate official consumer statistics based on information extracted programmatically from online retail stores. Despite its limitations, I believe this type of project is scalable, easily replicable, and most importantly, provides a good source of consumer prices for countries where authoritarian regimes may be misrepresenting or hiding inflation statistics.

In the future, I plan to continue populating the series with bimonthly downloads of new items and prices for the five countries. This time series of foods and beverages will be the basis for projecting BCB costs, and also for creating a Walmart CPI index for Nicaragua. I also plan to complement this information with research to assess the external validity of my main results, focusing on understanding Walmart’s approximate market shares. This could potentially be addressed by including other retail stores in the project.

The first blog post for this series can be found here. Let me know what you think!

--

--

Alvaro A. Montoya

Economist, Msc. Data Science student at Georgetown University.