Skip to contents

This vignette demonstrates the core functionality of the consumptionsurveyindia package: connecting to data, browsing available datasets and items, and running basic queries.

Connect to the data

After running the data pipeline (data-raw/download_all.R through data-raw/harmonize_consumption.R), connect to the harmonised Parquet file:

library(consumptionsurveyindia)
library(dplyr)
library(ggplot2)

con <- ics_connect("~/data/consumption_parquet")

What data do we have?

avail <- ics_available_data(con)
avail |>
  select(nss_round, round_year, era, n_records, n_items, n_states, description) |>
  knitr::kable(
    format.args = list(big.mark = ","),
    caption = "Available survey rounds in the database"
  )
Available survey rounds in the database
nss_round round_year era n_records n_items n_states description
43rd 1,988 pre2022 700,172 20 31 NSS Schedule 1.0, 30-day uniform recall
45th 1,990 pre2022 11,483 6 0 NSS Schedule 1.0, 30-day uniform recall
47th 1,991 pre2022 5,171 6 32 NSS Schedule 1.0, 30-day uniform recall
48th 1,992 pre2022 4,892 6 32 NSS Schedule 1.0, 30-day uniform recall
49th 1,993 pre2022 1,544,269 236 32 NSS Schedule 1.0, 30-day uniform recall
50th 1,994 pre2022 1,896,656 85 32 NSS Schedule 1.0, 30-day uniform recall
51st 1,995 pre2022 2,394,586 219 32 NSS Schedule 1.0, 30-day uniform recall
52nd 1,996 pre2022 195,085 21 32 NSS Schedule 1.0, 30-day uniform recall
53rd 1,997 pre2022 2,466,992 219 32 NSS Schedule 1.0, mixed recall
54th 1,998 pre2022 1,218,197 219 32 NSS Schedule 1.0, mixed recall
55th 2,000 pre2022 5,049,897 177 32 NSS Schedule 1.0, mixed recall
56th 2,001 pre2022 719,734 24 35 NSS Schedule 1.0, mixed recall
57th 2,002 pre2022 3,320,540 158 30 NSS Schedule 1.0, mixed recall
58th 2,002 pre2022 1,507,939 175 35 NSS Schedule 1.0, mixed recall
59th 2,003 pre2022 1,524,908 157 35 NSS Schedule 1.0, mixed recall
60th 2,004 pre2022 1,026,092 158 28 NSS Schedule 1.0, mixed recall
61st 2,005 pre2022 4,551,390 158 28 NSS Schedule 1.0, mixed recall
62nd 2,006 pre2022 1,509,725 160 28 NSS Schedule 1.0, mixed recall
63rd 2,007 pre2022 3,110,721 177 35 NSS Schedule 1.0, mixed recall
64th 2,008 pre2022 2,751,946 196 35 NSS Schedule 1.0, mixed recall
66th-T1 2,010 pre2022 5,310,709 189 35 NSS Schedule 1.0, mixed recall
66th-T2 2,010 pre2022 4,813,463 189 35 NSS Schedule 1.0, mixed recall
68th-T1 2,012 pre2022 5,763,152 183 0 NSS Schedule 1.0, mixed recall
68th-T2 2,012 pre2022 5,277,850 183 35 NSS Schedule 1.0, mixed recall
HCES-2022-23 2,023 post2022 13,694,415 185 36 HCES three-visit design (FDQ+CSQ+DGQ)
HCES-2023-24 2,024 post2022 14,511,701 183 36 HCES three-visit design (FDQ+CSQ+DGQ)
NA NA NA 543,859 0 0 NSS Schedule 1.0, mixed recall

We have 27 rounds spanning 1988 to 2024, with 85,425,544 total item-household observations.

Browse the item catalogue

The package classifies food items into 13 groups based on the HCES questionnaire structure:

ics_food_groups() |>
  knitr::kable(caption = "Food group categories with item counts and recall periods")
Food group categories with item counts and recall periods
category n_items recall_days sections
Beverages 10 7 6.8
Cereals 39 30 5.1
Edible Oil 10 7 6.6
Egg, Fish & Meat 8 7 6.5
Fruits (Dry) 11 7 6.4
Fruits (Fresh) 20 7 6.3
Milk & Milk Products 14 7 6.1
Packaged Processed Food 14 7 7.2
Pulses 15 30 5.2
Served Processed Food 7 7 7.1
Spices 13 7 6.7
Sugar & Salt 10 30 5.3
Vegetables 18 7 6.2

Search for specific items:

ics_search_items("rice")
#> # A tibble: 7 × 5
#>   item_code item_name                  category              section recall_days
#>   <chr>     <chr>                      <chr>                   <dbl>       <int>
#> 1 061       Rice (free/PMGKAY)         Cereals                   5.1          30
#> 2 101       Rice (PDS)                 Cereals                   5.1          30
#> 3 102       Rice (other sources)       Cereals                   5.1          30
#> 4 103       Chira (flattened rice)     Cereals                   5.1          30
#> 5 105       Muri (puffed rice)         Cereals                   5.1          30
#> 6 106       Other rice products        Cereals                   5.1          30
#> 7 015       Noodles (cup/rice noodles) Packaged Processed F…     7.2           7
ics_search_items("chicken")
#> # A tibble: 1 × 5
#>   item_code item_name category         section recall_days
#>   <chr>     <chr>     <chr>              <dbl>       <int>
#> 1 195       Chicken   Egg, Fish & Meat     6.5           7
ics_search_items("oil")
#> # A tibble: 10 × 5
#>    item_code item_name                              category section recall_days
#>    <chr>     <chr>                                  <chr>      <dbl>       <int>
#>  1 075       Edible oil (free/PMGKAY)               Edible …     6.6           7
#>  2 095       Edible oil: others                     Edible …     6.6           7
#>  3 181       Mustard oil                            Edible …     6.6           7
#>  4 182       Groundnut oil                          Edible …     6.6           7
#>  5 183       Coconut oil                            Edible …     6.6           7
#>  6 184       Refined oil (sunflower/soyabean/palm/… Edible …     6.6           7
#>  7 185       Other edible oil                       Edible …     6.6           7
#>  8 188       Edible oil (PDS)                       Edible …     6.6           7
#>  9 189       Edible oil: sub-total                  Edible …     6.6           7
#> 10 260       Oilseeds                               Spices       6.7           7

Query a specific item across all years

Track rice expenditure from 1993 to 2024:

rice <- ics_query_item(con, "rice")
rice |> knitr::kable(digits = 1, format.args = list(big.mark = ","))
nss_round round_year mean_value mean_qty total_value n item
49th 1,993 33.1 173.1 50,609,812 29,892 rice
53rd 1,997 19.2 161.9 3,855,658,939 59,469 rice
54th 1,998 17.7 156.5 3,620,969,646 30,261 rice
55th 2,000 225.5 235.4 59,977,189,749 168,712 rice
58th 2,002 196.5 20.2 58,388,560,858 46,438 rice
57th 2,002 251.4 26.8 57,773,289,294 97,681 rice
59th 2,003 258.3 27.4 578,304,659 44,559 rice
61st 2,005 250.1 25.6 574,532,178 139,041 rice
62nd 2,006 250.9 24.8 598,991,847 43,600 rice
63rd 2,007 201.7 18.9 672,553,646 96,870 rice
64th 2,008 215.5 18.3 712,737,285 75,037 rice
66th-T1 2,010 251.6 16.7 901,025,182 157,103 rice
66th-T2 2,010 252.3 16.8 894,725,004 155,500 rice
68th-T2 2,012 246.3 15,082.9 1,031,823,976 170,515 rice
HCES-2022-23 2,023 242.2 9.1 126,369,772,982 497,625 rice
HCES-2023-24 2,024 273.9 7.8 148,946,675,703 507,796 rice
if (nrow(rice) > 0) {
  ggplot(rice, aes(round_year, mean_value)) +
    geom_line(linewidth = 1, colour = "#E69F00") +
    geom_point(size = 3, colour = "#E69F00") +
    labs(
      title = "Rice: Mean Expenditure Across Survey Rounds",
      x = "Survey year", y = "Weighted mean expenditure (Rs.)"
    ) +
    theme_minimal(base_size = 12)
}

Query by food group

shares <- ics_expenditure_shares(con)
if (nrow(shares) > 0) {
  shares |>
    filter(!is.na(food_group)) |>
    group_by(food_group) |>
    filter(n() >= 3) |>
    ungroup() |>
    ggplot(aes(round_year, spending_share, colour = food_group)) +
    geom_line(linewidth = 0.8) +
    geom_point(size = 2) +
    scale_y_continuous(labels = scales::percent_format()) +
    labs(
      title = "Food Group Expenditure Shares Over Time",
      x = "Survey year", y = "Share of total food expenditure",
      colour = "Food group"
    ) +
    theme_minimal(base_size = 11) +
    theme(legend.position = "right")
}

Query by state

state_exp <- ics_consumption_by_state(con,
  round = "HCES-2022-23",
  min_obs = 100
)
if (nrow(state_exp) > 0) {
  state_exp |>
    filter(!is.na(state_name)) |>
    arrange(desc(mean_value)) |>
    head(20) |>
    ggplot(aes(reorder(state_name, mean_value), mean_value)) +
    geom_col(fill = "#2171B5") +
    coord_flip() +
    labs(
      title = "Mean Food Expenditure by State (HCES 2022-23)",
      x = NULL, y = "Mean expenditure (Rs.)"
    ) +
    theme_minimal(base_size = 11)
}

Cleanup