Getting Started with consumptionsurveyindia
Source:vignettes/getting-started.Rmd
getting-started.RmdThis vignette demonstrates the core functionality of the
consumptionsurveyindia package: connecting to data,
browsing available datasets and items, and running basic queries.
Connect to the data
After running the data pipeline (data-raw/download_all.R
through data-raw/harmonize_consumption.R), connect to the
harmonised Parquet file:
library(consumptionsurveyindia)
library(dplyr)
library(ggplot2)
con <- ics_connect("~/data/consumption_parquet")What data do we have?
avail <- ics_available_data(con)
avail |>
select(nss_round, round_year, era, n_records, n_items, n_states, description) |>
knitr::kable(
format.args = list(big.mark = ","),
caption = "Available survey rounds in the database"
)| nss_round | round_year | era | n_records | n_items | n_states | description |
|---|---|---|---|---|---|---|
| 43rd | 1,988 | pre2022 | 700,172 | 20 | 31 | NSS Schedule 1.0, 30-day uniform recall |
| 45th | 1,990 | pre2022 | 11,483 | 6 | 0 | NSS Schedule 1.0, 30-day uniform recall |
| 47th | 1,991 | pre2022 | 5,171 | 6 | 32 | NSS Schedule 1.0, 30-day uniform recall |
| 48th | 1,992 | pre2022 | 4,892 | 6 | 32 | NSS Schedule 1.0, 30-day uniform recall |
| 49th | 1,993 | pre2022 | 1,544,269 | 236 | 32 | NSS Schedule 1.0, 30-day uniform recall |
| 50th | 1,994 | pre2022 | 1,896,656 | 85 | 32 | NSS Schedule 1.0, 30-day uniform recall |
| 51st | 1,995 | pre2022 | 2,394,586 | 219 | 32 | NSS Schedule 1.0, 30-day uniform recall |
| 52nd | 1,996 | pre2022 | 195,085 | 21 | 32 | NSS Schedule 1.0, 30-day uniform recall |
| 53rd | 1,997 | pre2022 | 2,466,992 | 219 | 32 | NSS Schedule 1.0, mixed recall |
| 54th | 1,998 | pre2022 | 1,218,197 | 219 | 32 | NSS Schedule 1.0, mixed recall |
| 55th | 2,000 | pre2022 | 5,049,897 | 177 | 32 | NSS Schedule 1.0, mixed recall |
| 56th | 2,001 | pre2022 | 719,734 | 24 | 35 | NSS Schedule 1.0, mixed recall |
| 57th | 2,002 | pre2022 | 3,320,540 | 158 | 30 | NSS Schedule 1.0, mixed recall |
| 58th | 2,002 | pre2022 | 1,507,939 | 175 | 35 | NSS Schedule 1.0, mixed recall |
| 59th | 2,003 | pre2022 | 1,524,908 | 157 | 35 | NSS Schedule 1.0, mixed recall |
| 60th | 2,004 | pre2022 | 1,026,092 | 158 | 28 | NSS Schedule 1.0, mixed recall |
| 61st | 2,005 | pre2022 | 4,551,390 | 158 | 28 | NSS Schedule 1.0, mixed recall |
| 62nd | 2,006 | pre2022 | 1,509,725 | 160 | 28 | NSS Schedule 1.0, mixed recall |
| 63rd | 2,007 | pre2022 | 3,110,721 | 177 | 35 | NSS Schedule 1.0, mixed recall |
| 64th | 2,008 | pre2022 | 2,751,946 | 196 | 35 | NSS Schedule 1.0, mixed recall |
| 66th-T1 | 2,010 | pre2022 | 5,310,709 | 189 | 35 | NSS Schedule 1.0, mixed recall |
| 66th-T2 | 2,010 | pre2022 | 4,813,463 | 189 | 35 | NSS Schedule 1.0, mixed recall |
| 68th-T1 | 2,012 | pre2022 | 5,763,152 | 183 | 0 | NSS Schedule 1.0, mixed recall |
| 68th-T2 | 2,012 | pre2022 | 5,277,850 | 183 | 35 | NSS Schedule 1.0, mixed recall |
| HCES-2022-23 | 2,023 | post2022 | 13,694,415 | 185 | 36 | HCES three-visit design (FDQ+CSQ+DGQ) |
| HCES-2023-24 | 2,024 | post2022 | 14,511,701 | 183 | 36 | HCES three-visit design (FDQ+CSQ+DGQ) |
| NA | NA | NA | 543,859 | 0 | 0 | NSS Schedule 1.0, mixed recall |
We have 27 rounds spanning 1988 to 2024, with 85,425,544 total item-household observations.
Browse the item catalogue
The package classifies food items into 13 groups based on the HCES questionnaire structure:
ics_food_groups() |>
knitr::kable(caption = "Food group categories with item counts and recall periods")| category | n_items | recall_days | sections |
|---|---|---|---|
| Beverages | 10 | 7 | 6.8 |
| Cereals | 39 | 30 | 5.1 |
| Edible Oil | 10 | 7 | 6.6 |
| Egg, Fish & Meat | 8 | 7 | 6.5 |
| Fruits (Dry) | 11 | 7 | 6.4 |
| Fruits (Fresh) | 20 | 7 | 6.3 |
| Milk & Milk Products | 14 | 7 | 6.1 |
| Packaged Processed Food | 14 | 7 | 7.2 |
| Pulses | 15 | 30 | 5.2 |
| Served Processed Food | 7 | 7 | 7.1 |
| Spices | 13 | 7 | 6.7 |
| Sugar & Salt | 10 | 30 | 5.3 |
| Vegetables | 18 | 7 | 6.2 |
Search for specific items:
ics_search_items("rice")
#> # A tibble: 7 × 5
#> item_code item_name category section recall_days
#> <chr> <chr> <chr> <dbl> <int>
#> 1 061 Rice (free/PMGKAY) Cereals 5.1 30
#> 2 101 Rice (PDS) Cereals 5.1 30
#> 3 102 Rice (other sources) Cereals 5.1 30
#> 4 103 Chira (flattened rice) Cereals 5.1 30
#> 5 105 Muri (puffed rice) Cereals 5.1 30
#> 6 106 Other rice products Cereals 5.1 30
#> 7 015 Noodles (cup/rice noodles) Packaged Processed F… 7.2 7
ics_search_items("chicken")
#> # A tibble: 1 × 5
#> item_code item_name category section recall_days
#> <chr> <chr> <chr> <dbl> <int>
#> 1 195 Chicken Egg, Fish & Meat 6.5 7
ics_search_items("oil")
#> # A tibble: 10 × 5
#> item_code item_name category section recall_days
#> <chr> <chr> <chr> <dbl> <int>
#> 1 075 Edible oil (free/PMGKAY) Edible … 6.6 7
#> 2 095 Edible oil: others Edible … 6.6 7
#> 3 181 Mustard oil Edible … 6.6 7
#> 4 182 Groundnut oil Edible … 6.6 7
#> 5 183 Coconut oil Edible … 6.6 7
#> 6 184 Refined oil (sunflower/soyabean/palm/… Edible … 6.6 7
#> 7 185 Other edible oil Edible … 6.6 7
#> 8 188 Edible oil (PDS) Edible … 6.6 7
#> 9 189 Edible oil: sub-total Edible … 6.6 7
#> 10 260 Oilseeds Spices 6.7 7Query a specific item across all years
Track rice expenditure from 1993 to 2024:
rice <- ics_query_item(con, "rice")
rice |> knitr::kable(digits = 1, format.args = list(big.mark = ","))| nss_round | round_year | mean_value | mean_qty | total_value | n | item |
|---|---|---|---|---|---|---|
| 49th | 1,993 | 33.1 | 173.1 | 50,609,812 | 29,892 | rice |
| 53rd | 1,997 | 19.2 | 161.9 | 3,855,658,939 | 59,469 | rice |
| 54th | 1,998 | 17.7 | 156.5 | 3,620,969,646 | 30,261 | rice |
| 55th | 2,000 | 225.5 | 235.4 | 59,977,189,749 | 168,712 | rice |
| 58th | 2,002 | 196.5 | 20.2 | 58,388,560,858 | 46,438 | rice |
| 57th | 2,002 | 251.4 | 26.8 | 57,773,289,294 | 97,681 | rice |
| 59th | 2,003 | 258.3 | 27.4 | 578,304,659 | 44,559 | rice |
| 61st | 2,005 | 250.1 | 25.6 | 574,532,178 | 139,041 | rice |
| 62nd | 2,006 | 250.9 | 24.8 | 598,991,847 | 43,600 | rice |
| 63rd | 2,007 | 201.7 | 18.9 | 672,553,646 | 96,870 | rice |
| 64th | 2,008 | 215.5 | 18.3 | 712,737,285 | 75,037 | rice |
| 66th-T1 | 2,010 | 251.6 | 16.7 | 901,025,182 | 157,103 | rice |
| 66th-T2 | 2,010 | 252.3 | 16.8 | 894,725,004 | 155,500 | rice |
| 68th-T2 | 2,012 | 246.3 | 15,082.9 | 1,031,823,976 | 170,515 | rice |
| HCES-2022-23 | 2,023 | 242.2 | 9.1 | 126,369,772,982 | 497,625 | rice |
| HCES-2023-24 | 2,024 | 273.9 | 7.8 | 148,946,675,703 | 507,796 | rice |
if (nrow(rice) > 0) {
ggplot(rice, aes(round_year, mean_value)) +
geom_line(linewidth = 1, colour = "#E69F00") +
geom_point(size = 3, colour = "#E69F00") +
labs(
title = "Rice: Mean Expenditure Across Survey Rounds",
x = "Survey year", y = "Weighted mean expenditure (Rs.)"
) +
theme_minimal(base_size = 12)
}
Query by food group
shares <- ics_expenditure_shares(con)
if (nrow(shares) > 0) {
shares |>
filter(!is.na(food_group)) |>
group_by(food_group) |>
filter(n() >= 3) |>
ungroup() |>
ggplot(aes(round_year, spending_share, colour = food_group)) +
geom_line(linewidth = 0.8) +
geom_point(size = 2) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Food Group Expenditure Shares Over Time",
x = "Survey year", y = "Share of total food expenditure",
colour = "Food group"
) +
theme_minimal(base_size = 11) +
theme(legend.position = "right")
}
Query by state
state_exp <- ics_consumption_by_state(con,
round = "HCES-2022-23",
min_obs = 100
)
if (nrow(state_exp) > 0) {
state_exp |>
filter(!is.na(state_name)) |>
arrange(desc(mean_value)) |>
head(20) |>
ggplot(aes(reorder(state_name, mean_value), mean_value)) +
geom_col(fill = "#2171B5") +
coord_flip() +
labs(
title = "Mean Food Expenditure by State (HCES 2022-23)",
x = NULL, y = "Mean expenditure (Rs.)"
) +
theme_minimal(base_size = 11)
}
Cleanup
ics_disconnect(con)