2020 have seen yusho (top division champion title) won by rank-and-file maegashira Tokushoryu and Terunofuji (in January and July, respectively). How unusual is this?
We’ll be working with another data set from sumo data collection.
Import results.csv with hard-coded column types:
library(tidyverse)
df <- read_csv(
"results.csv",
col_types = "ciiccciciccci"
)
head(df)
## # A tibble: 6 x 13
## basho day rikishi1_id rikishi1_rank rikishi1_shikona rikishi1_result
## <chr> <int> <int> <chr> <chr> <chr>
## 1 1983~ 1 4140 J13w Chikubayama 0-1 (7-8)
## 2 1983~ 1 4306 Ms1e Ofuji 1-0 (6-1)
## 3 1983~ 1 1337 J12w Tochitsukasa 1-0 (9-6)
## 4 1983~ 1 4323 J13e Shiraiwa 0-1 (3-12)
## 5 1983~ 1 4097 J12e Tamakiyama 0-1 (8-7)
## 6 1983~ 1 4319 J11w Harunafuji 1-0 (5-10)
## # ... with 7 more variables: rikishi1_win <int>, kimarite <chr>,
## # rikishi2_id <int>, rikishi2_rank <chr>, rikishi2_shikona <chr>,
## # rikishi2_result <chr>, rikishi2_win <int>
In this data set, every bout is represented by two rows. If A fought B, there’ll be one row where A is rikishi 1 and B is rikishi 2 (with corresponding values for id/shikona/result/win) and another row where B is rikishi 1 and A is rikishi 2. We’ll focus on rikishi 1, effectively ignoring “reflected” rows.
Same as in banzuke, first one or two letters in the rank column indicate which of sumo divisions the wrestler belongs to:
divisions <- c("Jk", "Jd", "Sd", "Ms", "J", "M", "K", "S", "O", "Y")
df_yusho <- df %>%
# top division
mutate(
division = ordered(
str_extract(
rikishi1_rank,
"^\\D+"
),
levels = divisions
)
) %>%
filter(
division > "J"
) %>%
# number of wins for each tournament/wrestler
group_by(
basho,
id = rikishi1_id,
rank = rikishi1_rank,
division,
shikona = rikishi1_shikona
) %>%
summarise(
wins = sum(
rikishi1_win
),
.groups = "drop"
) %>%
# highest number of wins = champion
group_by(
basho
) %>%
slice_max(
order_by = wins
) %>%
ungroup()
This gives us the winner of every basho (tournament) since 1983:
head(
df_yusho
)
## # A tibble: 6 x 6
## basho id rank division shikona wins
## <chr> <int> <chr> <ord> <chr> <int>
## 1 1983.01 4112 O1w O Kotokaze 15
## 2 1983.03 1354 Y1e Y Chiyonofuji 15
## 3 1983.05 1363 S1e S Hokutenyu 14
## 4 1983.07 4104 O1e O Takanosato 14
## 5 1983.09 4104 Y1w Y Takanosato 15
## 6 1983.11 1354 Y1w Y Chiyonofuji 14
A quick sanity check against Wikipedia (keep in mind, only titles starting from 1983):
df_yusho %>%
group_by(
id
) %>%
summarise(
# shikona can change throughout career
shikona = last(
shikona
),
n = n(),
.groups = "drop"
) %>%
arrange(
-n
)
## # A tibble: 46 x 3
## id shikona n
## <int> <chr> <int>
## 1 1123 Hakuho 44
## 2 878 Asashoryu 25
## 3 1354 Chiyonofuji 24
## 4 2 Takanohana 22
## 5 4 Musashimaru 12
## 6 1 Akebono 11
## 7 1111 Harumafuji 9
## 8 1339 Hokutoumi 8
## 9 1219 Kakuryu 6
## 10 3 Wakanohana 5
## # ... with 36 more rows
Let’s look at the tournaments won by rank-and-file wrestlers (maegashira), only 13 since 1983:
df_yusho %>%
filter(
division == "M"
) %>%
arrange(
desc(basho)
)
## # A tibble: 13 x 6
## basho id rank division shikona wins
## <chr> <int> <chr> <ord> <chr> <int>
## 1 2020.07 11927 M17e M Terunofuji 13
## 2 2020.01 11726 M17w M Tokushoryu 14
## 3 2019.05 12291 M8w M Asanoyama 12
## 4 2018.01 6599 M3w M Tochinoshin 14
## 5 2012.05 41 M7w M Kyokutenho 13
## 6 2001.09 876 M2e M Kotomitsuki 13
## 7 2000.03 18 M14e M Takatoriki 13
## 8 1998.11 12 M12w M Kotonishiki 14
## 9 1992.07 28 M1w M Mitoizumi 13
## 10 1992.01 2 M2e M Takahanada 14
## 11 1991.09 12 M5e M Kotonishiki 13
## 12 1991.07 1303 M13e M Kotofuji 14
## 13 1984.09 1352 M12w M Tagaryu 13
As you can see, recent titles by Tokushoryu and Terunofuji, both ranked at M17, have been unprecedented in recent history.