2020 have seen yusho (top division champion title) won by rank-and-file maegashira Tokushoryu and Terunofuji (in January and July, respectively). How unusual is this?
We’ll be working with another data set from sumo data collection.
Import results.csv
with hard-coded column types:
library(tidyverse)
df <- read_csv(
"results.csv",
col_types = "ciiccciciccci"
)
head(df)
## # A tibble: 6 x 13
## basho day rikishi1_id rikishi1_rank rikishi1_shikona rikishi1_result
## <chr> <int> <int> <chr> <chr> <chr>
## 1 1983~ 1 4140 J13w Chikubayama 0-1 (7-8)
## 2 1983~ 1 4306 Ms1e Ofuji 1-0 (6-1)
## 3 1983~ 1 1337 J12w Tochitsukasa 1-0 (9-6)
## 4 1983~ 1 4323 J13e Shiraiwa 0-1 (3-12)
## 5 1983~ 1 4097 J12e Tamakiyama 0-1 (8-7)
## 6 1983~ 1 4319 J11w Harunafuji 1-0 (5-10)
## # ... with 7 more variables: rikishi1_win <int>, kimarite <chr>,
## # rikishi2_id <int>, rikishi2_rank <chr>, rikishi2_shikona <chr>,
## # rikishi2_result <chr>, rikishi2_win <int>
In this data set, every bout is represented by two rows. If A fought B, there’ll be one row where A is rikishi 1
and B is rikishi 2
(with corresponding values for id/shikona/result/win) and another row where B is rikishi 1
and A is rikishi 2
. We’ll focus on rikishi 1
, effectively ignoring “reflected” rows.
Same as in banzuke, first one or two letters in the rank
column indicate which of sumo divisions the wrestler belongs to:
divisions <- c("Jk", "Jd", "Sd", "Ms", "J", "M", "K", "S", "O", "Y")
df_yusho <- df %>%
# top division
mutate(
division = ordered(
str_extract(
rikishi1_rank,
"^\\D+"
),
levels = divisions
)
) %>%
filter(
division > "J"
) %>%
# number of wins for each tournament/wrestler
group_by(
basho,
id = rikishi1_id,
rank = rikishi1_rank,
division,
shikona = rikishi1_shikona
) %>%
summarise(
wins = sum(
rikishi1_win
),
.groups = "drop"
) %>%
# highest number of wins = champion
group_by(
basho
) %>%
slice_max(
order_by = wins
) %>%
ungroup()
This gives us the winner of every basho (tournament) since 1983:
head(
df_yusho
)
## # A tibble: 6 x 6
## basho id rank division shikona wins
## <chr> <int> <chr> <ord> <chr> <int>
## 1 1983.01 4112 O1w O Kotokaze 15
## 2 1983.03 1354 Y1e Y Chiyonofuji 15
## 3 1983.05 1363 S1e S Hokutenyu 14
## 4 1983.07 4104 O1e O Takanosato 14
## 5 1983.09 4104 Y1w Y Takanosato 15
## 6 1983.11 1354 Y1w Y Chiyonofuji 14
A quick sanity check against Wikipedia (keep in mind, only titles starting from 1983):
df_yusho %>%
group_by(
id
) %>%
summarise(
# shikona can change throughout career
shikona = last(
shikona
),
n = n(),
.groups = "drop"
) %>%
arrange(
-n
)
## # A tibble: 46 x 3
## id shikona n
## <int> <chr> <int>
## 1 1123 Hakuho 44
## 2 878 Asashoryu 25
## 3 1354 Chiyonofuji 24
## 4 2 Takanohana 22
## 5 4 Musashimaru 12
## 6 1 Akebono 11
## 7 1111 Harumafuji 9
## 8 1339 Hokutoumi 8
## 9 1219 Kakuryu 6
## 10 3 Wakanohana 5
## # ... with 36 more rows
Let’s look at the tournaments won by rank-and-file wrestlers (maegashira), only 13 since 1983:
df_yusho %>%
filter(
division == "M"
) %>%
arrange(
desc(basho)
)
## # A tibble: 13 x 6
## basho id rank division shikona wins
## <chr> <int> <chr> <ord> <chr> <int>
## 1 2020.07 11927 M17e M Terunofuji 13
## 2 2020.01 11726 M17w M Tokushoryu 14
## 3 2019.05 12291 M8w M Asanoyama 12
## 4 2018.01 6599 M3w M Tochinoshin 14
## 5 2012.05 41 M7w M Kyokutenho 13
## 6 2001.09 876 M2e M Kotomitsuki 13
## 7 2000.03 18 M14e M Takatoriki 13
## 8 1998.11 12 M12w M Kotonishiki 14
## 9 1992.07 28 M1w M Mitoizumi 13
## 10 1992.01 2 M2e M Takahanada 14
## 11 1991.09 12 M5e M Kotonishiki 13
## 12 1991.07 1303 M13e M Kotofuji 14
## 13 1984.09 1352 M12w M Tagaryu 13
As you can see, recent titles by Tokushoryu and Terunofuji, both ranked at M17, have been unprecedented in recent history.