4 min read

Low-ranked champions

2020 have seen yusho (top division champion title) won by rank-and-file maegashira Tokushoryu and Terunofuji (in January and July, respectively). How unusual is this?

We’ll be working with another data set from sumo data collection.

Import results.csv with hard-coded column types:

library(tidyverse)
df <- read_csv(
    "results.csv",
    col_types = "ciiccciciccci"
)
head(df)
## # A tibble: 6 x 13
##   basho   day rikishi1_id rikishi1_rank rikishi1_shikona rikishi1_result
##   <chr> <int>       <int> <chr>         <chr>            <chr>          
## 1 1983~     1        4140 J13w          Chikubayama      0-1 (7-8)      
## 2 1983~     1        4306 Ms1e          Ofuji            1-0 (6-1)      
## 3 1983~     1        1337 J12w          Tochitsukasa     1-0 (9-6)      
## 4 1983~     1        4323 J13e          Shiraiwa         0-1 (3-12)     
## 5 1983~     1        4097 J12e          Tamakiyama       0-1 (8-7)      
## 6 1983~     1        4319 J11w          Harunafuji       1-0 (5-10)     
## # ... with 7 more variables: rikishi1_win <int>, kimarite <chr>,
## #   rikishi2_id <int>, rikishi2_rank <chr>, rikishi2_shikona <chr>,
## #   rikishi2_result <chr>, rikishi2_win <int>

In this data set, every bout is represented by two rows. If A fought B, there’ll be one row where A is rikishi 1 and B is rikishi 2 (with corresponding values for id/shikona/result/win) and another row where B is rikishi 1 and A is rikishi 2. We’ll focus on rikishi 1, effectively ignoring “reflected” rows.

Same as in banzuke, first one or two letters in the rank column indicate which of sumo divisions the wrestler belongs to:

divisions <- c("Jk", "Jd", "Sd", "Ms", "J", "M", "K", "S", "O", "Y")

df_yusho <- df %>% 
    # top division
    mutate(
        division = ordered(
            str_extract(
                rikishi1_rank,
                "^\\D+"
            ),
            levels = divisions
        )
    ) %>% 
    filter(
        division > "J"
    ) %>% 
    # number of wins for each tournament/wrestler
    group_by(
        basho,
        id = rikishi1_id,
        rank = rikishi1_rank,
        division,
        shikona = rikishi1_shikona
    ) %>% 
    summarise(
        wins = sum(
            rikishi1_win
        ),
        .groups = "drop"
    ) %>% 
    # highest number of wins = champion
    group_by(
        basho
    ) %>% 
    slice_max(
        order_by = wins
    ) %>% 
    ungroup()

This gives us the winner of every basho (tournament) since 1983:

head(
    df_yusho
)
## # A tibble: 6 x 6
##   basho      id rank  division shikona      wins
##   <chr>   <int> <chr> <ord>    <chr>       <int>
## 1 1983.01  4112 O1w   O        Kotokaze       15
## 2 1983.03  1354 Y1e   Y        Chiyonofuji    15
## 3 1983.05  1363 S1e   S        Hokutenyu      14
## 4 1983.07  4104 O1e   O        Takanosato     14
## 5 1983.09  4104 Y1w   Y        Takanosato     15
## 6 1983.11  1354 Y1w   Y        Chiyonofuji    14

A quick sanity check against Wikipedia (keep in mind, only titles starting from 1983):

df_yusho %>% 
    group_by(
        id
    ) %>% 
    summarise(
        # shikona can change throughout career
        shikona = last(
            shikona
        ),
        n = n(),
        .groups = "drop"
    ) %>% 
    arrange(
        -n
    )
## # A tibble: 46 x 3
##       id shikona         n
##    <int> <chr>       <int>
##  1  1123 Hakuho         44
##  2   878 Asashoryu      25
##  3  1354 Chiyonofuji    24
##  4     2 Takanohana     22
##  5     4 Musashimaru    12
##  6     1 Akebono        11
##  7  1111 Harumafuji      9
##  8  1339 Hokutoumi       8
##  9  1219 Kakuryu         6
## 10     3 Wakanohana      5
## # ... with 36 more rows

Let’s look at the tournaments won by rank-and-file wrestlers (maegashira), only 13 since 1983:

df_yusho %>% 
    filter(
        division == "M"
    ) %>% 
    arrange(
        desc(basho)
    )
## # A tibble: 13 x 6
##    basho      id rank  division shikona      wins
##    <chr>   <int> <chr> <ord>    <chr>       <int>
##  1 2020.07 11927 M17e  M        Terunofuji     13
##  2 2020.01 11726 M17w  M        Tokushoryu     14
##  3 2019.05 12291 M8w   M        Asanoyama      12
##  4 2018.01  6599 M3w   M        Tochinoshin    14
##  5 2012.05    41 M7w   M        Kyokutenho     13
##  6 2001.09   876 M2e   M        Kotomitsuki    13
##  7 2000.03    18 M14e  M        Takatoriki     13
##  8 1998.11    12 M12w  M        Kotonishiki    14
##  9 1992.07    28 M1w   M        Mitoizumi      13
## 10 1992.01     2 M2e   M        Takahanada     14
## 11 1991.09    12 M5e   M        Kotonishiki    13
## 12 1991.07  1303 M13e  M        Kotofuji       14
## 13 1984.09  1352 M12w  M        Tagaryu        13

As you can see, recent titles by Tokushoryu and Terunofuji, both ranked at M17, have been unprecedented in recent history.