4 min read

Best comeback

library(tidyverse)

Grand sumo tournaments last 15 days. First thing every wrestler wants is to achieve kachikoshi — more wins than losses.

I was curious to see if a wrestler with a 0-7 record (seven straight losses) after the first 7 days has ever managed a miraculous comeback to 8-7 by the end of the tournament.

We’ll be using results.csv from my sumo dataset, which contains all results in the top two divisions since January 1983 (far from complete history of modern sumo):

results <- read_csv(
    "results.csv",
    col_types = "ciiccciciccci"
)
glimpse(results)
## Rows: 218,824
## Columns: 13
## $ basho            <chr> "1983.01", "1983.01", "1983.01", "1983.01", "1983....
## $ day              <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ rikishi1_id      <int> 4140, 4306, 1337, 4323, 4097, 4319, 4109, 4129, 41...
## $ rikishi1_rank    <chr> "J13w", "Ms1e", "J12w", "J13e", "J12e", "J11w", "J...
## $ rikishi1_shikona <chr> "Chikubayama", "Ofuji", "Tochitsukasa", "Shiraiwa"...
## $ rikishi1_result  <chr> "0-1 (7-8)", "1-0 (6-1)", "1-0 (9-6)", "0-1 (3-12)...
## $ rikishi1_win     <int> 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1,...
## $ kimarite         <chr> "yorikiri", "yorikiri", "oshidashi", "oshidashi", ...
## $ rikishi2_id      <int> 4306, 4140, 4323, 1337, 4319, 4097, 4129, 4109, 41...
## $ rikishi2_rank    <chr> "Ms1e", "J13w", "J13e", "J12w", "J11w", "J12e", "J...
## $ rikishi2_shikona <chr> "Ofuji", "Chikubayama", "Shiraiwa", "Tochitsukasa"...
## $ rikishi2_result  <chr> "1-0 (6-1)", "0-1 (7-8)", "0-1 (3-12)", "1-0 (9-6)...
## $ rikishi2_win     <int> 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0,...

Each bout corresponds to two rows, with rikishi1 and rikishi2 columns swapped around, so it’s enough to only consider rikishi1 columns.

Column rikishi1_result contains the wrestler’s record at the end of the corresponding day, as well as at the end of the tournament (in brackets). By looking at rows with day == 7 we can extract the number of wins both after Day 7 and Day 15:

df <- results %>% 
    filter(
        day == 7
    ) %>% 
    mutate(
        day7 = as.integer(
            str_match(
                rikishi1_result,
                "^(\\d+)-"
            )[,2]
        ),
        day15 = as.integer(
            str_match(
                rikishi1_result,
                "\\((\\d+)-"
            )[,2]
        )
    )

Let’s do a sanity check. Number of wins after Day 7 has a nice normal-looking distribution:

df %>% 
    ggplot(
        aes(
            day7
        )
    ) +
    geom_histogram(
        binwidth = 1,
        colour = "black",
        fill = "white"
    ) +
    labs(
        title = "Wins after 7 days",
        x = NULL,
        y = NULL
    ) +
    scale_x_continuous(
        breaks = 0:7
    ) +
    scale_y_continuous(
        expand = expansion(
            mult = c(0, .05)
        )
    ) +
    theme_bw() +
    theme(
        panel.grid = element_blank()
    )

Number of wins after Day 15 has a spike at 8, which makes sense — that’s the kachikoshi. If you’re a wrestler with a 7-7 record after Day 14, you’re fighting for your life on Day 15. In contrast, if you’re at 8-6, having already secured a kachikoshi, you can take it easy on the last day:

df %>% 
    ggplot(
        aes(
            day15
        )
    ) +
    geom_histogram(
        binwidth = 1,
        colour = "black",
        fill = "white"
    ) +
    labs(
        title = "Wins after 15 days",
        x = NULL,
        y = NULL
    ) +
    scale_x_continuous(
        breaks = 0:15
    ) +
    scale_y_continuous(
        expand = expansion(
            mult = c(0, .05)
        )
    ) +
    theme_bw() +
    theme(
        panel.grid = element_blank()
    )

Now, let’s plot wins after 7 days against wins after 15 days:

df %>% 
    count(
        day7,
        day15
    ) %>% 
    ggplot(
        aes(
            day7,
            day15,
            fill = n
        )
    ) +
    coord_flip() +
    geom_tile(
        colour = "white"
    ) +
    geom_text(
        aes(
            label = n
        ),
        colour = "white"
    ) +
    labs(
        title = "Wins",
        x = "after 7 days",
        y = "after 15 days"
    ) +
    scale_x_continuous(
        breaks = 0:7,
        expand = expansion()
    ) +
    scale_y_continuous(
        breaks = 0:15,
        expand = expansion()
    ) +
    theme_bw() +
    theme(
        legend.position = "none",
        panel.grid = element_blank()
    )

So, a wrestler with 0 wins after 7 days (bottom row) has never (since January 1983) got 8 wins in the following 8 days. On seven occasions, the wrestler managed to get to 7-8, but that’s not exactly a comeback, as it’s still a losing record.

The next best thing is a comeback from 1-6 to 9-6, which has happened five times:

df %>% 
    filter(
        day7 == 1,
        day15 == 9
    )
## # A tibble: 5 x 15
##   basho   day rikishi1_id rikishi1_rank rikishi1_shikona rikishi1_result
##   <chr> <int>       <int> <chr>         <chr>            <chr>          
## 1 2000~     7          39 K1w           Wakanosato       1-6 (9-6)      
## 2 2006~     7        2818 M3e           Tokitenku        1-6 (9-6)      
## 3 2009~     7          41 M1w           Kyokutenho       1-6 (9-6)      
## 4 2012~     7        9079 J3e           Kotoyuki         1-6 (9-6)      
## 5 2019~     7       12239 M1e           Hokutofuji       1-6 (9-6)      
## # ... with 9 more variables: rikishi1_win <int>, kimarite <chr>,
## #   rikishi2_id <int>, rikishi2_rank <chr>, rikishi2_shikona <chr>,
## #   rikishi2_result <chr>, rikishi2_win <int>, day7 <int>, day15 <int>

Always liked Hokutofuji’s spirit.