2 views

I'm in the process of writing a bot that places bets on the website Betfair using their Python API. I want to place bets on football (soccer) matches when they are in-play.

I've coded an XML feed to give me live data from the games, however, the XML feed doesn't always use the same name for football teams as Betfair use.

For example, when referring to Manchester United Betfair might use "Man Utd", whilst the XML feed might use "Man United" or some other variant. I am not limited to popular markets, so building up a standard Betfair to XML name conversion table isn't feasible.

I'm trying to use some kind of probabilistic string matching to give me some indication that the two data sources are referring to the same teams.

So far I've played with Reverend which seems to do some Bayesian calculations, however, I don't think I'm using it properly as I have to break the string down into characters to train the guesser. I then simply average the probability that each letter is associated with each name, I'm aware this is mathematically incorrect but I thought it could be a feasible heuristic test.

Here is my code:

import scorefeed

from reverend.thomas import Bayes guesser = Bayes() teams=['home','away']

def train(team_no, name):

for char in name: guesser.train(teams[team_no], char) def untrain(team_no, name):

for char in name: guesser.untrain(teams[team_no], char) def guess(name):

home_guess = 0.0

away_guess = 0.0

for char in name:

if len(guesser.guess(char)) > 0:

for guess in guesser.guess(char):

if guess[0] == teams[0]:

home_guess = home_guess + guess[1]

print home_guess

if guess[0] == teams[1]:

away_guess = away_guess + guess[1]

print away_guess home_guess = home_guess / float(len(name)) away_guess = away_guess / float(len(name)) probs = [home_guess, away_guess] return probs def game_match(betfair_game_string, feed_home, feed_away):

home_team = betfair_game_string[0:away_team = betfair_game_string[betfair_game_string.find('V')+2:len(betfair_game_string)]

train(0, home_team)

train(1, away_team)

probs = [] probs.append(guess(feed_home)[0]) probs.append(guess(feed_away)[1])

untrain(0, home_team)

untrain(1, away_team)

return probs

print game_match("Man Utd V Lpool", "Manchester United", "Liverpool")

The probability produced with the current setup is [0.4705411764705883, 0.5555]. I would be really grateful for any ideas or improvements.

EDIT: I've had another thought, I want the probability that it is the same match on Betfair and the feed. But this gives me the probability that the first name matches, and that the second name matches. I need to find the probability that the first AND second names match. I have therefore coded up the following function which seems to give me more reasonable results:

def prob_match(probs):

prob_not_home = 1.0 - probs[0]

prob_not_away = 1.0 - probs[1]

prob_not_home_and_away = prob_not_home*prob_not_away prob_home_and_away = 1.0 - prob_not_home_and_away return prob_home_and_away

I would still appreciate any suggestions for different methods or recommendations of existing libraries that do the same thing, or tips on correcting my probability calculations.

by (108k points)

Here is a link that defines the Match - Probabilistic Entity Detection and Matching using Python Programming.

The String-Matching brings common-sense entity detection and matching to python. In this project the Match is:

• Fast

• Lightweight (no heavy dependencies)