Here are the blog/news entries related to the sports prediction programs, Predict and Predict_Games. 11/13/2007 Note: This blog is discontinued. I have a new Wordpress blog with a link in the top section of the first page. There will be no more entries in this blog. 8/12/2007 I figured out the noise problem in the parameter sweep, so that works better now. I'm seeing a little too much variability in the sweeps where I optimize for NFL point_spread_success, so I won't be pretend wagering on NFL until a better set of control variables exist. College football, however, is tweaked out to 76% success rate so I'm hoping for an actual rate of 60% or better. That's in the point spread bet as well. The new site is up! We've gotten traffic of 20 unique visitors per day since the site opened a few days ago, and it seems to be growing. I'll keep working on the site for a little while until i get inspiration to do something else. By the way, I'm responsive to user requests so if you want a new feature, send email. 8/5/2007 Yesterday I spent about ten or twelve hours programming and added a very nice feature to the Predict program. It is a parameter sweep. Now you can sweep any control variable in up to 10 data points as a +/- percent range. This will save you tedious hours of sweeping control variables manually. Just go to the Parameter sweep section and adjust the pulldown menus, then hit run. If you have set memory to a large value and you use all the points of the sweep, then you may run into timeout problems with the server/browser setup, but overall it works quite nicely. The output of the sweep is a data table and an ascii plot that illustrates the data table. Unfortunately at this time there are some noise problems and the data should be taken as a general indicator not an exact one. However I hope to clear up the noise issue with a data destructor in the for loop. Good luck with it. 7/29/2007 Now that the new program, Predict, is beginning to stabilize, I thought it would be a good idea to make a table for users to refer to wehen choosing a wagering tool. Not all bets in all sports are well predicted by either of my two programs, and there is no data for some of the bets. I have begun to collect complete data but I do not know where to get past data on odds. In the table I put the program that is best at predicting each bet, Predict_Games is the old program and Predict is the new one. So have a look at the table and choose your tool. NFL MLB NBA WNBA NCAAFB NCAABK Money Line no data Predict no data Predict no data no data Point Spread Predict_Games no data neither Predict_Games Predict_Games neither Over/Under neither neither neither both neither neither 7/29/2007 A lot has happened with the new program. I continue to work on removing errors and tuning. Recently I have written and ASCII plotting module and it allows me to add plots to the end of Predict's output. This is very handy for tuning. For example, I can now optimize the prob_multiplier control variable independently. I created a plot of the vegas point spread versus the rating difference. This plot looks like a cloud of dots, but there is a discernable slope to it. Calculating that slope and multiplying by -400 tells me exactly what value to put in prob_multiplier. Actually I don't just use that value, I try some neighboring values and pick the best one. For example in NFL the number came up to be 27 and I found that 29 worked best. This is a significant improvement in tuning the point spread bet. Similarly, the table columns of success percentages are helpful as well. I just tune the k_factor to minimize the lower right number and I have the right k_factor. This made a big difference in tuning NFL. Also it is important to note that the database must not contain all-star games in order to do this propoerly. 7/18/2007 I have been working feverishly on the new software and the new web page. Both of the programs now have a web interface so once I get the page on a webserver, anyone can access the Predict Games program without having to do a complicated installation and mucking around with data files. I have also designed a neat new web page, my first use of tables to position things on the page. It will be called www.fatcatodds.com, an easy to remember domain name. The whole site is designed for viewing on the iPhone so Gamblers with an iPhone can access the programs in the Gambler's Toolbox inconspicuously and universally. I have also written detailed articles about the programs and how they work and how to use them, so fear not, that poor documentation has been expanded. 7/18/2007 Baseball: too good to be true! I found some problems with the way that I was calculating the money line that were driving up the baseball success percentage, and also the same thing for the over/under, so baseball is not quite the panacea I thought it was. There is hope, however, here is what the program reports for the baseball money line: Success rate of money line wagers = 41.0% of 690 bets with balance of $148.50 That's a modest profit for that number of ten dollar bets, but it is a profit. More importantly, the money line parlays are looking pretty good. I just started paper testing them yesterday, with a two game parlay, a three game parlay, and a four game parlay. The two and three game parlays both hit, returning a total of $76 on $30 risked, not bad at all! Of course that may be just beginner's luck but you can be sure I will be paper testing the baseball money line parlays. 7/3/2007 Baseball: too good to be true? "Baseball been very very good to me." Remember that old skit from Saturday Night Live in the late 70's? Well, that's how I feel when I look at the baseball money line history plot for this season! Or it would have been good to me if I had pretend wagered the money line on it three months ago. Looking at the plot, which I will post on the main page, we can ignore the wiggly stuff in the beginning because that's an initial time when the program doesn't know much yet due to lack of historical data, and we can concentrate more on the phenomenal upward trend of the plot. It's so good, I wonder if it's too good to be true. And it could be. I'm just beginning to understand how the money line works and its possible that I got the math wrong. Also, I changed a variable called the k-factor from 32 down to 8 to slow the speed at which the rating system learns, just for baseball only, because it seemed to improve the money line initially. This may be deceptive. it has the effect of squeezing the team ratings together so that the odds of winning are rarely below 40% or above 60%. That could be a mistake or could be a characteristic of baseball, I'm not sure. Also in the middle of the rise are some nasty losing streaks. If I had jumped into the game just before one of those downhill slides it would have cleaned out my entire account balance! For these reasons I think I'll play around with it a lot more and think it through before pretend wagering on baseball money lines. Another encouraging fact about the program's response to baseball is that the over/under wager is at a very good 64% success rate with 293 more won bets than lost, translating into over $1,000 profit with five dollar bets for the season so far. Again, this sounds too good to be true so I'll keep an eye on it for a while. In fact, i think I'll make an over/under history plot like the attached one to see if there are traps in there as well. I'm going to play around with that k-factor thingie and double check all my math and coding for a while because I'm not sure everything is kosher, but if it is we could be looking at some serious pretend wagering profits from baseball! Wow, it's like the fun never ends with this hobby! 7/1/2007 I have released version 1.1 of the program and it includes all of the features that I have been discussing in this blog to date. So if you're interested in running it you can. 7/1/2007 Now that the program has some new features, it is a good time to look at the predicted success rates for the various sports. These vary widely depending on bet type and sport, so I have listed them below. NFL money line percentage = 63.1% point spread percentage = 66.7% over/under percentage = 52.7% NCAAFB money line percentage = 73.7% point spread percentage = 72.9% over/under percentage = 50.0% NBA money line percentage = 67.2% point spread percentage = 50.0% over/under percentage = 51.1% NCAABK money line percentage = 72.0% point spread percentage = 57.8% over/under percentage = 52.2% WNBA money line percentage = 61.5% point spread percentage = 74.1% over/under percentage = 67.7% The first thing to notice is the money line percentages. The money line is the bet where you state which team will win regardless of the score, which is the way most friendly wagers and office pools are bet. Looking at the list we see that both NCAAFB and NCAABK have over 70% money line percentage which indicates that the program would be most suitable for wagering on college ball. In fact, a new feature of the program is that given two teams in a predict game (both scores set to zero), the program will tell you what the percentage chance is that the favored team will win. Next we have the point spread wager, which is the primary design and scope of the program. For this bet we can see that NFL is reasonable, but the real profit is in NCAAFB and WNBA. I have been following WNBA lately since the season is now, and the actual wagers are indicative of this percentage (since program improvements). Finally there is the over/under bet. This one shows profit potential only for WNBA. So far in following the games I have found that the actual performance is 57%, not 67.7% which appears to be typical... actual success rates are usually lower than predicted success rates by about 10%. 7/1/2007 I have figured out an equation that describes the average long term profit for a given actual success percentage. The formula is: P = N * B * (S * (2+V) / (1+V) - 1) where P is the total profit N is the number of bets made B is the bet ammount S is the success fraction V is the vigorish fraction Now let's consider an example wagering account. After 157 bets the account balance is $187.45 and we have kept track of the success percentage which is 55%. That may seem low, but you only need 52.38% to break even with the vigorish and anything above that is profit. The vigorish is 10% and the bet ammount is $5.00, so we have all we need for the formula. P = 157 * $5.00 * (0.55 * (2+0.1) / (1+0.1) - 1) P = $39.25 Now the initial deposit on the account was $100 and there was a $50 bonus so we subtract that from the total, leaving $37.45, which is approximately equal to $39.25, so there you go the formula works. Now to get a feel for how important success percentage is, we can make a table with N=100, B=$5.00, and V=0.10: S | P --------+--------- 0.5238 | $0.00 0.55 | $39.25 0.60 | $72.73 0.65 | $120.45 0.70 | $168.18 0.75 | $215.91 The real question here is: why is the actual success rate only 55% when the predicted success rate varies from 60% to 65% or more? Part of it is because the predicted success rate is based on past data and future data patterns vary from past data patterns, but why that should make as much as 10% difference, I do not yet know. For the time being I will just continue to improve the program and hope for the best. 6/30/2007 Wow, I have made some major changes and both the success percentages and the number of bet games went up significantly. I decided to add another method, method 2, of selecting the bet games from the confidence factors, and it paid off. First i created a new plot called confidence.pov which shows the internal data of confidence factors as a bunch of dots in 3D space, where the axes are nearest neighbor, gaussian index, and average score difference (the three confidence factors). Then I copied the nearest neighbor code and modified it to find nearest neighbors in this new confidence-space. The result is a data plot that illustrates the space, allowing me to optimize it by adjusting the control variables for the confidence-space nearest neighbor routine. In simpler words, I made a plot that let me see the data and then wrote a new method of selecting bet games from that data. The plot is very complex with dozens to thousands of data points depending on the sport and the number of games being plotted, so I made a movie of it by rotating the camera around in a 360 degree circle - just like the gaussian data movie. I put a labeled set of axes in it and made bet games cubes with nobet games spheres just like in the point spread plot, and there you go - effective data visualization. One thing I learned was that using such tight gaussian functions in the gaussian index was crowding most of the data into one plane which lowered selectability and therefore kept the success rate and number of bet games low. All I had to do was widen those gaussians with a factor of 0.25 instead of 0.05 and the data points jumped off the plane and into the 3D space where they could be more easily discerned by the bet selection algorithm. The result is slightly higher success rates and much higher number of games to bet on. For example, in NCAAFB I found a set of control variables that produced a 70% success rate with 45 bet games, doubling the number of bet games that method 1 could find. Now I'll work on the other sports and see if things get better for them as well. 6/23/2007 Good news - I have improved the sports prediction program by adding two new features. The first is an expected value function that tells you the probability of each team winning. For example the program can now report that Philly will defeat Miami with 59% probability or whatever it is. After learning about this excellent feature of the ELO rating system, I wondered if it might be a better basis for converting ratings into point spreads. Sure enough, when I plugged it in it made the success ratio go up slightly, but more importantly it more than trippled the number of bet games. Of course when you're winning, more bets = more profits, so that is good news indeed! The second feature is a running average for the points scored part of the over/under bet. What you do is you factor in new data rather than summing all the data and dividing by the number of data points to get an average. This allows the program to learn new data and "forget" old data as fast or as slow as you want. I set the control variable for 0.98 and it selected two more over/under victories for a total of 27 this year in WNBA. at 5% return on investment each that's real money! 6/20/2007 After all this time working on two player sports, I finally figured out a good way to adapt the sports prediction program to multiplayer sports such as NASCAR and horse racing. Its quite simple really, all I have to do is put in a separate rating adjustment for each possible combination of competitors in each event and divide by the number of competitors minus one. For example, if we have three competitors A, B, and C then the input file must contain AvsB, AvsC, and BvsC divided by two. First I'll write a data grabber for each sport to get the historical data from the web, which saves a lot of typing and typos. Second I'll write an expander program to convert the results into the individual ratings adjustments in the scores file. Third there are a couple of minor adjustments to be made to the sports prediction program itself, most notably to accomodate the divide by thing. And finally there is the laborious eternity of getting all the data from the web. That's boring and time consuming even with a data grabber. It should be worthwhile, however, since some of my friends are into NASCAR and horse racing, not to mention golf, poker, and all the other multiplayer sporting events. Really the program is not complete without them. Maybe one day I'll rewrite it all in one big program instead of maintaining all these separate little programs and do it in C, but that day is far off as there is plenty of work to be done on the present system. 6/19.2007 The over/under predictions are working better now, thanks to realizing that there should be some dependence on the rating difference of the two teams. If the teams are equally matched then we can expect average performance, but if they are not then it is likely that the stronger team will run up the score, leading to a greater likelyhood of an over event. So I put that feature into the sports prediction program and the self-check on past data showed a dramatic improvement. Interestingly, WNBA and NFL over/unders are more accurate if I remove the offensive and defensive strength factors, however NCAAFB seems to want to have that defensive strength factor left in place. This leads me to hyposthesize that the outcome of a college football game is significantly affected by how good the defenses are of the two competing teams. This is also reflected in the huge score differences that we often see in NCAAFB, with point spreads of 30 points or more. It's obviously due to RUTSing, or running up the score, but how is RUTSing possible? I think it may be because the defense, especially the passing defense at collegiate level is not as well developed as the offense. If a team doesn't have those key defensive players, then they are easy prey to one that does. Maybe so, but I'm no expert on sports myself, so you think about it and decide for yourself. In the meantime I'll continue wagering the over/under to see how well the new formula does. 6/13/2007 I have completed a paper test of the over/under bet and found it to be an excellent 63% success rate! If I had made my small $5 bets this WNBA season for the past 54 games, i would have added a phenomenal $70 to my little bankroll, which is a large percentage gain especially for only a month of 20 minutes a day working. Now that the paper test is complete and successful, I will begin pretend wagering the over/under bet and we will see what happens. 6/11/2007 After the release of my software under the GNU GPL, I have been working on alternate bet types. For example in basketball you can wager the side bet (point spreads), the money line (choose the winner), or the over/under (guess total of team scores). So far the program has only been designed for the side bet, but I have added the money line and the over/under now. I found these to be surprisingly easy to add because the data behind them is already built into the program. For example, the money line is just choosing the winner without regard to point spreads, so i just removed the point spread information for that one. Then I thought about the possibility that some transfer function might exist between the point spread and the money line odds. After plotting some datapoints, sure enough I discovered that the transfer function was none other than the old familiar sigmoid function of 1/(1+e^-x/xo). So I plugged that into a spreadsheet and fit the curve so that the program is now capable of estimating money line odds. For the over/under I used the average team scores plus an estimate of offensive and defensive strength. The offensive strength is calculated from how well a team does in scoring relative to the average score amount, and vice-versa for defensive strength. These statistics, representing actual points scored, are kept separately for each team. So the over/under estimate is just average score + offensive strength - defensive strength, totaled for both teams. I will be paper-testing the two wagers for the next week before deciding to wager on them or not.