mlwave.com
Predicting repeat buyers using purchase history | MLWave
http://mlwave.com/predicting-repeat-buyers-vowpal-wabbit
Predicting repeat buyers using purchase history. April 15, 2014. Another Kaggle contest means another chance to try out Vowpal Wabbit. This time on a data set of nearly 350 million rows. We will discuss feature engineering for the latest Kaggle contest and how to get a top 3 public leaderboard score ( 0.59347 AUC). A short competition description. The competition is to predict repeat buyers. You can download all data. With only a few shoppers. For a short description of the 4 benchmarks. If you check out...
mlwave.com
A Clustered Google Maps of 10k Dutch Traffic Accidents | MLWave
http://mlwave.com/a-clustered-google-maps-of-dutch-traffic-accidents
A Clustered Google Maps of 10k Dutch Traffic Accidents. March 14, 2014. The Open Data Portal is a website made by the Dutch Government. It includes a data set of all registered traffic accidents in the province of North-Holland from 2005 to 2009. We munge the data and place it all on a Google Maps. We use MarkerClusterer to deal with the 10k markers. The Open Data Portal. The Dutch have an Open Data Portal. With data sets. These data sets come from a variety of government institutions. The Netherlands us...
mlwave.com
Detecting Counterfeit Webshops. Part 1: Feature engineering | MLWave
http://mlwave.com/detecting-counterfeit-webshops-part-1-feature-engineering
Detecting Counterfeit Webshops. Part 1: Feature engineering. August 5, 2014. The number of fake webshops is rising. From 2010 to 2012 the Dutch authority on internet scams received 81.000 complaints. Spammers have moved from running their own webshops to hacking websites or registering expired domain names. This makes classification more difficult. Update: New Google Research: The underground market fueling for-profit abuse. Consumer safety at a webshop relies on:. The safety of the payment process.
mlwave.com
Predicting CTR with online machine learning | MLWave
http://mlwave.com/predicting-click-through-rates-with-online-machine-learning
Predicting CTR with online machine learning. June 25, 2014. Good clicklog datasets are hard to come by. Luckily CriteoLabs released a week’s worth of data — a whopping 11GB! 8212; for a new Kaggle contest. The task is to predict the click-through-rate for ads. We will use online machine learning with Vowpal Wabbit to beat the logistic regression benchmark and get a nr. 1 position on the leaderboard. From this contest added to Vowpal Wabbit. Now that this contest is over: Go here. Our team got 29th place.
mlwave.com
Kaggle Connectomics: Python Benchmark Code | MLWave
http://mlwave.com/kaggle-connectomics-python-benchmark-code
Kaggle Connectomics: Python Benchmark Code. March 8, 2014. For the Connectomics contest on Kaggle the task is to write a brain connectivity estimator using neuron activation time series data. Benchmark code for Discretization Pearson Correlation was available in C and Matlab. Now here in Python too! This article is under construction for the duration of the contest. The competition admins have released their own Python with correlation benchmark code. Check out their Github repo. A 5 minute tutorial.
mlwave.com
Human Ensemble Learning | MLWave
http://mlwave.com/human-ensemble-learning
July 20, 2014. Wisdom of the crowds and ensemble machine learning techniques are similar in principle. Could insights in group learning provide insights in machine learning and vice versa? In this article we will touch upon a variety of more (or less) related concepts and try to build an ensemble view of our own. Charles Mackay (1841), Extraordinary Popular Delusions and the Madness of Crowds. Wisdom of the crowds. The concept of Wisdom of the crowds originated with the book. Oinas-Kukkonen (2008 –...
mlwave.com
k-Nearest Neighbors and Clustering on Compressed Binary Files | MLWave
http://mlwave.com/k-nn-clustering-compressed-binary-files-ncd
K-Nearest Neighbors and Clustering on Compressed Binary Files. March 22, 2014. Normalized Compression Distance (Cilibrasi and Vitanyi) returns a similarity measure between binary files. This similarity measure allows for nearest neighbors search, clustering and classification. We are going to try some of these methods and review the results. This is a draft version. Pdf]” contains a formal introduction. If one spam word appears a lot, it will be saved in a dictionary (or compression table), and the compr...
mlwave.com
Movie Review Sentiment Analysis with Vowpal Wabbit | MLWave
http://mlwave.com/movie-review-sentiment-analysis-with-vowpal-wabbit
Movie Review Sentiment Analysis with Vowpal Wabbit. March 11, 2014. Kaggle is hosting another cool knowledge contest, this time it is sentiment analysis on the Rotten Tomatoes Movie Reviews data set. We are going to use Vowpal Wabbit to test the waters and get our first top 10 leaderboard score. The Rotten Tomatoes movie review data set is a corpus of movie reviews used for sentiment analysis. Originally collected by Pang and Lee. Pdf] In their work on sentiment treebanks, Socher et al. 0 – negative.
mlwave.com
Titanic – Machine Learning From Distaster with Vowpal Wabbit | MLWave
http://mlwave.com/tutorial-titanic-machine-learning-from-distaster
Titanic – Machine Learning From Distaster with Vowpal Wabbit. February 25, 2014. Kaggle is hosting a contest where the task is to predict survival rates of people aboard the titanic. A train set is given with a label 1 or 0, denoting ‘survived’ or ‘died’. We are going to use Vowpal Wabbit to get a score of about 0.79426 AUC (top 10%). In this Kaggle contest. Under construction. Come back soon. Getting started with Excel. Getting started with Python. Getting started with Scikit random forests. For this co...