EuroLotto Predictor

Python NLP Web Scraping Machine Learning

2025

Overview

A specialized conversational AI model that generates lottery number predictions on request. This project demonstrates advanced NLP capabilities, data acquisition through web scraping, and custom model training. The system responds only to lottery prediction requests while politely declining other queries, showcasing focused machine learning application development.

Challenge

Creating a specialized model that can understand various ways users might ask for lottery predictions while filtering out unrelated queries presents several challenges: gathering quality training data, balancing model size with performance requirements, and implementing natural language understanding within a narrow domain without relying on large pre-trained models.

Solution

I developed a complete solution that combines real-world data acquisition with custom model training:

Development Process

Step 1: Dataset Generation

I created a sophisticated dataset generation script that produces diverse natural language queries for lottery predictions along with appropriate responses. The script generates various phrasings, includes intentional variations in capitalization and occasional typos to improve robustness, and also creates negative examples (non-lottery queries) to train the model to recognize what it shouldn't respond to.


# generate_lottery_dataset.py
# This script creates a comprehensive dataset of lottery prediction requests
# with various phrasings, natural language variations, and negative examples.

import random
import json
import csv
import re
import nltk
from nltk.corpus import wordnet

# Core function to generate diverse lottery requests in English
def generate_english_lottery_dataset(num_samples=8000):
    # Various lottery types from around the world
    lottery_types = [
        "EuroMillions", "Powerball", "Mega Millions", "EuroJackpot", 
        "Lotto", "National Lottery", "Lotto 6/49", "Thunderball"
    ]
    
    # Request verbs and phrases for variation
    request_verbs = [
        "predict", "generate", "give", "tell me", "share", "provide", 
        "create", "get", "show", "display", "reveal", "suggest"
    ]
    
    # Processing and saving 8000 diverse examples
    # 70% positive (lottery requests) and 30% negative (general queries)
    # ...

Step 2: Web Scraping Historical Lottery Data

Instead of using randomly generated numbers, I developed a web scraper to collect real historical EuroJackpot results. This scraper demonstrates advanced web scraping techniques including multiple source fallback, robust error handling, and respectful rate limiting. The collected data is then processed into both raw data files and structured training examples.


# scrape_eurolotto_numbers.py
# This script collects historical EuroJackpot results from official sources
# using advanced scraping techniques with fallback mechanisms

import requests
from bs4 import BeautifulSoup
import pandas as pd
import json
import time
import random
from datetime import datetime
import os

class EurolottoScraper:
    def __init__(self, output_dir="data"):
        self.output_dir = output_dir
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
            'Accept-Language': 'en-US,en;q=0.9',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
            'Referer': 'https://www.google.com/'
        }
        
    def scrape_eurojackpot_official(self, num_pages=10):
        """Scrape from the official Eurojackpot results page"""
        base_url = "https://www.euro-jackpot.net/en/results-archive"
        results = []
        
        # Implementing respectful scraping with delays
        # Extracting dates, main numbers, and Euro numbers
        # ...
                                
    # Alternative source fallback mechanism 
    def scrape_eurolotto_alternative(self, num_results=100):
        # ...

Step 3: Model Training

For the model training phase, I developed a custom training pipeline using PyTorch that builds a lightweight yet effective neural network for text classification. The approach prioritizes efficiency and performance on modest hardware while maintaining high accuracy for the specific task domain.


# train_lottery_model.py
# Custom training pipeline for lottery query classification

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
import pandas as pd
import json
import os
from datetime import datetime
from tqdm import tqdm

class SimpleTextClassifier(nn.Module):
    """Simple but effective text classifier with embedding and LSTM layers"""
    def __init__(self, vocab_size, embedding_dim=100, hidden_dim=128, num_classes=2):
        super(SimpleTextClassifier, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
        self.dropout = nn.Dropout(0.2)
        self.fc = nn.Linear(hidden_dim, num_classes)
    
    def forward(self, x):
        embedded = self.embedding(x)
        lstm_out, (hidden, cell) = self.lstm(embedded)
        dropout_out = self.dropout(hidden.squeeze(0))
        output = self.fc(dropout_out)
        return output

# Training loop with learning rate scheduling and early stopping
def train(model, train_loader, val_loader, criterion, optimizer, epochs, device):
    best_val_accuracy = 0.0
    
    for epoch in range(epochs):
        # Training phase
        model.train()
        train_loss = 0.0
        train_correct = 0
        train_total = 0
        
        # Progress tracking with tqdm
        progress_bar = tqdm(train_loader, desc=f"Epoch {epoch+1}/{epochs}")
        
        for batch in progress_bar:
            # Training step implementation
            # ...

The training process implements several best practices including:

Custom Text Preprocessor: Creates a vocabulary from the training data and transforms text into numerical sequences
Embedding + LSTM Architecture: Combines word embeddings with LSTM layers for effective sequence classification
Evaluation and Monitoring: Saves detailed training metrics and model checkpoints
Hyperparameter Optimization: Carefully selected hyperparameters for optimal performance on consumer hardware

The final model achieved 93% accuracy on the validation set, demonstrating excellent classification performance with minimal computational requirements (training completed in under 20 minutes on an RTX 4070).

Step 4: Inference Engine

I developed a robust inference engine that handles real-time classification of user queries and generates appropriate responses. The engine includes several critical components that work together to provide a seamless user experience:


# inference.py
# Inference engine for the lottery prediction model

class LotteryQueryProcessor:
    """Main class for processing lottery queries"""
    def __init__(self, model_dir="model", confidence_threshold=0.85):
        # Load the preprocessor, model, and supporting components
        self.preprocessor = TextPreprocessor.load(os.path.join(model_dir, "text_preprocessor.json"))
        
        # Initialize model with trained weights
        self.model = SimpleTextClassifier(
            vocab_size=self.preprocessor.vocab_size,
            embedding_dim=100,
            hidden_dim=128
        )
        self.model.load_state_dict(torch.load(os.path.join(model_dir, "best_model.pth")))
        self.model.eval()  # Set model to evaluation mode

        # Load responses and initialize number generator
        self.responses = self._load_responses(model_dir)
        self.number_generator = LotteryNumberGenerator()
        
        # Lottery-specific keywords for enhanced classification
        self.lottery_keywords = ["lottery", "lotto", "eurojackpot", "numbers", "predict"]
        self.confidence_threshold = confidence_threshold
    
    def process_query(self, query_text):
        """Process a user query and return an appropriate response"""
        # Preprocess the text
        sequence = self.preprocessor.transform(query_text)
        sequence_tensor = torch.tensor(sequence, dtype=torch.long)
        
        # Get model prediction with confidence
        with torch.no_grad():
            outputs = self.model(sequence_tensor)
            probabilities = torch.softmax(outputs, dim=1)[0]
            class_0_prob = probabilities[0].item()  # Not lottery
            class_1_prob = probabilities[1].item()  # Lottery
            
        # Enhanced decision logic with keyword verification
        is_lottery_query = self._determine_intent(
            query_text, class_1_prob, class_0_prob
        )
        
        # Generate appropriate response
        if is_lottery_query:
            return self.number_generator.generate_prediction()
        else:
            return self.responses["non_lottery"]

The inference engine implements several advanced features:

Enhanced Classification Logic: Combines model confidence scores with keyword-based heuristics
Intelligent Number Generation: Generates lottery numbers for the next scheduled EuroJackpot drawing (Tuesday or Friday)
Response Formatting: Creates natural-sounding responses with proper date awareness
Error Handling: Gracefully handles unexpected inputs and edge cases

Step 5: Command-Line Interface

I completed the project with a clean, intuitive command-line interface that allows users to interact with the model in a natural way. The interface provides a conversational experience while demonstrating the model's capabilities:


def main():
    """Interactive command-line interface for the lottery query processor"""
    print("\n=== EuroLotto Predictor ===")
    print("Type 'quit' or 'exit' to end the session.")
    print("Type 'debug on' or 'debug off' to toggle debug mode.\n")
    
    try:
        processor = LotteryQueryProcessor()
        
        while True:
            query = input("\nEnter your query: ")
            
            # Handle special commands
            if query.lower() in ['quit', 'exit', 'q']:
                print("\nThank you for using EuroLotto Predictor. Goodbye!")
                break
            
            if query.lower() == 'debug on':
                debug_mode = True
                processor.debug_mode = True
                print("Debug mode turned ON")
                continue
                
            if query.lower() == 'debug off':
                debug_mode = False
                processor.debug_mode = False
                print("Debug mode turned OFF")
                continue
            
            # Process the query and display response
            response, is_lottery_query, confidence = processor.process_query(query)
            print(f"\n{response}")
            
if __name__ == "__main__":
    main()

Example Interaction:

=== EuroLotto Predictor ===
Type 'quit' or 'exit' to end the session.

Enter your query: Hello, how are you today?

I'm sorry, but I'm not a chatbot. I can only generate lottery number predictions.

Enter your query: Can you predict some numbers for EuroJackpot?

For EuroJackpot drawing on 2025-05-13, the main numbers are 7, 19, 23, 28, 41 and the Euro numbers are 3, 9. Good luck!

(Disclaimer: This is a prediction for entertainment purposes only. Lottery results are completely random.)

Enter your query: Will these numbers win?

I'm sorry, but I'm not a chatbot. I can only generate lottery number predictions.

Enter your query: What are my chances of winning with those numbers?

I'm sorry, but I'm not a chatbot. I can only generate lottery number predictions.

The implementation demonstrates successful domain-specific behavior, correctly identifying which queries are lottery prediction requests and providing appropriate responses while maintaining a focused scope.

Key Technical Features

Natural Language Understanding: Recognizes diverse ways users might request lottery predictions
Domain Specificity: Responds only to lottery-related queries, politely declining other requests
Data Acquisition: Web scraper with fallback mechanisms for resilient data collection
Custom Training: Purpose-built model trained on specialized dataset
Realistic Outputs: Generates number predictions based on historical patterns
Date Awareness: Provides predictions for the next actual drawing date (Tuesday/Friday)
Robust Classification: Combined model confidence scoring with keyword verification for high accuracy

Challenges and Solutions

During development, I encountered and solved several significant challenges:

Challenge: Unreliable Data Sources

Initial web scraping attempts encountered blocked requests and site structure changes.

Solution: Implemented a multi-source fallback system with automated mock data generation as a final fallback, ensuring the pipeline is never blocked.

Challenge: Classification Bias

Early model versions showed bias toward classifying all inputs as lottery requests.

Solution: Adjusted confidence thresholds, implemented hybrid decision logic combining model output with keyword verification, and rebalanced the training data.

Challenge: Dependency Conflicts

Initial attempts to use more complex ML libraries created dependency conflicts.

Solution: Developed a lightweight custom solution using core PyTorch components, eliminating dependency issues while maintaining high performance.

Future Enhancements

While the current implementation demonstrates the core functionality, several enhancements are planned:

Expanding to additional lottery types and languages
Implementing more sophisticated number generation algorithms
Adding a conversational component for follow-up questions
Creating a web application interface with visualization components
Implementing statistical analysis of generated numbers versus actual drawings

Live Demo: Try the Lottery Predictor

Interact with the machine learning model directly. Ask for lottery predictions in different ways, or try non-lottery questions to see how the model responds.

EuroLotto AI

Online

Hello! I'm the EuroLotto Predictor AI. I can generate lottery numbers for you. Try asking me for a prediction!

Just now

Predict EuroJackpot numbers Lucky numbers for tonight Ask about weather

How It Works

This demo connects to a real machine learning model running on a dedicated server. The model uses an LSTM neural network to analyze your text input and determine if it's a lottery prediction request. When it detects a lottery request with high confidence, it generates EuroJackpot numbers for the next drawing date. For all other queries, it provides a standard response explaining its limited functionality.

The classification confidence is displayed in real-time, allowing you to see how the model interprets different phrasings.