Machine Learning Competition 2022
Here is the registration form link for the MLC Competition next Friday (December 09)!
Google registration form
You can compete individually, or with a team up to 3.
If you are planning on competing, please all register individually using the SAME TEAM NAME
The contest will have three questions: two regression problems, and one numeric classification problem.
Prizes (Gift Cards) will be awarded to the team with the best evaluation metrics: (high accuracy) for classification and low validation loss for regression (making sure there are no high amounts of overfitting).
Also, there is a list of professors and classes offering extra credit! The list can be found in the registration form.
Please make sure to register by 5:00 PM Thursday, December 08!!
Any questions, please contact Mason, Manish , or Professor Aos!
Thanks!
Resource for the Comp:
>> Download the below datasets and use them for the competition.
>> The dataset is in the form of a csv file.
Question .1 : Regression: Admission Chance prediction
Predict “Chance of Admit”
Use the ‘admission_chance.csv’ dataset to create a machine learning model that will predict the school chance of admission for an aspiring college student.
The team with the lowest loss and difference between loss and val_loss will win this question.
Admission Chance
Question .2 : Regression: Bike dataset count prediction
Use the ‘bike_dataset.csv’ dataset to create a machine learning model that will estimate (predict) the count (cnt) of bikes rented that day from a bike rental company.
The team with the lowest loss and difference between loss and val_loss will win this question.
Bike Dataset
Question .3 : Classification: Diabetes prediction
Numerical Classification
Use the ‘diabetes_dataset.csv’ to create a machine learning model that will classify whether an individual is predicted to have diabetes (outcome 1) or not have diabetes (outcome 0).
The team with the highest f1 score will win this question.
Diabetes dataset
- Start with importing the necessary libraries like,
-
import numpy as np
-
import tensorflow as tf
-
import pandas as pd
-
import matplotlib.pyplot as plt
-
Then, load the data using pandas library
-
df = pd.read_csv('filename.csv')
-
Then, use dataframe.iloc to separate x and y from data
-
x = df.iloc[:, 0:4].values
-
y = df.iloc[:, 4].values
-
Then, use sklearn library to split the data into training and testing data(doesnt work with this easy with image classification)
-
from sklearn.model_selection import train_test_split
-
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
-
Then, use sklearn library to scale the data using MinMaxScaler or StandardScaler (pick carefully based on the data, sometimes you dont need to scale )
-
from sklearn.preprocessing import StandardScaler
-
Then, use tensorflow library to build the model
>>Tips on ways to optimize and make your model better
Change the number of neurons in each layer
Change the number of layers
Change the activation functions (relu, tanh, sigmoid…)
Change the optimizer when you are compiling the model (adam, Nadam)
Change the number of epochs
Change the learning rate within the optimizer (tf.keras.optimizers.Adam(learning_rate=0.001))
Change the train_test split percentages (from 80/20 to maybe 75/25)
Use “stratify” in your train_test_split() function
Use kernel_regularizer='l1' in your first model layer (Do not rely on this as it can actually also harm your model)
ALWAYS START wih a bigger model, which you can always make smaller after evaluating the losses.
>> Once done with the model paste this code for evaluation
from sklearn.metrics import precision_score, f1_score
p_score = precision_score(y_test, py.round(model.predict(x_test_scaled),0))
fscore = f1_score(y_test, py.round(model.predict(x_test_scaled), 0))
print(p_score)
print(fscore)