Three ways of splitting Train and Test in RStudio
I have put together 👌 ways of splitting the dataset before running the model.
1. ifelse
Will create and attach a new column named ‘train’ to the dataset. The runif() function will generate random values from a Uniform Distribution. The number of values this funtion will generate is the number of rows of the dataset. By random it generates 0 as minimum value and 1 as the maximum. If a value generated will be smaller than 0.8, the ifelse() function will assign the value 1 to the respective row, and if is bigger than 0.8 it will assign the value 0. So, we have 80% of 1 value and 20% of 0 value. Beautiful!
Creating the training set and test set from the rows that have the
‘train’ value equal to 1 and equal to 0, respectively to trainset and testset.
I will need to remove the ‘train’ column from the dataset before running the prediction model, as it is needed only for the separation of the data. Finding the index of the ‘train’ column with the grep() function and after removing from both trainset and testset as below.
For splitting the data to the train and test
set we use the createDataPartition function, part of the 'caret' library. It
takes the arguments p=0.8 which means what part of the data goes to training
and the list which in this case is false which means I don’t want the data
displayed as a list but as a matrix.
3. sample
Comments
Post a Comment