Archive for the 'Statistics' Category

How to Scale a Vector of Numbers to One

Let’s say you have a vector of numbers: 1, 2, and 5. And let’s say you have another vector of numbers: 3, 4, 9. These vectors have different ranges, but you believe they actually should have the same range. This scenario probably sounds strange, but it can happen, for example, in cases where genetic data [...] Read more »

How to Create a Simple ROC Curve

I’m using the R statistical package and the ROCR package within that. Both of these are free and very flexible. Sometimes with that flexibility comes ambiguity as to what you should do to accomplish a relatively simple task. I am doing a data-mining (machine-learning) project in which I predict a cancer patient’s prognosis. The algorithms [...] Read more »

How to Access Slots in S4 Classes in R: Area Under Curve Example

In the R statistical package, you have various ways of representing and packaging data. The most simple is in a single variable. More complex representations include vectors, lists, matrices, and data frames. An even more complex representation is S4 Classes, which are intended to simulate object-oriented programming in R.
I was using an R package that [...] Read more »

How Training and Testing Works in Data Mining / Machine Learning

Imagine you wanted to come up with a “classifier” to predict whether Georgia would win a given American football game. So you might get (training) data from all their games from the previous two years. Then you might come up with rules based on that data. For example, if the quarterback throws for 300+ yards, [...] Read more »

Perform Log Base 2 Transformation in Java

Java has functionality built into it to transform a number using the natural logarithm. This can be done using the java.util.Math.log() method. However, to my knowledge there is no way to do this for base-2 logarithms.
Please don’t let me get started on how silly this is!!
To do a base-2 log transformation:

public static double [...] Read more »

Modifying Text Size of Axis Labels in R

When you create a plot in R, you can easily modify the text size of the labels on the axes using the cex.lab property. This stands for “character expansion of labels.” For this value, you specify a relative size (compared to the default) that you want the text to be. The following code shows how [...] Read more »

Find the Mean/Average of a Number List in Python

To my knowledge, there is no built-in function in Python to find the mean of a list of numbers. You can use statistics packages to do this, such as statpy, but if you just want a lightweight solution to do the trick you can use the function below. Note that on the first line I [...] Read more »

Log Transformations in Python

It is very simple to do a log transformation in Python. Log transformations are sometimes performed on a set of numbers to smooth out the data (make it look more “Normal”), which can help in performing certain statistical analyses.
The default in Python is the natural log. So for demonstration purposes, I will first find e^10. [...] Read more »