Codestin Search App

Policy Gradient Algorithms

In reinforcement learning, there are multiple different approaches one can take to train a function approximator (often a neural network) that is capable of intelligent behavior. Broadly speaking, the algorithms fall into several different classes:

Value function based methods, in which the agent learns to estimate the relative value of different states. In this paradigm, if we know the relative value of each state, we can take an optimal sequence of actions by moving to states with the highest value.
Policy based methods, in which we learn some function that maps a given state to a given action. More generally, a policy tells us how we make a decision — in the case of value-based methods, the policy is fixed, and determined by how we select an action (e.g. epsilon-greedy, or softmax over actions).
Actor-critic methods that combine both a policy (the actor) and a value function (the critic).

In addition to these types of algorithm, we also have the distinction between on-policy and off-policy decision-making. On-policy decision-making is where you always follow the policy, whereas in off-policy you don’t (you might occasionally take an action that goes against the policy in order to explore the space a bit more). Value function techniques can be both on or off-policy, depending on the algorithm; Q-learning — a well-known value-based method — is off-policy, but SARSA, an almost identical technique, is on-policy (in a nutshell — SARSA does its Q-update with the next action and state, whereas Q-learning only uses the next state with the highest value for its update). Policy-based techniques are, as the name implies, on-policy by definition, though you can change this up a bit with actor-critic methods, which are able to be both on or off-policy depending on the implementation.

My area of interest is typically policy gradients, because they’re the family of algorithms that have had the most success in the types of continuous action spaces that definte robotics. As their name implies, we are directly learning the policy by using the gradient to do gradient ascent on a reward signal. The intuition here is that when you take an action, there will likely be some error associated with it; how wrong was the action, and which components contributed the most to the error? Say we have a robot that needs to control a bunch of torques in order to walk — how wrong was each torque value, and in what direction should each torque move in order to maximize the cumulative reward (say, points for how fast it is walking)?

To keep it simple, we’re going to stick with a discrete action space for this article, and go over the simplest policy gradient algorithms. We have an agent in an environment, and the agent receives some kind of state feedback as input to a neural net. The agent’s neural net has a series of outputs, which map directly to a set of actions (say, two neurons with a softmax layer for moving left, or moving right). In this case, the outputs represent the probability that the agent should take that action. The agent takes the action with the highest probability, moves to a new state, and receives some reward signal as feedback.

As with all machine learning problems, we have 4 components:

A differentiable function approximator (in this case, a neural net);
A cost function that we are trying to minimize or maximize;
An optimizer — in the case of neural networks, this is generally some variant of the backpropagation algorithm (I use Adam here); and,
A dataset. In the case of RL, this dataset consists of observations we make regarding the environment, usually in the form of a tuple (state, action, next state, reward).

Our cost function is relatively straightforward: we want the agent to maximize the total reward it receives over a trajectory. Generally, we discount future rewards so that the agent cares more about immediate rewards than ones that occur in the distant future. Mathematically, we can express this as:

$J(\theta) = \mathbb{E}[R(\tau)] = \mathbb{E}[\sum_{t=0}^T \gamma^t R_{t}]$

T can be a finite value, in which case, we might set gamma to be 1, or it could be infinite, in which case gamma needs to be in the interval [0, 1] to ensure that the sum converges. In order to step the network parameters in the direction of maximum reward, we need to take the derivative of J with respect to theta, but therein lies a problem — how much did each action contribute to the reward? This is known as the credit assignment problem.

We can tackle this problem in two ways — through the pathwise derivative, or using the REINFORCE trick. Given that the second formulation is more common (for discrete actions anyway — DDPG and other pathwise derivative techniques are more common for continuous action problems), we’ll use the REINFORCE trick. First of all, let’s rewrite the cost function J as:

$J(\theta) = \int_{\tau} P(\tau) r(\tau) d\tau$

Where tau is a trajectory, or a sequence of state-action-reward tuples <s0, a0, r0, s1, a1, r1, … , sn-1, an-1, rn-1, sn>. Our gradient with respect to the network parameters theta is:

$\nabla_{\theta} J(\theta) = \int_{\tau} \nabla_\theta P(\tau) r(\tau) d \tau$

We can make a clever observation here that:

$\nabla_\theta P(x) = \left( \frac{P(x)}{P(x)} \right)\nabla_\theta P(x)$

And use the identity:

$\nabla_ x log(x) = \frac{\nabla_x x}{x}$

To turn our cost function derivative into:

$\nabla_\theta J(\theta) = \int_{T} P(\tau) \nabla_\theta log P(\tau) r(\tau) d\tau$

This is useful because it reduces to an expectation:

$\nabla_\theta J(\theta) = \mathbb{E}[\nabla_\theta \log P(\tau) r(\tau)]$

Now, why are we talking about trajectories, and what does the probability of a trajectory even mean? One more step and it should become clear. We can rewrite our trajectory as:

$P(\tau) = P(x_0) \prod_{k=0}^H P(x_{k+1}|x_k, u_k)\pi(u_k|x_k)$

Where the policy pi is the probability of taking an action conditioned on the current state. Importantly, the policy is the only part of this function that depends on theta, which implies that:

$\nabla_\theta J(\theta) = \mathbb{E}[\sum_{k=0}^{K} \nabla_\theta \log \pi(u_k|x_k) r_k]$

What this means is that if we have a neural network policy that takes in a state and outputs a probability distribution over a set of discrete actions (i.e. moving left or right), we can sample from that distribution to get an action, and then take the log of that action to get its log probability. Our cost function becomes:

$J(\theta) = - \sum_{k=0}^{K} \log \pi(u_k|x_k) r_k]$

We store the log probabilities of our actions, and the reward feedback that we get from the environment when we take these actions, and we backpropagate them through our network. The way this is typically done in practice is that we subtract a baseline b from the reward function in order to minimize variance (since our current estimator is very high variance):

$J(\theta) = - \sum_{k=0}^{K} \log \pi(u_k|x_k) (r_k-0)]$

We can do this because if the baseline is not dependent on theta, it’s derivative with respect to theta is 0. By further recognizing that we only care about future returns, we can see that it makes sense to use the Q-function in place of the reward (and this will give a stronger reward signal, since the Q-function is by definition the expected return from state x_t):

$J(\theta) = - \sum_{k=0}^{K} \log \pi(u_k|x_k) (Q(x_k,u_k)-0)]$

Where:

$Q(x_k,u_k) = r_k+\gamma Q(x_{k+1},u_{k+1})$

This gives us the so-called actor-critic, in which the policy is our actor, and the Q-function is our critic. It also points the way towards the (now fairly standard) advantage actor-critic, in which the advantage function is used as the critic.

Phew!

For the code today, I’m not going to reinvent the wheel; PyTorch has an excellent example of the REINFORCE algorithm and the actor-critic algorithm here and here respectively.

I’ll leave it there for now, and hopefully I’ll be posting here more regularly in the coming months.

Ciao!

Implementing the Neural Physics Engine in PyTorch

So a little while ago I put up an article on an elastic collision simulator I wrote, with the intention of later replicating the work of a group at MIT (paper linked here). Well, I finally got around to that, and thought I would post the implementation up here.

My reason for wanting to implement this model has to do with my research — I’m interested in model-based RL in dynamic environments where the number of objects can potentially change. I’d like to do direct policy search using a model of the environment that takes into account abstractions (objects) that the agent uses for decision-making. Rather than just making a decision based on raw pixels or state, the agent chooses an action using a policy that is also conditioned on what it believes the future state of other objects is. Intuitively, this is sort of like looking at the Mario game screen, extracting the type and state of the enemies, and then making a decision based on what you believe the enemies are going to do next.

This is a powerful idea, because it allows a neural net to learn the underlying dynamics of a system, rather than just overfit to the actions that let it complete a level. I pitched a network architecture to my supervisors that I thought could handle this type of factorization, though I later stumbled onto the NPE which is conceptually very similar, but simpler than what I had in mind.

So how does the NPE work? In a nutshell, it’s a dynamic encoder-decoder architecture. We have a focus object with a state we’ll call F, and an arbitrary number of context objects with states A, B, C, …, etc. The idea is to factor interactions according to pairs — for example, (F, A), (F, B), (F, C), …, and so on. The encoder produces a latent variable $z$ that encodes the interaction between each factored pair, and we sum all of these together. We pass this value to the decoder, along with F. We then train our encoder and decoder to predict the next state of the object. A diagram from the original paper has been given below:

Screen Shot 2017-11-14 at 17.19.51

It’s important to point out that the same encoder is used for each factored pairwise interaction. The reason for this is that intuitively, it’s a model that encodes the interaction between the focus object and other relevant objects of the same type. Though I’ve only tried this with balls, if we added squares into the mix, we might need to include two other encoders to deal with interactions between balls and squares, and interactions between squares and squares. In theory, however, this is an object classification task that we’re already pretty good at.

The final reason we want to recycle the encoder is that it’s statistically very efficient. Our encoder is exposed to many times more data than a non-dynamic network would be, because it gets replicated for each pairwise interaction. The encoder doesn’t care if two objects aren’t interacting, so we make sure to pass it information only when interactions occur. This also helps to bump up its efficiency by removing sparsity. The percentage of time steps in which a physical interaction occurs between two objects in a simulation is very small — far less than 1%. Only passing interaction data lets the network learn directly from those instances, rather than trying to learn through noise. Intuitively, this is similar to having an attention mechanism that identifies the important interactions, though in this case, the attention mechanism isn’t something that’s learned — rather, it’s a function $Att(t)$ that we specify, and use to feed data to the dynamic encoder.

As a final quick point before the implementation, we don’t need to make any changes to the way backpropagation is done in order to train our model. The reason for this is that we are summing the latent codes — if we want to use the chain rule to backpropagate into the encoder, and we consider the sum operation to be a node, we get the following operation:

$\frac{\partial E}{\partial \phi} = \frac{\partial E}{\partial z_T} \frac{\partial Z_T}{\partial Z_i} \frac{\partial Z_i}{\partial \phi}$

$Z_T = \sum_{i-1}^N Z_i$

$\frac{\partial Z_T}{\partial Z_i} = 1$

This means that we can propagate the error into the encoder without any difficulty or modification.

For the implementation, I used PyTorch for its dynamic computational graphs (the authors of the original paper used Torch). It’s certainly possible to put this together in TensorFlow using a static computation graph, but it’s clunky. You would need to create branches for all possible interactions, and mask them. In PyTorch, we can just chop the data into a jagged list of interactions, and then feed this into our network, looping over the interactions. Though TensorFlow has just implemented dynamic graph structures, I’m not yet familiar enough with them to attempt something like this.

So without further ado, here is the PyTorch implementation of the NPE:

class DynamicEncoder(torch.nn.Module):
    def __init__(self, encoder, decoder):
        super(DynamicEncoder, self).__init__()
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, F, X):
        m = self.encoder.topology[-1]
        enc = Variable(torch.zeros(m))
        check = X.data.numpy().size
        if check != 0:
            for x in X:
                inp = torch.cat([F, x])
                effect = self.encoder(inp)
                enc = enc+effect
        h = torch.cat([F, enc])
        output = self.decoder(h)
        return output

Yeah, it’s really not much. We provide encoder and decoder models of arbitrary topology to the NPE. When we call the forward method, we pass the network inputs F for the focus object, and X for the context objects. I check the size of X to make sure it’s not empty (since, quite often it will be), and if it includes data, we loop through it with our encoder, and sum the latent variables. If it’s empty, we just pass a zero vector to the decoder, along with F.

I made sure to abstract the NPE a bit since I plan on trying out different autoencoders (I’m a big fan of VAEs), and using it recursively (passing an instance of the NPE as an encoder for a second NPE instance — this is to allow higher level reasoning such as choosing an action based on the predicted next state of objects in the environment).

Now for our encoder and decoder methods, they’re actually one and the same:

class FullyConnectedNetwork(torch.nn.Module):
    def __init__(self, topology):
        super(FullyConnectedNetwork,self).__init__()
        self.topology = topology
        m = len(topology)
        self.layers = torch.nn.ModuleList()
        for i in range(0,m-1):
            self.layers.append(torch.nn.Linear(topology[i],topology[i+1]))

    def forward(self, X):
        i = 0
        while i &lt; len(self.layers)-1:
            X = Func.relu(self.layers[i](X))
            i += 1
        out = self.layers[-1](X)
        return out

I wrote a class to build a basic MLP with ReLU activation functions in the hidden layer, and a linear output (since we’re outputting a real value). This was more for convenience than anything, since it lets me quickly change the topology of the encoder and decoder networks, and try out new things. In practice, a single hidden layer of 2500 neurons seemed to be more than enough to do the trick.

Now for the data handling:

def cutData(DATA, thresh):
    print('Prepping data...')

    l,m,n = DATA.shape
    clist = []          # context object
    xlist = []          # input data
    ylist = []          # output data
    X = []              # aggregated input data
    Y = []              # aggregated output data

    for i in range(0,m-1):
        for j in range(0,n):
            for k in range(0,n):
                if j != k:
                    rel_x = DATA[0,i,j]-DATA[0,i,k]
                    rel_y = DATA[1,i,j]-DATA[1,i,k]
                    rel_dist = math.sqrt(rel_x**2+rel_y**2)
                    total_rad = DATA[6,i,j]+DATA[6,i,k]
                    if rel_dist-total_rad &lt;= thresh:
                        clist.append(DATA[:,i,k])
            ylist.append(DATA[:,i+1,j])
            xlist.append([DATA[:,i,j], clist])
            clist = []
        X.append(xlist)
        Y.append(ylist)
        xlist = []
        ylist = []
    return X, Y

I’m not going to lie, this was a bit of a pain to write, and where the bulk of the effort went. In a nutshell, I recorded all of the state transitions for a 5 second period of time, for a bunch of particles. Once I’d done that, I needed to turn this raw data into pairwise interactions between particles. I created a threshold of 7 pixels for particle interactions, and looped through the whole list. For each particle, I check to see if there are any particles within this threshold (finding the absolute distance, and then subtracting the sum of the radius of the two particles); if there are, I then add that nearby object to the list of context objects for the current particle. Finally, I add the focus object’s state in the next time step to the Y list that our neural net is going to try and predict. This is repeated for all particles, for each time step up to T-1.

When it’s time to train the network, I loop through the number of training epochs. At each epoch, I loop from 0 to time T-1, and then through the state of each particle at time t. I grab the state of the particle, the list of context object states, and the Y label (the state of the focus object at t+1), wrap it in a variable so that autograd can do the work, and then pass it to the NPE. The loss criterion is the mean squared error (since we have a linear output) and the optimizer is just standard SGD. I zero the gradient, and then backpropagate the error:

# initialize neural net
encoder_topology = [18, 2500, 1]
decoder_topoogy = [10, 2500, 9]
encoder = dae.FullyConnectedNetwork(encoder_topology)
decoder = dae.FullyConnectedNetwork(decoder_topoogy)
model = dae.DynamicEncoder(encoder, decoder)

criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-16, momentum=0.9)
iterations = 5
for i in range(0,iterations):
    total_loss = 0
    time = 0
    for t in X:
        c = 0
        for obj in t:
            focus = obj[0]
            context = obj[1]
            f = Variable(torch.FloatTensor(focus))
            x = Variable(torch.FloatTensor(context))
            y = Y[time][c]
            y = Variable(torch.FloatTensor(y))

            # Forward pass: Compute predicted y by passing x to the model
            y_pred = model(f,x)

            # Compute and print loss
            loss = criterion(y_pred, y)
            total_loss = total_loss+loss.data[0]

            # Zero gradients, perform a backward pass, and update the weights.
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            c += 1
        time += 1
    print('Iteration: ' + str(i+1) + '/' + str(iterations) + ', Loss: ' + str(total_loss))

Training something like this is actually very hard — the error explodes very quickly if the learning rate is too high. I eventually settled for 1e-16, which seems to work fairly well.

Once the network is trained, I use it to run the simulation by predicting the velocity of each each particle in the next time step, and using the physic’s model’s state update to step the simulation forward. The code for this is:

# initialize simulation with neural engine running the update function
running = True
thresh = 7
t = 0
while running:
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False

    screen.fill([255, 255, 255])
    for p in env.particles:
        focus = [p.X[0][0], p.X[0][1], p.V[0][0], p.V[0][1], p.A[0][0], p.A[0][1], p.radius, p.mass, p.density]
        context = []
        for q in env.particles:
            if p != q:
                rel_x = p.X[0][0]-q.X[0][0]
                rel_y = p.X[0][1]-q.X[0][1]
                rel_dist = math.sqrt(rel_x**2+rel_y**2)
                total_rad = p.radius+q.radius
                if rel_dist-total_rad &lt;= thresh:
                    c = [q.X[0][0], q.X[0][1], q.V[0][0], q.V[0][1], q.A[0][0], q.A[0][1], q.radius, q.mass, q.density]
                    context.append(c)
        focus = Variable(torch.FloatTensor(focus))
        context = Variable(torch.FloatTensor(context))
        out = model(focus, context).data.numpy()[2:4]
        p.addVelocity(out/1000)
    env.update()
    display(env)
    pygame.display.flip()
    image.save(screen,'frame'+str(t)+'.jpeg')
    t += 1
    if t == int(simtime):
        running = False

I loop through all of the particles, and check to see if there are any context particles within the distance threshold. If there are, I add the state of those particles to the context list for the focus particle. From there, I feed the input to the network and make a prediction for the next state of the focus particle, and add the velocity to the focus particle’s state. The environment update function is called as normal, and the whole system is progressed forward one time step.

In this case, I divided the output velocity by 1000 to slow everything down, since I haven’t decoupled the frame rate and the physics update. My justification for this is that I’m more interested in whether or not the interaction is being captured, which makes the direction of the velocity vector more interesting than the overall magnitude. This is especially true given that I’ve only trained the encoder/decoder architecture for 5 epochs, so we’re looking at a system that still has high losses. Dividing the output velocity by some constant is in this case equivalent to reducing the timestep by a constant value.

Here is a short GIF of the model in action:

https://streamable.com/sypdf

And there you have it! All in all, I’m actually quite happy with the results. As I mentioned above, I have a few interesting ideas that I plan to use this for — in theory it can be used for just about any instance in which the environment is dynamic, or in which the interactions we care about are very, very rare. The GitHub code can be found here, and if you have any questions, feel free to ask. As always, feedback, comments and advice are appreciated.

Cheers!

A Strip Theory Aerodynamics Model for Small Fixed Wing Aircraft, Implemented in Python and Tested Against Tornado VLM

So this is a write-up for something I implemented a while ago. If you go back over some of the earlier posts on my site, you’ll see I wrote a vectorized BEMT solver to calculate the thrust and efficiency of propellers. The same technique can actually be used to solve for the lift and drag forces acting on an aircraft, which is what I eventually did. The reasons for this are that blade element analysis is actually very simple, and — after Roskam/DATCOM techniques for estimating aircraft stability derivatives — is probably the simplest body force solver to implement. It’s also very powerful; as this paper shows, it’s fairly close to vortex lattice and other panel-based methods in terms of accuracy (especially for conventional configurations), and it’s much faster ( $O(n)$ at worst as opposed to $O(n^2)$ for panel methods). Finally, with the right data, it’s capable of enabling full-envelope flight simulation in real-time, which — to my knowledge — no other technique can currently do. This means we can use the technique for real-time simulation of complicated aerobatic maneuvers like tail slides, which are typically fairly difficult to capture.

For this post, I’ll go over the basic mathematics of the technique, and then compare it to TornadoVLM for calculation of stability derivatives for a simple lifting body. I’m going to assume a linearized lift-curve slope since it makes the maths easier, and we can test it very quickly. This is generally in line with what other groups have done in the literature (see here, for example).

To begin with, I’m going to assume a standard axis system for the aircraft:

Inertial-Frame

We’ll also be using the flat Earth assumption and use Euler rotation matrices to convert between the inertial and body axis frames. This lets us rotate the wing and subject it to a velocity in our virtual wind tunnel, so that we can calculate the stability derivatives for any direction and rotation. It’s also necessary further down the line if we want to calculate the frequency response of the vehicle, or use it in a flight simulator. I’ll briefly go over how this is done towards the end of this article.

I should also make a note on the architectural convention I tend to use for the simulations I build. First off, I like to define an environment that holds all of our basic parameters like air density, the gravity vector, the simulation time step, and the total time it will run for. This environment contains a list of vehicles, and each vehicle is built up of numerous parts — wings, propulsion, bodies, etc — and each of these sub-components have a solver attached to them. This lets us test out different solvers on the same geometry. The vehicle tends to hold parameters such as the inertia tensor, mass, and cg location, whilst a wing holds information such as the reference area, wingspan, root and tip chords, etc. The solver attached to the wing accesses these variables in order to run its calculation, and the forces and moments are stored as part of the vehicle’s state. For this write-up, I’ll keep it simple, and we’ll just be looking at the wing, though some of these architectural considerations will become apparent in the functions (which is why it’s important that I mention them).

For our wing, we want to calculate the total lift force generated in the body axis coordinate system — that is, the force components in our x, y and z axes respectively. The expression for this is similar to the expression you might have seen in my BEMT article:

$L' = C_L(\alpha) \frac{1}{2} \rho V(y)^2 c(y) dy$
$D' = C_D(\alpha) \frac{1}{2} \rho V(y)^2 c(y) dy$

For a rotating aircraft, the local velocity will change along the wing, which we can find using:

$V = \omega \times (R_i-R)$

Where R is the location of the CG in [x,y,z], and R_i is the location of a particle in [x,y,z]. If we know the local velocity, we can generate an angle function and integrate along the wing:

$X = \frac{\rho}{2} \int V(y)^2 c(y) \left(C_D(\alpha(y))cos(\alpha(y)) - C_L(\alpha(y))sin(\alpha(y)) \right) dy$

$Y = \frac{\rho}{2} \int V(z)^2 c(z) \left(C_L(\beta(z))cos(\beta(z)) + C_D(\beta(z))sin(\beta(z)) \right) dz$

$Z = \frac{\rho}{2} \int V(y)^2 c(y) \left(C_L(\alpha(y))cos(\alpha(y)) + C_D(\alpha(y))sin(\alpha(y)) \right) dy$

Unfortunately, these integrals have no solution expressible using elementary functions, meaning that no analytical solution exists (if there were, aircraft simulation would be very quick and easy since you could just plug parameters into the equation). So to calculate these forces, we need to use a numerical method to estimate a solution to the integral.

Those of you who remember high school calculus might remember that we can approximate an integral using a Riemann sum of finite elements:

riemann sum

Which is exactly what we do with our wing! We divide it up into strips, we calculate the force in the body X,Y,Z from each strip, and then we sum them up. We assume that each strip is independent of the others, and we can see that as the number of strips approaches infinity, we’ll get a better and better approximation of the exact value (though in practice we really don’t need that many).

For each strip, we need the local chord, the local velocity, the local angle of attack (calculated using the velocity) and the local lift and drag coefficients (calculated using the angle of attack).

So in Python, we can build a vector of locations as follows:

def buildElementPositions(self):
        COORDS = self.wing.COORDS
        theta = self.wing.dihedral
        phi = self.wing.qtrchordsweep
        K = self.el/2*math.tan(phi)
        if COORDS[0][1]-COORDS[1][1] == 0:
            A = min(COORDS[0][2],COORDS[1][2])
            B = max(COORDS[0][2],COORDS[1][2])
            YS = np.linspace(COORDS[0][1], COORDS[1][1], self.elements)
            ZS = np.linspace(A+self.el/2,B-self.el/2,self.elements)
            if A == COORDS[0][2]:
                XS = np.linspace(COORDS[0][0]-0.25*self.wing.rc+K, COORDS[1][0]-0.25*self.wing.tc-K, self.elements)
            else:
                XS = np.linspace(COORDS[1][0]-0.25*self.wing.tc-K, COORDS[0][0]-0.25*self.wing.rc+K, self.elements)
        else:
            A = min(COORDS[0][1],COORDS[1][1])
            B = max(COORDS[0][1],COORDS[1][1])
            C = min(COORDS[0][2],COORDS[1][2])
            D = max(COORDS[0][2],COORDS[1][2])
            YS = np.linspace(A+self.el/2, B-self.el/2, self.elements)
            if A == COORDS[0][1]:
                XS = np.linspace(COORDS[0][0]-0.25*self.wing.rc+K, COORDS[1][0]-0.25*self.wing.tc-K, self.elements)
                if C == COORDS[0][2]:
                    ZS = np.linspace(COORDS[0][2]+self.el/2*math.tan(theta), COORDS[1][2]-self.el/2*math.tan(theta), self.elements)
                else:
                    ZS = np.linspace(COORDS[0][2]-self.el/2*math.tan(theta), COORDS[1][2]+self.el/2*math.tan(theta), self.elements)
            else:
                XS = np.linspace(COORDS[1][0]-0.25*self.wing.tc-K, COORDS[0][0]-0.25*self.wing.rc+K, self.elements)
                if C == COORDS[0][2]:
                    ZS = np.linspace(COORDS[1][2]-self.el/2*math.tan(theta), COORDS[0][2]+self.el/2*math.tan(theta), self.elements)
                else:
                    ZS = np.linspace(COORDS[1][2]+self.el/2*math.tan(theta), COORDS[0][2]-self.el/2*math.tan(theta), self.elements)
        WING = []
        for i in range(0,self.elements):
            WING.append([XS[i],YS[i],ZS[i]])
        self.WING = np.asarray(WING)

The wing is stored as a vector of four location coordinates using the following convention:

Wing Convention

Where we define both the left and right wing as separate entities that are added to the vehicle container.

I access the wing’s parameters, and create a vector of stations along the wing in the body [x,y,z] coordinate system. This is a little bit involved since I use the same wing object for both vertical and horizontal tail sections, as well as wings, so I need to check to see if I’m dealing with a vertical object or not. From there, I create a coordinate vector for the local chord of each wing element:

def buildElementChords(self):
    COORDS = self.wing.COORDS
    theta = math.atan2(COORDS[1][2]-COORDS[0][2],COORDS[1][1]-COORDS[0][1])
    if COORDS[0][2]-COORDS[1][2] == 0:
        STATIONS = abs(self.WING[:,1])
    else:
        STATIONS = abs(self.WING[:,1]/math.cos(theta))
    A = 2*self.wing.Sref/((1+self.wing.lmbd)*self.wing.halfspan)
    B = (1-self.wing.lmbd)/(self.wing.halfspan)
    self.CHORD = A*(1-B*STATIONS)

There’s a formula to be able to do this, linked here, though it assumes a flat wing. The theta value is the dihedral angle, and I use it to get the true step distance along the wing, and calculate the chord. On this note, I actually use a magnitude for the wing span, which is technically not the way it’s normally done (wing span for a dihedral wing is normally tip-to-tip).

From here, we need to write up a function to do numerical integration of lift and drag forces and moments over the wing. We’ll take in the velocity components in the body frame (u, v, w) and calculate the local angle of attack for each element, and from there, we’ll use these to calculate the forces in X,Y,Z using the previous equations. For the moments, I’m using:

$L = \sum_{i=1}^n Z_i (y_R-y_i)+Y_i (z_R-z_i)$
$M = -\sum_{i=1}^n Z_i (x_R-x_i)+X_i (z_R-z_i)$
$N = \sum_{i=1}^n X_i (y_R-y_i)+Y_i (x_R-x_i)$

We aren’t going to iteratively calculate downwash or wake effects in this case, which greatly simplifies matters. We’re also going to do something a bit clever — we’ll do the lift calculation in an axis system in which the z-direction is aligned with the wing normal vector, and the x-axis travels in the same direction as the body x-axis (i.e. the wing is never yawed relative to the body, but it could have dihedral or incidence). The reason for this is that we can set our integral in the y direction to zero, and then rotate our forces back to the body frame to get the X,Y,Z components. In code, the process for this is:

def solveForces(self, U, P):
        V = U+np.cross(P, self.WING-self.CG)                                # calculate element velocities (body frame)
        projN = np.einsum("ij,ij->i",V,self.WINGNORM)                       # project velocity vector to wing norm axis
        projX = np.einsum("ij,ij->i",V,self.bodX)                           # project velocity vector to body X axis
        PNsq = np.einsum("i,i->i",projN,projN)                              # squared magnitude of proj N
        PXsq = np.einsum("i,i->i",projX,projX)                              # squared magnitude of proj X
        Vsq = PNsq+PXsq                                                     # calculate squared magnitude of local V
        ALPHA = np.arctan2(projN,projX)                                     # calculate local angle of attack
        cl = self.liftfunction.CL(ALPHA)                                    # calculate lift coefficiencts using aoa
        cm = self.liftfunction.CM(ALPHA)                                    # calculate moment coefficients using aoa
        cd = cl**2/self.K1                                                  # calculate drag coefficients using cl
        LIFT = cl*self.K2*Vsq*self.dA                                       # calculate elemental lift vector
        DRAG = cd*self.K2*Vsq*self.dA                                       # calculate elemental drag vector
        PITCH = cm*self.K2*Vsq*self.dA*self.CHORD                           # calculate elemental pitching moment vector
        locXS = DRAG*np.cos(ALPHA)-LIFT*np.sin(ALPHA)                       # local X component
        locYS = np.zeros(np.size(locXS))                                    # local Y component (always zero)
        locZS = LIFT*np.cos(ALPHA)+DRAG*np.sin(ALPHA)                       # local Z component
        locF = np.asarray([locXS, locYS, locZS]).T                          # build local elemental force vector array
        projF = np.einsum("ij,ij->i",self.WINGNORM,locF)                    # project local force vector to wing norm
        F = np.einsum("ij,i->ij",self.WINGNORM,projF)                       # calculate wing norm force vector
        XS = np.einsum("ij,ij->i",self.bodX,locF)                           # project into body Xs for each element
        YS = np.einsum("ij,ij->i",self.bodY,F)                              # project into body Ys for each element
        ZS = np.einsum("ij,ij->i",self.bodZ,F)                              # project into body Zs for each element
        X, Y, Z = np.sum(XS), np.sum(YS), np.sum(ZS)                        # sum linear forces in body X, Y, Z
        L = np.sum(ZS*self.dY)+np.sum(YS*self.dZ)                           # sum moments about X axis (roll)
        M = -np.sum(ZS*self.dX)-np.sum(XS*self.dZ)                          # sum moments about Y axis (pitch)
        N = np.sum(XS*self.dY)+np.sum(YS*self.dX)                           # sum moments about Z axis (yaw)
        return np.asarray([[X, Y, Z], [L, M, N]])

The first step is to take the cross product to determine the velocity of each element control point. I then project the velocity vector onto the wing normal vector, and the body x-axis using Einstein summation to take the dot product (this is convenient for dealing with large arrays of vectors). From there, I find the squared magnitude of this velocity vector, the angle of attack, and the magnitude of the lift/drag forces and the pitching moment. I find the components in the local X,Y,Z axis using conventional strip theory (where, as mentioned, the y component is always zero), and then project it back onto the wing norm vector (which is in the body frame). From there, it can be projected back into the body axis system, after which it’s a fairly straightforward process of finding the forces and moments.

The reason for doing it this way is that if we wanted to create a wing with dihedral, the dihedral reduces the local angle of attack. I wanted to code in such a way that if I wanted to, I could play around with these types of configuration (though that’s still mostly untested and unvalidated).

Now to test it out. I built a wing with the following shape and dimensions:

CG = [0,0,0]
Span = 1.8m
Root Chord = 0.25
Lambda = 0.44
LE Sweep = 15 degrees
Dihedral = 0 degrees

Plot it, along with the element locations: Figure_1

Now, lets pass it a velocity vector (body frame), and calculate the forces:

V = [15, 0, 1]
F = [-1.16360633, 0., 19.56530196]
M = [0.00000000e+00, -2.90892967e+00, -2.77555756e-17]

Seems reasonable. Roll is zero, and yaw is very close to zero (potentially a numerical error) which is what you would expect. We have a Z component of 19.5N, or roughly 2kg, which certainly seems ballpark.

For reference, the dimensions for a wing I built whilst on exchange to NUAA in China gives a Z force component of 26N for a velocity vector of [15,0,1]. That aircraft cruised at around 15m/s, with a total weight of 2.85kg. The value we’re getting is pretty close to that, which is a good sign.

Now let’s calculate our stability derivatives using:

$CX = \frac{F}{q S_{ref}}$
$CL = \frac{L}{q S_{ref}b}$
$CL = \frac{M}{q S_{ref} \bar{c}}$
$CN = \frac{N}{q S_{ref} b}$

And compare our calculation to one done in TornadoVLM, which uses vortex lattice method to calculate the lifting force of a wing divided into quadrilateral panels. We’ll use a velocity vector of roughly 10m/s magnitude, with a 5 degree angle of attack:

V = [9.961, 0, 0.872]
$\omega$ = [0, 0, 0]

Strip theory gives us:

F = [-0.88130927, 0., 11.31456245]
M = [0., -1.68222634, 0.]

CX = [-0.04302242, 0., 0.55233719]
CL = [ 0., -0.15258542, 0.]

And Tornado gives us:

CX = [-0.0357, 0., 0.5690]
CL = [0., -0.1871, 0.]

They’re actually pretty close, and this is without doing any kind of 3D correction (I’m using the ideal lift-curve slope of $2 \pi$ ). Let’s put in some angular velocity and see how they compare:

V = [9.961, 0, 0.872]
$\omega$ = [ $\pi/2$ , $\pi/4$ , 0]

Strip theory:
CX = [-0.08657472, 0., 0.62970771]
CL = [-0.11854038, -0.17696761, 0.02177258]

And VLM:

CX = [-0.0169, 0., 0.6116]
CL = [-0.1074, -0.1682, -0.0272]

And a bit of crosswind:

V = [9.807, 1.754, 0.858]
$\omega$ = [0, 0, 0]

Strip theory:
CX = [-0.04317599, 0., 0.55331712]
CL = [0., -0.15285613, 0.]

VLM:
CX = [-0.0298, 0, 0.4653]
CL = [-0.0413, -0.1428, -0.0015]

We seem to be overestimating the force in the x-direction and z-directions, but otherwise it looks ballpark. It’s still in the process of being validated, so it’s possible that there are still some errors or oversights in the code. That said, I’m always happy to take feedback and suggestions to improve it, so if you have any thoughts let me know.

Further down the line, I’ll tidy up the environment and do a few more experiments to verify the code. From there, the plan is to start playing around with a full rigid-body dynamics simulation; the code for this already exists, but it’s very much a rough draft. Each vehicle belongs to an environment, and has its own state update function that calculates the forces acting on the body. These forces are used to determine the linear and angular acceleration in the body axis system, and a rotation matrix is used to find components in the inertial frame. A numerical integrator is used to step the simulation forward by some timestep dt (usually using a semi-inplicit Euler method).

As always, the code for this is available on my GitHub, and I’m always happy to take feedback, corrections, and/or improvements to my code. Until next time!

Ciao

Simulation of Elastic Collisions in Python

So this is a short write-up of an elastic collision simulation I wrote in Python. The reasons for this were 1) a bit of fun, and 2) I’m interested in playing in around with neural-network-based reinforcement learning in an object oriented framework, similar to this work being done at MIT. The reason for this is that I believe model-based reinforcement learning is more biologically plausible than model-free methods, and because I believe that model-based methods allow for more powerful behavior due to the extra layer of abstraction.

So for the structure of our simulation, I’d like to separate the physics from the visualization. To do this, I create an Environment class that is a container for all of our ball objects, and a separate Particle class that handles the state of each particle. The Environment object tracks not only the balls, but also runs the simulation update. It has a series of dimensions that define the size of the space, a list of particles within the space, and functions that handle bouncing (when a particle collides with the walls of our space) and collisions (when two or more particles collide with one another).

The Particle class holds the environment it belongs to, the properties of the particle (radius, mass, density, color) and also its state (vectors X, V, A, denoting position, velocity, and acceleration, respectively). The particle also includes functions for adding a force, position, velocity, and acceleration, which makes life easier for us when detecting collisions, and if we want to do things like add a constant gravity vector. We can also make an attract function that will let us do n-body gravitational simulations such as planet formation and orbits.

For quick reference of the structure, the environment class will look this this:

class Environment():
    def __init__(self, DIM, GRAVITY, dt):
        self.DIM = DIM
        self.GRAVITY = GRAVITY
        self.density = density
        self.dt = dt
        self.particles = []

    def update(self):
        # code goes here

    def addParticle(self, p):
        # code goes here

    def bounce(self, p):
        # code goes here

    def elasticCollision(self, p1, p2):
        # code goes here

    def plasticCollision(self):
        # code goes here

Where DIM is a vector containing the dimensions of the environment, GRAVITY is a vector that defines gravitational acceleration, and dt is the finite time step of the simulation.

And the Particle class will look like this:

class Particle():
    def __init__(self, env, X, V, A, radius, mass, density):
        self.env = env
        self.X = X
        self.V = V
        self.A = A
        self.radius = radius
        self.mass = mass
        self.density = density
        self.colour = (0, 0, int((density-5)/95*240+15))

    def addForce(self, F):
        # code goes here

    def addAcceleration(self, acc):
        # code goes here

    def addVelocity(self, vel):
        # code goes here

    def addPosition(self, pos):
        # code goes here

    def attract(self, particle):
        # code goes here

    def stateUpdate(self):
        # code goes here

Where we initialize a particle with its environment, state variables, and physical properties. For the sake of brevity, I’ll leave the attract function, and plastic collisions for another day. For this post, I’ll just be going over the implementation of basic elastic collisions. For those who are curious, the color in the above code is defined by the density of the object, with the intention that denser objects are a darker shade of blue than comparatively less dense objects.

For the addForce, addPosition, addVelocity, and addAcceleration functions, these are relatively straightforward. if we use a vectorized implementation (which we should, since it makes the code much less complicated, and more extensible should we wish to go into 3D) we can just do:

    def addForce(self, F):
        self.A += F/self.mass

    def addAcceleration(self, acc):
        self.A += acc

    def addVelocity(self, vel):
        self.V += vel

    def addPosition(self, pos):
        self.X += pos

For the state update we can do:

    def stateUpdate(self):
        self.V += self.A*self.env.dt
        self.X += self.V*self.env.dt

This is a semi-implicit Euler method that is generally the gold star for numerical integration in games (it’s much more stable than the explicit Euler method).

For our environment class, our addParticle function is fairly straightforward:

    def addParticle(self, p):
        self.particles.append(p)
        p.addAcceleration(self.GRAVITY)

When we initialize a particle, we give it a state, and then pass it to the environment where it is added to the list of all particles. We then add the environment’s gravitational acceleration, which — in most cases — will either be a vector [0, 0], or something similar to Earth’s gravitational acceleration [0, 9.81]. You could have a bit of fun here, but for our purposes we’ll just stick with zero vectors most of the time.

Next, we need to handle bouncing off the walls of the simulation so that our particles won’t drift out of view. This raises an interesting predicament: since we have a finite time step, we are likely to get situations in which the edge of our particle passes through the wall or another particle:

Screen Shot 2017-10-01 at 22.25.30

I’m not sure what the conventional terminology is here, but I’ll refer to it as clipping, since it’s conceptually similar to wall clipping from old FPS games. This problem is inherent in all simulations since by definition, we can’t have an infinitesimal time step. To handle it, you either need use an a priori technique (scanning ahead of time so that your particle never clips the wall) or an a posteriori technique (detecting clipping after it has occurred, and then running a collision routine). Very high fidelity simulations using a priori techniques, but most games will use a posteriori since it’s much easier to implement. In this case, I went with the a posteriori method.

When we detect a wall collision using an a posteriori technique, we need to do two things: 1) move the edge of the particle to the edge of the wall, and, 2) reverse the velocity component perpendicular to the reflection axis (i.e. when bouncing off of a vertical wall, we want to flip the velocity vector in the x-direction). The reason for the second task is obvious, but what about the first one? We need to move the particle because if we reverse the velocity vector and then step the simulation forward, the particle may be deep enough in the wall that it is still clipped in the next time step. When this happens, the velocity vector will keep reversing, and the particle will vibrate. Because of this, resetting the position of the particle is necessary.

Thankfully, the implementation of this is pretty straightforward:

    def bounce(self, p):
        for p in self.particles:
            i = 0
            for x in p.X[0]:
                if x > self.DIM[i]-p.radius:
                    dist = p.radius-(self.DIM[i]-x)
                    p.addPosition(-dist)
                    tmp = np.zeros(np.size(p.V))
                    tmp[i] = -2*p.V[0][i]
                    p.addVelocity(tmp)
                elif x < p.radius:
                    dist = p.radius-x
                    p.addPosition(dist)
                    tmp = np.zeros(np.size(p.X))
                    tmp[i] = -2*p.V[0][i]
                    p.addVelocity(tmp)
                i += 1

First off, we loop through all particles. This isn’t particularly efficient, but with n-body simulations, very little is. I create a counter i, and loop through the dimensions x of the particle p. We have two cases where our particle clips: 1) when particle’s coordinate is greater than the dimension of the environment in that axis, minus the particle’s radius (that is, we’ve exceeded the max boundary condition), or if the particle’s coordinate is less than it’s radius (we’ve exceeded the minimum boundary condition). We calculate the distance that the particle has exceeded the boundary by, and then use the addPosition function to correct its location. Then, we create a zero vector, add twice the velocity component in the negative direction (for that dimension) and then use the addVelocity function to reverse the velocity of the particle.

Astute readers might note that I’m using 2D arrays here, when only 1D vectors are needed. This was an issue I had with numpy, and I’m not quite sure how to fix it. I eventually decided to just roll with it.

Now for the elastic collision function. There are two ways that we can do this, either using vectors, or angles. The vector implementation is much easier, so I’ll stick with that. From Wikipedia, the velocity of two moving particles after a collision in 2D will be:

$v_1'=v_1-\frac{2m_2}{m_1+m_2}\frac{\langle v_1-v_2,x_1-x_2 \rangle}{\|x_1-x_2\|^2}(x_1-x_2)$
$v_2'=v_2-\frac{2m_1}{m_1+m_2}\frac{\langle v_2-v_1,x_2-x_1 \rangle}{\|x_2-x_1\|^2}(x_2-x_1)$

As with the bouncing function, we will also need to check for clipping, and reset the position of the particles. This could get a bit hairy in instances where three or more particles are clipping, but that’s pretty difficult to control for, and fairly rare, so we’ll just deal with the case where two particles are clipping one another. In this instance, we can calculate the distance particles need to shift by using:

$c = \|x_2-x_1\|-(r_1+r_2)$

Where c is the clipping distance. Though it’s probably not correct to do so, I move each particle half of the distance of the offset, in the direction opposite to the collision. In reality, the repositioning should probably be proportional to the speed of each particle in the collision, but I’ll save that for another day. The easiest way to do reposition particles is to calculate a normalized vector describing the direction of the collision, and then move each particle in that direction:

$x' = x + |x_2-x_1| \frac{c}{2}$

In code, this is:

    def elasticCollision(self, p1, p2):
        dX = p1.X-p2.X
        dist = np.sqrt(np.sum(dX**2))
        if dist <span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span><span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span>< p1.radius+p2.radius:
            offset = dist-(p1.radius+p2.radius)
            p1.addPosition((-dX/dist)*offset/2)
            p2.addPosition((dX/dist)*offset/2)
            total_mass = p1.mass+p2.mass
            dv1 = -2*p2.mass/total_mass*np.inner(p1.V-p2.V,p1.X-p2.X)/np.sum((p1.X-p2.X)**2)*(p1.X-p2.X)
            dv2 = -2*p1.mass/total_mass*np.inner(p2.V-p1.V,p2.X-p1.X)/np.sum((p2.X-p1.X)**2)*(p2.X-p1.X)
            p1.addVelocity(dv1)
            p2.addVelocity(dv2)

Our function takes in two particles as arguments, and then calculates the vector dX that describes the direction of the collision. The total distance between the particles is the magnitude of this vector, which is compared to (r_1+r_2). If the distance is less than this value, we know we have a collision, and so we calculate the clipping distance (offset) and reposition each particle using the addPosition function (similar to how we did with the bounce function). From here we solve for the change in velocity for both particles as above, and update the velocity of each.

Finally, we have an update function in the Environment class that runs the simulation for all particles in the container. We loop through each particle and run the state update and bounce functions, and then loop through all other particles to check for collisions. In code:

    def update(self):
        for p1 in self.particles:
            p1.stateUpdate()
            self.bounce(p1)
            for p2 in self.particles:
                if p1 != p2:
                    self.elasticCollision(p1, p2)

This isn’t a particularly efficient way to do things, but it’s very simple, and more than adequate for our needs.

In principle, any visualization library can be used to view the simulation (which is why I made a point of separating out the Environment and Particle functions, so that we could isolate the physics). In this case, I used pygame to create a window, and draw our objects on the screen. The code for this is given below:

import physics as pf
import pygame
import pygame.gfxdraw
import numpy as np

# initialize physics simulation
DIM = np.asarray([400, 400])
GRAVITY = np.asarray([0, 0])
dt = 0.01
env = pf.Environment(DIM, GRAVITY, dt)

pygame.init()
screen = pygame.display.set_mode((DIM[0], DIM[1]))
pygame.display.set_caption('Elastic Collision Particle Simulation')

number_of_particles = np.random.randint(5,10)

for n in range(number_of_particles):
radius = np.random.randint(10, 20)
density = np.random.randint(50, 75)
mass = (4/3)*density*np.pi*radius**3
X = np.random.rand(1, 2)*(DIM-radius)+radius
V = np.random.rand(1, 2)*75
A = np.asarray([0, 0])
particle = pf.Particle(env, X, V, A, radius, mass, density)
env.addParticle(particle)

def display(env):
for p in env.particles:
pygame.gfxdraw.filled_circle(screen, int(p.X[0][0]), int(p.X[0][1]), p.radius, p.colour)

running = True
while running:
for event in pygame.event.get():
if event.type == pygame.QUIT:
running = False

screen.fill([255, 255, 255])
env.update()
display(env)
pygame.display.flip()

If you go to my GitHub here and download the code, when you run it, you should see something that looks like this:

Screen Shot 2017-10-01 at 22.20.11

It’ll keep running and the particles will keep bouncing off of one another until you get bored of it. At slow speeds you sometimes get particles sticking, which results in slightly unrealistic behavior, but this is rare. I would likely need to write code for stiction in order to handle this, but that will have to wait for another day. As always, feel free to comment, use the code, and make your own changes.

Ciao!

A quad-rotor on Mars

How much bigger would the blade radius of a quadrotor on mars need to be than a quadrotor on Earth? Alternatively, how much more power would you need to hover with an equivalent quadrotor on Mars, as compared to Earth? This post will go through it analytically, step-by-step, to give you the exact ratio — based on surface atmospheric density and gravity — for a hovering vehicle.

Using Froude’s Momentum theory, our thrust and power equations are:

$F_T = \dot{m} \Delta v \ \ \ \ (1)$
$P_T = F_T v_D \ \ \ \ (2)$

Where $F_T$ is the thrust force, $\dot{m}$ is mass flow rate, and $\Delta v$ is the change in velocity through a hypothetical control volume enclosing the rotor disc. The power ( $P_T$ ) at the disc is the thrust force multiplied by the induced velocity at the disc. We can get an expression for the disc velocity ( $v_D$ ) using the following equation:

$P_T = F_T v_D = \frac{1}{2}\dot{m}(v_2^2-v_1^2) \ \ \ \ (3)$

Factorizing using the difference of two squares, we get:

$F_T v_D = \frac{1}{2}\dot{m}(v_2-v_1)(v_2+v_1) \ \ \ \ (4)$

Where:

$\dot{m}(v_2-v_1) = \dot{m}\Delta v = F_T \ \ \ \ (5)$

Thus:

$v_D = \frac{v_2+v_1}{2} = \frac{v_1+v_2}{2} \ \ \ \ (6)$

We can plug this into Eqns. 1 and 2 to get:

$F_T = \rho A_D (\frac{v_1+v_2}{2})(v_2-v_1)=\frac{1}{2} \rho A_D (v_2^2-v_1^2) \ \ \ \ (7)$

$P_T = \frac{1}{4} \rho A_D (v_2^2-v_1^2)(v_1+v_2) \ \ \ \ (8)$

Since we are hovering, we set $v_1$ to zero, leaving us with:

$F_T = \frac{1}{2} \rho A_D v_2^2 \ \ \ \ (9)$

$P_T = \frac{1}{4} \rho A_D v_2^3 \ \ \ \ (10)$

This is the ideal performance of our rotor/prop – that is, it is theoretically the best we can ever do. In reality, our performance will be much worse, but for the case of a quick sanity check, this is a good starting point.

To compare the performance of a Martian quad, and an Earth quad, we want to put these into a form where we can take the ratio of either the power or disc area needed in order to hover on Mars compared to Earth. This requires fixing our thrust to ensure that we are hovering in both cases, and then solving for the ratio of the two. We start by calculating the far-field velocity behind the rotor, $v_2$ :

$v_2 = \sqrt{\frac{2F_T}{\rho A_D}} \ \ \ \ (11)$

$P_T = \frac{1}{4} \rho A_D (\frac{2F_T}{\rho A_D})^{\frac{3}{2}} \ \ \ \ (12)$

$P_T = (2^{-2})(2^{\frac{3}{2}}) (\rho^1)(\rho^{-\frac{3}{2}}) (A_D^1)(A_D^{-\frac{3}{2}})(F_T^{\frac{3}{2}}) \ \ \ \ (13)$

$P_T = (2^{-\frac{1}{2}}) (\rho^{-\frac{1}{2}})(A_D^{-\frac{1}{2}})(F_T^{\frac{3}{2}}) = \sqrt{\frac{F_T^3}{2 \rho A_D}} \ \ \ \ (14)$

The ratio of Martian gravity to Earth’s gravity is 0.37. Likewise, Mars’ surface air density is roughly 0.02 kg/m^3, compared to Earth’s 1.225 kg/m^3. Mass is constant in both cases. This gives us the following relations:

$\frac{P_{Tm}}{P_{Te}} = \frac{\sqrt{\frac{F_{Tm}^3}{2 \rho_m A_D}}}{ \sqrt{\frac{F_{Te}^3}{2 \rho_e A_D}}} = ( \frac{\rho_e F_{Tm}^3}{\rho_m F_{te}^3} )^\frac{1}{2} \ \ \ \ (15)$

Where:

$F_{Tm} = mg_m = mg_e( \frac{g_m}{g_e} ) = 0.37mg_e \ \ \ \ (16)$

Going through and canceling from Eqn. 15 leaves us with:

$\frac{P_{Tm}}{P_{Te}} = ( \frac{\rho_e (0.37mg_e)^3}{\rho_m (mg_e)^3} )^\frac{1}{2} = (\frac{1.225 \times 0.37^3}{0.02})^\frac{1}{2} = 1.76139 \ \ \ \ (17)$

Therefore, for a disc of any given area, we would need 1.76 times more power to be able to hover in the Martian atmosphere (at sea level) than we would on Earth. What about if we wanted to fix the power and compare the disc area? How big would our disc need to be in order to hover on Mars using equivalent power as a vehicle on Earth? In that case, we would take the following ratio:

$A_D = \frac{F_T^3}{2 \rho P_T^2} \ \ \ \ (18)$

$\frac{A_{Dm}}{A_{De}} = \frac{\frac{F_{Tm}^3}{2 \rho_m P_{T}^2}}{ \frac{F_{Te}^3}{2 \rho_e P_{T}^2}} = \frac{0.37^3 \times 1.225}{0.02} = 3.1 \ \ \ \ (19)$

Thus, you would need roughly 3 times the disc area to produce the same amount of static thrust on Mars as on Earth, for a fixed power. This translates to a blade radius of 1.76 times greater for Mars than on Earth, the same as our power ratio.

This is actually a lot closer than I was expecting, but there are many other factors that could have an impact. For example, it only takes into account the surface atmospheric densities of Mars and Earth, when density actually varies with height. I haven’t looked into whether or not the rates at which each atmosphere change are constant relative to one another, and it would be difficult to make a guess without actually looking at the numbers.

The second factor that warrants a mention is efficiency. Factors such as compressibility can have an impact on the efficiency of the rotor, which means that the difference in the speed of sound on Earth compared to Mars could limit how closely each vehicle is able to get to the theoretical limit. For the sake of argument, if the speed of sound is very low on Mars, shock formation on the blade tips will increase the drag of the prop, and increase the power needed to achieve the same induced velocity at the disc. The effects of compressibility are likely to be quite different on Mars as compared to Earth, and – whilst these should have no bearing on the theoretical performance that can be achieved – will result in a much lower figure of merit (efficiency).

Handwritten digit prediction using a neural network written in MATLAB/Octave

So this is a quick follow-up to my previous post, where I went through a neural network implementation for Octave. This time I’m going to be using the same network to classify a series of handwritten digits, the source of which you can find here.

The basic procedure is as follows:

Unpack the data and put it into a form that our network can use;
Visualize the data so that we can verify it;
Convert labels to an output form that is useful to us (in this case, zeros and ones in a 10 label set);
Train the network using the training data and converted labels;
Use the trained network to classify the test data set;
Visualize the test set along with the network’s prediction.

It seems like a lot, but most of this is fairly routine data manipulation. In addition to this, some of the work has already been done for us thanks to the good folks at Mathworks.

So, step one:

clear all;
close all;

% import neural network functions
addpath(genpath('/home/seanny/Dropbox/Documents/Octave Scripts/blaze'))

tr = csvread('train.csv', 1, 0);                  % read train.csv
sub = csvread('test.csv', 1, 0);                  % read test.csv

First off, I’m adding the path to the neural network functions to our current load path. This means that we can access those functions without having to copy the files directly over to our working directory. The next thing this code does is read both the test and training data from the csv files.

For our next snippet of code:

figure                                          % plot images
colormap(gray)                                  % set to grayscale
for i = 1:25                                    % preview first 25 samples
    subplot(5,5,i)                              % plot them in 6 x 6 grid
    digit = reshape(tr(i, 2:end), [28,28])';    % row = 28 x 28 image
    imagesc(digit)                              % show the image
    title(num2str(tr(i, 1)))                    % show the label
    set(gca, 'XTick', []);
    set(gca, 'YTick', []);
end

Credit where credit is due – this piece of code is actually from Mathworks, and is useful for visualizing the data to make sure the representation is correct. In this case, the training data comes in an m×785 array, where m is the number of test cases. Why 785? Well, 785 is $28^2$ (the size of an image that is 28 pixels wide, and 28 pixels high) plus a single column for the training data label. To make this data usable for our network, we need to put it into the format our network needs:

% pull out data
DATA = tr(:,2:end);
LABEL = tr(:,1);

% put data into the shape we want it (each example is a column vector)
DATA = DATA';
LABEL = LABEL';
TEST = sub';

% map data labels to values 0 and 1
CLASSIFICATION = zeros(10,size(LABEL,2));
for i=1:size(LABEL,2),
  CLASSIFICATION(LABEL(i)+1,i) = 1;
end

The reason for this is that our network takes in a column vector as input, since this is intuitively how most neural network diagrams are drawn. For m examples, we have an n×m array, where n is the number of input neurons, and m is the number of examples in our dataset.

It’s also useful to turn our data labels into column vectors as well. This let’s us represent, say, the number 7 as follows:

$7 = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 0 \\ 0 \end{bmatrix}$

Our first element is zero, and every element afterwards represents a digit from 1 to 9. If our digit is a 7, we then “activate” the 8th output neuron (since the first output neuron corresponds to 0). The advantage of doing it this way is that we can use sigmoid activation functions for our neurons. Since a sigmoid function outputs a value between 0 and 1, we cannot intuitively get a ‘7’ without also using some kind of linear function in the output layer. Sigmoids are useful in this context because we don’t need a deep network to learn digit identification, and because they’re well known. Since we can get away with a simpler model by mapping the outputs to ones and zeros, it’s worth doing in this context.

Once that’s done, we build our network using the functions outlined in the previous write-up:

% create neural network
TOPOLOGY = [784 50 10];
[THETAs Xs] = nnbuild(TOPOLOGY);
ACTFNS = cell(1,size(TOPOLOGY,2)-1);
ACTFNS(1:end) = @sigmoid;

784 is the size of the image when put into the form of a single vector (each neuron corresponds to one pixel), 50 is the number of neurons in the hidden layer, and we have 10 outputs to represent our 10 integers from 0-9. So why 50 hidden neurons? Well, it was a bit of an arbitrary choice. I recalled from the Coursera machine learning course that they used 25 hidden neurons to get a pretty effective result. With many more training examples, I decided to try 50 out, and it seemed to work.

Next, we set our options and train the network:

% train the network on the dataset
options = optimset('MaxIter',1000);
THETAs = nntrain(THETAs,Xs,DATA,CLASSIFICATION,TOPOLOGY,ACTFNS,@crssenterr,@fmincg,0,options);
OUT = cell2mat(nnfeedforward(THETAs,Xs,TEST,ACTFNS)(end));
TEST = TEST';

To train the network, I opted for 1000 iterations, though for a serious problem you’d try to minimize the error as much as possible. Once we’ve done this, we use our trained network to make a prediction using the test set. I take the test data and put it back into its original form, since we will use the same routine as above to visualize the first 25 examples of the test set.

Finally, it’s time to use same techniques as above to convert our prediction into its integer value, and visualize the results:

% convert prediction back into its integer value
PREDICTION = zeros(size(OUT,2),1);
for i=1:size(OUT,2),
  PREDICTION(i) = find(OUT(:,i)==max(OUT(:,i)))-1;
end

% prediction on test set for first 25 samples
figure
colormap(gray)
for i = 1:25
    subplot(5,5,i)
    digit = reshape(TEST(i, 1:end), [28,28])';
    imagesc(digit)
    title(num2str(PREDICTION(i, 1)))
    set(gca, 'XTick', []);
    set(gca, 'YTick', []);
end

First we build allocate memory to the prediction set, and then we loop through the columns of our network output to map the maximum value to an integer (since we are doing this numerically, our neuron outputs will never actually == 1 or 0). This gives the following training and prediction outputs:

As you can see, with 1000 training iterations (about 5 minutes), our network gets most of its predictions correct, even the relatively difficult ‘4’ in the middle of the test output (example 13). Surprisingly, it gets images 4 and 5 wrong, even though — conceptually — they seem like they might be simpler examples. Still, 23/25 isn’t bad for a bit of fun; if the rest of the dataset yields a similar error rate, that would put us at a 92% correct prediction rate (though given that there are thousands of such images, I wouldn’t recommend checking this by hand).

As a potential upgrade to the code as it currently stands, I might try implementing a convnet using MATLAB/Octave’s inbuilt convolution functions, and FFTW. This would involve writing a new builder and topological structure, and potentially a new feedforward method, though I suspect that with some clever code, the existing backpropagation function could still be used. I’ll save that for a rainy day though, since next time, I’ll be putting up an article on my ‘deep’ q network, and outlining some of the problems I encountered while putting it together.

As always, the code is available on my github page; feel free to use it, distribute it, and make your own improvements. If you want to try this example for yourself, you’ll need to download my neural network functions into your working directory, and/or change the path in the classifier.m file to point towards whatever directory you store it in. You’ll also need to download the MNIST data from the link above, as it’s too large for me to upload to github.

Ciao until next time.

Fully vectorized, general topology neural network implementation in GNU Octave

This is the as-promised second article in my machine learning series. In this write-up, I’ll go over the maths and implementation of a neural network framework I built in Octave. This is likely to be a pretty long read, so you’ll have to bear with me. We’ll be using Octave again, since we can lean on its inbuilt solvers and functions to make life easier for us. Given that this implementation will be completely vectorized, this will also allow us to take advantage of Octave’s underlying linear algebra capabilities, which will speed up training considerably.

So first of all, what is a neural network? Well, it’s a network of neurons. And what’s a neuron?

A neuron is a type of cell in your brain, and consists of three main parts – the axon, the body, and the dendrite. Briefly, the neuron receives an input signal into its dendrites, from a neighboring neuron’s axon. This signal travels along its dendrite into the cell’s body, where it is transformed, and a new signal is sent out to neighboring neurons along its own axon. This can be seen from the figure below:

neuron

We can model this process mathematically, and create networks of such artificial neurons. So-called artificial neural networks allow us to perform complex regression and classification tasks that machines typically struggle with, and are currently a major area of research in the broader field of machine learning. Our artificial model is a fairly simple one that assumes several fully connected layers of such neurons, and is modeled using a chained series of bipartite graphs:

neural network diag

From the figure, we can see that each neuron in a given layer (the circles) is connected to each neuron in both the previous and following layers (if they exist). Any given neuron takes in the outputs from the previous layer, multiplies them by a corresponding weighting value (in this case, W11 to Wm1 for the input to hidden layer H1, and then V11 to Vn1 for H1 to the output layer), sums them, and then passes this value to an activation function that transforms it in some way. This transformed value is then the output of our current neuron, which is passed to the next layer of neurons, and multiplied by the next layer of weighting values. In matrix notation, this is:

$a_l = f(z_l) \ \ \ \ (1)$
$z = \theta_l^TX_{l-1} \ \ \ \ (2)$

Where \theta is our weight matrix (W_11 … Wm1, V11 … Vn1, etc.), and X is a feature vector that is composed of the outputs of the previous layer. So for the hidden layer in the diagram above, X would be the input, \theta would be a matrix of weights W, and these would be passed to an activation function f(z_l). For the output layer, the feature vector X is the output of the hidden layer, and the matrix \theta is the matrix of weights containing all values V11 … Vn1.

The process of training a neural network is the process of refining the weighting values so that for a given input, the neural network produces the correct output. To do this, we use a process called backpropagation. This is where we first feed forward through the network to get an output value, and then find the error between our expected output, and what we actually got. We use this information to minimize a cost function using some form of iterative procedure (generally gradient descent). Mathematically:

Cost function error at the output:

$J(\theta_o) = \frac{1}{2}(a_o-y_o)^2 \ \ \ \ (3)$

Where a_o is the output of our neural net, y_o is the expected output value, and the 0.5 scalar out the front is purely for mathematical convenience. We want to find the minimum of this function by finding the error caused by the weighting values, and then stepping in the direction that minimizes this error:

$\Delta \theta = \alpha \frac{\partial J}{\partial \theta_l} \ \ \ \ (4)$
$\theta_l \leftarrow \theta_l - \Delta \theta_l \ \ \ \ (5)$

Where alpha is an arbitrary step size that we choose (say, 0.1 or 0.01). The critical challenge, then, is to calculate $\frac{\partial J}{\partial \theta_l}$ , which is done using the chain rule. For three weight matrices (a four layer network):

Output layer to H2:

$\frac{\partial J}{\partial \theta_o} = \frac{\partial J}{\partial f_o(z_o)} \frac{\partial f_o(z_o)}{\partial z_o} \frac{\partial z_o}{\partial \theta_o} \ \ \ \ (5)$

H2 to H1:

$\frac{\partial J}{\partial \theta_{o-1}} = \frac{\partial J}{\partial f_o(z_o)} \frac{\partial f_o(z_o)}{\partial z_o} \frac{\partial z_o}{\partial f_{o-1}(z_{o-1})} \frac{\partial f_{o-1}(z_{o-1})}{\partial z_{o-1}} \frac{\partial z_{o-1}}{\partial \theta_{o-1}} \ \ \ \ (6)$

H1 to input layer:

$\frac{\partial J}{\partial \theta_{o-2}} = \frac{\partial J}{\partial f_o(z_o)} \frac{\partial f_o(z_o)}{\partial z_o} \frac{\partial z_o}{\partial f_{o-1}(z_{o-1})} \frac{\partial f_{o-1}(z_{o-1})}{\partial z_{o-1}} \frac{\partial z_{o-1}}{\partial f_{o-2}(z_{o-2})} \frac{\partial f_{o-2}(z_{o-2})}{\partial z_{o-2}} \frac{\partial z_{o-2}}{\partial \theta_{o-2}} \ \ \ \ (7)$

Working through these derivatives in order:

$\frac{\partial J}{\partial f_o(z_o)} = (f_o(z_o)-y_o) \ \ \ \ (8)$

$\frac{\partial f_o(z_o)}{\partial z_o} = f_o'(z_o) \ \ \ \ (9)$

$z_o = \theta_o^T f_{o-1}(z_{o-1}) \ \ \ \ (10)$

$\frac{\partial z_o}{\partial \theta_o} = f_{o-1}(z_{o-1}) \ \ \ \ (11)$

$\frac{\partial z_o}{\partial f_{o-1}(z_{o-1})} = \theta_o^T \ \ \ \ (12)$

$\frac{\partial f_{o-1}(z_{o-1})}{\partial z_{o-1}} = f_{o-1}'(z_{o-1}) \ \ \ \ (13)$

$z_{o-1} = \theta_{o-1}^T f_{o-2}(z_{o-2}) \ \ \ \ (14)$

$\frac{\partial z_{o-1}}{\partial \theta_{o-1}} = f_{o-2}(z_{o-2}) \ \ \ \ (15)$

$\frac{\partial z_{o-1}}{\partial f_{o-2}(z_{o-2})} = \theta_{o-1}^T \ \ \ \ (16)$

$\frac{\partial f_{o-2}(z_{o-2})}{\partial z_{o-2}} = f_{o-2}'(z_{o-2}) \ \ \ \ (17)$

$z_{o-2} = \theta_{o-2}^T input \ \ \ \ (18)$

$\frac{\partial z_{o-2}}{\partial \theta_{o-2}} = input \ \ \ \ (19)$

Careful inspection of the derivatives shows that this is a repeating pattern, whereby the number of derivatives grows by two for each layer as we travel backwards through the network. We can chain these together to get a recursive algorithm for backpropagation, by setting:

$\delta_o = (f_o(z_o)-y_o) f_o'(z_o) \ \ \ \ (20)$

$\delta_{o-1} = \delta_o \theta_o^T f_{o-1}'(z_{o-1}) \ \ \ \ (21)$

$\delta_{o-2} = \delta_{o-1} \theta_{o-1}^T f_{o-2}'(z_{o-2}) \ \ \ \ (22)$

Where:

$\frac{\partial J}{\partial \theta_o} = \delta_o f_{o-1}(z_{o-1}) \ \ \ \ (23)$

$\frac{\partial J}{\partial \theta_{o-1}} = \delta_{o-1} f_{o-2}(z_{o-2}) \ \ \ \ (24)$

$\frac{\partial J}{\partial \theta_{o-2}} = \delta_{o-2} f_{o-3}(input, \theta_{o-L}) \ \ \ \ (25)$

The delta value for any neuron in any hidden layer is thus:

$\delta_{o-n} = \delta_{o-n+1} \theta_{o-n+1}^T f_{o-n}'(z_{o-n}) \ \ \ \ (26)$

And the cost function gradient for that hidden layer is:

$\frac{\partial J}{\partial \theta_{o-n}} = \delta_{o-n} f_{o-n}(z_{o-n-1}) \ \ \ \ (27)$

The algorithm is then to calculate the deltas and gradients in the output layer, and then use these values to calculate the deltas and gradients in each previous layer successively, until we get to the input layer. At this point, we know the cost function and the gradients, and we can use these to step in the direction of the minimum cost, as per equations 4 and 5.

Astute readers might notice that as the number of derivatives that are chained together increases with the number of layers, this could potentially present a problem for training. In fact, this is exactly what happens — layers close to the output tend to train faster than layers closer to the input, because as the gradients tend towards zero, this has a multiplicative effect on earlier layers. The way to get around this is to use non-sigmoidal activation functions such as the rectifier, or to train deep networks using a ‘deep belief’ method that treats successive layers as an autoencoder or restricted Boltzmann machine.

Though this is a fair bit to go through in one sitting, an implementation of a neural network can help solidify the concept. To do this, we turn to Octave, where we implement the following:

A function to build the matrices for our specified neural network;
A function to feed forward through the network;
A function that implements the backpropagation algorithm; and,
A training function that calls a solver and uses it in conjunction with the previous function.

So, the first of these is our builder function:

function [THETAs Xs]= nnbuild(TOPOLOGY)
  % this function builds a fully connected neural network based on an input
  % topology vector, and returns a structure that contains the nn layers
  % -- Sean Morrison, 2017

  % initialize weight matrices based on topology
  N = max(TOPOLOGY);
  L = size(TOPOLOGY,2);
  THETAs = cell(L-1,1);
  Xs = cell(L-1,1);

  % step through each layer and generate random theta weightings, create output
  % matrices initialized to zero
  for i=2:L,
    TMP = [];
    for j=1:TOPOLOGY(i),
      TMP = [TMP; rand(1,TOPOLOGY(i-1)+1)];
    end
    THETAs(i-1) = TMP;
    Xs(i-1) = zeros(TOPOLOGY(i),1);
  end
end

This function relies on an input topology vector that specifies the number of neurons and layers in the network. For example, a topology of [2 3 3 2] would correspond to an input layer with two neurons, two hidden layers with three neurons each, and an output layer with two neurons. A cell structure is used to store the weight matrices, as each layer will generally have a differently sized matrix. If you print the output of this function, you’ll get a series of matrices, whereby the number of rows in each matrix corresponds to the number of neurons in that layer, and the number columns corresponds to the number of features being fed to each neuron (the number of neurons in the previous layer, plus an additional bias unit).

The next part of our code is the feedforward function:

function Xs = nnfeedforward(THETAs, Xs, INPUT, ACTFNS)
  % this function feeds all inputs forward through the network. Full
  % connectivity is assumed. This function returns weightings theta, and
  % output values A at all nodes through all layers, for all input samples.
  % -- Sean Morrison, 2017

  % feed input into first layer
  _THETA = cell2mat(THETAs(1));
  _Z = nnpush(_THETA,INPUT);
  Xs(1) = ACTFNS{1}(_Z,0);

  % loop through remaining cells to feed forward through network
  length = size(THETAs,1);
  running = true;
  while (running),
    i = 2;
    _THETA = cell2mat(THETAs(i));
    _X = cell2mat(Xs(i-1));
    _Z = nnpush(_THETA,_X);
    Xs(i) = ACTFNS{i}(_Z,0);
    i = i+1;
    if i = length, running = false; end
  end
end

This function takes both the theta weight matrices and the Xs output vectors as arguments, along with an input matrix, and a cell structure that contains the activation functions for each layer. Matrices are retrieved from the relevant cell structures and converted back into matrix form. A push function is used to carry out multiplication:

function _Z = nnpush(_THETA, _X)
  _X = [ones(1,size(_X,2)); _X];
  _Z = _THETA*_X;
end

The reason for this is that we will also use this function during backpropagation (we need to use our z-values, remember?). An alternative to this would be to store the z-values in their own cell structure in the same that way we’ve stored our X values, and pass this cell structure as an argument to our backpropagation function. In this case, I felt that creating a push function for Z values was fairly straightforward, and resulted in few necessary input arguments to our backpropagation function. You might also create an inverse function to do this, but that may be difficult for some activation functions.

Next up, we have our cost function — that is, the function that works out not only the current error of the neural network, but also the gradients for each weight value (using backpropagation).

function [J grad] = nncostfunction(_THETA,THETAs,Xs,INPUT,OUTPUT,TOPOLOGY,errfunc,ACTFNS,lambda)
  % this function takes in a vector of theta values, and calculates the cost
  % function of the network based on a given error function.
  % -- Sean Morrison, 2017

  % get size parameters and create
  m=size(INPUT,2);
  L=size(TOPOLOGY,2);

  % use topology to repack thetas into our NN, so that we can use nnfeedforward
  THETAs = nnpack(_THETA,TOPOLOGY);

  % feedforward to calculate all output values through the network
  Xs = nnfeedforward(THETAs,Xs,INPUT,ACTFNS);

  % evaulate cost function
  [J err] = errfunc(cell2mat(Xs(end)),OUTPUT);

  % calculate the regularization value and gradients. We're so fancy, we're going
  % to build the gradient vector backwards (bottom to top).
  count = size(_THETA,1);                                                       % count the number of elements in column vector _THETA
  grad = zeros(count, 1);                                                       % allocate memory to the gradient vector that we want to return
  reg = 0;                                                                      % initialize regularization parameter to zero
  for i=L:-1:2,                                                                 % loop backwards through layers in our network
    n = TOPOLOGY(i);                                                            % get number of neurons in the current layer
    j = TOPOLOGY(i-1);                                                          % get number of neurons in the layer in front of the current one
    _X = cell2mat(Xs(i-1));                                                     % get output of neurons in the layer in front of the current one
    _Tn = cell2mat(THETAs(i-1));                                                % get thetas for the previous layer
    if i==L,                                                                    % check to see if the current layer is the final output layer
      _Xn_1 = cell2mat(Xs(i-2));
      _DLT = err.*ACTFNS{i-1}(nnpush(_Tn,_Xn_1),1);                             %
      _GRAD = _DLT*cell2mat(Xs(i-2))';                                          % calculate gradient values from the delta values
      _GRAD = [sum(_DLT,2); _GRAD(:)];                                          % collect bias and weight gradient values into a single column vector
    elseif i==2,
      _THETA = cell2mat(THETAs(i));
      _THETA(:,2:end);
      _DLT = _THETA(:,2:end)'*_TMP.*ACTFNS{i-1}(nnpush(_Tn,INPUT),1);
      _GRAD= _DLT*INPUT';
      _GRAD = [sum(_DLT,2); _GRAD(:)];
    else
      _Xn_1 = cell2mat(Xs(i-2));
      _THETA = cell2mat(THETAs(i));
      _DLT = (_THETA(:,2:end)'*_TMP).*ACTFNS{i-1}(nnpush(_Tn,_Xn_1),1);
      _GRAD= _DLT*cell2mat(Xs(i-2))';
      _GRAD = [sum(_DLT,2); _GRAD(:)];
    end
    grad((count-n*(j+1)+1):count) = _GRAD(:);                                   % fill our gradient column vector up from the final layer to the first
    count = count - n*(j+1);                                                    % update count value
    _TMP = _DLT;
    _THETA = cell2mat(THETAs(i-1));
    reg = reg+sum(sum(_THETA(:,2:end).^2));
  end

  % final calculation of J and cost function gradient.
  reg = lambda/2/m*reg;
  J = J+reg;
  grad = grad/m;
end

What’s important to note here is that this function takes in a single column vector of theta values. The reason for this is that we are using MATLAB/Octave’s inbuilt solvers, which take a single column vector argument. We want a column vector of theta values, and a column vector of gradients for those values. This means that we need to pack and unpack our theta weight matrices in order to make use of some of our other functions. Unpacking is simple, since we can just use Octave’s (:) notation to build up a single column vector (which we only need to do once). Packing the matrices is a little bit more complicated, and so there’s a function that handles it:

function THETAs = nnpack(_THETA, TOPOLOGY)
  % takes a column vector of theta values and a topology vector, and rebuilds
  % the weight matrices.
  % -- Sean Morrison, 2017
  L=size(TOPOLOGY,2);
    count=1;
  for i=2:L,
    n = TOPOLOGY(i);
    j = TOPOLOGY(i-1);
    _TMP = reshape(_THETA(count:(count+n*(j+1)-1)),n,j+1);
    THETAs{i-1} = _TMP;
    count = count+n*(j+1);
  end
end

The backpropagation implementation above should look fairly close to our derivation above. We loop backwards through the layers, and have a conditional statement that checks: 1) for the output layer, 2) if we’re using the input layer, otherwise, 3) assume we’re in some hidden layer. We initialize a vector for our gradient values, and build it from the bottom up so that our gradients correspond to the correct weight values.

Finally, there’s the training function that implements this costfunction routine:

function THETAs = nntrain(options,THETAs,Xs,INPUT,OUTPUT,TOPOLOGY,ACTFNS,errfn,lambda)
 % this function uses a solver to train the neural network;
 % fmincg has been chosen due to its more efficient use of memory than fminunc.
 % Since fmincg is not a default solver in Octave, it must be included in the
 % script's root folder.
 % -- Sean Morrison, 2017

  % define function handle for the cost function
  costfunction = @(P)nncostfunction(P,THETAs,Xs,INPUT,OUTPUT,TOPOLOGY,errfn,ACTFNS,lambda);

  % unroll theta matrices into a single vector that can be minimized
  _THETA = [];
  for i = 1:size(THETAs,1),
    _TMP = cell2mat(THETAs(i));
    _THETA = [_THETA; _TMP(:)];
  end

  % minimize cost function using fminunc
  [_THETA cost] = fmincg(costfunction, _THETA, options);

  % pack converged theta values
  THETAs = nnpack(_THETA, TOPOLOGY)

  disp('Network trained.')
  pause;
end

In this function, we set up our solver parameters, unroll our theta values into a single vector, and then call the fmincg solver. The solver will then train our network, after which we print out ‘Network trained.’, and repack our thetas. Once this has been done, our network is ready to be used for prediction.

Now for an AND gate test run:

cost fcn output

This is the cost function output charted as a function of solver iterations. Interestingly, a plateau is reached fairly quickly that the solver then manages to overcome. I’m not sure if this is due to the fminunc solver that was used for this particular case, or something else. In this case, the solver has no trouble finding the global minimum, though this is by no means guaranteed. For more complex functions, the algorithm may instead find a local minimum depending on the initial starting weights, though it has been argued by Yann LeCun among others that this is not necessarily be a problem if the local minimum still produces good performance. Next up is the output:

network output

From the screenshot, we can see that for an input of [1;1], our network correctly predicts an output of 1. Likewise, for inputs of [1;0] and [0;0], the network predicts an output on the order of 10E-16, which is very, very close to zero. Success!

As always, if you want to see the full code, you can do so at my Github page. I haven’t quite tested all of the error functions yet, but I do know that the mean-squared-error and cross-entropy-error functions both seem to work. Ciao until next time!

Blade-element momentum theory: a brief overview, and analysis of an optimized propeller

So this is a quick write-up of a blade element solver I put together based on this code from an academic at Sydney University. The main goal of this was to look into an idea that my father had been mulling over, and to determine if it had any merit. Given that I have some experience working with and writing my own blade element solvers, I was roped into helping him out; I also had the ulterior motive of wanting to improve my own understanding of the topic.

Referring to the following diagram of a single blade element:

BEM diag

We can see that for our propeller slice, our lift and drag functions are:

$dL = C_L(\alpha) \frac{1}{2} \rho W^2 c(r) dr \ \ \ \ (1)$

$dD = C_D(\alpha) \frac{1}{2} \rho W^2 c(r) dr \ \ \ \ (2)$

Where:

$W^2 = \left[v_\infty(1+a)\right]^2 + \left[wr(1-b)\right]^2 \ \ \ \ (3)$

Given that we want to resolve the forces on our blade into normal and tangential components in the disc plane, we can show that:

$dT = dLcos(\phi)-dDsin(\phi) \ \ \ \ (4)$

$dQ = \left[dLsin(\phi)+dDcos(\phi)\right]r \ \ \ \ (5)$

Where φ is:

$\phi = atan\left(\frac{v_\infty(1+a)}{\omega r(1-b)}\right) \ \ \ \ (6)$

And our C_L and C_D values are determined using the angle of attack alpha:

$\alpha = \theta - \phi \ \ \ \ (7)$

The difficulty in solving these equations is that we don’t know the axial induction factor a, and the swirl factor b, and we don’t have enough equations to solve for them. To get around this, we introduce Froude’s momentum theory, which gives us two equivalent expressions for dT and dQ that we can use to iterate for a and b:

$dT = 4 \pi \rho r v_\infty^2(1+a)a dr \ \ \ \ (8)$

$dQ = 4 \pi \rho r^3 v_\infty^2(1+a)b\omega dr \ \ \ \ (9)$

The idea is that we initialize a guess for both a and b, calculate dT/dr, dQ/dr by rearranging equations 4 and 5, and then use our momentum theory equations (equations 8 and 9, rearranged in terms of a and b) to calculate our new a and b values. We can then iterate by subbing in our new values until everything converges to within a satisfactory threshold.

This works after a fashion, but occasionally you run into values that won’t converge easily, even with relaxation factors. To get around this, I used a tactic from machine learning to leverage more powerful solution methods. We construct two cost functions of the form:

$J_a = \frac{1}{2}(a_g-a_c)^2 \ \ \ \ (10)$

$J_b = \frac{1}{2}(b_g-b_c)^2 \ \ \ \ (11)$

And find the gradients dJ_a/da, and dJ_b/db:

$\frac{dJ_a}{da_g} = (a_g-a_c) \ \ \ \ (12)$

$\frac{dJ_b}{db_g} = (b_g-b_c) \ \ \ \ (13)$

Where the subscript g denotes our initial guessed value, and subscript c denotes the iterated, calculated value. We can then use these in conjunction with a more powerful solver like fmincg, or fminunc, which will let us find a solution for values that have difficulty converging.

So that’s the theory, let’s have a look at the implementation. First, we calculate our constants and initialize memory for the operation:

element = BLADE(end)-BLADE(end-1); omega = (rpm/60)*2*pi; % calculate element size and angular velocity
K1 = 0.5*rho*blades*CHORD; K2 = 0.5*rho*blades*CHORD.*BLADE; % calculate equation constants
K3 = 4.0*pi*rho*v^2; K4 = K3*omega/v; % calculate equation constants
THETA = (pitch+BETA)*pi/180; % calculate theta along the blade
A0 = 0.1*ones(size(BLADE)); B0 = 0.01*ones(size(BLADE)); % initialize A0, B0

These values are all constants that we want to want to calculate outside of any loops we might need to implement. Next, we have a nested function that calculates our cost function and gradient:

function [J grad] = bemfuncmin(ABs)
A0 = ABs(1:size(BLADE,2)); B0 = ABs((size(BLADE,2)+1):end); % unpack the A0 and B0 vectors
A0 = A0'; B0 = B0'; % put into row vector form
[DtDr DqDr]=bemsolve(A0, B0); % solve system of equations using estimated A and B values
TEM1=DtDr./(K3*BLADE.*(1+A0)); TEM2=DqDr./(K4*BLADE.^3.*(1+A0)); % calculated A and B values from system of equations
ERRA = 0.5*(A0-TEM1).^2; ERRB = 0.5*(B0-TEM2).^2; % error function
dERRA_dA0 = (A0-TEM1).*(1+DtDr./K3./BLADE./(1+A0).^2); % derivative of the squared error function
dERRB_dB0 = (B0-TEM2); % derivative of the squared error function
J = sum([ERRA(:); ERRB(:)]); grad = [dERRA_dA0(:); dERRB_dB0(:)]; % convert back to column vectors
end

Clever readers will notice that this function is calling another function called bemsolve. This our basic evaluation function; that is, if we know a and b, we can just evaluate equations 1-5 and be done with it. Conveniently, we also need to do this in order to evaluate our cost function and our gradient, so rather than writing the same code out twice, we’ll just write another nested function to handle it.

A second important point is that our a and b values are passed to bemfuncmin as a single vector. This is so that we can use MATLAB/Octave’s fminunc function to do the heavy lifting for us. We need to unpack this vector so that we can use our a and b values to evaluate bemsolve. Once we’ve done this and evaluated our cost function J and our gradient values, we return a single value for the cost function, and a single column gradient vector by repacking it.

Incidentally, this leads us to our bemsolve function:

function [DtDr DqDr] = bemsolve(A, B)
V0=v*(1+A); V2=omega*BLADE.*(1-B); VLOCSQU=V0.^2+V2.^2; % calculate velocities
PHI=atan2(V0,V2); ALPHA=THETA-PHI; % calculate angles
[CL CD]=liftfunc(ALPHA); % get lift and drag coefficients from our lift function
DtDr=K1.*VLOCSQU.*(CL.*cos(PHI)-CD.*sin(PHI)); % calculate thrust per element
DqDr=K2.*VLOCSQU.*(CD.*cos(PHI)+CL.*sin(PHI)); % calculate torque per element
end

As mentioned above, this is a fairly straightforward evaluation of equations 1-5. Importantly, this function calls our lift function, which returns a C_L and C_D for a given angle of attack. This lift function is passed to the BEM function as a handle, and should be vectorized so that we can get the C_L and C_D for each element of the blade.

Next, we set up our cost function for use with the fminunc solver:

costfunction = @(P)bemfuncmin(P); % establish cost function
options = optimset('GradObj', 'on'); % set options
ABs = [A0(:); B0(:)]; % pack A0, B0 into single column vector
[ABs cost] = fminunc(costfunction, ABs, options); % solving using conjugate gradient

This is a fairly standard implementation that will look familiar to anyone who’s done this before for machine learning.

Finally, we unpack our converged vectors, and calculate our thrust, torque, and power:

A0 = ABs(1:size(BLADE,2)); B0 = ABs((size(BLADE,2)+1):end); % unpack the A0 and B0 vectors
A0 = A0'; B0 = B0'; % put into row vector form
[DT DQ] = bemsolve(A0, B0); % solve system using converged A0 and B0 values
thrust=sum(DT*element); torque=sum(DQ*element); power = torque*omega; % calculate thrust, torque, power

The idea of this implementation is that it’s vectorized as much as possible. Rather than stepping through elements one by one, we use MATLAB/Octave’s inbuilt vector handling to parallelize the process. My usual convention is to capitalize anything that’s a vector or matrix, and keep scalars in lowercase (the exception here is J). An example implementation that calls the above code would be:

BLADE = 0.08:(0.8-0.08)/10:0.8;
BETA = 65:-(65-25)/10:25
CHORD = 0.1*ones(size(BLADE))
pitch = 0; v = 60; rpm = 2100; rho = 1.225; blades = 2;
[thrust torque power] = BEM(@NACA0012, pitch, CHORD, BETA, BLADE, v, rpm, rho, blades)

If you wanted to calculate the efficiency curves for a given propeller at multiple pitch settings, you would set up a nested loop like the following:

for i = 1:size(PITCH,2)
  p = PITCH(i);
  for j=1:size(VEL,2)
    v = VEL(j);
    [THRUST(i,j), TORQUE(i,j), POWER(i,j)] = BEM(@NACA0012, p, CHORD, BETA, BLADE, v, rpm, rho, blades);
  end
end

From there, you calculate the thrust and power coefficients, your advance ratio, and use them to compute your efficiency:

$J = \frac{V_\infty}{nD} \ \ \ \ (14)$

$C_T = \frac{T}{\rho n^2 D^4} \ \ \ \ (15)$

$C_Q = \frac{Q}{\rho n^3 D^5} \ \ \ \ (16)$

$\eta_p = \frac{JC_T}{2 \pi C_Q} \ \ \ \ (17)$

Where J is the advance ratio, D is the prop diameter, n is revolutions per second, η is efficiency, and C_T and C_Q are the thrust and torque coefficients, respectively.

Now, for some sweet pics. The following are efficiency curves for a blade that was optimized by solving Goldstein’s equations, as outlined in this paper. The lift and drag coefficients were roughly taken from another paper with data for a NACA0012, from 0 to 180 degrees. Given that the NACA0012 is a symmetrical airfoil, we can flip the C_L values about the x and y axes, and the C_D values about the y-axis, to get the full envelope of the airfoil. Note that for these values, I haven’t taken Reynold’s number into account. A more accurate implementation would do this, as well as correcting for 3D flow.

First off, we have our blade geometry:

optimized NACA0012 prop

Seems a bit sharp towards the tip, but otherwise believable (the chord changes to take into account the change in Reynold’s number over the span of the blade). Next up are the efficiency curves:

optimized NACA0012 prop efficiency curves

This seems to be a fairly typical propeller efficiency plot. Interestingly, at zero degrees pitch, our propeller reaches a maximum efficiency of almost 100%. I’m a bit dubious of this, but I suspect that it’s due to the optimization algorithm also being based on BEMT. Finally, we have the power coefficient curves:

optimized NACA0012 prop power coefficients

If you want to have a poke around with the code and give it a whirl for yourself, you can find it on my Github page linked here. Let me know if you find any mistakes, and I’ll look into fixing them.

Ciao!

Taking a look at Q-learning

So this is both my first article, and my first attempt at writing a tutorial for anything to do with machine learning. The motivation for this is partly for my own benefit as I’m still fairly new to machine learning in general, and because many of the ‘simple’ implementations that I found online were needlessly complicated. This script will give you a basic rundown of how Q-learning works, without requiring any kind of visualization classes beyond what Octave and MATLAB both offer. In all, it’s just a bit over 30 lines of code, and I’ll be walking the reader through the script as we go.

On that note, the first thing I’d encourage anyone reading this to do, is to go out and download Octave here (or, for those of you on Linux, you should be able to install it through your distro’s package manager). It’s a free linear algebra and numerical computing suite that shares syntactical compatibility with MATLAB, making it a great tool for anyone who is interested in this type of work. Though it lacks the graphical Simulink environment that MATLAB provides, it’s nevertheless a great tool for picking up machine learning and doing numerical computing on a budget. It’s the best linear algebra package this side of free.

So what exactly is Q-learning? Q-learning is a form of reinforcement learning whereby an agent learns to perform a time-dependent task based on a reward schema. When an agent performs an action that we as the developer want it to do, it gets a positive reward; likewise, if it performs an action that we as the developer don’t want it to do, it gets a negative reward. The tricky part is that the action that seems best right now may in fact turn out to be detrimental further down the line (or: eating icecream is great until you get fat). Our algorithm needs to take this into account, hence the time-dependence aspect. Our agent should therefore explore options through multiple trial runs, and as it does so, the rewards it receives lets it build up a function of state-action pairs, that tell it the best action to take based on a given state.

Now for the maths. The formula we will use to do this is:

$Q(S_t,a_t) \leftarrow Q(S_t,a_t)+\alpha \left[R_{t+1} +\gamma max \left(Q(S_{t+1},a_{t+1}) \right)-Q(S_t,a_t) \right]$

Where Q(S_t, a_t) is the output of the Q-function based on our state at the current time, R_t+1 is the reward our agent will receive in the following timestep for taking the action it chooses, max[Q(S_t+1)] is the maximum Q-value across all actions available to the agent in the next timestep, α is our learning rate, and γ is our discount factor for a future task.

For those unfamiliar with the notation: the arrow in this case means set, or update. It doesn’t really make sense to say 2 = 2+n (even though we write it this way in code), since 2 and 2+n aren’t the same, and therefore aren’t equal to one another. So in this case we use the arrow symbol to denote that we are taking the value of Q, adding some other quantity to it, and then setting the result of this operation as our new Q value. Our subscripts denote the timestep – that is, the state of the agent at the current time, and the possible future states (how far our agent is able to look ahead – generally only one timestep).

To demonstrate this principle, we will create a small gridworld example, where we place our agent at a random position on a square grid. From there, our agent will explore until it reaches the lower right-hand corner of the grid, at which point it receives a reward, and a new episode begins. Throughout this process, a Q-matrix that stores the value of each action (moving either up, down, left, or right) in each state (the agent’s position on the grid) will be updated. Initially this matrix will only contain zeros, but as our agent gets rewarded for reaching the target square, this matrix will fill up with progresively better Q-values.

So our implementation of this is pretty straightforward, but it’s likely to be fairly challenging to anyone new to MATLAB or Octave since we’ll be making heavy use of the language’s in-built indexing functions. First off, we need to create our map, define our actions, and set an initial state:

% create map
x = 10; y = 10;
% actions -- up; down; left; right
A = [-1; 1; -y; y];
% initial state
S = 1;

x and y are the number of elements in the x and y directions respectively, and A is the action matrix. MATLAB and Octave have a useful indexing feature called single-valued indexes. In a nutshell, rather than having to know the x and y locations of our element, we can just use a single z value instead. If you take a 2d matrix, and stack it column-by-column like this:

$\begin{bmatrix} A_{11} & \cdots & A_{1n} \\ \vdots & \ddots & \vdots \\ A_{m1} & \cdots & A_{mn} \end{bmatrix} = \begin{bmatrix} A_{11} \\ \vdots \\ A_{m1} \\ \vdots \\ A_{1n} \\ \vdots \\ A_{mn} \end{bmatrix}$

you can count downwards to get any element with a single value. This is useful since it lets us define our state and actions with single values that can then be summed using S_t+1 = S_t + A_t. If we want to go up on our map, we move -1 places up the list; if we want to go down, we move +1 places down the list. Likewise, if we want to go left or right, we go ±y places along the list. The semicolons inside A indicate new rows, so our action matrix A is a single column vector.

Next, we set the following:

% reward matrix, utility matrix, and learning variables
R = zeros(x,y); R(end) = 100; Q = zeros(x*y,size(A,1));
alpha = 0.2; gamma = 0.2; epsilon = 0; episodes = 1000;

% position matrix (for viewing)
POS = zeros(x,y); POS(S) = 1; POS(end)=2;
imagesc(1:x,1:y,POS);
drawnow;

The first part of this is creating a reward matrix that is the same size as our map. We then use single-valued indexing to find the last element (in this case, the 100^th element) and set a reward of 100 (this is an arbitrary value, and in this case, any positive number should do). Next we create the Q-matrix; given that we can hypothetically go up, down, left, or right at each square on the grid, we have 10x10x4 possible states. We create a zero matrix with dimensions X x Y and size(A,1), which is the number of rows in our action matrix.

Once we’ve done all this, we specify alpha, gamma, epsilon and the number of episodes we will train our agent for. Epsilon is a greed coefficient – if the agent is greedy, it will always take what it thinks is the best course of action. We can specify a value for epsilon where 0 < ε < 1, so that some percentage of the time, the agent will take an action other than what it believes to be the best action. This lets our agent explore and find paths that are potentially even better.

Finally, we create a position matrix that shows 3 things: grid squares represented by zeros, our agent, represented by a 1, and our target square represented by a 2. You could also have a series of obstacles, represented by a negative reward in our R-matrix, and denoted with 3s in our position matrix, but this hasn’t been done for this example. We then create an image using this POS matrix, and tell Octave to draw it immediately.

The next part of our program is:

function qvals = action(S)
  % get neighboring elements, prevent agent from making illegal moves
  if S>1 & mod(S,y)~=1, up=Q(S,1); else, up=-inf; end
  if S<x*y & mod(S,y)~=0, down=Q(S,2); else, down=-inf; end
  if S-y>=1, left=Q(S,3); else, left=-inf; end
  if S+y<=x*y, right=Q(S,4); else, right=-inf; end
  qvals=[up down left right];
end

% vector of random starting positions
r = randi([1 x*y-1],1,episodes);

The first block of code is a function that returns the q-values of the actions for a given state. That is, we pass it a position S on our grid, and the function returns the q-values for moving up, down, left, or right from that position. In this case, we also check to see where the agent is. If the agent is near a boundary (which we can determine using the modulo operation), we prevent it from going outside the boundary (which will give us an indexing error) by setting the q-values for those conditions to -inf.

The second block is a vector of random starting positions for our agent. For each episode, we will have the agent start in a new position by setting its state to S = r(n) where n is our episode counter. We do this by creating a 1 x episodes vector of random integer values in the range 1 to xy-1, to ensure that our agent doesn’t start on the end square.

Now for the real meat of our function:

for n = 1:episodes
  running = true;
  while running == true
    % if landed on final square, break and go to next episode
    if S==x*y,
      running=false; disp('Restart');
      break;
    end

    % evaluate Q-matrix
    qvals = action(S);
    m = max(qvals);
    k=rand;
    
    % epsilon-greedy action selection
    if k>=epsilon,
      idx = find(qvals(:)==m);
      i = idx(randi(size(idx,1)));
    else
      idx = find(qvals(:)~=-inf);
      i = idx(randi(size(idx,1)));
    end

    % update Q-matrix
    Q(S,i) = Q(S,i)+alpha*(R(S+A(i))+gamma*max(max(action(S+A(i))))-Q(S,i));

    % update position matrix
    S=S+A(i);
    POS=0*POS; POS(end)=2; POS(S)=1;
    imagesc(1:x,1:y,POS);
    drawnow;
  end
  % reinitialise the position of the agent to a random point on the grid
  S = r(n); POS = 0*POS; POS(S) = 1;
end

This is quite a chunk of code, so we’ll take it one step at a time. First of all, we will run this algorithm for the required number of episodes. For each episode, we have a while loop that only terminates when the agent has reached the goal, after which we reset its position and let it go again (the Q-matrix values are preserved between episodes). The first thing we do is check if our agent is on the final square, and if so, we terminate the while loop.

If the agent isn’t on the reward square, we evaluate possible actions by calling the actions function, and passing it the agent’s current state. We determine the action with the best value, and generate a random value k between 0 and 1. We then compare epsilon to k to determine whether or not the agent will take the greedy option, or choose a random value to explore. If k is greater than epsilon, we randomly choose between actions with the max value (since this works for cases where there is only one action, and multiple actions that have the same value). If k is less than epsilon, we randomly choose an action that isn’t -inf.

Once an action has been chosen, we update the Q-matrix for the current state using the current Q-value, the next state’s reward (which is 0 unless the goal square is reached), and the maximum Q-value for the actions available from the next state.

Finally, we update the agent’s position by moving it to the new state. We do this by adding the action directly to the agent’s position, and redrawing our POS map with the agent’s new position. This will loop until the agent reaches the end square, after which the loop will terminate, and the agent’s position will be reinitialized at the next random starting position.

The full code for this implementation can be found here; feel free to play around with it and extend it. Hopefully this tutorial was instructive for anyone interested in this topic!

hello world

So this is my first ever post.

I thought I’d start a blog as part of my own self-learning and discovery — mainly as an outlet for the many hobbies and projects I take on that others might find interesting. In the process, I hope to learn new things as others point out and correct my own mistakes and misunderstandings!

So on that note, my second-ever post — a short write-up on a Q-learning script I put together for MATLAB/Octave — is already in the works. To the poor, lonely bastard who stumbled onto this page: stay tuned!