Thanks to visit codestin.com
Credit goes to arjunsreedharan.org

  1. Memory Allocators 101 - Write a simple memory allocator


    Code related to this article: github.com/arjun024/memalloc

    This article is about writing a simple memory allocator in C.
    We will implement malloc(), calloc(), realloc() and free().

    This is a beginner level article, so I will not spell out every detail.
    This memory allocator will not be fast and efficient, we will not adjust allocated memory to align to a page boundary, but we will build a memory allocator that works. That’s it.

    If you want to take a look at the code in full, take a look at my github repo memalloc.

    Before we get into building the memory allocator, you need to be familiar with the memory layout of a program. A process runs within its own virtual address space that’s distinct from the virtual address spaces of other processes. This virtual address space typically comprises of 5 sections:

    Text section: The part that contains the binary instructions to be executed by the processor.
    Data section: Contains non-zero initialized static data.
    BSS (Block Started by Symbol) : Contains zero-initialized static data. Static data uninitialized in program is initialized 0 and goes here.
    Heap: Contains the dynamically allocated data.
    Stack: Contains your automatic variables, function arguments, copy of base pointer etc. Memory layout

    As you can see in the image, the stack and the heap grow in the opposite directions.
    Sometimes the data, bss and heap sections are collectively referred to as the “data segment”,
    the end of which is demarcated by a pointer named program break or brk.
    That is, brk points to the end of the heap.

    Now if we want to allocate more memory in the heap, we need to request the system to increment brk. Similarly, to release memory we need to request the system to decrement brk.

    Assuming we run Linux (or a Unix-like system), we can make use of sbrk() system call that lets us manipulate the program break.

    Calling sbrk(0) gives the current address of program break.
    Calling sbrk(x) with a positive value increments brk by x bytes, as a result allocating memory.
    Calling sbrk(-x) with a negative value decrements brk by x bytes, as a result releasing memory.
    On failure, sbrk() returns (void*) -1.

    To be honest, sbrk() is not our best buddy in 2015. There are better alternatives like mmap() available today. sbrk() is not really thread safe. It can can only grow or shrink in LIFO order.
    If you do a man 2 sbrk on your macbook, it will tell you:

    The brk and sbrk functions are historical curiosities left over from earlier days before the advent of virtual memory management.

    However, the glibc implementation of malloc still uses sbrk() for allocating memory that’s not too big in size.[1]
    So, we will go ahead with sbrk() for our simple memory allocator.

    malloc()

    The malloc(size) function allocates size bytes of memory and returns a pointer to the allocated memory.
    Our simple malloc will look like:

    void *malloc(size_t size)
    {
    	void *block;
    	block = sbrk(size);
    	if (block == (void*) -1)
    		return NULL;
    	return block;
    }
    

    In the above code, we call sbrk() with the given size.
    On success, size bytes are allocated on the heap.
    That was easy. Wasn’t it?

    The tricky part is freeing this memory.
    The free(ptr) function frees the memory block pointed to by ptr, which must have been returned by a previous call to malloc(), calloc() or realloc().
    But to free a block of memory, the first order of business is to know the size of the memory block to be freed. In the current scheme of things, this is not possible as the size information is not stored anywhere. So, we will have to find a way to store the size of an allocated block somewhere.

    Moreover, we need to understand that the heap memory the operating system has provided is contiguous. So we can only release memory which is at the end of the heap. We can’t release a block of memory in the middle to the OS. Imagine your heap to be something like a long loaf of bread that you can stretch and shrink at one end, but you have to keep it in one piece.
    To address this issue of not being able to release memory that’s not at the end of the heap, we will make a distinction between freeing memory and releasing memory.
    From now on, freeing a block of memory does not necessarily mean we release memory back to OS. It just means that we keep the block marked as free. This block marked as free may be reused on a later malloc() call. Since memory not at the end of the heap can’t be released, this is the only way ahead for us.

    So now, we have two things to store for every block of allocated memory:
        1. size
        2. Whether a block is free or not-free?

    To store this information, we will add a header to every newly allocated memory block.
    The header will look something like this:

    struct header_t {
    	size_t size;
    	unsigned is_free;
    };
    

    The idea is simple. When a program requests for size bytes of memory, we calculate total_size = header_size + size, and call sbrk(total_size). We use this memory space returned by sbrk() to fit in both the header and the actual memory block. The header is internally managed, and is kept completely hidden from the calling program.

    Now, each one of our memory blocks will look like:
    memory block with header

    We can’t be completely sure the blocks of memory allocated by our malloc is contiguous. Imagine the calling program has a foreign sbrk(), or there’s a section of memory mmap()ed in between our memory blocks. We also need a way to traverse through our blocks for memory (why traverse? we will get to know when we look at the implementation of free()). So to keep track of the memory allocated by our malloc, we will put them in a linked list. Our header will now look like:

    struct header_t {
    	size_t size;
    	unsigned is_free;
    	struct header_t *next;
    };
    

    and the linked list of memory blocks like this:

    linked list of memory blocks

    Now, let’s wrap the entire header struct in a union along with a stub variable of size 16 bytes. This makes the header end up on a memory address aligned to 16 bytes. Recall that the size of a union is the larger size of its members. So the union guarantees that the end of the header is memory aligned. The end of the header is where the actual memory block begins and therefore the memory provided to the caller by the allocator will be aligned to 16 bytes.

    typedef char ALIGN[16];
    
    union header {
    	struct {
    		size_t size;
    		unsigned is_free;
    		union header *next;
    	} s;
    	ALIGN stub;
    };
    typedef union header header_t;
    

    We will have a head and tail pointer to keep track of the list.

    header_t *head, *tail;
    

    To prevent two or more threads from concurrently accessing memory, we will put a basic locking mechanism in place.

    We’ll have a global lock, and before every action on memory you have to acquire the lock, and once you are done you have to release the lock.

    pthread_mutex_t global_malloc_lock;
    

    Our malloc is now modified to:

    void *malloc(size_t size)
    {
    	size_t total_size;
    	void *block;
    	header_t *header;
    	if (!size)
    		return NULL;
    	pthread_mutex_lock(&global_malloc_lock);
    	header = get_free_block(size);
    	if (header) {
    		header->s.is_free = 0;
    		pthread_mutex_unlock(&global_malloc_lock);
    		return (void*)(header + 1);
    	}
    	total_size = sizeof(header_t) + size;
    	block = sbrk(total_size);
    	if (block == (void*) -1) {
    		pthread_mutex_unlock(&global_malloc_lock);
    		return NULL;
    	}
    	header = block;
    	header->s.size = size;
    	header->s.is_free = 0;
    	header->s.next = NULL;
    	if (!head)
    		head = header;
    	if (tail)
    		tail->s.next = header;
    	tail = header;
    	pthread_mutex_unlock(&global_malloc_lock);
    	return (void*)(header + 1);
    }
    
    header_t *get_free_block(size_t size)
    {
    	header_t *curr = head;
    	while(curr) {
    		if (curr->s.is_free && curr->s.size >= size)
    			return curr;
    		curr = curr->s.next;
    	}
    	return NULL;
    }
    

    Let me explain the code:

    We check if the requested size is zero. If it is, then we return NULL.
    For a valid size, we first acquire the lock. The we call get_free_block() - it traverses the linked list and see if there already exist a block of memory that is marked as free and can accomodate the given size. Here, we take a first-fit approach in searching the linked list.

    If a sufficiently large free block is found, we will simply mark that block as not-free, release the global lock, and then return a pointer to that block. In such a case, the header pointer will refer to the header part of the block of memory we just found by traversing the list. Remember, we have to hide the very existence of the header to an outside party. When we do (header + 1), it points to the byte right after the end of the header. This is incidentally also the first byte of the actual memory block, the one the caller is interested in. This is cast to (void*) and returned.

    If we have not found a sufficiently large free block, then we have to extend the heap by calling sbrk(). The heap has to be extended by a size that fits the requested size as well a header. For that, we first compute the total size: total_size = sizeof(header_t) + size;. Now, we request the OS to increment the program break: sbrk(total_size).

    In the memory thus obtained from the OS, we first make space for the header. In C, there is no need to cast a void* to any other pointer type, it is always safely promoted. That’s why we don’t explicitly do: header = (header_t *)block;
    We fill this header with the requested size (not the total size) and mark it as not-free. We update the next pointer, head and tail so to reflect the new state of the linked list. As explained earlier, we hide the header from the caller and hence return (void*)(header + 1). We make sure we release the global lock as well.

    free()

    Now, we will look at what free() should do. free() has to first deterimine if the block-to-be-freed is at the end of the heap. If it is, we can release it to the OS. Otherwise, all we do is mark it ‘free’, hoping to reuse it later.

    void free(void *block)
    {
    	header_t *header, *tmp;
    	void *programbreak;
    
    	if (!block)
    		return;
    	pthread_mutex_lock(&global_malloc_lock);
    	header = (header_t*)block - 1;
    
    	programbreak = sbrk(0);
    	if ((char*)block + header->s.size == programbreak) {
    		if (head == tail) {
    			head = tail = NULL;
    		} else {
    			tmp = head;
    			while (tmp) {
    				if(tmp->s.next == tail) {
    					tmp->s.next = NULL;
    					tail = tmp;
    				}
    				tmp = tmp->s.next;
    			}
    		}
    		sbrk(0 - sizeof(header_t) - header->s.size);
    		pthread_mutex_unlock(&global_malloc_lock);
    		return;
    	}
    	header->s.is_free = 1;
    	pthread_mutex_unlock(&global_malloc_lock);
    }
    

    Here, first we get the header of the block we want to free. All we need to do is get a pointer that is behind the block by a distance equalling the size of the header. So, we cast block to a header pointer type and move it behind by 1 unit.
    header = (header_t*)block - 1;

    sbrk(0) gives the current value of program break. To check if the block to be freed is at the end of the heap, we first find the end of the current block. The end can be computed as (char*)block + header->s.size. This is then compared with the program break.

    If it is in fact at the end, then we could shrink the size of the heap and release memory to OS. We first reset our head and tail pointers to reflect the loss of the last block. Then the amount of memory to be released is calculated. This the sum of sizes of the header and the acutal block: sizeof(header_t) + header->s.size. To release this much amount of memory, we call sbrk() with the negative of this value.

    In the case the block is not the last one in the linked list, we simply set the is_free field of its header. This is the field checked by get_free_block() before actually calling sbrk() on a malloc().

    calloc()

    The calloc(num, nsize) function allocates memory for an array of num elements of nsize bytes each and returns a pointer to the allocated memory. Additionally, the memory is all set to zeroes.
    void *calloc(size_t num, size_t nsize)
    {
    	size_t size;
    	void *block;
    	if (!num || !nsize)
    		return NULL;
    	size = num * nsize;
    	/* check mul overflow */
    	if (nsize != size / num)
    		return NULL;
    	block = malloc(size);
    	if (!block)
    		return NULL;
    	memset(block, 0, size);
    	return block;
    }
    

    Here, we do a quick check for multiplicative overflow, then call our malloc(),
    and clears the allocated memory to all zeroes using memset().

    realloc()

    realloc() changes the size of the given memory block to the size given.

    void *realloc(void *block, size_t size)
    {
    	header_t *header;
    	void *ret;
    	if (!block || !size)
    		return malloc(size);
    	header = (header_t*)block - 1;
    	if (header->s.size >= size)
    		return block;
    	ret = malloc(size);
    	if (ret) {
    		
    		memcpy(ret, block, header->s.size);
    		free(block);
    	}
    	return ret;
    }
    

    Here, we first get the block’s header and see if the block already has the size to accomodate the requested size. If it does, there’s nothing to be done.

    If the current block does not have the requested size, then we call malloc() to get a block of the request size, and relocate contents to the new bigger block using memcpy(). The old memory block is then freed.

    Compiling and using our memory allocator.

    You can get the code from my github repository - memalloc.
    We’ll compile our memory allocator and then run a utility like ls using our memory allocator.

    To do that, we will first compile it as a library file.

    $ gcc -o memalloc.so -fPIC -shared memalloc.c
    

    The -fPIC and -shared options makes sure the compiled output has position-independent code and tells the linker to produce a shared object suitable for dynamic linking.

    On Linux, if you set the enivornment variable LD_PRELOAD to the path of a shared object, that file will be loaded before any other library. We could use this trick to load our compiled library file first, so that the later commands run in the shell will use our malloc(), free(), calloc() and realloc().

    $ export LD_PRELOAD=$PWD/memalloc.so
    

    Now,

    $ ls
    memalloc.c		memalloc.so
    

    Voila! That’s our memory allocator serving ls.
    Print some debug message in malloc() and see it for yourself if you don’t believe me.

    Thank you for reading. All comments are welcome. Please report bugs if you find any.

    Footnotes, References

    See a list of memory allocators:
    liballoc
    Doug Lea’s Memory Allocator.
    TCMalloc
    ptmalloc

    [1] The GNU C Library: Malloc Tunable Parameters
    OSDev - Memory allocation
    Memory Allocators 101 - James Golick

  2. 1:11 AM 9th August 20160 notes■  Comments








  3. JPEG 101 - How does JPEG work?


    We hear the term JPEG all the time. What really is JPEG? How does it work?

    image


    I will try to give a brief overview of what JPEG is and how it works.
    This is a beginner level article, so we will not dwell too much on the details.


    First, JPEG is not a file format. It’s a compression method.
    The term JPEG is an acronym for the Joint Photographic Experts Group, which created the standard.

    Most of the time you’re talking about your file in a JPEG format, you are referring to the JFIF (JPEG File Interchange Format) wrapper.

    Let’s start from the beginning of an image’s life.

    You click a picture using your camera.
    The camera’s sensor is overlaid with a color filter array (CFA), usually a Bayer filter, consisting of a mosaic of a 2x2 matrix of red, green, blue and (again) green filters. The green photo-sensors are luminance-sensitive elements and the red and blue ones are chrominance-sensitive elements. Bayer used twice as many green elements as red or blue so as to mimic the physiology of the human eye. From the CFA, we receive image data that is one color per pixel. This is not suitable for JPEG compression. This image data is reconstructed to produce an RGB color triplet per pixel image.

    The raw image file thus obtained is a bitmap (2D array). It’s very huge in size.
    So, we need to compress this file, and that’s where JPEG come into play.
    JPEG is a lossy compression technique, which means it uses approximations and partial data discarding to compress the content. Therefore it's irreversible.

    JPEG compression is based on the following 2 observations:

    Observation #1: Human eyes don’t see color (chrominance) quite as well as we do brightness (luminance).

    Observation #2: Human eyes can’t distinguish high frequency changes in image intensity.


    Step 1: Convert RGB to YCbCr color space

    Each pixel in your image is stored as a additive combination of Red, Blue and Green values. Each of these values can be in the range of 0 to 255. This color model is called the RGB model. Consider a pixel that is khaki in color. It will be stored as (240, 230, 140).

    Remember Observation #1 - Luminance is more important to the eventual perceptual quality of the image than color. So we convert from RGB color space to one where luminance is confined to a single channel. This color space is called YCbCr.

    Here, Y is the luminance component and Cb, Cr are the chrominance components. They are the blue and red differences respectively.
    Their values will be in the range 0 to 255.

    YCbCr values can be computed directly from RGB as follows:[1]
    Y = 0.299 R + 0.587 G + 0.114 B
    Cb = - 0.1687 R - 0.3313 G + 0.5 B + 128
    Cr = 0.5 R - 0.4187 G - 0.0813 B + 128


    Step 2: Downsampling

    Since chrominance is not very important, we can downsample and reduce the amount of color (CbCr components).
    Generally, color is reduced by factor of 2 in both directions (vertical & horizontal) - that is, Y is sampled at each pixel, where as Cb and Cr are sampled at every block of 2x2 pixels.
    Now for every 4 Y pixels, there will exist only 1 CbCr pixel.
    You won’t notice much of a change in the image, but a good amount of file size is reduced.

    In image editing software, you are generally asked what quality you want the image to be saved. This is in fact the software asking you how much downsampling you want it to do on the image.


    Step 3: Use Discrete Cosine Transform (DCT)

    Each of the three YCbCr components are compressed and encoded separately
    using the same method described here. For now, consider only one of these components. The other 2 components are processed exactly the same way.

    3 a. base images
    DCT is a method that expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies.

    For compression, cosine functions are used rather than sine functions because of a particular difference in their boundary behavior. A function like the brightness of an image need not take zero values on the boundary like sine does. So, it’s difficult to approximate such a signal by a linear combination of sines.

    See the following image[2]

    image

    These are 64 base images, that are built from cosine functions at different frequencies in the X and Y axes.

    First base image, that is baseimg[0][0] will be full white,
    for baseimg[0][1] to baseimage[0][7], you can see the frequency increasing along the x-axis.
    for baseimg[1][0] to baseimage[7][0], you can see the frequency increasing along the y-axis.
    baseimg[7][7] will be totally checkered.


    3 b. sub-images

    The entire image we want to compress is divided into sub-images each of which comprises of 8x8 pixels. Let’s call each of them as a sub-image.
    This sub-image can be visualized as an 8x8 matrix.
    We are going to compress the full image one sub-image at a time.

    Consider an example. The values of the component under consideration is given in the following matrix:
    image
    Since we are going to use DCT, and cosine waves go from 1 to -1, we are going to center our values around zero. This means we shift the range from [0..255] to [-128..128]. So we subtract 128 from every value.

    Now our sub-image is shifted to:
    image

    Now, we have 2 things in our hand:

    1. The 8x8 sub-image to be compressed
    2. 64 base images

    Our task here is to transform the sub-image to a linear combination of these 64 base images.

    The sub-image can be converted to this frequency-domain representation
    using a normalized, two-dimensional type-II Discrete Cosine Transform (DCT).

    We can think of the sub-image as being comprised of a weighted set of these 64 base images merged together on top of each other.

    Therefore, subimage = C1f1 + C2f2 + C3f3 + … C64f64
    where Ci is some constant and fi are the base images.

    We can find each of these coefficients (Ci) using DCT (type II).
    If you want to know more about how DCT works, please refer here.

    I am going to use the following python code to find the DCT'ed matrix of my sub-image:

    import numpy as np
    from scipy.fftpack import dct
    
    def dct2D(x):
    	tmp = dct(x, type=2 ,norm='ortho').transpose()
    	return dct(tmp, type=2 ,norm='ortho').transpose()
    
    print dct2D([
    	     [-64., -68., -71., -72., -80., -81., -81., -85.],
    	     [-67., -70., -75., -76., -80., -79., -76., -75.],
    	     [-61., -68., -75., -75., -79., -81., -80., -74.],
    	     [-60., -67., -65., -65., -66., -63., -63., -64.],
    	     [-57., -67., -58., -65., -59., -54., -40., -40.],
    	     [-45., -36., -26., -23., -21., -17., -18., -13.],
    	     [-33., -20., -20., -4., -6., 2., 0., 0.],
    	     [-21., -10., -3., 6., 9., 14., 13., 9.]
    	    ])
    

    Here, I am using the SciPy package which provide the scipy.fftpack.dct function to computer the DCT of a 1D array. To compute DCT for a 2D array, we apply dct first by columns, transpose, then by rows, and then undo the transposition.

    And, I get:
    image

    This is the 8x8 table of coefficients, that represents the contribution of each base image to the sub-image.


    Step 4: Quantization

    We will now quantize the coefficient table we obtained using DCT. This is the real lossy part of the process.

    In the table of coefficients we got through DCT, the top-left cells refer to low frequency part, and the bottom-right cells refers to high frequency part.
    We know that high frequency part can be eliminated without much loss in the look of the image. (Remember Observation #2)

    So we now prepare a 8x8 quantization table. This table will have have very small values at the top-left part and very high values towards the bottom-right part.

    Every value in the coefficient table is divided by the corresponding value in the quantization table and rounded to the nearest integer.

    Now, because of the high divisor in the bottom-right part, the divided values here become zero - thus eliminating the high frequency data.

    This Quantization table is up to the encoder, and therefore the table is kept in the image header so the image can be later decoded.

    Here’s a standard JPEG Quantization table:

    image

    And here’s our sub-image after quantization.
    (after dividing each value in our coefficient table with the corresponding value in the quantization table)

    image

    Notice that in the quantized output table, all values except the top-left 3x3 block are all zeroes. These are the high frequency data we eliminated. JPEG’s claim to fame is that with just these 9 values we can get almost the same image back.


    Step 5: Encoding

    We have now got the compressed output as a 2D array. We also know that a lot of them are zeroes. So, we will find a better way to store the sub-image than store its as a 2D array.

    We will store the values in a zigzag order. So the data will be:
    -24, -23, 19, 5, 4, 0, 0, 1, 0, 0, 0, 0, 1 followed by 53 zeroes.
    image

    Data of this pattern can be easily compressed by Run-Length Encoding (RLE) algorithm. The final output is encoded using a combination of RLE and Huffman encoding.


    Step 6: Add Header

    Though JPEG/JFIF files do not possess a formally defined header, it generally contains the following:

    * JPEG Start Of Image (SOI) marker (0xFFD8)
    * Application markers
    * Width in pixels
    * Height in pixels
    * Number of components (eg 3 for RGB)

    Your compressed file is ready !!!

    To decompress a JPEG compressed file, simply do the reverse.
    Use Discrete Cosine Transform-III to reverse DCT-II.


    Next, in JPEG 102 we will write from scratch a JPEG encoder in C, but that’s for another day.



    Footnotes

    1. JPEG File Interchange Format - Eric Hamilton, W3C

    2. Image credit: https://oku.edu.mie-u.ac.jp/~okumura

  4. 2:16 PM 17th June 20161 notes■  Comments








  5. Hide data inside pointers


    Code related to this article: hide-data-in-ptr

    When we write C code, pointers are everywhere. We can make a little extra use of pointers and sneak in some extra information in them. To pull this trick off, we exploit the natural alignment of data in memory.

    Data in memory is not stored at any arbitrary address. The processor always reads memory in chunks at the same size as its word size; and thus for efficiency reasons, the compiler assigns addresses to entities in memory as multiples of their size in bytes. Therefore on a 32 bit processor, a 4-byte int will definitely reside at a memory address that is evenly divisible by 4.

    Here, I am going to assume a system where size of int and size of pointer are 4 bytes.

    Now let us consider a pointer to an int. As said above, the int can be located at memory addresses 0x1000 or 0x1004 or 0x1008, but never at 0x1001 or 0x1002 or 0x1003 or any other address that is not divisible by 4.
    Now, any binary number which is a multiple of 4 will end with 00.
    This essentially means that for any pointer to an int, its 2 lower order bits are always zero.

    Now we have 2 bits which communicate nothing. The trick here is put our data into these 2 bits, use them whenever we want and then remove them before we make any memory access by dereferencing the pointer.

    Since bitwise operations on pointers don’t go well with the C standard, We will be storing the pointer as an unsigned int.

    The following is a naive snippet of the code for brevity. See github repo - hide-data-in-ptr for the full code.

    void put_data(int *p, unsigned int data)
    {
    	assert(data < 4);
    	*p |= data;
    }
    
    unsigned int get_data(unsigned int p)
    {
    	return (p & 3);
    }
    
    void cleanse_pointer(int *p)
    {
    	*p &= ~3;
    }
    
    int main(void)
    {
    	unsigned int x = 701;
    	unsigned int p = (unsigned int) &x;
    
    	printf("Original ptr: %u\n", p);
    
    	put_data(&p, 3);
    
    	printf("ptr with data: %u\n", p);
    	printf("data stored in ptr: %u\n", get_data(p));
    
    	cleanse_pointer(&p);
    
    	printf("Cleansed ptr: %u\n", p);
    	printf("Dereferencing cleansed ptr: %u\n", *(int*)p);
    
    	return 0;
    }
    

    This will give the following output:

    Original ptr:  3216722220
    ptr with data: 3216722223
    data stored in ptr: 3
    Cleansed ptr:  3216722220
    Dereferencing cleansed ptr: 701
    

    We can store any number that can be represented by 2 bits in the pointer. Using put_data(), the last 2 bits of the pointer are set as the data to be stored. This data is accessed using get_data(). Here, all bits except the last 2 bits are overwritten as zeroes there by revealing our hidden data.

    cleanse_pointer() zeroes out the last 2 bits, making the pointer safe for dereferencing. Note that while some CPUs like Intel will let us access unaligned memory locations, certain others like ARM CPU will fault. So, always remember to keep the pointer pointed to an aligned location before dereferencing.



    Is this used anywhere in the real world?

    Yes, it is. See the implementation of Red Black Trees in the Linux kernel (link).

    The node of the tree is defined using:

    struct rb_node {
    	unsigned long  __rb_parent_color;
    	struct rb_node *rb_right;
    	struct rb_node *rb_left;
    } __attribute__((aligned(sizeof(long))));
    

    Here unsigned long __rb_parent_color stores:
    1. the address of the parent node
    2. the node’s color.

    The color is represented as 0 for Red and 1 for Black.
    Just like in the earlier example, this data is sneaked into the ‘useless’ bits of the parent pointer.

    Now see, how the parent pointer and the color information is accessed:

    /* in rbtree.h */
    #define rb_parent(r)   ((struct rb_node *)((r)->__rb_parent_color & ~3))
    
    /* in rbtree_augmented.h */
    #define __rb_color(pc)     ((pc) & 1)
    #define rb_color(rb)       __rb_color((rb)->__rb_parent_color)
    
  6. 9:02 AM 15th December 20140 notes■  Comments








  7. Kernels 201 - Let’s write a Kernel with keyboard and screen support


    In my previous article Kernels 101 - Let’s write a Kernel
    I wrote how we can build a rudimentary x86 kernel that boots up using GRUB,
    runs in protected mode and prints a string on the screen.

    Today, we will extend that kernel to include keyboard driver that can read the characters a-z and 0-9 from the keyboard and print them on screen.

    Source code used for this article is available at my Github repository - mkeykernel

    image

    We communicate with I/O devices using I/O ports. These ports are just specific address on the x86’s I/O bus, nothing more. The read/write operations from these ports are accomplished using specific instructions built into the processor.



    Reading from and Writing to ports

    read_port:
    	mov edx, [esp + 4]
    	in al, dx	
    	ret
    
    write_port:
    	mov   edx, [esp + 4]    
    	mov   al, [esp + 4 + 4]  
    	out   dx, al  
    	ret
    

    I/O ports are accessed using the in and out instructions that are part of the x86 instruction set.

    In read_port, the port number is taken as argument. When compiler calls your function, it pushes all its arguments onto the stack. The argument is copied to the register edx using the stack pointer. The register dx is the lower 16 bits of edx. The in instruction here reads the port whose number is given by dx and puts the result in al. Register al is the lower 8 bits of eax. If you remember your college lessons, function return values are received through the eax register. Thus read_port lets us read I/O ports.

    write_port is very similar. Here we take 2 arguments: port number and the data to be written. The out instruction writes the data to the port.



    Interrupts

    Now, before we go ahead with writing any device driver; we need to understand how the processor gets to know that the device has performed an event.

    The easiest solution is polling - to keep checking the status of the device forever. This, for obvious reasons is not efficient and practical. This is where interrupts come into the picture. An interrupt is a signal sent to the processor by the hardware or software indicating an event. With interrupts, we can avoid polling and act only when the specific interrupt we are interested in is triggered.

    A device or a chip called Programmable Interrupt Controller (PIC) is responsible for x86 being an interrupt driven architecture. It manages hardware interrupts and sends them to the appropriate system interrupt.

    When certain actions are performed on a hardware device, it sends a pulse called Interrupt Request (IRQ) along its specific interrupt line to the PIC chip. The PIC then translates the received IRQ into a system interrupt, and sends a message to interrupt the CPU from whatever it is doing. It is then the kernel’s job to handle these interrupts.

    Without a PIC, we would have to poll all the devices in the system to see if an event has occurred in any of them.

    Let’s take the case of a keyboard. The keyboard works through the I/O ports 0x60 and 0x64. Port 0x60 gives the data (pressed key) and port 0x64 gives the status. However, you have to know exactly when to read these ports.

    Interrupts come quite handy here. When a key is pressed, the keyboard gives a signal to the PIC along its interrupt line IRQ1. The PIC has an offset value stored during initialization of the PIC. It adds the input line number to this offset to form the Interrupt number. Then the processor looks up a certain data structure called the Interrupt Descriptor Table (IDT) to give the interrupt handler address corresponding to the interrupt number.

    Code at this address is then run, which handles the event.

    Setting up the IDT

    struct IDT_entry{
    	unsigned short int offset_lowerbits;
    	unsigned short int selector;
    	unsigned char zero;
    	unsigned char type_attr;
    	unsigned short int offset_higherbits;
    };
    
    struct IDT_entry IDT[IDT_SIZE];
    
    void idt_init(void)
    {
    	unsigned long keyboard_address;
    	unsigned long idt_address;
    	unsigned long idt_ptr[2];
    
    	/* populate IDT entry of keyboard's interrupt */
    	keyboard_address = (unsigned long)keyboard_handler; 
    	IDT[0x21].offset_lowerbits = keyboard_address & 0xffff;
    	IDT[0x21].selector = 0x08; /* KERNEL_CODE_SEGMENT_OFFSET */
    	IDT[0x21].zero = 0;
    	IDT[0x21].type_attr = 0x8e; /* INTERRUPT_GATE */
    	IDT[0x21].offset_higherbits = (keyboard_address & 0xffff0000) >> 16;
    	
    
    	/*     Ports
    	*	 PIC1	PIC2
    	*Command 0x20	0xA0
    	*Data	 0x21	0xA1
    	*/
    
    	/* ICW1 - begin initialization */
    	write_port(0x20 , 0x11);
    	write_port(0xA0 , 0x11);
    
    	/* ICW2 - remap offset address of IDT */
    	/*
    	* In x86 protected mode, we have to remap the PICs beyond 0x20 because
    	* Intel have designated the first 32 interrupts as "reserved" for cpu exceptions
    	*/
    	write_port(0x21 , 0x20);
    	write_port(0xA1 , 0x28);
    
    	/* ICW3 - setup cascading */
    	write_port(0x21 , 0x00);  
    	write_port(0xA1 , 0x00);  
    
    	/* ICW4 - environment info */
    	write_port(0x21 , 0x01);
    	write_port(0xA1 , 0x01);
    	/* Initialization finished */
    
    	/* mask interrupts */
    	write_port(0x21 , 0xff);
    	write_port(0xA1 , 0xff);
    
    	/* fill the IDT descriptor */
    	idt_address = (unsigned long)IDT ;
    	idt_ptr[0] = (sizeof (struct IDT_entry) * IDT_SIZE) + ((idt_address & 0xffff) << 16);
    	idt_ptr[1] = idt_address >> 16 ;
    
    	load_idt(idt_ptr);
    }
    

    We implement IDT as an array comprising structures IDT_entry. We’ll discuss how the keyboard interrupt is mapped to its handler later in the article. First, let’s see how the PICs work.

    Modern x86 systems have 2 PIC chips each having 8 input lines. Let’s call them PIC1 and PIC2. PIC1 receives IRQ0 to IRQ7 and PIC2 receives IRQ8 to IRQ15. PIC1 uses port 0x20 for Command and 0x21 for Data. PIC2 uses port 0xA0 for Command and 0xA1 for Data.

    The PICs are initialized using 8-bit command words known as Initialization command words (ICW). See this link for the exact bit-by-bit syntax of these commands.

    In protected mode, the first command you will need to give the two PICs is the initialize command ICW1 (0x11). This command makes the PIC wait for 3 more initialization words on the data port.

    These commands tell the PICs about:

    * Its vector offset. (ICW2)
    * How the PICs wired as master/slaves. (ICW3)
    * Gives additional information about the environment. (ICW4)

    The second initialization command is the ICW2, written to the data ports of each PIC. It sets the PIC’s offset value. This is the value to which we add the input line number to form the Interrupt number.

    PICs allow cascading of their outputs to inputs between each other. This is setup using ICW3 and each bit represents cascading status for the corresponding IRQ. For now, we won’t use cascading and set all to zeroes.

    ICW4 sets the additional enviromental parameters. We will just set the lower most bit to tell the PICs we are running in the 80x86 mode.

    Tang ta dang !! PICs are now initialized.



    Each PIC has an internal 8 bit register named Interrupt Mask Register (IMR). This register stores a bitmap of the IRQ lines going into the PIC. When a bit is set, the PIC ignores the request. This means we can enable and disable the nth IRQ line by making the value of the nth bit in the IMR as 0 and 1 respectively. Reading from the data port returns value in the IMR register, and writing to it sets the register. Here in our code, after initializing the PICs; we set all bits to 1 thereby disabling all IRQ lines. We will later enable the line corresponding to keyboard interrupt. As of now, let’s disable all the interrupts !!

    Now if IRQ lines are enabled, our PICs can receive signals via IRQ lines and convert them to interrupt number by adding with the offset. Now, we need to populate the IDT such that the interrupt number for the keyboard is mapped to the address of the keyboard handler function we will write.

    Which interrupt number should the keyboard handler address be mapped against in the IDT?

    The keyboard uses IRQ1. This is the input line 1 of PIC1. We have initialized PIC1 to an offset 0x20 (see ICW2). To find interrupt number, add 1 + 0x20 ie. 0x21. So, keyboard handler address has to be mapped against interrupt 0x21 in the IDT.

    So, the next task is to populate the IDT for the interrupt 0x21.
    We will map this interrupt to a function keyboard_handler which we will write in our assembly file.

    Each IDT entry consist of 64 bits. In the IDT entry for the interrupt, we do not store the entire address of the handler function together. We split it into 2 parts of 16 bits. The lower bits are stored in the first 16 bits of the IDT entry and the higher 16 bits are stored in the last 16 bits of the IDT entry. This is done to maintain compatibility with the 286. You can see Intel pulls shrewd kludges like these in so many places !!

    In the IDT entry, we also have to set the type - that this is done to trap an interrupt. We also need to give the kernel code segment offset. GRUB bootloader sets up a GDT for us. Each GDT entry is 8 bytes long, and the kernel code descriptor is the second segment; so its offset is 0x08 (More on this would be too much for this article). Interrupt gate is represented by 0x8e. The remaining 8 bits in the middle has to be filled with all zeroes. In this way, we have filled the IDT entry corresponding to the keyboard’s interrupt.

    Once the required mappings are done in the IDT, we got to tell the CPU where the IDT is located.
    This is done via the lidt assembly instruction. lidt take one operand. The operand must be a pointer to a descriptor structure that describes the IDT.

    The descriptor is quite straight forward. It contains the size of IDT in bytes and its address. I have used an array to pack the values. You may also populate it using a struct.

    We have the pointer in the variable idt_ptr and then pass it on to lidt using the function load_idt().

    load_idt:
    	mov edx, [esp + 4]
    	lidt [edx]
    	sti
    	ret
    

    Additionally, load_idt() function turns the interrupts on using sti instruction.

    Once the IDT is set up and loaded, we can turn on keyboard’s IRQ line using the interrupt mask we discussed earlier.

    void kb_init(void)
    {
    	/* 0xFD is 11111101 - enables only IRQ1 (keyboard)*/
    	write_port(0x21 , 0xFD);
    }
    



    Keyboard interrupt handling function

    Well, now we have successfully mapped keyboard interrupts to the function keyboard_handler via IDT entry for interrupt 0x21.
    So, everytime you press a key on your keyboard you can be sure this function is called.

    keyboard_handler:                 
    	call    keyboard_handler_main
    	iretd
    

    This function just calls another function written in C and returns using the iret class of instructions. We could have written our entire interrupt handling process here, however it’s much easier to write code in C than in assembly - so we take it there.
    iret/iretd should be used instead of ret when returning control from an interrupt handler to a program that was interrupted by an interrupt. These class of instructions pop the flags register that was pushed into the stack when the interrupt call was made.

    void keyboard_handler_main(void) {
    	unsigned char status;
    	char keycode;
    
    	/* write EOI */
    	write_port(0x20, 0x20);
    
    	status = read_port(KEYBOARD_STATUS_PORT);
    	/* Lowest bit of status will be set if buffer is not empty */
    	if (status & 0x01) {
    		keycode = read_port(KEYBOARD_DATA_PORT);
    		if(keycode < 0)
    			return;
    		vidptr[current_loc++] = keyboard_map[keycode];
    		vidptr[current_loc++] = 0x07;	
    	}
    }
    

    We first signal EOI (End Of Interrput acknowlegment) by writing it to the PIC’s command port. Only after this; will the PIC allow further interrupt requests. We have to read 2 ports here - the data port 0x60 and the command/status port 0x64.

    We first read port 0x64 to get the status. If the lowest bit of the status is 0, it means the buffer is empty and there is no data to read. In other cases, we can read the data port 0x60. This port will give us a keycode of the key pressed. Each keycode corresponds to each key on the keyboard. We use a simple character array defined in the file keyboard_map.h to map the keycode to the corresponding character. This character is then printed on to the screen using the same technique we used in the previous article.

    In this article for the sake of brevity, I am only handling lowercase a-z and digits 0-9. You can with ease extend this to include special characters, ALT, SHIFT, CAPS LOCK. You can get to know if the key was pressed or released from the status port output and perform desired action. You can also map any combination of keys to special functions such as shutdown etc.

    You can build the kernel, run it on a real machine or an emulator (QEMU) exactly the same way as in the earlier article (its repo).

    Start typing !!

    kernel running with keyboard support


    References and Thanks

    1. 1. wiki.osdev.org
    2. 2. osdever.net
  8. 10:26 PM 6th October 20142 notes■  Comments








  9. Kernels 101 – Let’s write a Kernel


    Hello World,

    Let us write a simple kernel which could be loaded with the GRUB bootloader on an x86 system. This kernel will display a message on the screen and then hang.

    One does simply write a kernel



    How does an x86 machine boot

    Before we think about writing a kernel, let’s see how the machine boots up and transfers control to the kernel:

    Most registers of the x86 CPU have well defined values after power-on. The Instruction Pointer (EIP) register holds the memory address for the instruction being executed by the processor. EIP is hardcoded to the value 0xFFFFFFF0. Thus, the x86 CPU is hardwired to begin execution at the physical address 0xFFFFFFF0. It is in fact, the last 16 bytes of the 32-bit address space. This memory address is called reset vector.

    Now, the chipset’s memory map makes sure that 0xFFFFFFF0 is mapped to a certain part of the BIOS, not to the RAM. Meanwhile, the BIOS copies itself to the RAM for faster access. This is called shadowing. The address 0xFFFFFFF0 will contain just a jump instruction to the address in memory where BIOS has copied itself.

    Thus, the BIOS code starts its execution.  BIOS first searches for a bootable device in the configured boot device order. It checks for a certain magic number to determine if the device is bootable or not. (whether bytes 511 and 512 of first sector are 0xAA55)

    Once the BIOS has found a bootable device, it copies the contents of the device’s first sector into RAM starting from physical address 0x7c00; and then jumps into the address and executes the code just loaded. This code is called the bootloader.

    The bootloader then loads the kernel at the physical address 0x100000. The address 0x100000 is used as the start-address for all big kernels on x86 machines.

    All x86 processors begin in a simplistic 16-bit mode called real mode. The GRUB bootloader makes the switch to 32-bit protected mode by setting the lowest bit of CR0 register to 1. Thus the kernel loads in 32-bit protected mode.

    Do note that in case of linux kernel, GRUB detects linux boot protocol and loads linux kernel in real mode. Linux kernel itself makes the switch to protected mode.



    What all do we need?

    * An x86 computer (of course)
    * Linux
    * NASM assembler
    * gcc
    * ld (GNU Linker)
    * grub



    Source Code

    Source code is available at my Github repository - mkernel



    The entry point using assembly

    We like to write everything in C, but we cannot avoid a little bit of assembly. We will write a small file in x86 assembly-language that serves as the starting point for our kernel. All our assembly file will do is invoke an external function which we will write in C, and then halt the program flow.

    How do we make sure that this assembly code will serve as the starting point of the kernel?

    We will use a linker script that links the object files to produce the final kernel executable. (more explained later)  In this linker script, we will explicitly specify that we want our binary to be loaded at the address 0x100000. This address, as I have said earlier, is where the kernel is expected to be. Thus, the bootloader will take care of firing the kernel’s entry point.

    Here’s the assembly code:

    ;;kernel.asm
    bits 32			;nasm directive - 32 bit
    section .text
    
    global start
    extern kmain	        ;kmain is defined in the c file
    
    start:
      cli 			;block interrupts
      mov esp, stack_space	;set stack pointer
      call kmain
      hlt		 	;halt the CPU
    
    section .bss
    resb 8192		;8KB for stack
    stack_space:
    

    The first instruction bits 32 is not an x86 assembly instruction. It’s a directive to the NASM assembler that specifies it should generate code to run on a processor operating in 32 bit mode. It is not mandatorily required in our example, however is included here as it’s good practice to be explicit.

    The second line begins the text section (aka code section). This is where we put all our code.

    global is another NASM directive to set symbols from source code as global. By doing so, the linker knows where the symbol start is; which happens to be our entry point.

    kmain is our function that will be defined in our kernel.c file. extern declares that the function is declared elsewhere.

    Then, we have the start function, which calls the kmain function and halts the CPU using the hlt instruction. Interrupts can awake the CPU from an hlt instruction. So we disable interrupts beforehand using cli instruction. cli is short for clear-interrupts.

    We should ideally set aside some memory for the stack and point the stack pointer (esp) to it. However, it seems like GRUB does this for us and the stack pointer is already set at this point. But, just to be sure, we will allocate some space in the BSS section and point the stack pointer to the beginning of the allocated memory. We use the resb instruction which reserves memory given in bytes. After it, a label is left which will point to the edge of the reserved piece of memory. Just before the kmain is called, the stack pointer (esp) is made to point to this space using the mov instruction.

     

    The kernel in C

    In kernel.asm, we made a call to the function kmain(). So our C code will start executing at kmain():

    /*
    *  kernel.c
    */
    void kmain(void)
    {
    	const char *str = "my first kernel";
    	char *vidptr = (char*)0xb8000; 	//video mem begins here.
    	unsigned int i = 0;
    	unsigned int j = 0;
    
    	/* this loops clears the screen
    	* there are 25 lines each of 80 columns; each element takes 2 bytes */
    	while(j < 80 * 25 * 2) {
    		/* blank character */
    		vidptr[j] = ' ';
    		/* attribute-byte - light grey on black screen */
    		vidptr[j+1] = 0x07; 		
    		j = j + 2;
    	}
    
    	j = 0;
    
    	/* this loop writes the string to video memory */
    	while(str[j] != '\0') {
    		/* the character's ascii */
    		vidptr[i] = str[j];
    		/* attribute-byte: give character black bg and light grey fg */
    		vidptr[i+1] = 0x07;
    		++j;
    		i = i + 2;
    	}
    	return;
    }
    

    All our kernel will do is clear the screen and write to it the string “my first kernel”.

    First we make a pointer vidptr that points to the address 0xb8000. This address is the start of video memory in protected mode. The screen’s text memory is simply a chunk of memory in our address space. The memory mapped input/output for the screen starts at 0xb8000 and supports 25 lines, each line contain 80 ascii characters.

    Each character element in this text memory is represented by 16 bits (2 bytes), rather than 8 bits (1 byte) which we are used to.  The first byte should have the representation of the character as in ASCII. The second byte is the attribute-byte. This describes the formatting of the character including attributes such as color.

    To print the character s in green color on black background, we will store the character s in the first byte of the video memory address and the value 0x02 in the second byte.
    0 represents black background and 2 represents green foreground.


    Have a look at table below for different colors:

    0 - Black, 1 - Blue, 2 - Green, 3 - Cyan, 4 - Red, 5 - Magenta, 6 - Brown, 7 - Light Grey, 8 - Dark Grey, 9 - Light Blue, 10/a - Light Green, 11/b - Light Cyan, 12/c - Light Red, 13/d - Light Magenta, 14/e - Light Brown, 15/f – White.



    In our kernel, we will use light grey character on a black background. So our attribute-byte must have the value 0x07.

    In the first while loop, the program writes the blank character with 0x07 attribute all over the 80 columns of the 25 lines. This thus clears the screen.

    In the second while loop, characters of the null terminated string “my first kernel” are written to the chunk of video memory with each character holding an attribute-byte of 0x07.

    This should display the string on the screen.



    The linking part

    We will assemble kernel.asm with NASM to an object file; and then using GCC we will compile kernel.c to another object file. Now, our job is to get these objects linked to an executable bootable kernel.

    For that, we use an explicit linker script, which can be passed as an argument to ld (our linker).

    /*
    *  link.ld
    */
    OUTPUT_FORMAT(elf32-i386)
    ENTRY(start)
    SECTIONS
     {
       . = 0x100000;
       .text : { *(.text) }
       .data : { *(.data) }
       .bss  : { *(.bss)  }
     }
    

    First, we set the output format of our output executable to be 32 bit Executable and Linkable Format (ELF). ELF is the standard binary file format for Unix-like systems on x86 architecture.

    ENTRY takes one argument. It specifies the symbol name that should be the entry point of our executable.

    SECTIONS is the most important part for us. Here, we define the layout of our executable. We could specify how the different sections are to be merged and at what location each of these is to be placed.

    Within the braces that follow the SECTIONS statement, the period character (.) represents the location counter.
    The location counter is always initialized to 0x0 at beginning of the SECTIONS block. It can be modified by assigning a new value to it.

    Remember, earlier I told you that kernel’s code should start at the address 0x100000. So, we set the location counter to 0x100000.

    Have look at the next line .text : { *(.text) }

    The asterisk (*) is a wildcard character that matches any file name. The expression *(.text) thus means all .text input sections from all input files.

    So, the linker merges all text sections of the object files to the executable’s text section, at the address stored in the location counter. Thus, the code section of our executable begins at 0x100000.

    After the linker places the text output section, the value of the location counter will become
    0x1000000 + the size of the text output section.

    Similarly, the data and bss sections are merged and placed at the then values of location-counter.



    Grub and Multiboot

    Now, we have all our files ready to build the kernel. But, since we like to boot our kernel with the GRUB bootloader, there is one step left.

    There is a standard for loading various x86 kernels using a boot loader; called as Multiboot specification.

    GRUB will only load our kernel if it complies with the Multiboot spec.

    According to the spec, the kernel must contain a header (known as Multiboot header) within its first 8 KiloBytes.

    Further, This Multiboot header must contain 3 fields that are 4 byte aligned namely:

    • a magic field: containing the magic number 0x1BADB002, to identify the header.
    • a flags field: We will not care about this field. We will simply set it to zero.
    • a checksum field: the checksum field when added to the fields ‘magic’ and ‘flags’ must give zero.

    So our kernel.asm will become:

    ;;kernel.asm
    
    ;nasm directive - 32 bit
    bits 32
    section .text
            ;multiboot spec
            align 4
            dd 0x1BADB002            ;magic
            dd 0x00                  ;flags
            dd - (0x1BADB002 + 0x00) ;checksum. m+f+c should be zero
    
    global start
    extern kmain	        ;kmain is defined in the c file
    
    start:
      cli 			;block interrupts
      mov esp, stack_space	;set stack pointer
      call kmain
      hlt		 	;halt the CPU
    
    section .bss
    resb 8192		;8KB for stack
    stack_space:
    

    The dd defines a double word of size 4 bytes.

    Building the kernel

    We will now create object files from kernel.asm and kernel.c and then link it using our linker script.

    nasm -f elf32 kernel.asm -o kasm.o
    

    will run the assembler to create the object file kasm.o in ELF-32 bit format.

    gcc -m32 -c kernel.c -o kc.o
    

    The ’-c ’ option makes sure that after compiling, linking doesn’t implicitly happen.

    ld -m elf_i386 -T link.ld -o kernel kasm.o kc.o
    

    will run the linker with our linker script and generate the executable named kernel.



    Configure your grub and run your kernel

    GRUB requires your kernel to be of the name pattern kernel-<version>. So, rename the kernel. I renamed my kernel executable to kernel-701.

    Now place it in the /boot directory. You will require superuser privileges to do so.

    In your GRUB configuration file grub.cfg you should add an entry, something like:

    title myKernel
    	root (hd0,0)
    	kernel /boot/kernel-701 ro
    



    Don’t forget to remove the directive hiddenmenu if it exists.

    Reboot your computer, and you’ll get a list selection with the name of your kernel listed.

    Select it and you should see:

    image

    That’s your kernel!!



    PS:

    * It’s always advisable to get yourself a virtual machine for all kinds of kernel hacking. * To run this on grub2 which is the default bootloader for newer distros, your config should look like this:
    menuentry 'kernel 701' {
    	set root='hd0,msdos1'
    	multiboot /boot/kernel-701 ro
    }
    



    * Also, if you want to run the kernel on the qemu emulator instead of booting with GRUB, you can do so by:
    qemu-system-i386 -kernel kernel

    Also, see the next article in the Kernel series:
    Kernels 201 - Let’s write a Kernel with keyboard and screen support



    References and Thanks

    1. 1. wiki.osdev.org
    2. 2. osdever.net
    3. 3. Multiboot spec
    4. 4. Thanks to Rubén Laguna from comments for grub2 config
  10. 3:12 PM 14th April 201465 notes■  Comments








  11. Character literal is not a character in C !


    I was once writing a program similar to the following. But I made a little mistake which led me to something interesting.

    #include <stdio.h>
    int main()
    {
      char ch[256];
      scanf("%s", ch);
      if (ch == 'a') {
        printf("Your sentence begins with %c.\n", *ch);
      }
      return 0;
    }
    

    In this code, I am supposedly reading a string from stdin, checking if it begins with the character literal 'a' and if it does, print something.

    However, if you watch closely enough I missed a dereference operator (*) on line 6.
    It should have been if (*ch == 'a') {

    Not noticing the bug, I went on to compile the code.

    GCC threw the following warning:
    warning: comparison between pointer and integer

    $ gcc -o test test.c
    test.c:6:10: warning: comparison between pointer and integer ('char *' and 'int')
      if (ch == 'a') {
          ~~ ^  ~~~
    1 warning generated.
    

    Now I knew where to fix, but the warning caught my attention. It says I’m trying to compare a pointer and an integer. But where does the integer come from?

    I’m not using an integer anywhere in the code except for the return value of main(). In the line that throws the warning, I’m comparing a pointer (ch) to a character literal ('a').

    A little bit of investigation and I found out why.

    Before I tell you why, let me try to find the size of the character literal and compare it to that of an int and char.

    #include 
    int main()
    {
      printf("sizeof(char) %zu\n", sizeof(char));
      printf("sizeof(int) %zu\n", sizeof(int));
      printf("sizeof('a') %zu\n", sizeof('a'));
      return 0;
    }
    

    Here’s the output:

    sizeof(char) 1
    sizeof(int) 4
    sizeof('a') 4
    

    As you can see, the character literal doesn’t occupy the size of a char, it takes the size of an int.

    Here’s what the C standard says (§6.4.4.4):

        883 An integer character constant has type int.
        
         ...
        
        886 If an integer character constant contains a single
        character or escape sequence, its value is the one that 
        results when an object with type char whose value is that of 
        the single character or escape sequence is converted to 
        type int.
    



    Yes. In C, a character literal is an int not a char. (char is the smallest integer datatype)

    PS:
    The same is not true for C++. See the output below.

    $ gcc -o size size.c && ./size
    sizeof(char) 1
    sizeof(int) 4
    sizeof('a') 4
    $ g++ -o size size.c && ./size
    sizeof(char) 1
    sizeof(int) 4
    sizeof('a') 1
    
  12. 1:31 AM 5th March 20140 notes■  Comments








  13. Simple php wrapper for OAuth requests


    Hi, i wrote a simple php wrapper for making OAuth requests to a url.

    Here’s the code on github.

    Include the library file in your code first:

    require('oauthRequest.php');
    

    Now, you can use the oauthRequest method which returns the response text:

    $output = oauthRequest(<url>);
    
  14. 5:36 AM 24th February 20140 notes■  Comments








  15. Some jQuery bugs


    jQuery is probably the best thing to have happened to an otherwise lousy language JavaScript. Just a couple of things I felt that doesn’t fare well with jQuery.

    XSS !!!

    Suppose you want to create an HTML element on the fly, you could use the $() method and pass to it a string that looks like HTML.
    If the string matches the regex for an HTML tag, then the string is internally passed to the $.parseHTML() method.

    So, I could create a div like:

    var $mainDiv = $("<div class='main-div'></div>");
    

    What if a scripts is passed as an event attribute:

    $("<img src=x onerror=alert(/hacked/)></img>");

    Here’s what happens:

    image

    The onerror event was fired just when the node was created. I could pass any malicious script here instead of the alert, which would then be run immediately. Busted!!

    What if I give a valid url for the src attribute.
    A GET request is immediately sent to the url.

    What if you are logged in, and given the src path meets the cookies’ restrictions,
    you are sending off your cookies as well.
    Busted again!!

     

    The .data() method:

    Now, I am going to attach some data to my body element:

    $("body").data({"my-fav": 7});
    

    Let me try see if the data’s set.

    $("body").data("my-fav");
    >>7
    

    Good.

    Well, 7 is no more my favorite number, I should change it to 5.
    Here, I go:

    $("body").data("my-fav", 5);
    

    Now let me check if it’s there:

    $("body").data("my-fav");
    >>7
    

    Oops !! It hasn’t changed.

    Let me have a look at all the data the node has:

    $("body").data();
     >>Object {my-fav: 7, myFav: 5}
    

    If I remove the hyphen and cameCase the key:

    $("body").data("myFav");
    >>5
    

    But "myFav" isn’t what I asked for !!

  16. 7:39 AM 16th February 20140 notes■  Comments








  17. Simple and free file storage for your website using Dropbox and Google App Engine


    Hi folks,

    I wanted to host some files on my this website which i host on tumblr.
    So, i decided to put them on dropbox and write a simple python app hosted on google app engine that serves them.
    Finally, pointed one of my subdomains to the python app - so i can host stuff in my domain like files.arjunsreedharan.org/test

    I have put the code on Github: arjun024/pystorage

    To use for your website, all your need to do is specify your dropbox user-id and the name of the folder you wish to store your content.
    Read on.

    What services do we use here

    image

    How we build our environment

    Dropbox

    • Create a dropbox account.
    • Create a folder inside your “Public” directory.
    • Create a test file inside that folder, Right-click and view its public url and find out your user-id.
      (public url will be of the syntax: https://dl.dropboxusercontent.com/u/<USERID>/<YOURFOLDER>/testfile)
    • Input user-id and folder-name as variable values in index.py.
    DROPBOX_USERID = "<USERID>"
    DROPBOX_FOLDER = "<YOURFOLDER>"
    

    Google App Engine

    • Create a Google App Engine account.
    • register a unique app-id.
    • now your application will run at <app-id>.appspot.com
    • Dowload Python and Google App Engine SDK
    • Download source files of pystorage project from github. (here)
    • Select “src” folder as the project folder in the GAE SDK.
    • Deploy it to App Engine.

    Example

    http://py-storage.appspot.com/test

    This serves the file ‘test’ that is located in the specified folder of my Dropbox’s Public directory.

    Access the files under your own domain

    • Let’s say you want to acces the test file as <files.yourwebsite.com>/test
    • Login to your App Engine, set custom domain for your app as <files.yourwebsite.com>
    • In your website’s DNS settings: point CNAME record for <files.yourwebsite.com> to ghs.googlehosted.com
    • You’re done.
  18. 1:38 PM 15th January 20141 notes■  Comments








  19. How to find the size of an array in C without sizeof (aka The difference between arr and &arr)


    Hey folks, Long time no C.

    Generally in C, this is how we find the length of an array arr :

    n = sizeof(arr) / sizeof(arr[0]);
    


    Here we take the size of the array in bytes; then divide it by the size of an individual array element.


    What if I tell you that we can get rid of sizeof and have a cooler way to calculate size? like this:

    n = (&arr)[1] - arr;
    

    Alright, let’s see how that could be correct!

    Have you ever thought what’s the difference between arr and &arr?


    Well let’s check that out by printing the memory addresses of both:

    int arr[5] = {1, 2, 3, 4, 5};
    printf("Address of arr  is %p\n", (void*)arr);
    printf("Address of &arr is %p\n", (void*)&arr);
    

    and here’s the output:

    $ gcc -o a size.c
    $ ./a
    Address of arr 		is 0x7fff57266870
    Address of &arr 	is 0x7fff57266870
    

    As you can see in the output, both arr and &arr point to the exact same memory location 0x7fff57266870.

    Now, let’s increment both the pointers by 1 and check their memory address.


    Here’s the code to check for the memory address of arr + 1 and &arr + 1 :

    int arr[5] = {1, 2, 3, 4, 5};
    printf("Address of arr      is %p\n", (void*)arr);
    printf("Address of &arr     is %p\n", (void*)&arr);
    printf("Address of arr + 1  is %p\n", (void*)(arr + 1));
    printf("Address of &arr + 1 is %p\n", (void*)(&arr + 1));
    

    and the output:

    $ gcc -o a size.c
    $ ./a
    Address of arr 		is 0x7fff57266870
    Address of &arr 	is 0x7fff57266870
    Address of arr + 1 	is 0x7fff57266874
    Address of &arr + 1 	is 0x7fff57266884
    

    We find that:

    (arr + 1) points to 874 which is 4 bytes away from arr, which points to 870 (I have removed the higher order bits of the address for brevity). An int on my machine takes up 4 bytes, so (arr + 1) points to the second element of the array.

    (&arr + 1) points to 884 which is 20 bytes away from arr (points to 870).
    (884 - 870 = 14 in hex = 20 in decimal)

    Taking the size of int into consideration, (&arr + 1) is 5 int-sizes away from the beginning of the array. 5 also happens to be the size of the array. So, (&arr + 1) points to the memory address after the end of the array.

    Why is (arr + 1) and (&arr + 1) different though arr and &arr point to the same location?
    The answer - While (arr + 1) and (&arr + 1) have the values, they are different types.
    arr is of the type int *, where as &arr is of the type int (*)[size].


    So, &arr points to the entire array where as arr points to the first element of the array.

    image


    This brings us to something useful - length of the array.


    * (&arr + 1) gives us the address after the end of the array and arr that of the first element of the array.
    Subtracting latter from former would thus give the length of the array.

    n = *(&arr + 1) - arr;
    


    We can simplify this using array indexes (since x[1] is same as *(x+1)), and here you go:

    n = (&arr)[1] - arr;
    

    Pretty cool right? But, please don’t really use this in real life, unless there’s a compelling reason. sizeof is an operator evaluated at compile time (execpt for C99 variable-length arrays), so it’s also pretty cool, though it doesn’t look it!


    PS:
    This works only for arrays, not when you take pointers (as in char *str for strings).

    void reverse_str(char *str)
    { 
      //wrong
      int strlength = (&str)[1] - str;
    }
    

    In this case &str is a pointer that points to the pointer str. Remember, in C arrays are not pointers.



    Addendum:

    Here’s an interesting Question/Answer I found on stackoverflow regarding the same topic.

    Q: Accessing the first address after an array seems to be undefined behavior. For example: if your array located at the end of an address-space, the referencing address causing an overflow and your resulting size could be anything. Then how could you access (&arr)[1]<\code> ?

    A: C doesn’t allow access to memory beyond the end of the array. It does, however, allow a pointer to point at one element beyond the end of the array. The distinction is important.

  20. 4:28 PM 7th December 20138 notes■  Comments








    Disclaimer: The views expressed here are solely those of the author in his private capacity and do not in any way represent the views of the author's employer or any organization associated with the author.










































































Check this out:
Wikicoding, the wikipedia of code
Recent Posts:
Simplicity is the ultimate sophistication.
©
Arjun Sreedharan 2013