Tuesday, 25 December 2012

What's that /boot/initramfs file?



Have you ever wondered how kernel mounts the root "/" filesystem? When I was learning Linux and Unix a few years ago, this question used to haunt me. Why? In order to load a filesystem, kernel needs to have the filesystem module loaded into memory. You can achieve this in two ways, either you compile in the filesystem module code along with the kernel i.e statically link it. Or, you can load the module dynamically when needed. The problem arises there. How is it possible for the Linux kernel to load many kernel modules even before mounting the root filesystem? (If you see dmesg, "mounting root filesystem" message appears towards the end). FYI, kernel modules are usuaally stored in /lib/modules/`uname -r`/ directory.


To solve this problem, Linux uses initram filesystem. Grub boot loader takes "initrd" argument along with the location of the iniramfs file. Grub has enough knowledge of the filesystems enabling it to find kernel, initramfs and config files from the /boot filesystem. But it does not yet support LVM or software RAID. That's why you don't get to see /boot partition on an LVM or software RAID.


Anyway, so grub takes the initramfs file, uncompresses it, and lays it out on the memory. Initraamfs contains enough modules and binaries to load the root filesystem modules. When control passes to the kernel, it sees the initramfs filesystem as it's root file system. It then executes /init (yes, it's not a typo) script on initramfs, which loads the modules required for mounting the root filesystem i.e modules for LVM/software RAID, filesystem. Once the root filesystem is mounted, the script chroots to it and the control is passed on to the /sbin/init program. From then onwards, you all know what happens.


Are you not feeling excited enough to see the contects of the initramfs filesystem? Yes, you can see the contents of it. Just execute the following commands:

mkdir /tmp/init
cp /boot/initramfs-`uname -r`.img /tmp/init/initramfs.gz
gunzip /tmp/init/initramfs.gz
cd /tmp/init
cpio -iv </tmp/init/initramfs

And explore the miniature root filesystem!! :)


Wednesday, 5 December 2012

How to create external Journal device in Linux with EXT4

Among you, who have worked on AIX, must be smiling and thinking "Is this really something to write about on a blog?". Well, it's true for AIX, as external Journal device is an old story there. On the other hand, I have seen very few or close to none usage of external journal device in production Linux systems.

People who do not know what a Journaling filesystem is, this should be comprehensive enough: The whole idea of Journaling filesystem is to keep a write-ahead data of all the filesystem changes. In this way, creating/deleting/modifying a file becomes a transaction. Remember atomicity of transaction in DBMS? All or None law? The same idea works here. So in case if some thing goes wrong, the filesystem can always roll back to the last known good state. However, all awesome things come with a cost. Here the cost you pay is the filesystem write performance: anytime you want to write data to a file, the filesystem will write to the journal first and then it will write the actual data.

Here's why you might want to have an external journal device for your ext4 file system:

1. You are avoiding corruption in the journal data itself by keeping it somewhere else than the original data
2. As in journaling, filesystem writes data twice, keeping a separate journal device i.e a partition or logical volume on a separate physical disk altogether will introduce a significant performance boost

There's a downside though: You cannot have multiple EXT4 filesystems sharing the same journal device as of now (AIX folks have probably started laughing now).

So how do you actually create a external journal device? Here's how:

mke2fs -O journal_dev /dev/block_device_name















While creating a new filesystem, you can easily point to the newly created journal device like this:

mke2fs -t ext4 -J device=/dev/journal_device_name /dev/block_device_for_new_fs
















To change journaling from internal to external for an existing filesystem, first unmount the filesystem. Then, execute the following:

tune2fs -O ^has_journal /dev/blk_dev_for_existing_fs
tune2fs -o journal_dev -j -J device=/dev/device_name /dev/blk_dev_for_existing_fs

There are few things to remember though:
The size of the journal device should be at least 1024*block size of the file system. So a filesystem with 4kb block size should have a journal device of 4mb at least.
Block size of the journal device should be set to the same as that of the actual filesystem

You can gain around 40% performance boost with external journal device (provided, the journal device resides on a separate physical disk). Amazing huh?

Sunday, 30 September 2012

Turbo charge your web servers using Squid HTTP Intercept mode

Now, this is not of course something new. But, I thought of sharing basic configurations of Squid proxy caching server to cache web requests. There are of course a lot more you can do with Squid, one of them is load balancing web servers. One thing to remember though, if your web content is dynamic i.e contains stuffs that change over a short time, you may not gain that much performance boost. Also, this does not wok with Secure HTTP i.e HTTPS as it was designed to avoid Man In the Middle attacks.

So here it is:

eth0: 172.16.1.11/24 => client facing interface
eth1: 10.0.1.11/24 => web server facing client

1. Install Squid

yum install squid

2. Before we start the service we have to do few changes in the config mode to tun on the intercept mode. Open the /etc/squid/squid.conf file

http_port 3128 intercept

It listens on tcp port 3128 and enables the intercept mode specifically designed to facilitate web traffic.

3. Next, we have to set the cache directory and it's size

cache_dir aufs /var/spool/squid 90 16 256

aufs is better Advanced UNIX file system mode and is better than the ufs mode in terms of file operations. Or else, you may specify diskd mode too which is almost similar but runs as a separate daemon and requires a litlle bit extra fine tuning. 90 MB is the size of the cache. 16 i the number of directories in the cache dir and 256 is the number of the directories under each directory. You may double the numbers depending on the load of the web server.

To have maximum performance, I have mounted /var/spool/squid as tmpfs to keep all of its contents in memory rather than on hard disk

My /etc/fstab has an entry like this

tmpfs /var/spool/squid tmpfs size=100m,rw,rootcontext="system_u:object_r:squid_cache_t:s0" 0 0                    

                           
4. Set the cache memory size to be used

cache_mem 50 MB

Squid is intelligent enough to decide which content should go to cache memory and what should be kept in the cache disk.

5. Next, you have to set your router to route traffic for the web server to the squid server so that the web queries can take advantage of squid. I used my squid server as the router as well (this is called taking most out of it ;)).

So here are the iptables settings I had to do

A. Enable routing:
 
sysctl -w net.ipv4.ip_forward=1 >>/etc/sysctl.conf

B. To avoid looping, we have to tell iptables to accept any port 80 traffic which came from our squid server's IP  

iptables -t nat -A PREROUTING -s 10.0.1.11/32 -p tcp -m tcp --dport 80 -j ACCEPT

Then, redirect any traffic for port 80 to localhost tcp port 3128 (which is squid server)

iptables -t nat -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j DNAT --to-destination 127.0.0.1:3128

While sending out web traffic to the web server, pose as if we are the client

iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE

Enable forwarding

iptables -A FORWARD -i eth0 -j ACCEPT

service iptables save

6. Start the squid service now

service squid start

7. Now, we have to point our clients and the web server to the squid server as default gateway and test the network connectiong using traceroute and ping commands.

Everything should be fine now and all web queries should go through squid.

Saturday, 22 September 2012

OpenLDAP + NFS + Automount = Complete Identity Solution

Well, we have lots of identity solutions these days. They are ready to be in use out-of-the-box with very little configuration changes. But having said that, be it MS Windows Active Directory, Red Hat Directory Server, or IBM Tivoli Identity Manager, all are based on rock-solid LDAP protocol. Still, I have seen people using OpenLDAP in Open Source projects as well as critical commercial environments.

I thought of setting up my own OpenLDAP server for my home lab, just for fun as well as to have more in-depth knowledge about it. As always, I felt sharing the knowledge I gained and the issues that I came across.

I am using RHEL 6.2 on both the server and clients.

Setting up the server:

1. Install the required packages

yum install openldap*

2. cd /etc/openldap/slapd.d
   find ./ -type f | xargs grep "dc=my-domain,dc=com"

This will usually point to ./cn=config/olcDatabase={2}bdb.ldif file
Open that file and change the domain name with yours in vi

:%s/my\-domain/vmnet/g

3. Change the domain admin's user name from Manager to root to look like this

olcRootDN: cn=root,dc=vmnet,dc=com

4. Press CTRL+Z while in vi to stop the process and run slappasswd to set a new password for the domain admin, root in this case

5. Copy the password string and type 'fg' to get the vi session resumed. Make a new line after olcRootDN directive and put a line like this with the password

olcRootPW: {SSHA}wIEjnTE+CU6U1KsU5pGdcmEyqZ/jTsbt

6. At this point, you may check if the configs are fine by running the following command

slaptest -u 

-u is to ignore warnings for database files, no issues now as we are yet to create them

7. Now, we need to install migrationtools package to migrate exiting users/groups etc. database to LDAP

yum install migrationtools -y

8. cd to /usr/share/migrationtools/ and edit the follwing lines in the migrate_common.ph file to reflect correct domain name

# Default DNS domain
$DEFAULT_MAIL_DOMAIN = "vmnet.com";

# Default base
$DEFAULT_BASE = "dc=vmnet,dc=com";

9. Run the migrate_all_offline.sh script to build LDAP DBs out of local users, groups etc.

10. Now, change the owner of the newly created files in /var/lib/ldap directory

chown -R ldap:ldap /var/lib/ldap/*

11. Start the slapd service

service slapd start
chkconfig slapd on --level 35

12. Open up LDAP port 389 both TCP and UDP on iptables

iptables -I INPUT -m state --state NEW -m tcp -p tcp --dport 389 -j ACCEPT
iptables -I INPUT -m state --state NEW -m udp -p udp --dport 389 -j ACCEPT

13. At this point, you should be able to see the objects in the LDAP domain using slapcat command


Setting up the client

1. Install the following packages

yum install pam_ldap nss-pam-ldapd -y

2. Run authconfig-tui and select LDAP for User Information and Authentication and select NEXT. You have to then provide FQDN for your LDAP server and domain name in the Base DN field.

===============================
Adding/removing/modifying LDAP objects
===============================

If you are not familiar with the ldif file format, use slapcat or migrate_passwd.pl script in /usr/share/migrationtools directory to get one example.

Then you may execute one of the following:

ldapadd -a -W -D "cn=root,dc=vmnet,dc=com" </tmp/testuser.ldif

Or else, you may install phpLDAPadmin to administer the LDAP server through web

yum httpd php php-ldap




















============================================
Getting user's home directory automatically mounted on client
============================================

It's better to create a separate home diretory for the ldap users. /home/users => this is what I chose

Share this through nfs server

/home/users   10.0.1.0/24(rw,no_root_squash,sync,no_wdelay)

Now, on the client side, configure autofs:

1. In the /etc/auto.master file, you may add the following

/home/users   /etc/auto.home

2. Create /etc/auto.home file and add the follwing

*       -fstype=nfs     red.vmnet.com:/home/users/&

3. Create /home/users directory

In this approach, there will not be any clash between a local user and an LDAP user logged in on the same client machine as they will have separate home directories. Otherwise, a local user would lose access to their home directory once an LDAP user's home directory got automounted on /home.

Now, you are highly likely run into permission issue on the user's home directories if you have not already configured how IDs should match. /etc/idmapd.conf on the client machine is something you need to concentrate on.

This file must be edited for the below directives/options

[General]
Domain = vmnet.com
[Translation]
LDAP_server = red.vmnet.com
LDAP_base = dc=vmnet,dc=com

Now, restart the rpcidmapd service

service restart rpcidmapd

You may ask the users to setup ssh-keys and they will be able to log in to any LDAP clients

That's about it!!

Tuesday, 4 September 2012

Centralized logging system: rsyslog, logstash, Elasticsearch & kibana

So I have been a little bit busy with this voluntary work I do in my free time. I am associated with Wikimedia Foundation as a volunteer IT staff. People are very much open and helpful there.

Few months back, I was browsing through their ongoing projects hoping someone would need help in something which I can contribute to. As I was new to their community, things were half clear to me. I needed work on a project which would make me understand their infrastructure and it should be simple enough for me. Of course, I did not want to dive into the most complicated project and then sitting idle, looking at other people's scribbles on IRC.

One day I stumbled upon an interesting project. Its objective was to build a centralized logging system with good search capability. Although they use Nagios for alerting, if someone needed to dive search through logs, they had to log into that particular server and they do a little grep or egrep against the logs. I thought this would be a perfect project for me. I would get to know about the surrounding, plus it's relatively simple to setup something like this.

I was added to the project and I was the only one member. Sweet!!

So I first started with some open source products for experimenting various things. Few did just fine, few did not scale at all. At last I found a perfect combination: Logstash, Elasticsearch, and Kibana.

Logstash is very useful and versatile. It's made of JRuby (Java+Ruby). You can specify inputs and outputs as well as filters. It supports various input types. One of them is "Linux Syslog". Which means, you do not have to install logging agent on every server increasing the overall load of the server. Your default rsyslog client will do just fine. Then comes the filtering part, after taking input, you can filter out logs within Logstash itself. It's awesome but it didn't serve any purpose for me as I wanted to index every log. Next is the output part, Logstash can output logs on standard output (why would anyone want that). But as with input, it supports multiple output types too. One of them is Elasticsearch.

Elasticsearch is a Java based log indexer. You can search through Elasticsearch indices using Lucene search syntax for more complicated query. But, simple wildcard search works too.

Next comes Kibana. It provides the web frontend for Elasticsearch, written on Java Script and PHP, requires only one line to be edited for this to work out off the box.

As of now, I have configured all of them on one relatively larger lab VM. There were several hitches in the beginning, but apart from that it all went pretty smooth.

So here's what is happening:


I had to make a little init script too for these services (logstash and elasticsearch). It's based on Ubuntu 10.04.3 LTS, but should work on CentOS/RedHat as well with a little bit modification.

Here's the script:

#! /bin/sh

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

. /lib/lsb/init-functions

mode="reader"
name="logstash-$mode"
logstash_bin="/usr/bin/java -- -jar /logstash/logstash.jar"
logstash_conf="/logstash/logstash.conf"
logstash_log="/logstash/$name.log"
ls_pid_file="/var/run/$name.pid"
es_pid_file="/var/run/elasticsearch.pid"

NICE_LEVEL="-n 19"

# This is needed by ElasticSearch
export ES_HOME="/logstash/elasticsearch"

# sets max. open files limit to 65000
# otherwise, elasticsearch throws java.io.IOException: Too many open files

ulimit -n 65000









start () {

        command_es="/usr/bin/nice ${NICE_LEVEL} /logstash/elasticsearch/bin/elasticsearch"
        #command_ls="/usr/bin/nice ${NICE_LEVEL} ${logstash_bin} agent -f ${logstash_conf} -- web --backend elasticsearch:///?local --log ${logstash_log}"
        command_ls="/usr/bin/nice ${NICE_LEVEL} ${logstash_bin} agent -f ${logstash_conf} --log ${logstash_log}"


        log_daemon_msg "Starting" "elasticsearch"
        if start-stop-daemon --start -d "/logstash/elasticsearch" --quiet  --oknodo  -b --exec ${command_es}; then
                log_end_msg 0
                # I had to do this as -p option with elasticsearch gives wrong PID
                # The same with --pidfile option with start-stop-daemon
                sleep 1 # takes a little bit of time before getting caught by below
                # don't why I chose to grep for "sigar"; maybe it looks like cigar
                ps -elf | grep [e]lasticsearch | grep sigar | awk '{ print $4 }' >${es_pid_file}
        else
                log_end_msg 1
        fi


        log_daemon_msg "Starting" "logstash"
        if start-stop-daemon --start -d "/logstash/" --quiet --oknodo --pidfile "$ls_pid_file" -b -m --exec ${command_ls}; then
                log_end_msg 0
        else
                log_end_msg 1
        fi
}

stop () {
        start-stop-daemon --stop --quiet --oknodo --pidfile "$ls_pid_file"
        start-stop-daemon --stop --quiet --oknodo --pidfile "/var/run/elasticsearch.pid"
}

status () {
        status_of_proc -p $ls_pid_file "" "$name"
        status_of_proc -p ${ws_pid_file} "" "elasticsearch"
}

case $1 in
        start)
        if status; then exit 0; fi
        start
        ;;

        stop)
        stop
        ;;

        reload)
        stop
        start
        ;;

        restart)
        stop
        start
        ;;

        status)
                status && exit 0 || exit $?
        ;;

        *)
        echo "Usage: $0 {start|stop|restart|reload|status}"
        exit 1
        ;;
esac

exit 0
                           

The system is in the testing phase. We need to check how it scales for 2000+ servers, maybe we have to think about load-balancing too. But as of now, it really does a great job regarding memory consumption, disk space, processing power, etc.

Once the whole system gets ready to go live for production servers, I would definitely publish more detailed technical stuffs. Crossing my fingers!!

Wednesday, 18 July 2012

Linux RPM to the rescue!

RedHat and its variants e.g CentOS, Fedora use RPM packages. Apart from simply installing and upgrading an RPM package, RPM command can be very very handy while troubleshooting.

Below are some interesting scenarios where RPM command does a nifty job.

1. You have an RPM file and you do not know any thing about it other than just the name and version number of it. Want to know more about the package? Try the below command:

# rpm -qip kernel-2.6*.rpm
warning: kernel-2.6.32-220.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID

fd431d51: NOKEY
Name        : kernel                       Relocations: (not relocatable)
Version     : 2.6.32                            Vendor: Red Hat, Inc.
Release     : 220.el6                       Build Date: Wed 09 Nov 2011 07:39:52 PM

IST
Install Date: (not installed)               Build Host: x86-004.build.bos.redhat.com
Group       : System Environment/Kernel     Source RPM: kernel-2.6.32-220.el6.src.rpm
Size        : 117131326                        License: GPLv2
Signature   : RSA/8, Thu 10 Nov 2011 12:33:33 AM IST, Key ID 199e2f91fd431d51
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
URL         : http://www.kernel.org/
Summary     : The Linux kernel
Description :
The kernel package contains the Linux kernel (vmlinuz), the core of any
Linux operating system.  The kernel handles the basic functions
of the operating system: memory allocation, process allocation, device
input and output, etc.

2. Some times you really need to know the files that could be overwritten by a package update/upgrade, so that you can take a backup of them before hand. You can list down the files provided by an RPM package and then the similar files which are already in the system from a previous installation can go to your backup. To list down the files provided by an RPM package:

# rpm -qlp kernel-2.6*.rpm

3. Admins are not Einstein. We cannot remember each and everything ( oh come on! Even Einstein used to forget way to hist own house!! ). But, admins belong to that species which can always find a way around. Suppose, one day you are struggling to remember the location of the config file for samba. Google is your friend, but we are too lazy to bring up even a web browser. With the below command, you can see the config files for samba

# rpm -qc samba
/etc/logrotate.d/samba
/etc/pam.d/samba
/etc/samba/smbusers

4. What could be worse than one day your junior admits that he deleted some package related files and he does not remember what exact files he deleted. How will you restore them? It's easy if you have a good backup, but what if backup team is too busy doing other things and you are left alone? As far as the files are from an RPM, you can always restore them (well almost!)

First, check what files are missing:

rpm -Va       (takes really long time)

Then, once you confirm what exact files are missing, you can find what package provides that

rpm -qf /path/to/the/file

Now, freshen the package

rpm -Fvh package-name.rpm

You should get the fresh files now.


5. As you may know, RPM actually keeps databases of what packages are installed and when etc. Sometimes the rpm db gets corrupted and you get a warning message every time you try to install/upgrade an RPM package. How to solve this? Simple.

# rm -f /vaar/lib/rpm/__db.*
# rpm -vv --rebuilddb


6. There are many reasons you may want to know the packages which were installed on the system recently. RPM provides a way to have that information too along with the time stamp when that package was installed.

# rpm -qa --last | head
samba-3.5.10-114.el6                          Tue 17 Jul 2012 01:02:26 PM IST
mysql-server-5.1.52-1.el6_0.1                 Tue 10 Jul 2012 12:21:58 PM IST
perl-DBD-MySQL-4.013-3.el6                    Tue 10 Jul 2012 12:21:55 PM IST
mysql-5.1.52-1.el6_0.1                        Tue 10 Jul 2012 12:21:54 PM IST
setroubleshoot-server-3.0.38-2.1.el6          Wed 04 Jul 2012 12:55:03 PM IST
setroubleshoot-plugins-3.0.16-1.el6           Wed 04 Jul 2012 12:55:02 PM IST
python-slip-dbus-0.2.11-1.el6                 Wed 04 Jul 2012 12:55:01 PM IST
python-slip-0.2.11-1.el6                      Wed 04 Jul 2012 12:55:01 PM IST
python-decorator-3.0.1-3.1.el6                Wed 04 Jul 2012 12:55:00 PM IST


7. Have you ever wondered what an RPM file is? Well, it's basically a special archived file. You can extract the contents of the RPM package in the current directory without even installing it.

# rpm2cpio kernel-*.rpm | cpio -idmv


Isn't RPM amazing?

Wednesday, 4 July 2012

Linux: Deploying NIS Master/Slave/Client - The easy way


A. Configuring Master Server

1. Install the required package

yum install ypserv

2. Set the NIS domain name

echo “NISDOMAIN=vmnet.com” >>/etc/sysconfig/network
ypdomainname vmnet.com

3. Start the service

service ypserv start
chkconfig --level 35 ypserv on

4. Check if the rpc ports have opened and everything is fine

rpcinfo –p | grep ypserv
rpcinfo –u red.vmnet.com ypserv

5. Change NOPUSH=false from NOPUSH=true if you have slave servers in /var/yp/Makefile

6. List down your slave servers in /var/yp/ypservers

echo “vlue.vmnet.com” >>/var/yp/ypservers

7. Now, run ypinit to configure this as master server

/usr/lib64/yp/ypinit –m


8. Start the ypxfrd daemon which is responsible for transferring the maps from master server to the slave servers

service ypxfrd start
chkconfig –level 35 ypxfrd on

9. Start the yppasswdd service which lets NIS users to change their passwords

service yppasswdd start
chkconfig –level 35 yppasswdd on

10. Export the /home directory for the NIS users through NFS

B. Configuring NIS Slave server:

1. Configure the NIS domain name

echo “NISDOMAIN=vmnet.com” >>/etc/sysconfig/network
ypdomainname vmnet.com

2. Configure it as the slave server

/usr/lib64/yp/ypinit –s red.vmnet.com
3. Start the yppasswdd service
4. Configure cronjob to schedule map transfers from master
* */1 * * * /usr/lib64/yp/ypxfr_1perhour >/dev/null 2>&1
30 2 * * * /usr/lib64/yp/ypxfr_1perday >/dev/null 2>&1

C. Configuring Client

1. Install the client package

yum install ypbind

2. Configure the system to authenticate with NIS; it will automatically run ypinit to fetch the maps

authconfig-tui

3. Configure autofs for auto-mounting home directories on user login
Add similar to the following in /etc/auto.master

/exports /etc/auto.exports
cp /etc/auto.net /etc/auto.exports
mkdir /exports
ln –s /exports/red/nishome /nishome
service autofs restart

The above works perfectly if the NIS users’ home directories are set in /nishome, thus avoiding any conflict with local user’s home directory in /home, so both local user and an NIS user will be able to login on the same client simultaneously.

***Assigning static ports to ypserv and rpc.ypxfrd:

1. Add the following lines in /etc/sysconfig/network

YPSERV_ARGS=”-p 834”
YPXFRD_ARGS=”-p 835”

2. Restart both the services
3. Open the TCP and UDP ports in iptables