Linux Caesar: GFS2 File System Administration

Q. What is the Purpose of LVM in Clustered Environment and What are the Limitations?
The Red Hat GFS2 file system is a native file system that interfaces directly with the Linux kernel file system interface (VFS layer). Red Hat supports the use of GFS2 file systems only as implemented in Red Hat Cluster Suite.

GFS2 is based on a 64-bit architecture, which can theoretically accommodate an 8 EB file system. However, the current supported maximum size of a GFS2 file system is 25 TB.

Important Note from Redhat :
Although a GFS2 file system can be implemented in a standalone system or as part of a cluster configuration, for the Red Hat Enterprise Linux 6 release Red Hat does not support the use of GFS2 as a single-node file system. Red Hat does support a number of high-performance single node file systems which are optimized for single node and thus have generally lower overhead than a cluster file system. Red Hat recommends using these file systems in preference to GFS2 in cases where only a single node needs to mount the file system. Red Hat will continue to support singlenode GFS2 file systems for mounting snapshots of cluster file systems (for example, for backup purposes).

Limitations of Traditional Filesystems ( ext2/ ext3 /ext4) in Clustered Environment

Step 1
: On node1, creating a ext4 filesystem on top of shared storage manged by CLVM

[root@node1 ~]# mkfs.ext4 /dev/vgCLUSTER/clvolume1

mke2fs 1.41.12 (17-May-2010)

Filesystem label=

OS type: Linux

Block size=1024 (log=0)

Fragment size=1024 (log=0)

Stride=0 blocks, Stripe width=0 blocks

25688 inodes, 102400 blocks

5120 blocks (5.00%) reserved for the super user

First data block=1

Maximum filesystem blocks=67371008

13 block groups

8192 blocks per group, 8192 fragments per group

1976 inodes per group

Superblock backups stored on blocks:

8193, 24577, 40961, 57345, 73729

Writing inode tables: done

Creating journal (4096 blocks): done

Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 37 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.

Step 2

: mounting new filesystems on /clvmnt mount point

[root@node1 ~]# mkdir /clvmnt

[root@node1 ~]# mount /dev/vgCLUSTER/clvolume1 /clvmnt

Step 3

: On node2, directly mounting the newly created filesystem to the mount point /clvmnt

[root@node2 ~]# mkdir /clvmnt

[root@node2 ~]# mount /dev/vgCLUSTER/clvolume1 /clvmnt

Step 4

: on node1, creating a test directory under the mount point

[root@node1 ~]# cd /clvmnt

[root@node1 clvmnt]# mkdir test-under-ext4-fs

[root@node1 clvmnt]# ls -l

total 14

drwx——. 2 root root 12288 Sep 21 12:53 lost+found

drwxr-xr-x. 2 root root 1024 Sep 21 12:54 test-under-ext4-fs

Step 5

: on node2, verify, if the newly created directory visible. But it doesn’t appear

[root@node2 ~]# cd /clvmnt
[root@node2 clvmnt]# ls
lost+found

Step 6

: on gurlulclu-node2, just unmount and remount volume to reflect the actual content of the

filesystem.

[root@node2 ~]# umount /clvmnt

[root@node2 ~]# mount /dev/vgCLUSTER/clvolume1 /clvmnt

[root@node2 ~]# ls -l /clvmnt

total 14

drwx——. 2 root root 12288 Sep 21 12:53 lost+found

drwxr-xr-x. 2 root root 1024 Sep 21 12:54 test-under-ext4-fs

Note :

There are no supported tool/command line option to convert ext3/ext4 file system to GFS/GFS2

or existing shared GFS/GFS2 file system to ext3/ext4 file system. The only way is to, take backup of data resides at ext3 file system , create new GFS/GFS2 volume per the requirement and restore the data back to GFS/GFS2 file system.

Inital GFS2 Filesystem Configuration

Identifying Required information

The Syntax to Create a GFS Filesystem is as Below

mkfs.gfs2 -p lock_dlm -t ClusterName:FSName -j NumberJournals BlockDevice

[root@node1 ~]# mkfs.gfs2 -h

Usage:

mkfs.gfs2 [options] <device> [ block-count ]

Options:

-b <bytes> Filesystem block size

-c <MB> Size of quota change file

-D Enable debugging code

-h Print this help, then exit

-J <MB> Size of journals

-j <num> Number of journals

-K Don’t try to discard unused blocks

-O Don’t ask for confirmation

-p <name> Name of the locking protocol

-q Don’t print anything

-r <MB> Resource Group Size

-t <name> Name of the lock table

-u <MB> Size of unlinked file

-V Print program version information, then exit

Before proceeding to creating GFS2 file system we need to Gather Below information that is required to create a new GFS file systems.

Identify Required Block Size: Default (4K) Blocks Are Preferred

As of the Red Hat Enterprise Linux 6 release, the mkfs.gfs2 command attempts to estimate an optimal block size based on device topology. In general, 4K blocks are the preferred block size because 4K is the default page size (memory) for Linux.

Identify Required No. of Journals required to Create the File system.

GFS2 requires one journal for each node in the cluster that needs to mount the file system. For example, if you have a 16-node cluster but need to mount only the file system from two nodes, you need only two journals. If you need to mount from a third node, you can always add a journal with the gfs2_jadd command. With GFS2, you can add journals on the fly.

Identify Required Journal Size: Default (128MB) Is Usually Optimal

When you run the mkfs.gfs2 command to create a GFS2 file system, you may specify the size of the journals. If you do not specify a size, it will default to 128MB, which should be optimal for most applications.

Identify Required Size and Number of Resource Groups

When a GFS2 file system is created with the mkfs.gfs2 command, it divides the storage into uniform slices known as resource groups. It attempts to estimate an optimal resource group size (ranging from 32MB to 2GB). You can override the default with the -r option of the mkfs.gfs2 command.

Identify the Cluster name

Below is my current configuration, and we can see my cluster is named as “GFSCluster”

[root@node1 ~]# clustat

Cluster Status for GFSCluster @ Sat Sep 21 13:15:04 2013

Member Status: Quorate

Member Name ID Status

——— — ——node1 1 Online, Local, rgmanager

node2 2 Online, rgmanager

Service Name Owner (Last) State

—— — —–———–service:HAwebService (node2) stopped

Identify Required Locking Protocol

We will use “lock_dlm” protocol in this case

Understand GFS2 Node Locking Mechanisim:

The difference between a single node file system and GFS2, then, is that a single node file system has a single cache and GFS2 has a separate cache on each node. In both cases, latency to access cached data is of a similar order of magnitude, but the latency to access uncached data is much greater in GFS2 if another node has previously cached that same data. In order to get the best performance from a GFS2 file system, each node has its own page cache which may contain some portion of the on-disk data. GFS2 uses a locking mechanism called glocks (pronounced gee-locks) to maintain the integrity of the cache between nodes. The glock subsystem provides a cache management function which is implemented using the distributed lock manager (DLM) as the underlying communication layer.

The glocks provide protection for the cache on a per-inode basis, so there is one lock per inode which is used for controlling the caching layer. If that glock is granted in shared mode (DLM lock mode: PR) then the data under that glock may be cached upon one or more nodes at the same time, so that all the nodes may have local access to the data. If the glock is granted in exclusive mode (DLM lock mode: EX) then only a single node may cache the data under that glock. This mode is used by all operations which modify the data (such as the write system call). If another node requests a glock which cannot be granted immediately, then the DLM sends a message to the node or nodes which currently hold the glocks blocking the new request to ask them to drop their locks. Dropping glocks can be (by the standards of most file system operations) a long process. Dropping a shared glock requires only that the cache be invalidated, which is relatively quick and proportional to the amount of cached data. Dropping an exclusive glock requires a log flush, and writing back any changed data to disk, followed by the invalidation as per the shared glock.

Configuration of GFS2 File system

Step 1:

Create GFS2 files system suing the following parameters.

clustername as GFSCluster

gfs2fs as filesystem name

/dev/vgCLUSTER/clvolume1 as device path

number of device journals as 2

locking mechanisam as lock_dlm

[root@node1 ~]# mkfs.gfs2 -j 2 -p lock_dlm -t GFSCluster:gfs2fs /dev/vgCLUSTER/clvolume1

This will destroy any data on /dev/vgCLUSTER/clvolume1.

It appears to contain: symbolic link to `../dm-6′

Are you sure you want to proceed? [y/n] y

Device: /dev/vgCLUSTER/clvolume1

Blocksize: 4096

Device Size 0.49 GB (128000 blocks)

Filesystem Size: 0.49 GB (127997 blocks)

Journals: 2

Resource Groups: 2

Locking Protocol: “lock_dlm”

Lock Table: “GFSCluster:gfs2fs”

UUID: ee264aa4-6ab9-18ed-c994-eb577c28399a

Note:

In the above output it is showing Resource Groups as 2 ,that means total 500M volume divided into two RGs of each ~25M each. Size of RG information is important when you are planning for the file system expansion. You should minimum allocate atleast the size of one RG, whenever you want to grow the file system using gfs2_grow.

Step 2

: Configure /etc/fstabl to mount the newly create gfs2 filesystem automatically during boot

[root@node1 ~]# cat /etc/fstab

## Created by anaconda on Thu Sep 5 14:06:56 2013

##Accessible filesystems, by reference, are maintained under ‘/dev/disk’

##See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info

/dev/mapper/vg_gfsnode1-lv_root / ext4 defaults 1 1

UUID=7263f27d-afbc-4a07-bf85-4354fa0651f9 /boot ext4 defaults 1 2

/dev/mapper/vg_gfsnode1-lv_swap swap swap defaults 0 0

tmpfs /dev/shm tmpfs defaults 0 0

/dev/vgCLUSTER/clvolume1 /clvgfs gfs2 defaults 0 0

[root@node1 ~]# mkdir /clvgfs

Step 3:

Make sure GFS2 and DLM modules are already loaded in the kernel, if not load with modprobe command.

[root@node1 ~]# lsmod |grep gfs

gfs2 545168 2
dlm 148231 32 gfs2
configfs 29538 2 dlm

Step 4:
Start the GFS2 Server

[root@node1 ~]# service gfs2 restart

Mounting GFS2 filesystem (/clvgfs): invalid device path “/dev/vgCLUSTER/clvolume1″ [FAILED]service Start up failed, so below is little troubleshooting to fix the problem.Troubleshooting check 1: :LVS doesn’t show volumes

[root@node1 ~]# lvs

connect() failed on local socket: Connection refused
Internal cluster locking initialisation failed.
WARNING: Falling back to local file-based locking.
Volume Groups with the clustered attribute will be inaccessible.
Skipping clustered volume group vgCLUSTER
Skipping volume group vgCLUSTER

LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert
lv_root vg_gfsnode1 -wi-ao —3.66g
lv_swap vg_gfsnode1 -wi-ao —3.84g
Troubleshooting check 2: : There are no device path exists

[root@node1 ~]# ls “/dev/vgCLUSTER/clvolume1″

ls: cannot access /dev/vgCLUSTER/clvolume1: No such file or directory
Troubleshooting check 3: : There are no Physical PVs exists

[root@node1 ~]# pvs

connect() failed on local socket: Connection refused
Internal cluster locking initialisation failed.
WARNING: Falling back to local file-based locking.
Volume Groups with the clustered attribute will be inaccessible.
Skipping clustered volume group vgCLUSTER
Skipping volume group vgCLUSTER
PV VG Fmt Attr PSize PFree
/dev/vda2 vg_gfsnode1 lvm2 a– 7.51g 0

Troubleshooting check 4:: All that caused because of CLVMD not running , so I just started it.

[root@node1 ~]# service clvmd start
Starting clvmd:
Activating VG(s): 1 logical volume(s) in volume group “vgCLUSTER” now active
2 logical volume(s) in volume group “vg_gfsnode1″ now active [ OK ]
Troubleshooting check 5 :: Now All volumes, and PVS back online.

[root@node1 ~]# pvs

PV VG Fmt Attr PSize PFree

/dev/mapper/1IET_00010002p1 vgCLUSTER lvm2 a– 780.00m 280.00m

/dev/vda2 vg_gfsnode1 lvm2 a– 7.51g 0

[root@node1 ~]# lvs

LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert

clvolume1 vgCLUSTER -wi-a 500.00m

lv_root vg_gfsnode1 -wi-ao 3.66g

lv_swap vg_gfsnode1 -wi-ao 3.84g

[root@node1 ~]# chkconfig clvmd on Start the GFS2 service again, now the service started fine.

[root@node1 ~]# service gfs2 restart

Mounting GFS2 filesystem (/clvgfs): [ OK ]

Step 5:

on node1, Check that the Volume Mounted properly

[root@node1 ~]# df -h /clvgfs

Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vgCLUSTER-clvolume1 500M 259M 242M 52% /clvgfs

Step 6 :

On node2, make entries to /etc/fstab, GFS2 & DLM module, Start service and check the volume mount .

[root@node2 ~]# cat /etc/fstab
# /etc/fstab
# Created by anaconda on Thu Sep 5 14:20:11 2013
# Accessible filesystems, by reference, are maintained under ‘/dev/disk’
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info

/dev/mapper/vg_gfsnode1-lv_root                                  /                ext4              defaults        1 1
UUID=18493b92-f8aa-4107-8e63-eaa75a8c3f01          /boot         ext4              defaults        1 2
/dev/mapper/vg_gfsnode1-lv_swap                                swap         swap             defaults        0 0
/dev/vgCLUSTER/clvolume1                                        /clvgfs        gfs2              defaults       0 0

[root@node2 ~]# lsmod |grep gfs

gfs2 545168 0
dlm 148231 27 gfs2
configfs 29538 2 dlm

[root@node2 ~]# service gfs2 restart

Mounting GFS2 filesystem (/clvgfs): [ OK ]

[root@node2 ~]# df -Ph /clvgfs

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/vgCLUSTER-clvolume1 500M 259M 242M 52% /clvgfs

Verifying the Functioning of GFS2

Step 1

: On node1,create a directory under /clvgfs

[root@node1 clvgfs]# ls -lR /clvgfs

/clvgfs:

total 8

drwxr-xr-x. 2 root root 3864 Sep 21 15:07 test-gfs-fs-data

/clvgfs/test-gfs-fs-data:

total 24

-rw-r–r–. 1 root root 0 Sep 21 15:07 a

-rw-r–r–. 1 root root 0 Sep 21 15:07 b

-rw-r–r–. 1 root root 0 Sep 21 15:07 c

Step 2

: On, node2, verify the content of /clvgfs with remount

[root@node2 ~]# ls -lR /clvgfs

/clvgfs:

total 8

drwxr-xr-x. 2 root root 3864 Sep 21 15:07 test-gfs-fs-data

/clvgfs/test-gfs-fs-data:

total 24

-rw-r–r–. 1 root root 0 Sep 21 15:07 a

-rw-r–r–. 1 root root 0 Sep 21 15:07 b

-rw-r–r–. 1 root root 0 Sep 21 15:07 c

And , we see it them as it is. That concludes data visibility is real time on all the cluster nodes accessing the shared storage.

we have discussed about the introduction and initial configuration of GFS2. And now we will go through the administration tasks related to GFS2 file systems. And we doesn’t cover entire admin tasks related to GFS2 but covers major day-to-day operations related to GFS2 in Red hat Clustered environment.

Below are the Tasks I will be discussing:

1. GFS2 File System Expansion

2. GFS2 Journal Addition

3. Suspending and Resuming Writing Activities on GFS2 file systems

4. Repairing GFS2 file system

GFS2 File System Expansion

Step 1:

Check The Current File system Size, LV size and PV size

[root@node1 test-gfs-fs-data]# df -h /clvgfs

Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vgCLUSTER-clvolume1 500M 259M 242M 52% /clvgfs

[root@node1 test-gfs-fs-data]# vgs

VG                   #PV   #LV   #SN     Attr          VSize           VFree
vgCLUSTER      1        1         0      wz–nc       780.00m       280.00m
vg_gfsnode         1        1         20    wz–n-        7.51g            0

[root@node1 test-gfs-fs-data]# lvs

LV               VG                      Attr             LSize Pool Origin Data% Move Log Cpy%Sync Convert
clvolume1    vgCLUSTER     -wi-ao—     500.00m
lv_root         vg_gfsnode1       -wi-ao—     3.66g
lv_swap       vg_gfsnode1       -wi-ao—     3.84g

Step 2:

Extend the LV by 200MB

[root@node1 test-gfs-fs-data]# lvextend -L +200M /dev/mapper/vgCLUSTER-clvolume1

Extending logical volume clvolume1 to 700.00 MiB

Logical volume clvolume1 successfully resized

Step 3:

Extend the GFS2 filesystem using the gfs2_grow command.

[root@node1 test-gfs-fs-data]# gfs2_grow /clvgfs

Error: The device has grown by less than one Resource Group (RG). The device grew by 200MB. One RG is 249MB

for this file system. gfs2_grow complete.

Note:

Remembered the point we discussed duringthe GFS2 filesystem creation, the command output was shown as Resource Groups = 2,that means total 500M volume divided into two RGs of each ~250M each. We should minimum allocate atleast the size of one RG, whenever we want to grow the filesystem using gfs2_grow.

[root@node1 test-gfs-fs-data]# df -h /clvgfs

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/vgCLUSTER-clvolume1 500M 259M 242M 52% /clvgfs
Since gfs2_grow failes, extend the LV by 50MB more, and the try the gfs2_grow again

[root@node1 test-gfs-fs-data]# lvextend -L +50M /dev/mapper/vgCLUSTER-clvolume1

Rounding size to boundary between physical extents: 52.00 MiB
Extending logical volume clvolume1 to 752.00 MiB
Logical volume clvolume1 successfully resized

[root@node1 test-gfs-fs-data]# gfs2_grow /clvgfs

FS: Mount Point: /clvgfs
FS: Device: /dev/dm-6
FS: Size: 127997 (0x1f3fd)
FS: RG size: 63988 (0xf9f4)
DEV: Size: 192512 (0x2f000)
The file system grew by 252MB.
gfs2_grow complete.
Now we see that file system extended to 750MB.
Step4 : on, node2 , Check the filesysem size.

[root@node2 test-gfs-fs-data]# df -h /clvgfs
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vgCLUSTER-clvolume1 750M 259M 492M 35% /clvgfs

We can confirm that new filesystem size is visible.
Repairing GFS2 Filesystems

Few points to remember:

1. We should not enable FSCK during boot time in /etc/fstab options

2. The fsck.gfs2 command must be run only on a file system that is unmounted from all nodes.
3. Pressing Ctrl+C while running the fsck.gfs2 interrupts processing and displays a prompt asking whether you would like to abort the command, skip the rest of the current pass, or continue processing.
4. You can increase the level of verbosity by using the -v flag. Adding a second -v flag increases the level again.
5. You can decrease the level of verbosity by using the -q flag. Adding a second -q flag decreases the level again.
6. The -n option opens a file system as read-only and answers no to any queries automatically. The option provides a way of trying the command to reveal errors without actually allowing the fsck.gfs2 command to take effect.

[root@node1 ~]# fsck.gfs2 -y /dev/mapper/vgCLUSTER-clvolume1

Initializing fsck
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5
Block 191984 (0x2edf0) bitmap says 0 (Free) but FSCK saw 1 (Data)
Metadata type is 1 (data)
Fixed.
RG #127997 (0x1f3fd) free count inconsistent: is 63984 should be 63983
Resource group counts updated
Pass5 complete
The statfs file is wrong:
Current statfs values:
blocks: 191956 (0x2edd4)
free: 92656 (0x169f0)
dinodes: 24 (0~18)
Calculated statfs values:
blocks: 191956 (0x2edd4)
free: 92655 (0x169ef)
dinodes: 24 (0~18)
The statfs file was fixed.
Writing changes to disk
gfs2_fsck complete

Linux Caesar

Popular Posts

Friday, 23 May 2014

GFS2 File System Administration

No comments:

Post a Comment