Networking / Cisco: MDS 9000 Port Locating Using Beacon

•June 9, 2009 • Leave a Comment

We all like blink lights.  Well, maybe not all, but they are great for remote assistance at off-site datacenters.  All engineers are not equal and trying to talk some through locating an interface and changing a cable can sometimes be as efficient as banging your head against the wall.

If a cable change or SFP swap is all that is needed, than beaconing can help out!  Below is a quick run-though of enabling and disabling beaconing “blinky mode”.
Continue reading ‘Networking / Cisco: MDS 9000 Port Locating Using Beacon’

VMware: Reconnecting “Disconnected” VMWare ESX Server to Virtual Infrastructure (vpx,vmware-hostd)

•October 27, 2008 • 2 Comments

A lot of backups failed overnight and this was my first clue to a problem in the virtual environment.  I checked the monitoring server and there were no down VMs or alerts coming from the hosts.  After logging into the management system through virtual infrastructure client, I saw that all the virtual machines were running, but about 30 showed “disconnected”.  These VMs were in a muted, almost dulled out color and were un-manageable.  There was also an alarm on the connection state on our second VMWare ESX server.

Image of the Infrastructure Client showing VMs in disconnected state.  VM names were replaced with underlines.

View when the malfunctioning ESX server is selected

Bellow is a snippet of the log showing an error from VPXA

[2008-10-27 16:41:39.482 'App' 3076446112 verbose] [SchedulePolling] Last stats polling used [0] ms
[2008-10-27 16:41:39.482 'App' 3076446112 verbose] [VpxaHalCnxHostagent] Creating temporary connect spec: localhost:443
[2008-10-27 16:41:39.483 'App' 3076446112 error] [VpxaHalCnxHostagent] Failed to discover namespace: Connection refused

[2008-10-27 16:41:39.483 'App' 3076446112 warning] [VpxaHalCnxHostagent] Could not resolve namespace for authenticating to host agent
When comparing the vmware-* process listing from a working vmware server, vmware-hostd was running.  On the server having problems, it was not.

# ps -auxc | grep vmware
root      2179  0.0  0.0  4256  228 ?        S    Sep29   0:00 vmware-watchdog
root      2823  0.0  0.1  4268  444 ?        S    Sep29   0:00 vmware-watchdog
root      2907  0.0  0.2  4256  568 ?        S    Sep29   0:00 vmware-watchdog
root      5801  0.0  0.4  4204 1144 ?        S    16:41   0:00 vmware-watchdog

I could not determine what init.d script called vmware-hostd.  Update: it is the  /etc/init.d/mgmt-vmware script that calls vmware-hostd.

Continue reading ‘VMware: Reconnecting “Disconnected” VMWare ESX Server to Virtual Infrastructure (vpx,vmware-hostd)’

VMWare: ESX Server Partitioning

•October 31, 2008 • 1 Comment

I have had a few servers core dump and drop over 5 gigs of data to /var/core.  Before, per “best practices” a vendor recommended around 4 gigs for /var.  I upped that to 6 gigs originally, but after 2 servers had /var 100% utilized I and revising that.  /var is still 6 gigs but /var/core has been broken out into its own mount point.  15 gigs is a little high, but these servers had raid 1 – 73 gig hard drives.  At least now if the servers core dump it will affect only its mount point.  I highly recommend doing this!

Below is how I am partitioning ESX servers during the local installs from now on

Mount Point Size(m) Partition type
/boot 250 Primary
/ 10240 Primary
swap 1600 Primary *max
/var 6142 Extended
/var/core 15360 Extended
/opt 2048 Extended
/home 2048 Extended
/tmp 1024 Extended
vmkcore 109 *max

Note: The *max references to the mount point being at the max size limit for ESX v3*.  The total size of this install is ~36 gigs.  If you have smaller drives, just decrease /var/core.  All the rest, I would keep close to the same.

Update:  Here is an example of why /var/core now has its own filesystem

[user@vm user]$ df -h

Filesystem Size Used Avail Use% Mounted on
/dev/sda2 9.9G 1.2G 8.3G 13% /
/dev/sda1 244M 27M 205M 12% /boot
/dev/sda8 2.0G 341M 1.6G 18% /home
/dev/sda7 2.0G 81M 1.8G 5% /opt
none 391M 0 391M 0% /dev/shm
/dev/sda9 1012M 33M 928M 4% /tmp
/dev/sda6 6.0G 327M 5.3G 6% /var
/dev/sda5 15G 15G 0 100% /var/core

Never lost vpxa or mgmt-vmware (hostd) this time!

Linux: Generating Strong Passwords Using random/urandom

•January 7, 2009 • 2 Comments

Occasionally, I find myself logged into a system that does not have a random password application installed and do not want to go to the trouble of downloading one.  Below is the easiest processes that I have found to generate a pretty random password from any Linux variant.

To begin, strait from the Linux man page:

/dev/random
When  read,  the /dev/random device will only return random bytes within the estimated number of bits of noise in the entropy pool.  /dev/random should be suitable for uses that need very high quality randomness  such  as  one-time  pad  or key generation.  When the entropy pool is empty, reads from /dev/random will block until additional environmental noise is gathered.

/dev/urandom
A read from the /dev/urandom device will not block waiting for more entropy.  As a result, if there is not sufficient  entropy in the entropy pool, the returned values are theoretically vulnerable to a cryptographic attack on the algorithms used by the driver.  Knowledge of how to do this is not available in  the  current  non-classified literature,  but it is theoretically possible that such an attack may exist.  If this is a concern in your appli‐cation, use /dev/random instead.

So basically, using /dev/random results in the strongest and most random characters.  Only downfall is the wait needed unless you have a lot of noise or specific hardware to accelerate the process.

* I tested a cut and paste from this page and some of the lines did not work correctly due to either the CSS or WordPress doing something weird with the ‘ and ` symbols.  So if one of the strings do not work for you, try deleting the ‘ and adding it back in.

Creating random passwords which contains no special characters, is 10 characters long and displays 4

$ cat /dev/urandom| tr -dc 'a-zA-Z0-9' | fold -w 10| head -n 4
z4w7RENNIn
ZOYg80cuQx
Kgm6IrS5wc
F741uiEXl6

Creating passwords which DO contain special characters, and is 12 characters long.  The grep at the end might seem a little redundant, but depending on how short your character length is (using fold), urandom will result in stings with no special characters.  Grep keeps that from happening here.

$ cat /dev/urandom| tr -dc 'a-zA-Z0-9-_!@#$%^&*()_+{}|:<>?='|fold -w 12| head -n 4| grep -i '[!@#$%^&*()_+{}|:<>?=]'
a(PYY5oid#2Z
>s#e)C5Kl=kc
63r)WBt9Y)^J
2_a5RLJV<CZH

Continue reading ‘Linux: Generating Strong Passwords Using random/urandom’

Linux: lsof Interaction With Networking

•February 3, 2009 • Leave a Comment

Linux: lsof Interaction With Networking

Most Linux and Unix people are already using lsof to determine what devices or files a process is binding to. Here, I will go over using lsof to pull networking relevant information. We will be using the ‘-i’ option with lsof throughout this article. Below is direct from the lsof man page.

“This option selects the listing of files any of whose Internet address matches the address specified in i. If no address is specified, this option selects the listing of all Internet and x.25 (HP-UX) network files. If -i4 or -i6 is specified with no following address, only files of the indicated IP version, IPv4 or IPv6, are displayed.”

List all processes utilizing any protocol communicating over port 80(http)
root@laptop22:~# lsof -i :80

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
apache2 5989 root 3u IPv6 20339 TCP *:www (LISTEN)
apache2 6059 www-data 3u IPv6 20339 TCP *:www (LISTEN)
apache2 6060 www-data 3u IPv6 20339 TCP *:www (LISTEN)
apache2 6061 www-data 3u IPv6 20339 TCP *:www (LISTEN)
apache2 6063 www-data 3u IPv6 20339 TCP *:www (LISTEN)
apache2 6064 www-data 3u IPv6 20339 TCP *:www (LISTEN)
TweetDeck 12074 usern221 9u IPv4 228736 TCP 172.16.1.100:46847->cup-www.apple.com:www (CLOSE_WAIT)
TweetDeck 12074 usern221 10u IPv4 228821 TCP 172.16.1.100:39635->ks357799.kimsufi.com:www (ESTABLISHED)
TweetDeck 12074 usern221 11u IPv4 229694 TCP 172.16.1.100:58130->gs01.gridserver.com:www (CLOSE_WAIT)
TweetDeck 12074 usern221 14u IPv4 464706 TCP 172.16.1.100:50025->s3.amazonaws.com:www (ESTABLISHED)
TweetDeck 12074 usern221 15u IPv4 464707 TCP 172.16.1.100:50026->s3.amazonaws.com:www (ESTABLISHED)
firefox 14945 usern221 43u IPv4 465318 TCP 172.16.1.100:50796->yw-in-f17.google.com:www (ESTABLISHED)

More from the man page
“An Internet address is specified in the form (Items in square brackets are optional.):

[46][protocol][@hostname|hostaddr][:service|port]

where:
46 specifies the IP version, IPv4 or IPv6
that applies to the following address.
`6’ may be be specified only if the UNIX
dialect supports IPv6. If neither ’4’ nor
`6’ is specified, the following address
applies to all IP versions.
protocol is a protocol name – TCP or UDP.
hostname is an Internet host name. Unless a
specific IP version is specified, open
network files associated with host names
of all versions will be selected.
hostaddr is a numeric Internet IPv4 address in
dot form; or an IPv6 numeric address in
colon form, enclosed in brackets, if the
UNIX dialect supports IPv6. When an IP
version is selected, only its numeric
addresses may be specified.
service is an /etc/services name – e.g., smtp -
or a list of them.
port is a port number, or a list of them.”

Now lets pass a few more parameters to lsof. Below I am looking to connections listening/binding to the ssh servers via port 22 using TCP
root@laptop22:~# lsof -i tcp:22

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
sshd 4709 root 3u IPv6 15052 TCP *:ssh (LISTEN)
sshd 4709 root 4u IPv4 15054 TCP *:ssh (LISTEN)
ssh 10268 root 3u IPv4 557118 TCP localhost:46828->localhost:ssh (ESTABLISHED)
sshd 10270 root 3r IPv4 557119 TCP localhost:ssh->localhost:46828 (ESTABLISHED)
sshd 10347 usern221 3u IPv4 557119 TCP localhost:ssh->localhost:46828 (ESTABLISHED)

Continue reading ‘Linux: lsof Interaction With Networking’

VMware: ESX Restart A Hung Virtual Machine

•February 9, 2009 • Leave a Comment

This is a last case scenario blog. I recommend always start out using the GUI.  If that does’t work, use vmware-cmd to try and stop the VM.  If all else fails, follow below.  First you need to know what ESX server the VM is running on. The easiest way is to click on the VM you are looking for and go to the summary tab. The ESX server it resides on will be in the “General” section under “Host”. The other method is to connect to each of the ESX hosts and run the following:

You have to su to root for this command to work

[root@esxserverroot]# vmware-cmd -l
/vmfs/volumes/4844ace-b448s7d4-74e-000055858ed/exchangeserv/exchangeserv.vmx
/vmfs/volumes/4844ace-b448s7d4-74e-000055858ed/sql01/sql01.vmx
/vmfs/volumes/4844ace-b448s7d4-74e-000055858ed/domaincontrol01/domaincontrol01.vmx
/vmfs/volumes/4844ace-b448s7d4-74e-000055858ed/tstxp04/tstxp04.vmx
/vmfs/volumes/4844ace-b448s7d4-74e-000055858ed/monitor5/monitor5.vmx

This commands lists all the currently running virtual machines on the individual ESX host. Also displayed is the full path to the vmx file, which is very useful when using vmware-cmd.

Now lets pass -axfww to ps. This will give us the

Continue reading ‘VMware: ESX Restart A Hung Virtual Machine’

Linux/Security: Gathering Filesystem,Device, and Port Process Details With fuser

•March 5, 2009 • Leave a Comment

When troubleshooting a *nix box, working knowledge of file system, network, and process utilities are a necessity. The main ones for me are mount, lsof, dd, ps, fsck, netstat, tcpdump, and fuser. All of these tools are very basic, but most admins seem to not know about or utilise fuser. All of fusers functionality can be accomplished by using any of the above commands together. In the below examples, the same end results can be archived by using kill and lsof together, but why not just use one tool?

To begin, we will be passing fuser the -m switch to specify both a device and file(system).

       -m     name specifies a file on a mounted file system or a block device
              that is mounted. All processes accessing files on that file sys-
              tem  are  listed.  If a directory file is specified, it is auto-
              matically changed to name/. to use any file system that might be
              mounted on that directory

In it’s basic form, fuser will provide only the process id(s) (PID) that are currently utilising the specified file/device

root@testbox:~# fuser -m /media/disk/
/dev/sdb1:            5535c  5589c

The above example showed that process 5355 is currently utilizing something on the /media/disk file system. Below is the same example, but specifying the actual device that is mounted on /media/disk.

root@testbox:~# fuser -m /dev/sdb1
/dev/sdb1:            5535c  5589c
 Continue reading 'Linux/Security: Gathering Filesystem,Device, and Port Process Details With fuser'

Linux/Networking/Security: TFTP Deamon Setup and Cisco Configuration Backup

•March 31, 2009 • Leave a Comment

This is just a quick walk-through on setting up TFTP service on a RedHat, Centos, or Fedora system. In general, this process should transfer over to other Linux (not BSD!) derived distributions.

[root@tftpsrv ~]# yum install tftp
Resolving Dependencies
--> Running transaction check
---> Package tftp-server.i386 0:0.42-3.1.el5.centos set to be updated
--> Processing Dependency: xinetd for package: tftp-server
--> Running transaction check
---> Package xinetd.i386 2:2.3.14-10.el5 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

Continue reading ‘Linux/Networking/Security: TFTP Deamon Setup and Cisco Configuration Backup’

Networking / Colocation: Baetech ATS (Automatic Transfer Switch) testing for Cisco/single Powered Devices

•April 6, 2009 • Leave a Comment

Below details the testing of an Baetech ATS-11 series (ATS18A-30) ATS (Automatic Transfer Switch). The networking rack houses some devices that contain only one power supply. The main focus of testing this unit was to provided failover power capabilities to our stacked network switch cluster (Cisco 3750G). A few options were out there for lower (Amps) but would not work for our implementation. Cisco Redundant Power System were also evaluated, but were limited in their abilities. From what I remember, 6 devices could be plugged in, but the RPS would only be able to power 3 of the 6 devices. This would mean we would have to use 2-3 of these units to actually be fully backed up.

ATS18A-30
Continue reading ‘Networking / Colocation: Baetech ATS (Automatic Transfer Switch) testing for Cisco/single Powered Devices’

Linux, Unix, NAS, File Systems: Inodes (Part 1) – Checking Availability And High Level Overview

•April 10, 2009 • Leave a Comment

Inodes really tell you how many file handles (files) that can be created on a file system. Most people will never exceed the default setting when the file system is created, nor even know that one is set. I will eventually go into more detail concerning this topic here on the blog. The majority (not all) of file systems that are used on Linux and Unix do not support dynamic inode allocation. What this means is that if you exceed the inode limit of a file system before the storage space, the remainder will be un-usable. That is until some of the current files are removed.

So here is the really bad part. The inodes on ext2 and ext3 (Linux default type) are statically set when the file system is formatted. You can not go back and change the max inode settings. The exceptions to this that I know of are as follows:

- Reiser4
- VxFS
- XFS
- JFS
- WAFL (NetApp proprietary)
- XZFS

If you are running one of the above and have max inodes issue, you can correct it.

I have been working with computers for over 15 years and have only ran into this problem once. Luckily, it occurred on a NetApp NAS device that had the ability to increase this value on the live file system. The main killer here are tons of small files. In this case, the file system for that NFS share was 40 gigabytes in size and default was ~1 million inode limit. The quick fix for the issue was to increase this to 3 million.

As far as a ext2 and 3 go, the following shows how to query a file system for relevant inode information
Continue reading ‘Linux, Unix, NAS, File Systems: Inodes (Part 1) – Checking Availability And High Level Overview’

Linux/Unix/File Systems: Inodes (Part 2) – File Level Inode Information And Removal

•April 13, 2009 • Leave a Comment

Inodes “Part 1″ went into locating filesystem level inode information. Here we will move from the main filesystem to the individual file. Besides reviewing how the inode record reflects permissions modification using chown, file removal based on inode number will be covered.

Create a test file

user01@testsrv:~$ touch testfile

Below, “ls” is used to display the inode number (2009418) of the test file

user01@testsrv:~$ ls -i /home/user01/testfile
2009418 /home/user01/testfile
 Continue reading 'Linux/Unix/File Systems: Inodes (Part 2) – File Level Inode Information And Removal'

DataDomain/NAS/Filesystems/Linux: Remove Files From A DataDomain’s /ddvar

•April 16, 2009 • Leave a Comment

A few days ago we received alerts for the /ddvar fileystem from one of our Data Domain units. Normally, I would not manually remove any files from this filesystem due to it being mainly used by the underlining OS and not for NAS storage. In this case, the problem was the “core” subdirectory. I tried to remove the files from a Windows machine, but due to permission issues, I was unable to do so. Even through Windows (CIFS) to the DataDomain, I was able to modify the permissions of the file(s) and still would receive a permission denied issue when trying to remove. The quick solution here was to add my Linux box to the NFS access list, mount “/ddvar”, and remove the files as the root user. Below details the process that worked for me.

DataDomain Web Alert

DataDomain Web Alert

Continue reading ‘DataDomain/NAS/Filesystems/Linux: Remove Files From A DataDomain’s /ddvar’

SAN/Networking/Linux: Using Multipath To Verify and Troubleshoot Connectivity To FC LUNs

•May 11, 2009 • Leave a Comment

A few days ago there was in error on one of our Cisco MDS 9120 fiber switches. The current environment at this datacenter consists of two Cisco MDS 9120 SAN switches with servers redundantly connected between the two. These switches are used to connect the servers to our fiber channel (FC) storage systems. In this case, the servers are generally mapped to an EMC, NetApp, and two RamSans. Below outlines a basic example of what can be expected from multipathing and the Linux environment when there is loss of connectivity to one leg of the fiber network.

Continue reading ‘SAN/Networking/Linux: Using Multipath To Verify and Troubleshoot Connectivity To FC LUNs’

SAN / EMC: CX4 DAE (Drive Shelf) Information

•May 29, 2009 • 4 Comments

This will not get very detailed, but I figured I would share the following information.  In light of not being happy with the typical “each shelf has a 4 Gig interconnect” statement, I kept checking until there was a better answer.  So, anyone working with EMC SANs typically knows that every shelf is connected to each SP (Service Processor – 2 per SAN), daisy chained in a specific loop, and assigned a shelf id.  Next is the LCC.

Continue reading ‘SAN / EMC: CX4 DAE (Drive Shelf) Information’

VMware/Linux/ESXi: Running ESX4i From Bootable USB

•June 1, 2009 • 3 Comments

I like running ESXi via booting from USB 2.0 memory sticks.  This makes it that much easier for my home lab.  Especially since I mainly use ISCSI VMFS datastores.  Not to mention that “in a pinch”, having ESXi on memory sticks can aide in disaster recovery (DR) scenarios for small businesses.  Of course the requirement here is that the server MUST be able to boot from USB!  Also, get a big memory stick.  Each time an upgrade is performed to ESXi, the version being upgraded from is still stored on the memory stick in case a “roll back” is needed.  At least this is my understanding.  Larger memory sticks are pretty cheap now.  Below outlines the steps to creating a bootable ESX4i memory stick.  The main reason for me writing this up is that the process turns out to be different from ESX3i.

Continue reading ‘VMware/Linux/ESXi: Running ESX4i From Bootable USB’

Linux / Security: Iptables CLI – List Rules Without DNS Resolution

•June 2, 2009 • Leave a Comment

This is quick and a little basic, but most people do not actually read the “man pages” or documentation.  The majority of the time, requests for access comes in specifying IP address instead of hostnames (FQDN).  I actually prefer this, but when doing a typical “iptables -L”, the reverse DNS is automatically checked for all IPs.

Most of the time I do not actually know the hostname that is associated and makes it hard to confirm the rule without doing a dns lookup on my own.  Below is the typical output of the command.
Continue reading ‘Linux / Security: Iptables CLI – List Rules Without DNS Resolution’

SAN / EMC: Clariion CX4 Solid State DAEs (Shelves)

•June 3, 2009 • Leave a Comment

Going over the solid state offerings for the EMC Clariion lines, Texas Memory RamSans came into the conversation.  This was due to the fact that we currently run 2 RamSans in our Environment and consider them the highest tier storage in our datacenters.  One is 128 gigs of solid state DRAM storage and the other is 2 terabyte solid state Flash storage with a 64 gig DRAM cache.

Per the title, this is really about the EMC Clariion, not RamSans.  Since the RamSan 500 was fronted with the DRAM cache, and the EMC CX4 series contains cache as well, I was curious.  I already knew that each Service Processor (SP) in the EMC has 4 gig of cache, and that a LUN can only be active on one SP at a time.  Also, per a previous blog post, each DAE has a theoretical max throughput of 8 gigabit per second, 4 gigabit if a single LUN stripes across the whole shelf.

Continue reading ‘SAN / EMC: Clariion CX4 Solid State DAEs (Shelves)’

SAN / Storage: Texas Memory RamSan 500

•June 4, 2009 • 1 Comment

After spending a year using the a RamSan 400 which is a 128 gigabyte solid state DRAM system, I wondered how it could get better.  These things are pretty expensive and the only drawback I found to the 400 is the limited storage.  There’s not much that you can do with 128 gigs of storage!  Granted it served the initial function perfectly.  This was used to house our main production database.

Continue reading ‘SAN / Storage: Texas Memory RamSan 500′