GNS3 – My Technical Test for Employers

I had an interesting requirement from a customer.

They asked me to implement a GNS3 server and design a 30 – 45 minute test that they could administrate to their candidates seeking a technical position with in their organisation.

The position was for a 2nd line network engineer role.

The test was to be aimed at CCNP level or equivalent, however I was to design it so that it is easy enough for even the rusty CCNP / engineers and hard enough for the complete brain dumpers and blaggers that serve to only dilute the IT industry with poor quality skill sets.

The main purpose of this test is to speed up the HR process by filtering the stronger candidates from the weaker candidates.

The successful candidates would then go ahead and progress to a final interview.

Here is the GNS3 test that I created for the purpose – To be honest, even a highly proficient CCNA engineer could do this :-)

The .net files and pre-configuration files can be download from here

————————————–

Configure the network as per the diagram and complete the tasks below.

  • Do NOT create any additional interfaces
  • Do NOT use any static routes or policy-based routes unless asked.
  • Ignore any duplex mismatch messages and do NOT modify any of the ports speed or duplex configurations.

Task 1: Configure an 802.1q trunk between SW1 fa1/15 interface and SW2 fa1/15 interface.

Task 2: Configure a static ether-channel 802.1Q trunk between SW1 and SW2. Both switches fa1/10 and fa1/11 interfaces should be members of the same LAG.

Task 3: Ensure all VLAN traffic successfully goes over the fa1/15 trunk and NOT the ether-channel trunk unless the fa1/15 trunk is down. Do NOT use backup interface to accomplish this.

 

Task 4: R1 and R2 should be put into VLAN 10 and should be able to ping each other fa0/0 interface. You must use the legacy vlan database command to create the VLAN.

Task 5: Ensure VLAN 20 traffic is never permitted to traverse the fa1/15 trunk should that trunk link become the active trunk link

Task 6: Ensure SW1 has the highest probability of always being the root bridge for VLAN 10, even if another switch is introduced into the network.

 

Task 7: Configure OSPF area 0 between R1 and R2. OSPF hellos should only be sent out their connected subnet interface ONLY. Ensure R1 loopback 0 interface can ping R2 loopback interface.

 

Task 8: Configure EIGRP 100 between the connected links of R1 & R3 and R2 & R3 ONLY. Ensure R3 REDISTRIBUTE its loopback 0 interface ONLY.

Task 9: Mutually redistribute between OSPF and EIGRP on R1 and R2. All routers (R1, R2, and R3) should be able to ping each others loopback 0 interfaces.

 

Task 10: Without using static routes or policy-based routing, ensure R2 is able to traceroute to R3 loopback 0 interface over its directly connected link. Don’t worry about affecting the optimal routing of other routes.

Task 11: You must deny R1 from being able to telnet to R2 only if R1 sources the telnet request from its fa0/0 IP address.

 

Bonus Task: on R2, Redistribute RIP into OSPF and RIP into EIGRP. Ensure you account for any potential loops. All routers should have full reachability to each other loopback 0 interfaces including R4 Loopback 0 interface.

 

PDW – Project Workshop Definition.

Working on an opportunity which requires some strategic relationship with other solution providers to cover elements that we may not necessary be able to handle ourselves.

The company we’ve partnered with uses a proven IBM methodology known as PDW to scope out the requirements of a project.

The Customer may think they know what they need but the PDW is a structured approach in identifying exactly what the customer really needs covering the following agenda in a very interactive format that involves a BIG room, lots of technical bodies with varied skillsets, blue-tac, A3 paper, sticking things on the wall and really getting everyone to put on their thinking caps.

 

1.        Introductions
2.        Objectives of the Meeting
3.        Project Background and Mission Statement
4.        Goals
5.        Critical Success Factors
6.        End of Project (or phase) Status
7.        Non Technological Factors
8.        System Design
9.        Work Activities
10.       Milestones
11.       Risks and Assumptions
12.       Immediate Actions
13.       Summary and Review

It was an extremely useful exercise for both myself and the prospect who feels they have learned so much more about their infrastructure and issues.

I will indeed incorporate a lot of the elements contained in the PDW exercise into my own methodology of scoping and defining a project.

 

Lunch and Learn

I’ve started off a new initiative in my company – Lunch and Learn.

It’s simple.

Have the sales team in the board room where they can eat their lunch and I’ll talk about some of ‘sweet spots’ in technology.

No powerpoint slides, no technical jargon – Just a very high level overview of various topics with the aim of repackaging complex technical topics to a sales audience.

Today’s session: ‘The Cloud’

What benefits the ‘Cloud’ brings to the enterprise and some of the concerns enterprises have around ‘bursting into the cloud’.

I then talked about some of the new features in the NetScaler appliance – SDX, Cloud Bridge and Datastream.

The session went well and the Eureka moments for each of those sales folks was highly gratifying.

It was very interactive,  highly Q&A orientated and quite an informal and entertaining way of keeping abreast of the world of technology.

Feedback has been very positive with a lot of the folks exclaiming that they ‘finally get it’.

It’s important that they do so they can go ahead and better identify opportunities.

I look forward to the next session.

— Some emails I have received;

“I think it was great… learned a lot in a short time..”

“Yes thanks mate excellently explained, really ‘got it’ – Need more like this!”

Citrix NetScaler 9.3 Features

Getting my head around some of the new features in 9.3

Data Stream

This new feature allows the NetScaler to support MySQL and Microsoft SQL natively. Prior to this feature it was only possible to load balance across database servers that had identical content through the use of a LB Vserver to an estate of backend database servers, which in the real world was not always possible due to how organisations organically grew this particular part of their infrastructure. Now the NetScaler can parse SQL transactions and make decisions based on real SQL values rather than on the IP address, protocol and port using tradition load-balancing logic.

The NetScaler can now content switch and load balance based on SELECT (reads), INSERTS (writes) and other values.  Understanding SQL natively allows the NetScaler to able to multiplex the TCP connections to the backend as oppose to the 1 to 1 connection method prior, essentially offloading this part away from the servers.

Some additional notes:

  • MySQL Authentication is done on the NetScaler before any communication to the backend is opened
  • TCP least connections is recommened for DB load balancing
  • Content Switching can use non-ip addressed LB Vservers
  • Split tables across many DB Servers – NS can work in this environment as it understands SQL natively
  • Intelligent monitor can use MYSQL.RESPONSE to guage latency per connection re-direct new transactions to a better performing pool of backend server
  • Excellent white paper here
AppFlow
AppFlow can be used to identify bottlenecks in real time between the request from the client to the VIP, VIP to the Backend server, Backend server to the VIP and the VIP to the client. It does this by exporting flow data to a collector using the IPFIX ietf standard (similar to NetFlow). Information such as Top 10 bandwidth consuming VIPs can also be pulled from the AppFlow function.
Cloud Bridge
Cloud bridge can be used to span the VLAN from an on-prem infrastructure to an off-prem infrastructure. Essentially it creates a L2 bridge between the customer’s datacentre and the ‘In-the-cloud’ provider’s front-end e.g. Amazon Cloud services.

Citrix Synergy 2011 – San Francisco

Heading up to San Francisco for the Citrix Synergy 2011 next week (May 21st to May 27th 2011). I’ve scheduled myself in for a few labs and key note presentations around Citrix NetScaler and Citrix XenServer. I attended the Synergy event last year in Berlin and it was great! – Cutting edge technical seminars and invaluable insight into the road map for Citrix and their products – I hope this year surpasses my expectations!

Change of Circumstances – New Job (sort of)

Having done several pieces of consultancy for a particular client in the past, they have now employed me, through my current employer, to work as part of their team permanently for two days a week up until Q4 of this year.

I get to be the Senior Network Engineer for one of the largest online gambling company in Europe!.

Don’t get me wrong, in my current role I do a lot of engineering but I’m also heavily involved in the technical presales / sales engineering, managerial and board-level element of the business.

This new change in circumstance will mean for two days of the week it will be nothing but pure nitty-gritty engineering.

Full on geek.

This opportunity also provides additional enhancement to my resume.

I get to have a real perspective, in a fairly senior and trusted position, in working in one of the most highly pressured IT environment – full throttle! My interaction with technologies and the decisions I make on the network could have the potential to increase the company’s revenue by millions or make the company lose millions in a matter of minute – No pressure at all! ;-)

So what else?

Working for one of the largest online gambling company in Europe means I get to play with some cool toys!

From an estate of firewalls, switches and load balancers to technologies such as BGP, Multi-link PPP and MPLS!

I will also be involved in the implementation of two new data centres which is going to be huge – Multi-million pounds worth of infrastructure.

My first day working as an employee rather than an external consultant saw me implement a number of firewall changes to provision a new service and manipulating the routing to traverse certain domains of their network.

Being at the fore-front of managing and designing highly resiliant networks from an end-user perspective, as oppose to a external contractor / consultant perspective should take my professional development in my chosen profession up another level.

Oh and for the three days I work for my current employer,  the personal assistant (PA) to the Chief Executive Officer (CEO) has now been assigned to run and manage my diary on my behalf!

Exciting times ahead.

Bluecoat ProxySG Consultancy

Had an interesting consultancy session with a customer who are facing two main challenges with their Bluecoat ProxySG deployment across both their VPN and MPLS networks.

  • Users at two of the 10 sites are complaining of stale web content being delivered to them. It should also be noted that only some users out of the two ‘problematic’ sites are complaining of stale content, which makes this exercise more interesting.
  • Some of the sites have recently been migrated from a VPN network to a MPLS network. Due to the way that the Bluecoats have been set up, MPLS traffic engineering / Quality of Service is not being properly administrated as the Bluecoat appears to be masking the native ports for applications and protocols that it is optimising.

The first part of the exercise was to do a quick sanity check on each Bluecoat ProxySG  device across the customer network.

The second part then lists recommendations and explanations to the issues that they are facing.

Site 1: Currently this device is running in a combined Explicit ADN and Transparent http proxy mode to fit into the 2 separate requirements of the customers deployment.

This device is configured to receive http and https requests from a Cisco ASA firewall device via WCCP, which is used to redirect any web traffic from any of the remote sites (both MPLS and VPN).  Through the investigation of current sessions it is confirmed that both of these modes are working correctly.

This device is configured to be the Primary ADN manager for this deployment. Site 2 is configured as the backup ADN manager. As with all other devices in the ADN network the use of Secure Certificates is enabled.

This device is also configured as a Client Manager and has both Acceleration and URL filtering enabled with multiple subnets defined for network awareness.

Apparently the linked AV device is not currently online.

 

Site 2: This device is configured to perform as an inline http proxy and as an ADN peer to the Main site.  This is a supported configuration and is ideal for the situation it is deployed in, however there are some recommendations that can be made which are discussed in the later section of this report.

This device is also the Backup Manager in the ADN network which will be utilised if the primary device goes off line.

It has been reported by some users that there are some issues with serving up stale web content from this proxy.  Following a check on the configuration of the devices it can be confirmed that the settings relating to the retention and serving of web objects is consistent with all other proxy devices in the network. The BlueCoat SG at this site is not the only Proxy in the chain for this deployment. To fully investigate issues of stale content there are further steps required.

 

Site 3: This device is also configured to perform as an inline http Proxy and an ADN peer to the Main site.  Once again there are some recommendations that can be made in relation to the configuration within the network. These are discussed in the later section of this report.

The Proxy SG device is configured to use Site 1 Blue Coat device as its primary ADN manager and Site 2 as its backup manager, which is consistent with the majority of the devices in the network.

Like the Site 2 device it has been reported by some users that there are some issues with serving up stale web content from this proxy.  Following a check on the configuration of the devices it can be confirmed that the settings relating to the retention and serving of web objects is the consistent with all other proxy devices in the network.  However like the Blue Coat SG at Site 2 it is not the only Proxy in the chain in this deployment –To fully investigate issues of stale content there are further steps needed.

 

Site 4: The Configuration of this device is different to the rest of the deployment.  There are a number of Static Bypass entries, which will effectively ensure the listed IP addresses are not proxied or accelerated via the Blue Coat Device.

This device also has an upstream proxy to an address in the 199 address range.

This Blue Coat is also inline and proxying and performing ADN on traffic however there are no reports of Stale web content being sent to the clients at this site.

 

Site 5: This device is configured to Proxy and use ADN and has not been reported as serving stale content.  Looking at the configuration of the device it would appear that there is an issue with the configuration of the ADN network.  The primary manager is correctly configured to point to Site 1 however the setting for the backup manager is set to “self”.

 

Site 6: This device is a Mach5 (ADN) only license, which is suitable for the acceleration job it is required to perform.  There are only minor configuration changes needed here and these are listed in the general recommendations.  It is noted that this device is on a slightly later version of the SGOS firmware and this forms part of the general recommendations too.

 

Site 7: This device is a Mach5 (ADN) only license, which is suitable for the acceleration job it is required to perform.  Once again there are only minor configuration changes need here and these are listed in the general recommendations.

 

General Recommendations

  • Each of the devices are running older versions of the SGOS firmware (5.4.x.x).  Blue Coat introduced major improvements in the ADN technology in version 5.5.x.x and therefore it is recommended that each of these devices be upgraded to version 5.5.4.1.  Following this a move to the new 64 bit operating system will give capacity benefits on the core SG810, this is SGOS version 6.x.x.x (current version is 6.1.2.1).  NOTE-if you are using IWA authentication please ensure you update your BCAAA server when upgrading the Blue Coat Infrastructure.
  • Eliminate acceleration of Terminal Services. The Services on the remote Blue Coats are set to intercept Terminal Services which will be affected by Negative Gain in this scenario.  It is recommended that the Terminal Services service is set to bypass on all Blue Coats.
  • Enable iMap. It is recommended that if using full client Outlook to connect into an Exchange server then iMap service needs to be set to Intercept in the services configuration on all Blue Coat devices.
  • Upstream Proxy – for those sites that are full Proxy editions (all bar Site 6 and Site 7) it would be possible to upstream the local bluecoats directly to the Core SG810 in Site 1.  The benefit of using an upstream proxy is that it will take load of the ASA server and can also simplify the ASA rulebase.  Depending upon the configuration of the ASA WCCP it may also give enhanced caching to the remote sites and reduce bandwidth on the internet connection.
  • Currently the Blue Coat ADN network is running in Explicit mode, in this configuration all acceleration traffic is set to be sent directly to the Core SG810 using a proprietary port 3305.  The benefit of this is that the Core Blue Coat does not need to be inline however all traffic does go over the proprietary port, the side effect to this is that all traffic is encapsulated on this port and cannot be affected by MPLS quality of service.  To remedy this Blue Coat can be set to work in Translucent mode – this effectively changes the traffic to mimic the host protocol port (i.e.  CIFS will show on 139 and 445 etc.) Even though the contents of the data stream will be the accelerated data the MPLS router will see it on its native port and apply the correct QOS.  Testing will need to be performed to ensure this will be correctly applied by the MPLS and it is recommended that this happen at 1-3 of the smaller sites first.

 

Individual Issues

  • Site 1 – In the Maintenance > Upgrades there appears to be an issue with displaying the currently installed version of the SGOS firmware.  It is obvious that the device is functioning correctly however it is desirable that this reports correctly too.  Recommend that this be rebooted and checked, should the problem persist a support call should be raised with Blue Coat TAC.
  • Site 2 – Further testing/troubleshooting of the Stale Website issue to be carried out.  Testing should include reports of the web sites involved, is it the same for all users in the office, can it be tested from another site at the same time?
  • Site 3 - Further testing/troubleshooting of the Stale Website issue to be carried out.  Testing should include reports of the web sites involved, is it the same for all users in the office, can it be tested from another site at the same time?
  • Site 4 - Investigation as to the nature of the Static Bypass and why these are needed.
  • Site 5 - Altering of the Backup Manager setting to reflect the rest of the Blue Coats in the network.

 

Device Relocation Recommendation

The Bluecoat proxy SG deployed at Site 1 is currently configured as a node. In this configuration the device can only see traffic that is redirected using WCCP from the ASA firewall or devices explicitly connecting to the Bluecoat such as the Bluecoat proxy SG appliances deployed on remote sites.

In order to handle CIFS traffic it is recommend a quad port Ethernet fail-open expansion module be purchased and installed into the Bluecoat Proxy SG at Site 1. Once installed it will then be possible to place the Proxy inline giving it complete visibility of all traffic traversing the 8el and MPLS network

The above diagram demonstrates the proposed deployment of the Bluecoat Proxy SG. This configuration will both simplify deployment and ensure any device required to perform WCCP would not be adversely affected by having to redirect CIFS traffic. After the proxy is deployed in this configuration all Web and CIFS traffic can now be processed ensuring optimization can be performed.

Pearls of Wisdom

Individuals contribute, but it’s teams that win.

It’s the best teams that win, not the most talented individuals.

It’s about marketing yourself so you can achieve the best you can.

You don’t hope for the best, you don’t pray for it, you visualise yourself doing it.

Without feedback you have no radar system.

The best way of reinforcing your learning is by teaching others how to do it as well or better than you.

What is GSLB?

Today I was asked the following question from a new member of our sales team

“What’s GSLB?  In the context of advanced load balancing algorithms.”

Below is my response, bear in mind that the sales member is not at all technical so my response is more conceptual than technical.

——————————————————————————————–

Global Server Load Balancing allows customer to achieve server load balancing between multiple site irrespective of location.

So if a customer has two sites;

  • Site A – London
  • Site B – Tokyo

Example 1: London is the primary site and Tokyo is the backup site. If Site A fails, then GSLB will be the guy who allows Site B to take over / be the primary site.

Example 2: If one server in Site A fails, then any traffic to that server will be redirected to the same server in Site B and all other traffic to other servers (healthy servers) will continue down the course of Site A.

Example 3: You type www.google.com, this site is hosted in America. GSLB will know that you are based in the UK and as Google has a local data centre in the UK, GSLB will automatically redirect you to www.google.co.uk which means quicker response as you are requesting content which is closer to you and therefore less latency and better performance when rendering web pages

They are just a couple of examples of what GSLB does. Don’t forget GSLB is a protocol and so for each site e.g. Google.com and Google.co.uk, you will have a GSLB capable device such as a NetScalar at both locations and both NetScaler will talk to each other constantly to be able to make these ‘GSLB’ decisions.

GSLB uses DNS (Domain Name Services which translates www.google.com into an IP address, which is what a PC understands). So when you type www.google.com GSLB through DNS will give you the IP address of the data location of the UK Google sites and not the American site.

Let me know if you would like me to expand on anything if the above is not overly clear.

Troubleshooting the impossible – Citrix XenServer High CPU Utilisation

I’m no Citrix expert, in fact I tend to shy away from the Windows and Citrix space altogether, but sometimes you have to roll up your sleeves and get your hands dirty and so I enter the Server world.

It appears for the last two days one of our Citrix XenServers have been playing up and only stabilises after the XenServer has been rebooted and remains fairly stable for about a day and then requires another reboot to bring it back into stability.

Troubleshooting using the bottoms-up approach here are my findings, again I could be way of the mark as troubleshooting networks tends to have a fair degree of logic to it but troubleshooting problematic servers seems to escape logic altogether!

———————————————————————————————————

Issues:

Troubleshooting XenServer 02 (XS02) and using XenServer 01 (XS01) as a frame of reference given it’s healthy status.

 

Failures

  • Friday 7th January 2011                   – 05:14 approx as reported by hostmonitor
  • Saturday 8th January 2011             –  07:30 approx as reported by hostmonitor

 

Switchport

XS02 is connected to G2 on the dell core switch

XS01 is connected to G1 on the dell core switch

Show interface status on both ports showed that XS01 mdix mode was enabled but not enabled on XS02. Both are set to auto, I’m not sure if this has any relevance to the issue at hand…

Show interface counters showed no port errors on XS02 / G2, infact the errors were 0 (zero) as with XS01

There appears to be little evidence suggesting switchport errors as per the above and as per  the ICMP tests conducted  (results provided below). It appears on both failure occasions, there is 0% packet loss when pinging XS02 and a number of packet losses when pinging the virtual machines hosted XS02. It is also worth paying attention to the max and average response time values on both XS02 and VMs hosted on XS02

 

Ping Results

XS02:                    206 sent / 206 received / 0% loss / Max 3ms / Avg 0ms

APPO2:                 227 sent / 147 received / 35% loss / Max 4496ms / Avg 1030ms

XAPP01:               233 sent / 154 received / 33% loss / Max 4096ms / Avg 1124ms

Act01:                   143 sent / 83 received / 39% loss / Max 4459ms / Avg 1050ms

 

XS02 XenCentre

1)      The Disk utilisation on XS02 VM seems to be consistently higher than that of XS01 – Not sure if this has any relevance (See appendix A for screenshots)

2)      CPU 0 on XS02 shot to 90-100% around 7.30am and stayed there, CPU 1 – 3 stayed within normal operation.

3)      All CPUs on XS01 stayed within normal operation – See Appendix B

4)      Shutting down the VMs on XS02 took a considerable amount of time and appears that once in the shutdown state the CPU 0 utilisation still did not change – This suggests a potential problematic disk in the raid setup. This suspicion is also given weight by the fact that rebooting XS02 brings the CPU 0 to expected levels of utilisation. This eliminates there being any issue of relevance on the VMs itself as no CPU utilisation change was observed when the VMs were put in a shutdown state . A reboot brings the the CPU0 utilisation back to expected levels which suggest error build ups on the disk or RAID is then subsequently settled / cleared.

 

My Comments

It appears there are strong signals suggesting disk drive problems, perhaps a faulty drive in the RAID cluster. It also appears that all the VMs on the disk uses CPU 0 and therefore creating a correlation between the potential disk issue and the CPU 0 over utilisation. What is not clear at this stage is what triggers the disk problems to cause the huge utilisation on CPU 0….

 

Other Notes

  • Each XenServer has 4 CPUs
  • It appears we have loaded all VMs on CPU 0
  • Perhaps CPU 0 is overloaded or is getting knackered now?
  • The network interface of XS02 responds with no packet loss but the network interface of the VMs have packet loss. This could potentially be because the virtual NICs are  on the Disk and a problem with the disc will mean a problem with the virtual NIC. Each VM has been assigned to CPU 0 whereas XS02 is assigned to CPU0-3 which further adds weight to this possibility.

 

Further Testing

One of the VMs, Act01, has now been assigned to all 4 CPUs or VCPU.

If this issue occurs again, the same test as outlined in the beginning of this draft report will be conduced on XS02, 3 VMs and the Act01 VM that is now running on all four CPUs

If act01 performs significantly better under these conditions then we will need to assign all VMs to all 4 CPUs to cope with with the issue when it occurs, the will allow our technical team to focus our efforts on further investigation on the underlying causes.

 

UPDATE 14th Jan 2011

The issue has surfaced once again and pinging ACT01 showed responses of up to 4000ms with a high number of packet drops. It should be noted though that there were a number of  ICMP echo replies at <1ms compared to the VMs using only one vCPU. This suggests that assigning the VMs to multiple vCPU does help but only by a very small amount.

issuing  the ‘top’ command  displays a list of processes with a corresponding CPU utilisation value.  Interestingly, the ‘cdrommon’ process had a utilisation value of 98.3%.

Killing the cdrommon process then immediately stabilised the XenServer.

It appears that my suspicion in regards to the correlation between CPU 0 and the Disk drive was correct, however it is more specifically to do with the CD disk drive as oppose to the hard drive.

Bizarrely, it seems that the CD drive is trying to write to itself.

After some further diagnostics and information, it appears this is a very well known bug with Citrix XenServer 5.6 and the only workaround is to disable the CD-Rom drive in the BIOS [click here]

On a side note, I find it absolutely atrocious that Citrix have not gone ahead and better inform their customers and partners about this very critical issue. This will surely damage brand reputation and will be the sweet spot for the competition to pick up on.

I can see it now ‘Mr. Customer, would you like a Citrix XenServer which is riddled with critical bugs with no real solution or announcements to address the problem or would you much prefer VMWare which is stable, reputable and has been stamped as proven technology….

 

Follow

Get every new post delivered to your Inbox.