Friday, June 9, 2017

AWS Lambda function to remove terminated instances from Octopus server

Octopus Deploy provides machine policies which allows you to configure your Octopus server to remove machines automatically, but it only removes the machine if it is considered unhealthy.  This is normally ok, but what if you want the cleanup to be near real-time? Especially if you launch alot of instances using AutoScaling Groups.  Or, what if you need to troubleshoot an instance, but you don't necessarily want the instance removed from Octopus when the health check fails.  A better and clenaer approach is to remove the machine from Octopus only when the instance is terminated in AWS.  The following steps will help guide you in creating a Lambda function that will remove machines from Octopus when they are terminated in AWS.

NOTE!  This assumes the name of the machines in Octopus use the AWS Instance ID. This post provides a process to do just that using Chef.  using-chef-to-automate-octopus-deployments

1. Create an IAM role to grant permissions to the Lambda

  Attach to the role one of these managed policies: 
       AWSLambdaVPCAccessExecutionRole (lambda in a VPC)
      AWSLambdaBasicExecutionRole (non VPC lambda)

   Grant the IAM role permissions to decyrpt using KMS and restrict to your KMS key
    kms:Decrypt

   Edit the Trust Relationship to allow the lambda to AssumeRole
   
{

      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }



2. Create the Lambda function

Runtime: Node.js 6.10

Code:  Here's a link to the function code, lambda-octopus-cleanup-terminated-machines  copy the code into the Lambda

Environment variables:
  apikey =  Obtain an API key from your Octopus server that has permissions to view and delete machines from Octopus.  Encrypt the api key using the encryption helper on the Lambda setup page
  octopusServer =  <this is the DNS name for your server,  ie:  octopus.yourdomain.com>

Handler:  index.handler

Existing Role:  Select the IAM role created in the previous step

Advanced Settings:
   Timeout:
 20 seconds  ( We need allow enough time for the lambda to launch after being idle as instances aren't terminated continuously)

   VPC:  Choose a VPC, Subnets, and Security Groups that will allow your Lambda access to communicate with your Octopus server

KMS Key: the key used to encrypt the Octopus API key

3. Create the trigger to execute the Lambda

To trigger the Lambda we will create a CloudWatch event rule
Event Pattern
Service Name: EC2
Event Type: EC2 Instance State-change Notification
Specific state(s):  Terminated
Any instance

That's it!  Now when an instance is terminated from AWS the Lambda will run and remove the machine from Octopus.


Monday, February 8, 2016

Using Chef to automate Octopus Deployments

If you are using Octopus Deploy to deploy your .Net code to your Windows servers and also using AWS Auto Scaling Groups, you may have come across some of the limitations of Octopus. Primarily, Octopus has no built-in process to deploy the current project release to a newly registered tentacle immediately upon the tentacle registration. Tentacles can either be registered as 'polling' or 'listening' tentacles.  Polling tentacles will periodically check with the Octopus server to determine if a deployment is required, while a listening tentacle will wait until a deployment is pushed from the Octopus server.  Neither of these approaches will deploy the release immediately.  This post will provide a process of using a Chef recipe to configure the Octopus tentacle in listening mode and then initiate a deployment of the latest project release from the Octopus Server using an API call.

I must give credit to CodeKing and his article of how to use Octopus Deploy with AWS.  His script provides the steps necessary to query the Octopus server for the latest project release and to initiate the deployment. Here is his post:
http://www.codeproject.com/Articles/719801/AWS-Deployment-With-Octopus-Deploy

The process I will demonstrate will rely on Chef to register the Octopus Tentacle as well as initiating the deployment of the latest project release to the server.  I'm not going to go into details about using Chef, if you need help using Chef please refer to their documentation at: https://docs.chef.io/

The recipe assumes the following:
  • Amazon's AWS CLI is installed on the server (this can be done using another chef recipe)
  • The server has access to S3 to download files to install Octopus Tentacle
  • The Octopus project is configured to increment version numbers
  • An Octopus account with an API key that has proper permissions to deploy the release.

The recipe will perform the following tasks:
  1. Download the Octopus Tentacle installer.
  2. Install Octopus Tentacle
  3. Register the Octopus Tentacle to the Octopus Server
  4. Deploy the latest project release from Octopus Server

1. Download the Octopus Tentacle installation from S3
It's not necessary to put the installer file on S3, but I prefer this approach as I know the file will always be available, vs relying on a web link that could potentially change.

Modify this script to use your S3 bucket




2. Install Octopus Tentacle
With the installer downloaded, we must now install Octopus on the server.




3. Register the Octopus Tentacle to the Octopus Server
The next step is to configure and register the Octopus tentacle to the Octopus server. This will register the tentacle using the server hostname, which is later used for the deployment process.

Modify this script to use your Octopus server, API key, and role



4. Deploy the latest release
The final step is to query the Octopus server for the latest release of the specified project and then make an API call to the Octopus server to initiate a deployment to the server.

Modify this script to use your Octopus server, Project, and API key,



Running Chef - putting it all together
The above steps can be placed into a single Chef recipe or kept into separate recipes and run individually.  I prefer to put these in a single recipe and call it via the Chef run-list.

I currently use Chef-Solo and therefore make a call like this, specifiying the runlist as well as the environment.
chef-solo -c c:/chef/solo.rb -j c:/chef/runlist.json -E development -L c:/chef/log.log -l info

Tuesday, January 5, 2016

AWS - using SQS to cleanup Active Directory of terminated instances

If your Amazon EC2 instances are part of an Auto Scaling Group and are required to be joined to a Windows Domain, then maintaining a clean Active Directory environment may be an after thought. EC2 instances can be terminated for a variety of reasons, and since they may terminate abruptly their Active Directory objects may not be removed from Active Directory.  The following steps will help you create a process utilizing Amazon's Simple Queue Service (SQS) to remove terminated instances from Active Directory and to help keep a cleaner Active Directory structure.

The following assumes:
  • Your servers are launched using an Auto Scaling Group and are auto joined to an Active Directory domain using the AWS Instance Id as its hostname.  Please see this post for details on how to accomplish this -  Auto Join EC2 instances to domain
  • An EC2 windows instance that has
    • An IAM role assigned to the instance
    • The AWS CLI installed on the instance
    • Access to the Active Directory domain

1. Create the SQS queue

Within AWS, create a new SQS queue.  Be sure to set the message retention period to a value greater than how often you plan to run the scheduled powershell script. We will set the permissions in a later step, after we've created the SNS topic.

2. Create the SNS topic

Create a new SNS topic in AWS and add a subscription to the SNS topic selecting 'Amazon SQS' as the endpoint, ie: arn:aws:sqs:us-east-1:123456789012:SQS-InstanceTerminations

3. Configure the SQS queue permissions

Return to the SQS created in the prior step and select the Permissions tab.  Add/Modify the permissions to allow SQS:SendMessage from the SNS topic you just created.  Modify the below policy to use your SNS ARN and the SQS ARN resource.


4. Configure the notification for the Auto Scaling Group

Select your Auto Scaling Group and choose the 'Notifications' tab and then 'Create notification'.
For the notification choose the option 'terminate' and select the SNS topic created earlier.


5. Configure the IAM role

The EC2 instance that will be running our Powershell cleanup script  requires permissions to access the SQS queue.  To allow this, configure a security policy for the IAM role that is attached to the instance.  Modify the policy below for the Resource ARN to match your SQS ARN.


6. Create the Powershell script to retrieve the SQS messages

Powershell is used to obtain the SQS messages of the terminated instances and then removes the terminated servers from Active Directory.  Save the script on the server that will run the scheduled task.

Here is the script for the complete process. Modify this script to use your SQS queue name.



7. Create a scheduled task to run the Powershell script

To schedule the script,  configure a scheduled task on the Windows EC2 instance to run "Powershell" with an optional argument. 
The program path for PowerShell is: 'C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe'
The optional arguments is the path to your script:  'C:\Scripts\ActiveDirectory-CleanUp.ps1'

NOTE: You must run the scheduled task using a Windows User account that has the appropriate user permissions to remove objects from Active Directory.

Tuesday, November 3, 2015

AWS - Bootstrap Windows EC2 instance with Chef-Solo


If you have an environment like we do you, fairly small and few service layers, it may not make sense to provision a Chef Server.  Luckily, we can still get the benefits of using Chef to configure our servers by using the included Chef-Solo.  Chef-Solo will run entirely locally on the instance, therefore we must include all required dependencies on the server, ie. cookbooks, run-lists, environments, etc.

The following process I will demonstrate how to launch an AWS EC2 instance and have Chef-Solo configure the instance. I will not go into details about how Chef works with recipes and cookbooks.

Here are the steps we will need to follow to start this process.
  1. Create S3 bucket to store Chef files, this includes your cookbooks
  2. Create IAM Role and define access to the S3 bucket
  3. Configure UserData to run PowerShell on initial launch
    • The UserData will do the following
      • Download the Chef-Client from an S3 Bucket
      • Download the Chef Cookbooks, Recipes, etc
      • Install the Chef-Client
      • Run Chef-solo
Let's get started!

1. Create an S3 bucket and upload the Chef MSI, cookbooks, run scripts, etc

Within the AWS console, create a new S3 bucket to store the Chef installer as well as all the required Chef cookbooks, environment files, run scripts, etc.  For this example we will use,  'examplebucket' for the S3 bucket name, You will need to use your own unique bucket name.

2. Create an IAM role with a policy to allow Read only access to the S3 bucket

By creating an IAM role and assigning the role to the instance we can eliminate the need to use an IAM user account with access keys.  IAM roles utilize temporary credentials to grant access to AWS resources.

Within the AWS console create a new IAM role and Select Role Type: AWS Service Roles > Amazon EC2


Follow the prompts clicking through until the Role is finally created. With the role created, we must now create a new Inline policy which will grant access to the S3 bucket.

Select the newly created Role and expand the 'Inline Policies' to create a new policy:


Choose the option to create a Custom Policy:



For the policy, we will grant ListBucket and GetObject restricted to the S3 bucket we created earlier.

Here is the policy, you must modify the bucket name :

3. Launch a new instance and configure UserData

When launching a new EC2 instance assign the previously created role to the instance.  Also, expand the Advanced details to provide 'UserData'.  This UserData allows you to run scripts when the instance is first launched.  Our instance will need to download and install Chef as well as execute Chef-Solo.

The userdata will utilize PowerShell to execute downloading and installing Chef.  Here is the actual userdata to include in the instance launch.  This assumes the instance AMI you are launching with has the AWS CLI available (the AWS provided AMI's for Windows include this already)

I've provided comments to the code for clarifications.



Troubleshooting

The UserData is only run on instance launch and not on restarts.  This is initiated by the Ec2ConfigService which records log information at:
C:\Program Files\Amazon\Ec2ConfigService\Logs\Ec2ConfigLog.txt

Chef-solo will record log information as defined by the -L option, in this case we are creating the logs at:   C:\chef\log.log

I hope this helps you progress to using Chef and moving more towards Infrastructure as Code in your environment!

Saturday, September 12, 2015

AWS - Auto join EC2 Windows instance to Active Directory Domain


Some environments will require you to join your Windows servers to a domain.  The following will show the steps taken to automatically join a server to a Windows domain.  This assumes the following:
   An existing AWS VPC with access to S3 bucket
   New instances are able to communicate to a domain controller.

NOTE:  Amazon does offer its Directory Service with AD Connector that will connect your VPC to your ActiveDirectory, but this will show how you can do so without the AD Connector.

The steps:

  1. Create a PowerShell script to join a server to the domain
  2. Secure the credentials by converting the PowerShell script to an Exe executable using PS2exe
  3. Create an S3 bucket and upload the exe file
  4. Create an IAM role with a policy to allow Read access to the S3 bucket
  5. Launch a new instance, assigning the IAM role and providing User Data which will run the required scripts at first launch

1. Create the PowerShell script

The PowerShell script will join the server to the domain.   We will use the Add-Computer function, and a user account that has permissions to join computers to the domain. Here is the full script, modify the username, password, and DomainName for your environment.  


Save the file as JoinDomain.ps1

2. Convert the PowerShell script to an executable file

To help secure the credentials we will convert the PowerShell script using PS2exe to an executable file. Download PS2exe from: PS2exe download

Extract the zip file to a folder and then run PS2exe.ps1 on the JoinDomain.ps1 script to convert it to an exe file. From a command prompt run the following:

c:\> .\ps2exe.ps1 -inputfile JoinDomain.ps1 JoinDomain.exe

This will create the JoinDomain.exe file.

3. Create an S3 bucket and upload the exe file

Within the AWS console, create a new S3 bucket to store the JoinDomain.exe file.
For this example we will use,  examplebucket  for the bucket name, You will need to use your own unique bucket name.

With the bucket created, we can upload the JoinDomain.exe file to the bucket.


4. Create an IAM role with a policy to allow Read only access to the S3 bucket

By creating an IAM role and assigning the role to the instance we can eliminate the need to use an IAM user account with access keys.  IAM roles utilize temporary credentials to grant access.

Create an IAM role in the AWS console, and Select Role Type: AWS Service Roles > Amazon EC2


Follow the prompts through, clicking next until the Role is finally created. With the role created, we must now create a new Inline policy which will grant access to the S3 bucket.

Select the newly created Role and expand the 'Inline Policies' to create a new policy:


Choose the option to create a Custom Policy:



For the policy, we grant ListBucket and GetObject restricted to the S3 bucket.  Here is the policy, you must modify the bucket name :


{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": ["arn:aws:s3:::examplebucket/*"]
}
]
}



5.  Launch a new instace

Launch a new instance into the VPC.  We need to attach the IAM Role to the instance as well as configure the Advanced > User Data

The User Data is used to run scripts when the instance is first launched.  For our example, we will be downloading the JoinDomain.exe file from S3 and finally executing it.

First assign the IAM Role to the instance.

Next, expand the Advanced Details to show the User Data field.  Here we can provide some PowerShell commands to download the exe file and execute it.  Here is the UserData to include, modifying the S3 bucket location to your environment:




To join newly launched instances to a domain you need to make use of UserData, which allows you to run scripts during the initial startup of the launch.
By using the UserData you can run commands. For our case, we will be executinig an EXE to join to the domain.

<powershell>
Set-ExecutionPolicy unrestricted -Force
New-Item c:/temp -ItemType Directory -Force
set-location c:/temp
read-s3object -bucketname examplebucket -key JoinDomain.exe -file JoinDomain.exe
Invoke-Item C:/temp/JoinDomain.exe
</powershell>

Here's what it looks like in the AWS Console:




Follow the remaining steps to complete launching of the instance. The instance will launch, download the exe, execute it and restart.

Monday, February 9, 2015

AWS - autoscaling and self healing NAT instance

Having your AWS hosted services maintain high availability is often a top priority, and sometimes its not as straightforward as we all would like it to be.  Here I will describe how to create an "almost" highly-available NAT server.

NOTE:  This configuration is not 100% highly available.  If you only have one NAT instance you will have downtime until the newly created NAT instance is re-instated.  For my use this was acceptable as this was used for an email service.  Any outgoing emails would be queued while a new replacement NAT is launched.  The time it takes for a new NAT to be put into service is about 3 minutes.  That met our SLA and not waking me up in the middle of the night!

This configuration will restore services of a failed instance in approximately 3 minutes!

Amazon provides an example of how to configure NAT instances for High-Availability, see it here,  but this configuration uses (2) NAT instances, and only works if the instance is stopped and restarted.  !!The AWS example does not work for terminated instances!!

When you create your NAT instance using an auto-scaling group and launch configuration the newly created instance will have a new network interface (ENI).  You must then update the routing tables with the new ENI ID to direct traffic to the new NAT instance.  We can accomplish this by adding a few items in the launch configuration user data and properly configuring the roles assigned to the instance.

Here are the steps to follow:

1.  Create a new Role (see example below).  Give it a useful name like:  NAT-update-route-table
   
  The role must grant DescribeNetworkInterfaces and ModifyNetworkInterfaceAttribute for all resources.  This is because we don't have an ARN for the newly launched instance.

  This role must also be allowed to modify the route table that is being used by your subnets.  The actions to allow are CreateRoute and ReplaceRoute.  This we can assign it to only be allowed to our specific route tables using the ARN.


2.  Create a Launch Configuration

   For the launch configuration:
  Select an AMI to use for your NAT.  I recommend using Amazon's community AMI for a NAT, do a search in the AMIs for "amzn-ami-vpc-nat"
  Assign the IAM Role created in the step above
  Assign the appropriate security groups, instance type, etc

  Finally, most importantly provide the User Data which will configure and update the route tables with the new instance ENI.  See full user data below.

I will walk through each step of the user data to explain what each does,  this example is for a AWS Linux NAT instance, therefore we begin our script with:  #!/bin/bash

First step is to enable IP forwarding.

echo 1 > /proc/sys/net/ipv4/ip_forward

Next we must obtain the instance ID.  We can get this from the meta-data provided by Amazon using this URL: http://169.254.169.254/latest/meta-data/instance-id

We set the variable, my_instance_id:

my_instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)

Next, we must obtain the ID for the network interface, with the instance ID we can now get the ENI ID running this command, which sets the ID to the variable my_eni_id. (be sure to modify this to your region)

my_eni_id=$(aws ec2 describe-network-interfaces --region ", {"Ref": "AWS::Region"}, " --filters Name=attachment.instance-id,Values=${my_instance_id} Name=attachment.device-index,Values=0 --output text | grep NETWORKINTERFACES | cut -f5)

We can now update our Route Table with the new network interface ID (be sure to modify this to match your Route Table ID and region)

aws ec2 replace-route --route-table-id rtb-xxxxxxxx --destination-cidr-block 0.0.0.0/0 --network-interface-id ${my_eni_id} --region us-east-1

And finally, we change the source destination check for the network interface for the instance to work properly as a NAT device.

aws ec2 modify-network-interface-attribute --network-interface-id ${my_eni_id} --no-source-dest-check --region us-east-1


3.  Last, create the auto-scaling group utilizing the launch configuration.  The auto-scaling group should be configured as:
Desired = 1
Min = 1
Max = 1

  In the event your NAT instance is terminated, the auto scaling group will launch a new instance and update the route table with the new ENI ID.

There is one missing component to this setup, and that is creating a Status Check Alarm.  The alarm should be configured to terminate the instance when it fails status check.  When the instance is terminated the auto-scaling group will launch a new instance.

(I have not yet created the code to create a new Status Check Alarm, this should be easily accomplished in the User Data.  I will hopefully find time to add to this post how to do this at a later time)


Here is the complete IAM Role Policy
(change the arn for the region and your route table)


Here is the complete User Data to add to the launch configuration
(change the region and the route-table-id to match your environment)

Friday, September 5, 2014

AWS - Auto schedule EC2 instance to start/stop

In order to reduce costs of running test and dev EC2 instances at Amazon I've created a simple script that will run and start or stop an instance based on a schedule.  This script is a PowerShell script that relies on the AWS modules to be installed  ("Import-Module AWSPowerShell")  Refer to the link to Configure PowerShell for AWS  (FYI - configure your default profile using AWS access keys as required)

The script is looking for a tag, 'RunningSchedule' and parsing the value that is set within, which is in a format similar to a cron like style using 24 hour clock : H:H:D or H:H:D-D  (I didn’t include a minutes field)

First 'H' = Start time Hour
Second 'H'  =  Stop time Hour
'D' is for day(s) to run, 1 = Monday, 2=Tues, etc.    So if you want it to run Mon-Fri, enter 1-5
To disable the schedule completely, use ‘Disabled’ for the value.

Example, to have the server start at 8am, stop at 5pm, and run Mon-Fri use:   8:17:1-5
To have the server run from 10am to 3pm on Wed use: 10:15:3

The script will then either start or stop the server based on the schedule.
It will then send an email with the servers listed that were either started or stopped.

To schedule the script itself, I configured a scheduled task in Windows on a server.  Simply run a program "Powershell", path is  'C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe'
Include the optional arguments for the path to your script:  'C:\Scripts\AWS-schedule-start-stop-instances.ps1'


Here is the script: