AWS Guard Duty Automation: Using Lambda to shut down a compromised instance

After getting a working CloudWatch Rule that would actually generate SNS events for GuardDuty all medium and high alerts – the work was not done. SNS by itself is not enough, still requires a human to go in and do something [stop the compromised instance].

AWS gives us the grand opportunity to automate so much of this!  I looked around for examples to see if anyone had done this and found some bits and pieces on the web and got a nice python 2.7 script working; one where I was actually able to a repeat a successful result again and again and again.

Here is the Lambda function on my GitHub that will parse the instance-id from the GuardDuty CloudWatch Rule and then initiate a stop on that instance.  Furthermore, it will tag the instance by modifying the Name value like so:


So, if you use it with my CloudWatch Rule for only alerting on Medium and High events, you can have high confidence that it will only stop actual compromised instances; and not ones that are just being port scanned. 🙂 Be sure to add both SNS and the Lambda function as TARGETS to the CloudWatch Rule!

This way, your automated security is working for you 24/7 and your remediation time of a compromised instance will usually be < 5 min! + you’ll get an email or text.

TESTING – You can test the script like so: Spin up an instance and then grab the instance-id. Get the JSON text of a sample GuardDuty event. In Lambda, in [ Configure TEST events ] in the event template, you can pick [ CloudWatch Logs ] and then you paste in your sample Guard Duty Event into the text box, but replace the ‘instance-id’ field with the instance-id of the instance you just spun up. Replace the region data as well if you are operating in a different region than the sample alert, then click TEST.

I hope this helps you!

Posted in AWS, Lambda, Uncategorized | Leave a comment

Passed AWS Solutions Architect Pro Exam!

Very happy to share!   Obliviously, no specifics can be shared due to the exam NDA, but I can recommend topics you should study and give my thoughts here.  I also had an ‘event’ in the middle of the exam where the testing computer I was on somehow disconnected form the internet and stopped me cold in my tracks half way through, but I’ll talk about that more later.

To study for this, to start – I consider myself blessed to currently be doing a lot of work in AWS now,  and get hours of hands-on everyday in my current role.

As always, a HUGE shout-out to Ryan Kroonenburg and the AWS Professional course. By far the BEST training material out there!  Material is spot on! Repetition is key, and Ryan is great to listen to just about anywhere.

I used published talks on AWS ReInvent 2017 from YouTube. I leveraged vBooks PDF reader on my iphone to help me listen to all of the AWS WhitePapers.

It felt like I was hit particularly hard on OpsWorks, VPC Direct Connect Routing, Storage, IAM and DR. The exam seemed outdated, for instance, there were questions about IDS/ IPS configuration that did not mention GuardDuty, which was announced six months ago… or the new EC2 instance types..

The Architect Professional Exam does a good job of measuring knowledge; but I found some of the quality of the questions to be in need of work. I get they can’t be straight forward, but as an Architect in real life, choosing appropriate solutions for customers doesn’t match well some of the incomplete and intentionally illusive question material I had to navigate today. What I am saying is that, although the exam does a good job testing knowledge of which solutions are used where / or which not to use ( anti-patterns ), the exam does not really test how well you can actually put this stuff together and make it work. That’s a gap in my mind.

AWS itself is in a unique position to disrupt the multiple choice question (MCQ) format and allow the exam taker to be tested on actual skill because AWS does not rely on hardware. Think about it… what if you could actually use AWS during the exam to build something?  I believe the interview process for a Solutions Architect at AWS already incorporates this kind of thing… Candidates are given instructions to build a small DataCenter in AWS with some config specifics – and then you show them the outcome.  Why not have the exam mirror more of a hands-on style like this proctored exam center and you have instructions and a time limit? MCQ makes sense for the Associate level, but I think AWS can do great things here and really up the bar!

Ok – now I’ll tell you about the scary thing that happened in the middle of my exam. I was about half-way through, contemplating possible answers to a question when  all of the sudden, (without any interaction from me),  the screen went white and a little box popped up saying that this computer had lost connection to the internet. Only option was an exit button. I went and got the proctor and he actually had to exit the exam and re-start it…, ( a total of about five minutes ) but… the state of the exam was saved! When I got back in, I was right where I left off!! Although I was happy,  I was thrown off a little by this, and found myself rushing through the second half of the exam in case it happened again. Got back in the groove, slowed down a bit and finished! So, kudos to AWS for ensuring their Exams save state during the test. ( highly durable 🙂  )

Last thought, Recommend picking up practice questions where you can. I actually bought the AWS practice questions from AWS training site and found that it only allowed me to do a single attempt, 40 questions.  Not a great value – but all practice questions help.

If I think of anything else, I’ll update this page. I hope something here was useful to you!


Posted in Uncategorized | Leave a comment

AWS GuardDuty CloudWatch Hell

I feel it is important to share with the community. I’ve fought with GuardDuty and CloudWatch to develop an alerting policy that works. In the midst of testing my policy, I found an error in AWS documentation which they have since acknowledged. This all started when I was writing a CloudWatch Rule does not deliver a lot of noise. For a while, all I could get CloudWatch to do was fire on the default GuardDuty Rule:

  "source": [ "aws.guardduty" ],
  "detail-type": [ "GuardDuty Finding" ],

This was so noisy, I was getting alerted for every port scan of every EC2 instance as well as a variety of other events that were not actionable.

The AWS Admin Guide for GuardDuty outlines the severity types and the alert levels in  float decimal associated with each. For instance, Low severity response falls within the 0.1
to 3.9 range). When ever I tried to add severity to my CloudWatch Rule, I never got alerted!  The guide gives the example CloudWatch Rule for defining severity levels:

aws events put-rule --name Test --event-pattern "{\"source\":
[\"aws.guardduty\"],\"detail-type\":[\"GuardDuty Finding\"],\"detail

Notice that the key value for 5 and 8 “severity” in this example is in float decimal. This is where the ‘fun’  began and why I could not ever get any of the sample alerts to work.  When you go into the GuardDuty Console > Settings > Generate Sample alerts, all of the samples (that I generated, and I did this 25 + times), ‘severity’ key value came in as integers. It took some digging and two different AWS Support cases; and I ended up finding it based on the different rules they were having me create and try. AWS support was able to verify my findings ( I opened tickets with both the CloudWatch team and the GuardDuty Team and they said the same thing:

—– GuardDuty Team

Hello Chris,

Thank you for contacting AWS Premium Support. It was a pleasure talking to you today.

So I did my own tests to conclude my findings about this case and I would like to share the results with you below. While I also noticed that the sample GuardDuty (GD) findings only produced a severity with whole numbers such as 5, 8, 2 etc., I tested your rule anyway that included the severity as decimals such as 5.0, 8.0 and 2.0 and noticed that I did not get any notifications about the same via the SNS topic that I had setup with my CW rule.

aws events put-rule --name HensonGDRule --event-pattern "{\"source\":[\"aws.guardduty\"],\"detail-type\":[\"GuardDuty Finding\"],\"detail\":{\"severity\”:[4.0,5.0,8.0]}}”

However, the moment I entered those values as whole numbers, such as 4,5 and 8, I started receiving alerts!

aws events put-rule --name HensonGDRule --event-pattern "{\"source\":[\"aws.guardduty\"],\"detail-type\":[\"GuardDuty Finding\"],\"detail\":{\"severity\”:[4,5,8]}}”

—-CloudWatch Team

Hi Chris,

Yes, you are correct. All the sample findings that were generated had an integer severity therefore our rule did not match. Adding integer values for severity in the rule did make it work.

I understand that you would like to do further testing. I have set the case to "Pending Merchant Action". This way the case will auto-resolve in 5 days if there is no activity. Even if the case auto-resolves you can reopen it by adding a correspondence. I have added myself to the case and will keep track of it till it is fully resolved.

Let me know if you have any questions.

Best regards,


What is unknown and I still need to test is if every same alert generates ‘severity’ as an integer. The CloudWatch Rule in my GitRepo will work based on both float or integer. I have yet to see an alert come in that is a float… most of them look like this:

"severity":5, or "severity":8

I have yet to see a  ‘severity ‘5.5’  . . .  AWS Support is still looking into some things, so I will update this later on.

UPDATED 5/29/2018: I received a new note on my support case that confirms AWS ackknowledges the issue:

Hello Chris,

Thank you for your patience with this case.

To provide an update to you, this is actually a known issue to our internal service teams where GuardDuty findings, which are supposed to be formatted with decimals, are being passed as integers; you will probably see this if you were to try and export a finding directly from the GuardDuty console, you'll see the "Severity" element has an integer value. Because CloudWatch Events is a pattern-based trigger system, this means that technically the values in your pattern are not the same as the values GuardDuty is presenting which explains why your patterns may not be triggering.

The GuardDuty team is aware of the issue and they are pushing forward with a fix as soon as possible, though I can't disclose any affirmative ETA on this. However, as a workaround for now, please try passing a pattern like this:


    "source": [



    "detail-type": [

        "GuardDuty Finding"


    "detail": {

        "severity": [








I would like to apologize for the inconvenience caused to you due to this however, if you have any further questions or concerns, please feel free to get back to us and I'll be glad to look into it as well.

Thank you and have a good day!

Best regards,

Ketan S.

Amazon Web Services

It’s important to note, that, whenever they fix this, it could break the existing work-around, so I will keep on top of this.


Some good info unearth during the investigation:

Posted in AWS, Uncategorized | Leave a comment

GlueCon2018: AWS Security for DevOps by Chris Henson

Gratitude is what comes to mind when reflecting back to my speaking opportunity at GlueCon2018. Back in January this year,  I came up with the topic of ‘AWS Secuirty for DevOps’ as a way to introduce the concept of an IAM role, show some basic policies and understand why Principle of Least Privilege is needed when using Apps inside AWS.  I built the slide deck and prepared the talk – without knowing if I was going to be selected to speak.

GlueCon was amazing! I did get selected to speak in the main hall used for the Keynotes (YAY! Thank you, Eric Norlin)  and although the presentation was not filmed, you can down load the deck here from the ‘about me’ page ( at the bottom ) and use the included MD5 hash to ensure you are getting the same file I put up.




Posted in AWS, Gluecon2018 | Leave a comment

Gluecon2018 Keynote w/ Adrian Cockroft + AWESOME!

Cool things happen when a Security person gets to attend a Developer Conference! In all seriousness, Last January, I planned to attend GlueCon this year because I feel development is a critical part of Security and I want to understand development concepts more in depth so I can add value there.

Below are my notes on the Gluecon2018 Keynote w/ Adrian Cockroft. Adrian works for a Amazon Web Services as a Solutions Architect; and he was previously at Netflix as a key part of their team.  The keynote was solid ! … and one of the best presentations on Cloud Architecture I have seen.


Adrian opened up with the fact that Architects must ask the “Awkward”  questions of their customers. ‘ What should your system do when it fails?’ [ because it will fail ] ‘ If a permissions lookup fails, what should you do?’ ‘Do you have a real DR?’ ‘ How do you know your system works’? [ what metrics tell you? ]  ‘How often do you fail the entire data center all at once?’ ‘ How exactly does your your system return to normal after a DR?’ Most customers don’t want to talk about failure scenarios and it is an Architect’s job to bring them up and address them in the design. Adrian pointed out that some companies have “Availability Theatre” where true failure scenarios are not part of the over-all DR testing process; yet DR is touted as functional.

Next, Adrian moved onto to talk about avoidable failure scenarios. He pointed out a SaaS company that forgot to renew their domain name and due to expiry, everything failed. He also pointed out SSL certificate expiry as a failure scenario.  Aside from the obvious controls in place to avoid these types of failure scenarios; Adrian pointed out that you could program an alternative DNS name to which your API could fail, so that DNS dependent services could still function in the event of a domain name expiry. He mentioned that DNS is one of the weakest points of a large system and needs to be taken into consideration when we architect for chaos.

Transition to  Chaos Architecture  . . .  4 Layers . .

  1. Infrastructure Layer – No Single Point of Failure
  2. Switching and Interconnecting Layer – Data replication / traffic routing
  3. Application Layer – app Failures / Error handling
  4. Users / People Layer – Operator confusion, users not interpreting data properly and making changes based off of what they see vs. what is actually happening

To mitigate problems on the User Layer . .Not enough emphasis on ‘People training’ and Fire drills when it comes to reacting and responding to system failures. Implement training to help users / operators behave in a consistent way when certain failures occur.  Also, implement ‘Game Days‘  where failures are purposely introduced as a way of training.

To mitigate problems in the Application Layer – Leverage Simian Army toolsets that introduce specific problems into various components of the application to understand how your system reacts to these failures so they can be addressed.  Adrian mentioned cHAP to automate.

To mitigate on Switching / Interconnection and Infrastructure Layers; use Gremlin to run specific failures and experiments against your infrastructure to understand how it behaves in those scenarios. Adrian pointed out that we must not think ” the network is reliable ” and architect for failures in the network domain when we design our systems and applications.

Adrian talked about how a ‘Chaos Engineering Team’ is like a ‘Security Red Team’ wherein a Security Red Team identifies Security weaknesses in Security; a Chaos Engineering Team identifies weaknesses in Availability.

Other critical points Adrian touched on:

The ‘Red Queen theory‘ wherein as we evolve; the people and environments around us evolve around us as well. I took this as a warning as to not “Architect in a Bubble”

Read ‘ the Safety Anarchist’  by Sydney Decker.

Amazon is beginning to implement chaos tests for customer use via Aurora DB Cluster Fault Injection 

Recommended to get involved in Chaos Engineering Working Group 



Posted in AWS, Gluecon2018, Uncategorized | Leave a comment

OSSEC / Auto-OSSEC Automation in AWS Linux – More GLUE!


OSSEC is a tricky devil to automate. And what I mean by automate; is install the ossec-hids-server, install the ossec-hids agent, register the agent and have the server recognize that registration without human prompts. If you’ve done this before, you know there are lots of manual steps. The smart folks over at BinaryDefense have added some automation to that process with their auto-ossec tool

They really took a lot of work out of all of the manual steps needed to connect the client to the server, generate the key and exchange the key…

but… the process was still not as automated as I needed it to be. In AWS you don’t know what the OSSEC server IP will be, and that IP needs to be passed to auto-ossec as an arguement +  placed in the ossec-hids-agent config file.  Not to mention all of the repo adds, tweaks to ossec config files that must happen even for ossec to start properly.

I have written two scripts, located in my git repository,  that automates the installation of the remaining pieces that auto-ossec does not; that is outfitted for AWS Linux.

The LinuxOssecServer script installs ossec-hids-server and binarydefense auto-ossec listener on the AWS Linux Ec2 instance that will be in the role of Ossec Server.

We leverage S3 as a storehouse for needed files:
The atomix file/ script that you run to install the ossec repositories: would go in s3://yourbucketname.

Also, a clone of binary defense repo would go in s3://yourbucketname.

You need to allow your EC2 instance access to S3 and to query other instances, so EC2 instance Role required for access to S3 and EC2.

The LinuxOssecClient script installs ossec-hids-agent and and binarydefense auto-ossec and then automatically locates the AWS EC2 instance ossec-server ip (via a pre-set tag) and registers the agent and starts services on AWS Linux. Same requirements as above for the fole.

The line with ‘aws ec2 describe-instances’ must have correct region, so put your region in there. For the public version of the code,  ossec server must have AWS tag of Name=tag:Role,Values=OssecMaster for script to locate the IP addr of the EC2 instance that is the ossec server, so when you start your OssecServer instance, be sure to add that tag.

You’ll notice some sleep commands I’ve put in the scripts. OSSEC initialization is a little buggy, meaning, [ see ref links 1 and 2 below ]  that you have to restart the ossec-hids-server process on the server after the first agent attempts to register; once that is done, all the subsequent agents will register with no problem. I don’t know why this is and this behavior is lame – and I hated to have to code around it.  I need to come up with a better way that just sleeping the script during the first agent registrations; and then running a restart after x minutes.  Or maybe the next version of OSSEC will fix this so the first agent will register without a restart.

Ref .1  Issue where you have to restart OSSEC after first agent registers

Ref 2 Issue where you have to restart OSSEC after first agent registers

Also, don’t forget to configure your Security Groups correctly.

You’ll need  9654 TCP open on the OSSEC server for the auto-ossec listener

You’ll need 1514 UDP open on the OSSEC server to accept agent keep alive messages.


Posted in Cloud Security, Cyber Security, Linux Security | Leave a comment

Path to AWS Architect Professional – Storage Anti-Patterns


This post a summary on my notes from reading the Storage Design Anti-Patterns addressed in this AWS Whitepaper.  

“An anti-pattern is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive”

S3 Anti-Patterns: 

Amazon S3 doesn’t suit all storage situations. The following list presents somestorage needs for which you should consider other AWS storage options

Storage Need:  File-System. S3 uses a flat name space and is not meant to be a POSIX-compliant file system.  Instead, consider Amazon EFS as for a File System.

Storage Need:  Structured Data with Query: S3 does not offer query for specific objects. When you use S3, you need to know the bucketname and key for the files you want to retrieve. Instead use / or pair S3 with: AWS Dynamo DB, Amazon RDS or  CloudSearch

Storage Need:  Rapidly Changing data: Use solutions that take read and write latencies into account, such s Amazon EFS , AWS Dynamo DB, Amazon RDS, or Amazon EBS

Storage Need:  Archival data: For data that requires infrequent read access with encrypted archival storage with a long RTO is ideal for Amazon Glacier

Storage Need:  Dynamic Website hosting: Although S3 is ideal for hosting static content, dynamic websites that depend on server-side scripting or database interaction are more ideal for Amazon EC2 or Amazon EFS

Glacier Anti-Patterns: 

Amazon Glacier doesn’t suit all storage situations. The following list presents some storage needs for which you should consider other AWS storage options.

Storage Need:  Rapidly Changing data: Look for a stroage solution with lower read and write latencies such as Amazon RDS, Amazon EFSAWS Dynamo DB,  or DBs running on Amazon EC2

Storage Need:  Immediate Access Data store in Glacier is not available immediately, typically takes 3-5 hours, so if you need to access your data immediately, Amazon S3 is a better choice .

Amazon EFS Anti-Patterns: Amazon EFS doesn’t suit all storage situations. The following list presents some storage needs for which you should consider other AWS storage options

Storage Need:  Archival data: For data that requires infrequent read access with encrypted archival storage with a long RTO is ideal for Amazon Glacier

Storage Need:  Relational Database Storage: In most cases, relational databases require storage that is mounted, accessed, and locked by a single node (EC2 instance, etc.).. Instead use: AWS Dynamo DB, Amazon RDS

Storage Need:  Temporary Storage: Consider using local instance store for items like buffers, cache, queues and caches.

Amazon EBS Anti-Patterns:

Amazon EBS doesn’t suit all storage situations. The following list presents some
storage needs for which you should consider other AWS storage options.

Storage Need:  Temporary Storage: Consider using local instance store for items like buffers, cache, queues and caches.

Storage Need:  Multi-instance Storage: EBS volumes can only be attached to one EC2 instance at a time.  If you need multiple instances attached to a single data store, consider using Amazon EFS

Storage Need:  Highly Durable Storage: Instead use Amazon S3 or Amazon EFS. Amazon S3 Standard storage is designed for 99.999999999 percent (11 nines) annual durability per object. You can take a snapshot of EBS and that snapshot gets saved to S3, thereby providing the durability of S3.  Alternatively, Amazon EFS  is designed for high durability and high availability, with data stored in multiple Availability Zones within an AWS Region.

Storage Need:  Static Data or Web Content: If data is more static, Amazon S3 might represent a more cost-effective and scalable solution for storing  fixed information. Web content served out of Amazon EBS requires a web server running on Amazon EC2; in contrast, you can deliver web content directly out of Amazon S3 or from multiple EC2 instances using Amazon EFS.

Amazon EC2 Instance Store Anti-Patterns:

Amazon EC2 instance store doesn’t suit all storage situations. The following list presents some storage needs for which you should consider other AWS storage options.

Storage Need:  Persistent StorageIf you need disks that a similar to a disk drive and must persist beyond the life of the instance,  EBS Volumes, EFS file systems or S3 are more appropriate

Storage Need:  Relational Database Storage: In most cases, relational databases require storage that is mounted, accessed, and locked by a single node (EC2 instance, etc.).. Instead use: AWS Dynamo DB, Amazon RDS

Storage Need:  Shared Storage: Instance store can only be attached to one EC2 instance at a time.  If you need multiple instances attached to a single data store, consider using Amazon EFS   If you need storage that can be detached from one instance and attached to a different instance, or if you need the ability to share data easily, Amazon EFS, Amazon S3, or Amazon EBS are better choices.

Storage Need:  Snapshots: If you need, long-term durability, availability,
and the ability to share point-in-time disk snapshots, EBS volumes with snapshots stored in S3  are a better choice

Posted in AWS, AWS Certified Solutions Architect, Uncategorized | Leave a comment