Scanning S3 Files with ClamAV and CDK

7 March 2022

In this post we look at how to trigger virus scanning of files when they are uploaded to S3 utilising AWS CDK and Lambda container images

Introduction

The AWS Well-Architected Framework consists of six pillars — operational excellence, performance efficiency, security, reliability, cost optimisation, and sustainability — with cloud security being one of AWS's highest priorities. As an AWS user you will benefit from all of AWS's best practices which in turn enables a shared responsibility model; AWS is responsible for “Security of the Cloud” and the user is responsible for “Security in the Cloud” including, you guessed it, the screening of S3 objects for malicious content.

With that in mind we will be looking at how to scan files uploaded to S3 utilising ClamAV, a multi-stage Dockerfile and Lambda container images.

Dockerfile

The multi-stage Dockerfile being used has been taken from Joseph Sutton's Using Serverless to Scan Files with ClamAV in a Lambda Container blog post.

To summarise what the Dockerfile is doing, it simplifies the building of the ClamAV binaries and virus definitions by using Amazon's base image to build them rather than directly building to the lambda/nodejs:14. It then builds the Lambda image with the handler and pulls the ClamAV binaries and virus definitions from the layer image.

FROM --platform=linux/amd64 amazonlinux:2 AS layer-image

WORKDIR /home/build

RUN set -e

RUN echo "Prepping ClamAV"

RUN rm -rf bin
RUN rm -rf lib

RUN yum update -y
RUN amazon-linux-extras install epel -y
RUN yum install -y cpio yum-utils tar.x86_64 gzip zip

RUN yumdownloader -x \\*i686 --archlist=x86_64 clamav
RUN rpm2cpio clamav-0*.rpm | cpio -vimd

RUN yumdownloader -x \\*i686 --archlist=x86_64 clamav-lib
RUN rpm2cpio clamav-lib*.rpm | cpio -vimd

RUN yumdownloader -x \\*i686 --archlist=x86_64 clamav-update
RUN rpm2cpio clamav-update*.rpm | cpio -vimd

RUN yumdownloader -x \\*i686 --archlist=x86_64 json-c
RUN rpm2cpio json-c*.rpm | cpio -vimd

RUN yumdownloader -x \\*i686 --archlist=x86_64 pcre2
RUN rpm2cpio pcre*.rpm | cpio -vimd

RUN yumdownloader -x \\*i686 --archlist=x86_64 libtool-ltdl
RUN rpm2cpio libtool-ltdl*.rpm | cpio -vimd

RUN yumdownloader -x \\*i686 --archlist=x86_64 libxml2
RUN rpm2cpio libxml2*.rpm | cpio -vimd

RUN yumdownloader -x \\*i686 --archlist=x86_64 bzip2-libs
RUN rpm2cpio bzip2-libs*.rpm | cpio -vimd

RUN yumdownloader -x \\*i686 --archlist=x86_64 xz-libs
RUN rpm2cpio xz-libs*.rpm | cpio -vimd

RUN yumdownloader -x \\*i686 --archlist=x86_64 libprelude
RUN rpm2cpio libprelude*.rpm | cpio -vimd

RUN yumdownloader -x \\*i686 --archlist=x86_64 gnutls
RUN rpm2cpio gnutls*.rpm | cpio -vimd

RUN yumdownloader -x \\*i686 --archlist=x86_64 nettle
RUN rpm2cpio nettle*.rpm | cpio -vimd

RUN mkdir -p bin
RUN mkdir -p lib
RUN mkdir -p var/lib/clamav
RUN chmod -R 777 var/lib/clamav

COPY freshclam.conf .

RUN cp usr/bin/clamscan usr/bin/freshclam bin/.
RUN cp usr/lib64/* lib/.
RUN cp freshclam.conf bin/freshclam.conf

RUN yum install shadow-utils.x86_64 -y

RUN groupadd clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamupdate

RUN LD_LIBRARY_PATH=./lib ./bin/freshclam --config-file=bin/freshclam.conf

FROM --platform=linux/amd64 public.ecr.aws/lambda/nodejs:14

COPY --from=layer-image /home/build ./

COPY virus-scanner/index.js ./

CMD ["index.handler"]

The freshclam.conf file referenced is a scaled down version containing only the required configurations for this example:

DatabaseMirror database.clamav.net
CompressLocalDatabase yes
ScriptedUpdates no
DatabaseDirectory /home/build/var/lib/clamav

More information about the freshclam.conf file can be found here

Lambda

Our lambda is going to be invoked when files are created in an S3 bucket and will handle the scanning and processing of any that are found to be infected. There are a number of ways infected files can be handled, for example, tagging the file and adding IAM rules to prevent infected files being opened or creating a quarantine bucket to move infected files to.

For this example we will be deleting a file if it is found to be infected:

export const handler = async (s3Event: S3Event): Promise<void> => {
    console.log("Received S3 event - handling virus scan", JSON.stringify(s3Event));

    for (const s3EventRecord of s3Event.Records) {
        await virusScan(s3EventRecord)
    }
};

const virusScan = async (s3EventRecord: S3EventRecord): Promise<void> => {
    const objectKey = s3EventRecord.s3.object.key;
    const sourceBucket = s3EventRecord.s3.bucket.name;

    const objectToScan = await s3.getObject({
        Bucket: sourceBucket,
        Key: objectKey
    }).promise();
    
    writeFileSync(`/tmp/${objectKey}`, objectToScan.Body as Buffer);

    try {
        execSync(`./bin/clamscan --database=./var/lib/clamav /tmp/${objectKey}`, {
            encoding: "utf8",
            stdio: "inherit",
        });
    } catch (error) {
        // the error status returned for an infected file is '1'
        if (error.status === 1) {
            await s3.deleteObject({
                Bucket: sourceBucket,
                Key: objectKey
            }).promise();
            console.error(`Virus found, ${objectKey} removed from ${sourceBucket}`);
        }
    }
};

CDK Stack

Now that we have our Lambda and Dockerfile we can create a simple stack in CDK to manage the Lambda container image along with our bucket and it’s permissions:

export class VirusScanningStack extends Stack {
    constructor(scope: Construct, stackName: string, props?: StackProps) {
        super(scope, stackName, props);

        const ourFileBucket = new Bucket(this, "our-file-bucket", {
            autoDeleteObjects: true,
            removalPolicy: RemovalPolicy.DESTROY
        });

        const virusScanningFunction = new DockerImageFunction(this, "virusScanUploadFunction", {
            code: DockerImageCode.fromImageAsset("./", {
                entrypoint: ["/lambda-entrypoint.sh"],
            }),
            timeout: Duration.minutes(2),
            memorySize: 2048
        });

        const s3PutEventSource = new S3EventSource(ourFileBucket, {
            events: [EventType.OBJECT_CREATED_PUT]
        });

        virusScanningFunction.addEventSource(s3PutEventSource);
        ourFileBucket.grantRead(virusScanningFunction);
        ourFileBucket.grantDelete(virusScanningFunction);
    }
}

Testing the virus scan

Provided you are set up to use AWS CDK and are running Docker on your machine you are good to deploy and start uploading some documents. The easiest and safest way to test how it processes an infected file is to use an EICAR test file. The file itself can be downloaded but the simplest way to use it is to copy the following test string into a plain text file:

X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*

And that’s it, once you’ve uploaded a file the scan itself is going to take around 30 seconds to run.


If you're curious about AWS serverless cloud solutions in TypeScript leveraging AWS CDK and Lambdas as well as protecting your cloud applications and want to learn more, then check out our AWS Serverless Typescript and Cloud Application Security Training courses.

Article By
Gravatar for emma.nash@instil.co

Emma Nash

Software Engineer