developing in development mode: Node & Webpack

I downloaded my app from Glitch to try to debug an issue, and I had to force it into development mode. Not being a node person it took me a while to figure out that I was supposed to do this via an environment variable.

And in that process I found that the preferred way to set an environment variable with webpack is something like the below stanza (which worked great when added to my webpack.config.js


plugins: [
new webpack.DefinePlugin({
'process.env.NODE_ENV': JSON.stringify('development')
})
],

parameterized insert with Python’s Records

I hate ORMs in general. I also hate SQLAlchemy in particular. It seems like a major flaw in Python’s ecosystem that connection and session handling is so deeply bound to a high magic ORM. (I miss DBI)

Python has Records, which is a nice simple interface. But I couldn’t find anything that explained how to do inserts with it.

This was my first stab at it:

values = {'id':'joy', 'created_at':'Thu Feb 22 18:40:35 +0000 2018'}
insert_sql = "INSERT INTO likes (id, created_at) VALUES (:id, :created_at)"

db.query(insert_sql, values)

This gives an error:

sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) A value is required for bind parameter 'id' [SQL: 'INSERT INTO likes (id, created_at) VALUES (%(id)s, %(created_at)s)'] (Background on this error at: http://sqlalche.me/e/cd3x)

After some fairly frustrating reading the source and Googling, I eventually came up with the following that works

db.query(insert_sql, **values);

Anatomy of a yak

What’s the best way to configure AWS spot instances?
Opinions are GCloud is easier to use
Login to Gcloud. Install gcloud sdk.
Pick “play with whatever the equivalent of AWS SQS is” as a hello world project
Find that Pub/Sub is the SQS equiv
Try to find the pricing for Pub/Sub — it’s based on bandwidth consumed not messages
Google for “pub/sub getting started”
yarn install dependencies
Google "yarn vs npm"brew install yarnError: /usr/local must be writable!sudo chown -R $(whoami) /usr/local
More Googling
Find out that /usr/local isn't chown'able under High Sierra
More Googling
sudo chown -R $(whoami) $(brew –prefix)/*
Error: /usr/local must be writable!
Fuck it yolo
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install yarn
Error: Your Xcode is too outdated.
Go pick up the kiddo from robot camp

Squaring the rectangle

This takes me back about 12 years to when square thumbnails were new and nifty.  Also how is it in 2016 image libraries don’t use DoTheRightThing when encountering EXIF orientation info?  But nope PIL/Pillow still has the “you made a thumbnail and now your photo is rotated” feature.  Here’s a quick and ugly fix


from __future__ import print_function
from PIL import Image, ExifTags
def square_thumb(img, thumb_size):
THUMB_SIZE = (thumb_size,thumb_size)
exif=dict((ExifTags.TAGS[k], v) for k, v in img._getexif().items() if k in ExifTags.TAGS)
if exif['Orientation'] == 3 :
img=img.rotate(180, expand=True)
elif exif['Orientation'] == 6 :
img=img.rotate(270, expand=True)
elif exif['Orientation'] == 8 :
img=img.rotate(90, expand=True)
width, height = img.size
# square it
if width > height:
delta = width – height
left = int(delta/2)
upper = 0
right = height + left
lower = height
else:
delta = height – width
left = 0
upper = int(delta/2)
right = width
lower = width + upper
img = img.crop((left, upper, right, lower))
img.thumbnail(THUMB_SIZE, Image.ANTIALIAS)
return img

view raw

square_thumb.py

hosted with ❤ by GitHub

Read Google Cloud credentials from a file

I’m playing with calling a Google Cloud API. All the examples I can find assume you’re either running in a Google sandbox or you can futz with environment variables and arbitrary system paths.

Preferred approach is to fetch authorization credentials via a call to GoogleCredentials.get_application_default() which either “does the right thing” or looks in GOOGLE_APPLICATION_CREDENTIALS (per the docs)

But I’m running this code on AWS Lambda.

And what I couldn’t find documented was how to use the same JSON file I’d downloaded to test locally in a deployed app. I found some references a WELL_KNOWN_FILE_LOCATION that turned out to be a dead end, and some baroque examples of initialization a GoogleCredential object manually.

So I started poking around the code, and I found from_stream. It takes a file name and does the right thing.

This is the write way to initialize a Google Cloud Credential with your “default Application Credentials” from an arbitrary file (and then, for example, pass it to the Cloud Vision API) in Python.

credentials = GoogleCredentials.from_stream(some_file.json)
return discovery.build('vision', 'v1', credentials=credentials,
                       discoveryServiceUrl=DISCOVERY_URL)

And given that you’re running it in a “serverless” environment on Lambda (let’s pretend!), I used a local file path

credentials = GoogleCredentials.from_stream('./application_default_credentials.json')

Success! And now I’m still ignorant to the intricacies of Google Cloud key management, which was the goal.

pay per access pattern

Just to follow up on the last post and the line “it isn’t reasonable to scan, just like a real database”.

So Dynamo is a pay-per-access-pattern model. I don’t know that I’d encountered that before or seen anyone explain it like that. But the pricing model is per index, and the capacity usage is such that you need an index to access any reasonably sized dataset.

Pay-per-access-pattern is pretty much 180 from what you want for indexing personal data for later random access.

I’m still chewing on it, but given the design goal of cheap-at-rest, and given Dynamo’s requirement to specify all data access patterns I’m considering handrolling indexes and storing them on S3 and bypassing Dynamo entirely.

quoted for truth

I’ve gotten to the point of futzing with DyDb where I need to understand provisioned capacity

In DynamoDB, you specify provisioned throughput requirements in terms of capacity units. Use the following guidelines to determine your provisioned throughput:

One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for items up to 4 KB in size. If you need to read an item that is larger than 4 KB, DynamoDB will need to consume additional read capacity units. The total number of read capacity units required depends on the item size, and whether you want an eventually consistent or strongly consistent read.

One write capacity unit represents one write per second for items up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB will need to consume additional write capacity units. The total number of write capacity units required depends on the item size.

The default setting for ConsistentRead is false.

ReturnConsumedCapacity has the option of INDEXES | TOTAL | NONE. Confusingly INDEXES is the more thorough option including

the aggregate ConsumedCapacity for the operation, together with ConsumedCapacity for each table and secondary index that was accessed.

You can use the Query and Scan operations to retrieve multiple consecutive items from a table or an index, in a single request. With these operations, DynamoDB uses the cumulative size of the processed items to calculate provisioned throughput. For example, if a Query operation retrieves 100 items that are 1 KB each, the read capacity calculation is not (100 × 4 KB) = 100 read capacity units, as if those items had been retrieved individually using GetItem or BatchGetItem. Instead, the total would be only 25 read capacity units, as shown following:

(100 * 1024 bytes = 100 KB) / 4 KB = 25 read capacity units

In summary. It isn’t reasonable to scan a large number of rows. Just like in a real database. Sigh.

Invalid type boto3.dynamodb.Attribute valid types: basestring

(actually never mind, just don’t use the pagination interface with dynamodb it makes everything harder and inscrutable)

Invalid type for parameter FilterExpression, value:
 <boto3.dynamodb.conditions.AttributeExists object at 0x10729fb10>, 
type: <class 'boto3.dynamodb.conditions.AttributeExists'>, 
valid types: <type 'basestring'>

Well that’s annoying.

But that isn’t where our story starts.

Using boto3 to query DynamoDb to find, for example, all the records that have a latitude field you might issue a query like this

resp = table.scan(
    FilterExpression=Attr('lat').exists())

Except DynamoDb is capped at only scanning 1Mb of results per call.
With I ussue the above scan against my db this returns the first 2735 records (out of ~35,000).

If DyDB believes there are more rows matching your scan it will inject LastEvaluatedKey into the results, which you then pass as ExclusiveStartKey to your next scan call. With me?

Except boto3 has pagination built in to handle the bookkeeping, sweet!

Instead of

dynamodb = boto3.resource(...)
table = dynamodb.Table(table_name)
table.scan(FilterExpression=Attr('lat').exists())

we now need

dynamodb = boto3.client(...)
paginator = dynamodb.get_paginator('scan')
iter = paginator.page(
  TableName=table_name
  FilterExpression=Attr('lat').exists()
)

Which brings us back to where we started this story

Invalid type for parameter FilterExpression, value:
 <boto3.dynamodb.conditions.AttributeExists object at 0x10729fb10>, 
type: <class 'boto3.dynamodb.conditions.AttributeExists'>, 
valid types: <type 'basestring'>

After some digging (yay endless twisty maze of metaprogramming!) and staring in disbelief at the documentation it became clear that while
table.scan(FilterExpression= takes an attribute builder object DynamoDB.Paginator.Scan takes a raw DynamoDb “Conditional Expression” string

So you actually need

resp = paginator.paginate(
     TableName=config.dynamo_table,
     FilterExpression='attribute_exists(lat)'
 )

Why the baffling inconsistency? No idea. But boy was it frustrating, and surprising from a library which seems as mature as boto. If was doing this for anything other than my own gratification I would have bailed out at least an hour earlier and just rolled my own pagination.

Addendum. I tried for a while to see if I could get Attr(...) to just return a string representation of itself without success 😦

Addendum 2. Keep running into fscking-weirdisms with the pagination interface, I’m declaring it DOA, rolling your own isn’t that hard.