Skip to content

Instantly share code, notes, and snippets.

@mark-cooper
Last active July 30, 2025 04:31
Show Gist options
  • Select an option

  • Save mark-cooper/a76ee5bda0ae9a7f67ceeedfb022c890 to your computer and use it in GitHub Desktop.

Select an option

Save mark-cooper/a76ee5bda0ae9a7f67ceeedfb022c890 to your computer and use it in GitHub Desktop.

Revisions

  1. mark-cooper revised this gist Jul 30, 2025. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion CSV_FUNCTION.go
    Original file line number Diff line number Diff line change
    @@ -42,7 +42,7 @@ err := parseManifest(ctx, manifest, func(entry ManifestEntry) bool {
    // do something for filename ...
    filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key))
    // preferably upload without writing to disk first ...
    if err := writeCSVFile(filename, csv); err != nil {
    if err := writeCSVFile(filename, csv); err != nil {
    log.Printf("Failed to write CSV for %s: %v", e.DataFileS3Key, err)
    return // probs need to handle an issue better
    }
  2. mark-cooper revised this gist Jul 30, 2025. 2 changed files with 27 additions and 23 deletions.
    47 changes: 26 additions & 21 deletions CSV_FUNCTION.go
    Original file line number Diff line number Diff line change
    @@ -23,29 +23,34 @@ func parseManifest(ctx context.Context, manifestBody string, processEntry func(M
    // Something like:

    var wg sync.WaitGroup
    semaphore := make(chan struct{}, 10) // Limit to 10 goroutines at a time

    err := parseManifest(ctx, manifest, func(entry ManifestEntry) {
    wg.Add(1)
    go func(e ManifestEntry) {
    defer wg.Done()

    // download file and convert to csv (could potentially be separate operations ...?)
    csv, err := getExportDataFile(ctx, e.DataFileS3Key)
    if err != nil {
    log.Printf("Failed to get export data for %s: %v", e.DataFileS3Key, err)
    return // probs need to handle an issue better
    }
    err := parseManifest(ctx, manifest, func(entry ManifestEntry) bool {
    wg.Add(1)
    go func(e ManifestEntry) {
    defer wg.Done()
    semaphore <- struct{}{}
    defer func() { <-semaphore }()

    // download file and convert to csv (could potentially be separate operations ...?)
    csv, err := getExportDataFile(ctx, e.DataFileS3Key)
    if err != nil {
    log.Printf("Failed to get export data for %s: %v", e.DataFileS3Key, err)
    return // probs need to handle an issue better
    }

    // do something for filename ...
    filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key))
    // preferably upload without writing to disk first ...
    if err := writeCSVFile(filename, csv); err != nil {
    log.Printf("Failed to write CSV for %s: %v", e.DataFileS3Key, err)
    return // probs need to handle an issue better
    }

    log.Printf("Successfully processed %s", e.DataFileS3Key)
    }(entry)

    // do something for filename ...
    filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key))
    // preferably upload without writing to disk first ...
    if err := uploadCSV(filename, csv); err != nil {
    log.Printf("Failed to upload CSV for %s: %v", e.DataFileS3Key, err)
    return // probs need to handle an issue better
    }

    log.Printf("Successfully processed %s", e.DataFileS3Key)
    }(entry)
    return true
    })

    wg.Wait()
    3 changes: 1 addition & 2 deletions FEEDBACK.md
    Original file line number Diff line number Diff line change
    @@ -14,8 +14,7 @@ And presumably that exportArn is usable for what you're doing ...
    **Questions re: CSV function.**

    How many files and how large are the files in the worst case?
    Because lots of copying into memory will need to limit how many gorountines can run at a time.
    May need to process export files by line, append to tmp file and upload when done ...
    Because lots of copying into memory may need to process export files by line, append to tmp file and upload when done ...

    **Template**

  3. mark-cooper revised this gist Jul 30, 2025. 1 changed file with 6 additions and 1 deletion.
    7 changes: 6 additions & 1 deletion FEEDBACK.md
    Original file line number Diff line number Diff line change
    @@ -11,7 +11,12 @@ exportArn, err := exportTable(ctx, dynamodbClient, tableArn, exportBucket, prefi
    So it could do something with it, like push it to s3 and you read the export arn from the file we're expecting?
    And presumably that exportArn is usable for what you're doing ...

    Questions re: CSV function.
    **Questions re: CSV function.**

    How many files and how large are the files in the worst case?
    Because lots of copying into memory will need to limit how many gorountines can run at a time.
    May need to process export files by line, append to tmp file and upload when done ...

    **Template**

    Missing the conditions stuff for IMAGE_URI. Also CSV function doesn't need the DDB envvar.
  4. mark-cooper revised this gist Jul 30, 2025. 1 changed file with 6 additions and 1 deletion.
    7 changes: 6 additions & 1 deletion FEEDBACK.md
    Original file line number Diff line number Diff line change
    @@ -9,4 +9,9 @@ exportArn, err := exportTable(ctx, dynamodbClient, tableArn, exportBucket, prefi
    ```

    So it could do something with it, like push it to s3 and you read the export arn from the file we're expecting?
    And presumably that exportArn is usable for what you're doing ...
    And presumably that exportArn is usable for what you're doing ...

    Questions re: CSV function.
    How many files and how large are the files in the worst case?
    Because lots of copying into memory will need to limit how many gorountines can run at a time.
    May need to process export files by line, append to tmp file and upload when done ...
  5. mark-cooper revised this gist Jul 30, 2025. 1 changed file with 8 additions and 8 deletions.
    16 changes: 8 additions & 8 deletions CSV_FUNCTION.go
    Original file line number Diff line number Diff line change
    @@ -29,20 +29,20 @@ err := parseManifest(ctx, manifest, func(entry ManifestEntry) {
    go func(e ManifestEntry) {
    defer wg.Done()

    // download file and convert to csv (could potentially be separate operations ...?)
    // download file and convert to csv (could potentially be separate operations ...?)
    csv, err := getExportDataFile(ctx, e.DataFileS3Key)
    if err != nil {
    log.Printf("Failed to get export data for %s: %v", e.DataFileS3Key, err)
    return // probs need to handle an issue better
    }

    // do something for filename ...
    filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key))
    // preferably upload without writing to disk first ...
    if err := uploadCSV(filename, csv); err != nil {
    log.Printf("Failed to upload CSV for %s: %v", e.DataFileS3Key, err)
    return // probs need to handle an issue better
    }
    // do something for filename ...
    filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key))
    // preferably upload without writing to disk first ...
    if err := uploadCSV(filename, csv); err != nil {
    log.Printf("Failed to upload CSV for %s: %v", e.DataFileS3Key, err)
    return // probs need to handle an issue better
    }

    log.Printf("Successfully processed %s", e.DataFileS3Key)
    }(entry)
  6. mark-cooper revised this gist Jul 30, 2025. 1 changed file with 7 additions and 7 deletions.
    14 changes: 7 additions & 7 deletions CSV_FUNCTION.go
    Original file line number Diff line number Diff line change
    @@ -36,13 +36,13 @@ err := parseManifest(ctx, manifest, func(entry ManifestEntry) {
    return // probs need to handle an issue better
    }

    // do something for filename ...
    filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key))
    // preferably upload without writing to disk first ...
    if err := uploadCSV(filename, csv); err != nil {
    log.Printf("Failed to upload CSV for %s: %v", e.DataFileS3Key, err)
    return // probs need to handle an issue better
    }
    // do something for filename ...
    filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key))
    // preferably upload without writing to disk first ...
    if err := uploadCSV(filename, csv); err != nil {
    log.Printf("Failed to upload CSV for %s: %v", e.DataFileS3Key, err)
    return // probs need to handle an issue better
    }

    log.Printf("Successfully processed %s", e.DataFileS3Key)
    }(entry)
  7. mark-cooper created this gist Jul 30, 2025.
    51 changes: 51 additions & 0 deletions CSV_FUNCTION.go
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,51 @@
    // Let's make `parseManifest` accept a callback that takes a `ManifestEntry` (very javascript =).
    // It just processes lines into manifest entries and hands them off.

    func parseManifest(ctx context.Context, manifestBody string, processEntry func(ManifestEntry)) error {
    dec := json.NewDecoder(strings.NewReader(manifestBody))

    for {
    var e ManifestEntry
    if err := dec.Decode(&e); err == io.EOF {
    break
    } else if err != nil {
    return fmt.Errorf("failed to decode manifest entry: %w", err)
    }

    processEntry(e)
    }

    return nil
    }

    // In handler, update parseManifest
    // Use a waitgroup and goroutines to process each file -> csv conversion and upload separately
    // Something like:

    var wg sync.WaitGroup

    err := parseManifest(ctx, manifest, func(entry ManifestEntry) {
    wg.Add(1)
    go func(e ManifestEntry) {
    defer wg.Done()

    // download file and convert to csv (could potentially be separate operations ...?)
    csv, err := getExportDataFile(ctx, e.DataFileS3Key)
    if err != nil {
    log.Printf("Failed to get export data for %s: %v", e.DataFileS3Key, err)
    return // probs need to handle an issue better
    }

    // do something for filename ...
    filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key))
    // preferably upload without writing to disk first ...
    if err := uploadCSV(filename, csv); err != nil {
    log.Printf("Failed to upload CSV for %s: %v", e.DataFileS3Key, err)
    return // probs need to handle an issue better
    }

    log.Printf("Successfully processed %s", e.DataFileS3Key)
    }(entry)
    })

    wg.Wait()
    12 changes: 12 additions & 0 deletions FEEDBACK.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,12 @@
    I don't dislike the export arn stuff. It's likely ok.
    Maybe could use some extra validation around the date (be sure we're getting this right) and id (format?).

    The only alternative that springs to mind would be to do something with the checksum table export function.
    It has the arn:

    ```go
    exportArn, err := exportTable(ctx, dynamodbClient, tableArn, exportBucket, prefix)
    ```

    So it could do something with it, like push it to s3 and you read the export arn from the file we're expecting?
    And presumably that exportArn is usable for what you're doing ...