Last active
July 30, 2025 04:31
-
-
Save mark-cooper/a76ee5bda0ae9a7f67ceeedfb022c890 to your computer and use it in GitHub Desktop.
Revisions
-
mark-cooper revised this gist
Jul 30, 2025 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -42,7 +42,7 @@ err := parseManifest(ctx, manifest, func(entry ManifestEntry) bool { // do something for filename ... filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key)) // preferably upload without writing to disk first ... if err := writeCSVFile(filename, csv); err != nil { log.Printf("Failed to write CSV for %s: %v", e.DataFileS3Key, err) return // probs need to handle an issue better } -
mark-cooper revised this gist
Jul 30, 2025 . 2 changed files with 27 additions and 23 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -23,29 +23,34 @@ func parseManifest(ctx context.Context, manifestBody string, processEntry func(M // Something like: var wg sync.WaitGroup semaphore := make(chan struct{}, 10) // Limit to 10 goroutines at a time err := parseManifest(ctx, manifest, func(entry ManifestEntry) bool { wg.Add(1) go func(e ManifestEntry) { defer wg.Done() semaphore <- struct{}{} defer func() { <-semaphore }() // download file and convert to csv (could potentially be separate operations ...?) csv, err := getExportDataFile(ctx, e.DataFileS3Key) if err != nil { log.Printf("Failed to get export data for %s: %v", e.DataFileS3Key, err) return // probs need to handle an issue better } // do something for filename ... filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key)) // preferably upload without writing to disk first ... if err := writeCSVFile(filename, csv); err != nil { log.Printf("Failed to write CSV for %s: %v", e.DataFileS3Key, err) return // probs need to handle an issue better } log.Printf("Successfully processed %s", e.DataFileS3Key) }(entry) return true }) wg.Wait() This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -14,8 +14,7 @@ And presumably that exportArn is usable for what you're doing ... **Questions re: CSV function.** How many files and how large are the files in the worst case? Because lots of copying into memory may need to process export files by line, append to tmp file and upload when done ... **Template** -
mark-cooper revised this gist
Jul 30, 2025 . 1 changed file with 6 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -11,7 +11,12 @@ exportArn, err := exportTable(ctx, dynamodbClient, tableArn, exportBucket, prefi So it could do something with it, like push it to s3 and you read the export arn from the file we're expecting? And presumably that exportArn is usable for what you're doing ... **Questions re: CSV function.** How many files and how large are the files in the worst case? Because lots of copying into memory will need to limit how many gorountines can run at a time. May need to process export files by line, append to tmp file and upload when done ... **Template** Missing the conditions stuff for IMAGE_URI. Also CSV function doesn't need the DDB envvar. -
mark-cooper revised this gist
Jul 30, 2025 . 1 changed file with 6 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -9,4 +9,9 @@ exportArn, err := exportTable(ctx, dynamodbClient, tableArn, exportBucket, prefi ``` So it could do something with it, like push it to s3 and you read the export arn from the file we're expecting? And presumably that exportArn is usable for what you're doing ... Questions re: CSV function. How many files and how large are the files in the worst case? Because lots of copying into memory will need to limit how many gorountines can run at a time. May need to process export files by line, append to tmp file and upload when done ... -
mark-cooper revised this gist
Jul 30, 2025 . 1 changed file with 8 additions and 8 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -29,20 +29,20 @@ err := parseManifest(ctx, manifest, func(entry ManifestEntry) { go func(e ManifestEntry) { defer wg.Done() // download file and convert to csv (could potentially be separate operations ...?) csv, err := getExportDataFile(ctx, e.DataFileS3Key) if err != nil { log.Printf("Failed to get export data for %s: %v", e.DataFileS3Key, err) return // probs need to handle an issue better } // do something for filename ... filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key)) // preferably upload without writing to disk first ... if err := uploadCSV(filename, csv); err != nil { log.Printf("Failed to upload CSV for %s: %v", e.DataFileS3Key, err) return // probs need to handle an issue better } log.Printf("Successfully processed %s", e.DataFileS3Key) }(entry) -
mark-cooper revised this gist
Jul 30, 2025 . 1 changed file with 7 additions and 7 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -36,13 +36,13 @@ err := parseManifest(ctx, manifest, func(entry ManifestEntry) { return // probs need to handle an issue better } // do something for filename ... filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key)) // preferably upload without writing to disk first ... if err := uploadCSV(filename, csv); err != nil { log.Printf("Failed to upload CSV for %s: %v", e.DataFileS3Key, err) return // probs need to handle an issue better } log.Printf("Successfully processed %s", e.DataFileS3Key) }(entry) -
mark-cooper created this gist
Jul 30, 2025 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,51 @@ // Let's make `parseManifest` accept a callback that takes a `ManifestEntry` (very javascript =). // It just processes lines into manifest entries and hands them off. func parseManifest(ctx context.Context, manifestBody string, processEntry func(ManifestEntry)) error { dec := json.NewDecoder(strings.NewReader(manifestBody)) for { var e ManifestEntry if err := dec.Decode(&e); err == io.EOF { break } else if err != nil { return fmt.Errorf("failed to decode manifest entry: %w", err) } processEntry(e) } return nil } // In handler, update parseManifest // Use a waitgroup and goroutines to process each file -> csv conversion and upload separately // Something like: var wg sync.WaitGroup err := parseManifest(ctx, manifest, func(entry ManifestEntry) { wg.Add(1) go func(e ManifestEntry) { defer wg.Done() // download file and convert to csv (could potentially be separate operations ...?) csv, err := getExportDataFile(ctx, e.DataFileS3Key) if err != nil { log.Printf("Failed to get export data for %s: %v", e.DataFileS3Key, err) return // probs need to handle an issue better } // do something for filename ... filename := fmt.Sprintf("export_%s.csv", extractFileID(e.DataFileS3Key)) // preferably upload without writing to disk first ... if err := uploadCSV(filename, csv); err != nil { log.Printf("Failed to upload CSV for %s: %v", e.DataFileS3Key, err) return // probs need to handle an issue better } log.Printf("Successfully processed %s", e.DataFileS3Key) }(entry) }) wg.Wait() This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,12 @@ I don't dislike the export arn stuff. It's likely ok. Maybe could use some extra validation around the date (be sure we're getting this right) and id (format?). The only alternative that springs to mind would be to do something with the checksum table export function. It has the arn: ```go exportArn, err := exportTable(ctx, dynamodbClient, tableArn, exportBucket, prefix) ``` So it could do something with it, like push it to s3 and you read the export arn from the file we're expecting? And presumably that exportArn is usable for what you're doing ...