Last active
October 20, 2025 21:57
-
-
Save 0xdevalias/09a5c27702cb94f81c9fb4b7434df966 to your computer and use it in GitHub Desktop.
Revisions
-
0xdevalias revised this gist
Oct 20, 2025 . 1 changed file with 8 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -508,6 +508,14 @@ Some notes on AI / ML tools that seem interesting/useful (largely aiming to focu > - Answer questions about your code's architecture and logic > - Execute and fix tests, lint, and other commands > - Search through git history, resolve merge conflicts, and create commits and PRs - https://github.com/anthropics/claude-cookbooks/tree/main - > Claude Cookbooks - > A collection of notebooks/recipes showcasing some fun and effective ways of using Claude. - > The Claude Cookbooks provide code and guides designed to help developers build with Claude, offering copy-able code snippets that you can easily integrate into your own projects. - https://github.com/anthropics/skills - > Skills - > Public repository for Skills - > Skills are folders of instructions, scripts, and resources that Claude loads dynamically to improve performance on specialized tasks. Skills teach Claude how to complete specific tasks in a repeatable way, whether that's creating documents with your company's brand guidelines, analyzing data using your organization's specific workflows, or automating personal tasks. ### OpenAI Codex CLI -
0xdevalias revised this gist
Oct 20, 2025 . 1 changed file with 47 additions and 5 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,11 +1,12 @@ # AI / ML Toolkit Some notes on AI / ML tools that seem interesting/useful (largely aiming to focus on open source tools) ## Table of Contents <!-- TOC start (generated with https://bitdowntoc.derlin.ch/) --> - [See Also](#see-also) - [My Other Related Deepdive Gist's and Projects](#my-other-related-deepdive-gists-and-projects) - [OpenRouter](#openrouter) - [ollama](#ollama) - [llama.cpp](#llamacpp) @@ -59,7 +60,9 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [Audio Separation / Stem Splitting / Sound Demixing / Music Source Separation](#audio-separation--stem-splitting--sound-demixing--music-source-separation) - [Audio Super Resolution](#audio-super-resolution) - [Unsorted](#unsorted-3) - [See Also](#see-also-1) - [Infrastructure and Hosting (Cloud GPUs, etc)](#infrastructure-and-hosting-cloud-gpus-etc) - [Runpod](#runpod) - [Vector Databases/Search, Similarity Search, Clustering, etc](#vector-databasessearch-similarity-search-clustering-etc) - [Faiss](#faiss) - [Benchmarks / Leaderboards](#benchmarks--leaderboards) @@ -70,8 +73,12 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [Unsorted](#unsorted-5) <!-- TOC end --> ## See Also ### My Other Related Deepdive Gist's and Projects - https://github.com/0xdevalias - https://gist.github.com/0xdevalias - [AI Agent Rule / Instruction / Context files / etc (0xdevalias' gist)](https://gist.github.com/0xdevalias/f40bc5a6f84c4c5ad862e314894b2fa6#ai-agent-rule--instruction--context-files--etc) - [Model Context Protocol (MCP) Tools (0xdevalias' gist)](https://gist.github.com/0xdevalias/86404c0a472e93109507a483a6cc6065#model-context-protocol-mcp-tools) - [AI Agent Swarm Musings (0xdevalias' gist)](https://gist.github.com/0xdevalias/4ce1ecd18b3a20ea6a9e58b1a2881875#ai-agent-swarm-musings) @@ -1685,6 +1692,41 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [Singing Voice Synthesizers (eg. Vocaloid, etc) (0xdevalias' gist)](https://gist.github.com/0xdevalias/0b64b25d72cbbc784042a9fdff713129#singing-voice-synthesizers-eg-vocaloid-etc) - [Generating Synth Patches with AI (0xdevalias' gist)](https://gist.github.com/0xdevalias/5a06349b376d01b2a76ad27a86b08c1b#generating-synth-patches-with-ai) ## Infrastructure and Hosting (Cloud GPUs, etc) TODO: Fill this section out with more details. ### Runpod - https://www.runpod.io/ - > Runpod - > AI infrastructure developers trust - > Everything you need to train, deploy, and scale AI all in one place. - https://www.runpod.io/pricing - https://www.runpod.io/changelog - > Runpod Changelog - > Release notes on what's new, improved, and fixed - https://www.runpod.io/product/runpod-hub - > Runpod Hub - > The fastest way to fork and deploy open-source AI. - > Customize, launch, and contribute to open-source packagesβfrom GitHub to production - https://www.runpod.io/product/cloud-gpus - > Cloud GPUs - > High-performance GPUs on demand. - > Run AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. - https://www.runpod.io/product/serverless - > Serverless - > Dedicated Serverless GPU API endpoints - > Skip the infrastructure headaches. Our auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating. - https://www.runpod.io/product/instant-clusters - > Instant Clusters - > 3,200 Gbps Infiniband GPU Clusters On-Demand - > Launch high-performance multi-node GPU clusters for AI, ML, LLMs, and HPC workloadsβfully optimized, rapidly deployed, and cost-effective. - https://docs.runpod.io/overview - > Docs - > Explore our guides and examples to deploy your AI/ML application on Runpod. - > Runpod is a cloud computing platform built for AI, machine learning, and general compute needs. Whether youβre running deep learning models, training AI, or deploying cloud-based applications, Runpod provides scalable, high-performance GPU and CPU resources to power your workloads. ## Vector Databases/Search, Similarity Search, Clustering, etc - See Also: -
0xdevalias revised this gist
Sep 12, 2025 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -73,6 +73,7 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus ## Some of my other related gists - [AI Agent Rule / Instruction / Context files / etc (0xdevalias' gist)](https://gist.github.com/0xdevalias/f40bc5a6f84c4c5ad862e314894b2fa6#ai-agent-rule--instruction--context-files--etc) - [Model Context Protocol (MCP) Tools (0xdevalias' gist)](https://gist.github.com/0xdevalias/86404c0a472e93109507a483a6cc6065#model-context-protocol-mcp-tools) - [AI Agent Swarm Musings (0xdevalias' gist)](https://gist.github.com/0xdevalias/4ce1ecd18b3a20ea6a9e58b1a2881875#ai-agent-swarm-musings) - [ChatGPT / AI Rental Property Plugins/Agents (0xdevalias' gist)](https://gist.github.com/0xdevalias/18e666bc319b2e08f90e52bb5cb53538#chatgpt--ai-rental-property-pluginsagents) -
0xdevalias revised this gist
Sep 12, 2025 . 1 changed file with 2 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -240,6 +240,8 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus ### Model Context Protocol (MCP) - See Also: - [Model Context Protocol (MCP) Tools (0xdevalias' gist)](https://gist.github.com/0xdevalias/86404c0a472e93109507a483a6cc6065#model-context-protocol-mcp-tools) - https://github.com/modelcontextprotocol - > Model Context Protocol - > A protocol for seamless integration between LLM applications and external data sources -
0xdevalias revised this gist
Jul 27, 2025 . 1 changed file with 2 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -235,6 +235,8 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/ - > Announcing the Agent2Agent Protocol (A2A) - > A2A is an open protocol that complements Anthropic's Model Context Protocol (MCP), which provides helpful tools and context to agents. Drawing on Google's internal expertise in scaling agentic systems, we designed the A2A protocol to address the challenges we identified in deploying large-scale, multi-agent systems for our customers. A2A empowers developers to build agents capable of connecting with any other agent built using the protocol and offers users the flexibility to combine agents from various providers. Critically, businesses benefit from a standardized method for managing their agents across diverse platforms and cloud environments. We believe this universal interoperability is essential for fully realizing the potential of collaborative AI agents. - https://developers.googleblog.com/en/google-cloud-donates-a2a-to-linux-foundation/ - > Google Cloud donates A2A to Linux Foundation ### Model Context Protocol (MCP) -
0xdevalias revised this gist
Jun 12, 2025 . 1 changed file with 3 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1682,6 +1682,8 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus ## Vector Databases/Search, Similarity Search, Clustering, etc - See Also: - [Vector Embedding Databases (0xdevalias' gist - subsection)](https://gist.github.com/0xdevalias/31c6574891db3e36f15069b859065267#vector-embedding-databases) - TODO: add more things here ### Faiss @@ -1696,6 +1698,7 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - See also: - [Agent Benchmarks / Leaderboards](#agent-benchmarks--leaderboards) - [Code Embeddings - Benchmarks / Leaderboards / etc (0xdevalias' gist - subsection)](https://gist.github.com/0xdevalias/31c6574891db3e36f15069b859065267#benchmarks--leaderboards--etc) - https://chat.lmsys.org/ - > LMSYS Chatbot Arena Leaderboard - https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard -
0xdevalias revised this gist
May 24, 2025 . 1 changed file with 5 additions and 5 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1583,6 +1583,11 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - > Datasets - https://paperswithcode.com/task/music-source-separation#papers-list - > Papers - https://transactions.ismir.net/search?q=Music%20Source%20Separation - > Searching for: Music Source Separation - https://transactions.ismir.net/articles/10.5334/tismir.171 - > The Sound Demixing Challenge 2023 β Music Demixing Track - > This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDXβ23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding. We describe the methods that achieved the highest scores in the competition. Moreover, we present a direct comparison with the previous edition of the challenge (the Music Demixing Challenge 2021): the best performing system achieved an improvement of over 1.6dB in signal-to-distortion ratio over the winner of the previous competition, when evaluated on MDXDB21. Besides relying on the signal-to-distortion ratio as objective metric, we also performed a listening test with renowned producers and musicians to study the perceptual quality of the systems and report here the results. Finally, we provide our insights into the organization of the competition and our prospects for future editions. - https://docs.google.com/document/d/17fjNvJzj8ZGSer7c7OFe_CNfUKbAxEh_OBv94ZdRG5c/edit?tab=t.0#heading=h.roiuj54hzww3 - > Instrumental, vocal & other stems separation & mix/master guide - UVR/MDX/Demucs/GSEP & others - https://ultimatevocalremover.com/ @@ -1650,11 +1655,6 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - > Important: As I am no longer working at Meta, this repository is not maintained anymore. I've created a fork at github.com/adefossez/demucs. Note that this project is not actively maintained anymore and only important bug fixes will be processed on the new repo. - https://github.com/adefossez/demucs - > This is the officially maintained Demucs now that I (Alexandre DΓ©fossez) have left Meta to join Kyutai. Note that I'm not actively working on Demucs anymore, so expect slow replies and no new feature for now. ### Audio Super Resolution -
0xdevalias revised this gist
May 24, 2025 . 1 changed file with 147 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -56,6 +56,7 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [Stable Audio](#stable-audio) - [AudioCraft: MusicGen, AudioGen, etc](#audiocraft-musicgen-audiogen-etc) - [Neural Audio Codecs](#neural-audio-codecs) - [Audio Separation / Stem Splitting / Sound Demixing / Music Source Separation](#audio-separation--stem-splitting--sound-demixing--music-source-separation) - [Audio Super Resolution](#audio-super-resolution) - [Unsorted](#unsorted-3) - [See Also](#see-also) @@ -1509,6 +1510,152 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - > High Fidelity Neural Audio Compression - > We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary that efficiently reduces artifacts and produce high-quality samples. We introduce a novel loss balancer mechanism to stabilize training: the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss. Finally, we study how lightweight Transformer models can be used to further compress the obtained representation by up to 40%, while staying faster than real time. We provide a detailed description of the key design choices of the proposed model including: training objective, architectural changes and a study of various perceptual loss functions. We present an extensive subjective evaluation (MUSHRA tests) together with an ablation study for a range of bandwidths and audio domains, including speech, noisy-reverberant speech, and music. Our approach is superior to the baselines methods across all evaluated settings, considering both 24 kHz monophonic and 48 kHz stereophonic audio. ### Audio Separation / Stem Splitting / Sound Demixing / Music Source Separation - https://mvsep.com/ - > Music & Voice Separation > MVSEP performs separation of audio on voice and music parts - https://mvsep.com/en/quality_checker - > Quality Checker - > On this page you can find a tools for checking the quality of models for splitting tracks into different stems like vocal, bass, drums etc. As well as a table of the last performed checks. - https://mvsep.com/en/quality - > Comparison of algorithms quality - > There are a lot of algorithms at MVSep now. Which algorithm to choose? > - If you need good isolated vocals or instrumental then use one of: Ultimate Vocal Remover HQ, MDX-B, Demucs3 (Model B) > - If you need good bass, drums, other: Demucs3 (Model B) > > For comparsion of algorithm we use SDR (signal-to-distortion ratio) metric. The larger the metric the better the result of algorithm. - https://mvsep.com/quality_checker/synth_leaderboard - > Synth Leaderboard (Full) - https://mvsep.com/quality_checker/synth_leaderboard?ensemble=0 - > Synth Leaderboard (Single Models) - https://mvsep.com/quality_checker/synth_leaderboard?ensemble=1 - > Synth Leaderboard (Ensemble) - https://mvsep.com/quality_checker/multisong_leaderboard - > Multisong Leaderboard - https://mvsep.com/quality_checker/multisong_leaderboard?sort=instrum - > Multisong Leaderboard (Instrum) - https://mvsep.com/quality_checker/multisong_leaderboard?sort=vocals - > Multisong Leaderboard (Vocals) - https://mvsep.com/quality_checker/multisong_leaderboard?sort=bass - > Multisong Leaderboard (Bass) - https://mvsep.com/quality_checker/multisong_leaderboard?sort=drums - > Multisong Leaderboard (Drums) - https://mvsep.com/quality_checker/multisong_leaderboard?sort=other - > Multisong Leaderboard (Other) - https://mvsep.com/quality_checker/other_leaderboards - > Other Leaderboards - Piano - Lead/Back Vocals - Guitar - Medley Vox - Strings - Wind - DNR v3 Test - Super Resolution Checker for Music - Drums Separation (5 stems) - Male/Female vocals separation - https://paperswithcode.com/task/music-source-separation - > Music Source Separation - > Music source separation is the task of decomposing music into its constitutive components, e. g., yielding separated stems for the vocals, bass, and drums. - https://paperswithcode.com/task/music-source-separation#benchmarks - > Benchmarks > > These leaderboards are used to track progress in Music Source Separation > > - MUSDB18 > - MUSDB18-HQ > - Slakh2100 - https://paperswithcode.com/sota/music-source-separation-on-musdb18 - > Music Source Separation on MUSDB18 - > Leaderboard - https://paperswithcode.com/sota/music-source-separation-on-musdb18-hq - > Music Source Separation on MUSDB18-HQ - > Leaderboard - https://paperswithcode.com/sota/music-source-separation-on-slakh2100 - > Music Source Separation on Slakh2100 - > Leaderboard - https://paperswithcode.com/task/music-source-separation#task-libraries - > Libraries > > Use these libraries to find Music Source Separation models and implementations - https://paperswithcode.com/task/music-source-separation#datasets - > Datasets - https://paperswithcode.com/task/music-source-separation#papers-list - > Papers - https://docs.google.com/document/d/17fjNvJzj8ZGSer7c7OFe_CNfUKbAxEh_OBv94ZdRG5c/edit?tab=t.0#heading=h.roiuj54hzww3 - > Instrumental, vocal & other stems separation & mix/master guide - UVR/MDX/Demucs/GSEP & others - https://ultimatevocalremover.com/ - > Ultimate Vocal Remover v5 - https://github.com/Anjok07/ultimatevocalremovergui - > Ultimate Vocal Remover GUI - > GUI for a Vocal Remover that uses Deep Neural Networks. - > This application uses state-of-the-art source separation models to remove vocals from audio files. UVR's core developers trained all of the models provided in this package (except for the Demucs v3 and v4 4-stem models). - https://github.com/Anjok07/ultimatevocalremovergui/issues/344 - > The answer to the most asked question: What is the model which provides the best results? \[Read this, very important info inside!\] - https://github.com/Anjok07/ultimatevocalremovergui/issues/430 - > BSRNN code is release - > Band-Split RNN code is release : yoongi43/music_source_separation - https://github.com/yoongi43/music_source_separation - > Models > - Band-split rnn > - Band-split Conformer - > Would love to see this added, as it's currently top of the MUSDB18 benchmarks for vocals: > > - https://paperswithcode.com/sota/music-source-separation-on-musdb18 > - **SDR (vocals):** 10.47 > - https://paperswithcode.com/sota/music-source-separation-on-musdb18-hq > - **SDR (vocals):** 10.47 > > **See also:** > > - https://github.com/amanteur/BandSplitRNN-Pytorch > - > Unofficial PyTorch implementation of Music Source Separation with Band-split RNN > - https://github.com/crlandsc/music-demixing-with-band-split-rnn > - > An unofficial PyTorch implementation of Music Source Separation with Band-split RNN for MDX-23 ("Label Noise" Track) - https://buymeacoffee.com/uvr5/vip-model-download-instructions - > VIP Model Download Instructions: > > - Make you have UVR v5.4 installed > - Click the "Settings" button (it's the wrench icon to the left of the conversion button) > - Go to the "Download Center" tab > - Click the button with the key icon at the bottom. > - Input the following and make sure "VIP" is in all caps - > - User Code: VIP > - Download Code: `02aeb35c203ed0a9` > > Now you will see the VIP models available for download! - https://github.com/stemrollerapp/stemroller - > StemRoller > > StemRoller is the first free app which enables you to separate vocal and instrumental stems from any song with a single click! StemRoller uses Facebook's state-of-the-art Demucs algorithm for demixing songs and integrates search results from YouTube. > > Simply type the name/artist of any song into the search bar and click the Split button that appears in the results! You'll need to wait several minutes for splitting to complete. Once stems have been extracted, you'll see an Open button next to the song - click that to access your stems! - https://audiostrip.co.uk/#isolate - > AudioStrip - > Near Perfect Instrumental And Vocal Isolation For Free! - https://www.lalal.ai/ - > LALAL.AI - > Extract vocal, accompaniment and various instruments from any audio and video - > A next-generation vocal remover and music source separation service for fast, easy and precise stem extraction. Remove vocal, instrumental, drums, bass, piano, electric guitar, acoustic guitar, and synthesizer tracks without quality loss. - https://www.lalal.ai/apps-and-plugins/ - > AI Vocal Remover App & Plugins > Enhance your audio and video editing with the powerful AI tools available across multiple platforms. Extract 10 stems and remove noise on iOS, Android, Windows, macOS, and Linux. - https://vocalremover.org/ - > Vocal Remover and Isolation - > Separate voice from music out of a song free with powerful AI algorithms - https://github.com/facebookresearch/demucs - > Demucs Music Source Separation - > Code for the paper Hybrid Spectrogram and Waveform Source Separation - > Important: As I am no longer working at Meta, this repository is not maintained anymore. I've created a fork at github.com/adefossez/demucs. Note that this project is not actively maintained anymore and only important bug fixes will be processed on the new repo. - https://github.com/adefossez/demucs - > This is the officially maintained Demucs now that I (Alexandre DΓ©fossez) have left Meta to join Kyutai. Note that I'm not actively working on Demucs anymore, so expect slow replies and no new feature for now. - https://transactions.ismir.net/search?q=Music%20Source%20Separation - > Searching for: Music Source Separation - https://transactions.ismir.net/articles/10.5334/tismir.171 - > The Sound Demixing Challenge 2023 β Music Demixing Track - > This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDXβ23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding. We describe the methods that achieved the highest scores in the competition. Moreover, we present a direct comparison with the previous edition of the challenge (the Music Demixing Challenge 2021): the best performing system achieved an improvement of over 1.6dB in signal-to-distortion ratio over the winner of the previous competition, when evaluated on MDXDB21. Besides relying on the signal-to-distortion ratio as objective metric, we also performed a listening test with renowned producers and musicians to study the perceptual quality of the systems and report here the results. Finally, we provide our insights into the organization of the competition and our prospects for future editions. ### Audio Super Resolution - https://github.com/haoheliu/versatile_audio_super_resolution -
0xdevalias revised this gist
Apr 26, 2025 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -71,6 +71,7 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus ## Some of my other related gists - [AI Agent Rule / Instruction / Context files / etc (0xdevalias' gist)](https://gist.github.com/0xdevalias/f40bc5a6f84c4c5ad862e314894b2fa6#ai-agent-rule--instruction--context-files--etc) - [AI Agent Swarm Musings (0xdevalias' gist)](https://gist.github.com/0xdevalias/4ce1ecd18b3a20ea6a9e58b1a2881875#ai-agent-swarm-musings) - [ChatGPT / AI Rental Property Plugins/Agents (0xdevalias' gist)](https://gist.github.com/0xdevalias/18e666bc319b2e08f90e52bb5cb53538#chatgpt--ai-rental-property-pluginsagents) -
0xdevalias revised this gist
Apr 20, 2025 . 1 changed file with 50 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -8,6 +8,8 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [Some of my other related gists](#some-of-my-other-related-gists) - [OpenRouter](#openrouter) - [ollama](#ollama) - [llama.cpp](#llamacpp) - [node-llama-cpp](#node-llama-cpp) - [vLLM](#vllm) - [LiteLLM](#litellm) - [Protocols / Standards / etc](#protocols--standards--etc) @@ -131,6 +133,54 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://ollama.ai/blog/python-javascript-libraries - > Python & JavaScript Libraries ## llama.cpp - https://github.com/ggml-org/llama.cpp - > `llama.cpp` - > Inference of Meta's LLaMA model (and others) in pure C/C++ - > LLM inference in C/C++ - > The main goal of `llama.cpp` is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud. > > - Plain C/C++ implementation without any dependencies > - Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks > - AVX, AVX2, AVX512 and AMX support for x86 architectures > - 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use > - Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads MTT GPUs via MUSA) > - Vulkan and SYCL backend support > - CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity - https://github.com/ggml-org/llama.cpp#supported-backends - > Supported Backends - https://github.com/ggml-org/llama.cpp#llama-cli - > `llama-cli` - > A CLI tool for accessing and experimenting with most of llama.cpp's functionality. - https://github.com/ggml-org/llama.cpp#llama-server - > `llama-server` - > A lightweight, OpenAI API compatible, HTTP server for serving LLMs. - https://github.com/ggml-org/llama.cpp#llama-perplexity - > `llama-perplexity` - > A tool for measuring the perplexity (and other quality metrics) of a model over a given text. - https://github.com/ggml-org/llama.cpp#llama-bench - > `llama-bench` - > Benchmark the performance of the inference for various parameters. - https://github.com/ggml-org/llama.cpp#llama-run - > `llama-run` - > A comprehensive example for running llama.cpp models. Useful for inferencing. Used with RamaLama - https://github.com/ggml-org/llama.cpp#llama-simple - > `llama-simple` - > A minimal example for implementing apps with llama.cpp. Useful for developers. ### node-llama-cpp - https://github.com/withcatai/node-llama-cpp - > `node-llama-cpp` > Run AI models locally on your machine - > Pre-built bindings are provided with a fallback to building from source with cmake - > Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level - https://node-llama-cpp.withcat.ai/ - > node-llama-cpp - > Run AI models locally on your machine - > node.js bindings for llama.cpp, and much more ## vLLM - https://github.com/vllm-project/vllm -
0xdevalias revised this gist
Apr 20, 2025 . 1 changed file with 8 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -471,6 +471,14 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus > This guide walks you through connecting OpenAI Codex to LiteLLM. - https://github.com/BerriAI/litellm/discussions/10156 - > LiteLLM / OpenAI Codex Discussion and Support - https://github.com/ymichael/open-codex - > Open Codex CLI - > **Important Note**: This is a fork of the [original OpenAI Codex CLI](https://github.com/openai/codex) with expanded model support and changed installation instructions. The main differences in this fork are: > > - Support for multiple AI providers (OpenAI, Gemini, OpenRouter, Ollama) > - Uses the [Chat Completion API instead of the Responses API](https://platform.openai.com/docs/guides/responses-vs-chat-completions) which allows us to support any openai compatible provider and model. > - All other functionality remains similar to the original project > - You can install this fork globally with `npm i -g open-codex` - https://github.com/lolrazh/codex - > OpenAI Codex CLI (Open Responses Fork) - > A fork of [openai/codex](https://github.com/openai/codex) integrated with [Julep's Open Responses API](https://docs.julep.ai/responses/quickstart) -
0xdevalias revised this gist
Apr 20, 2025 . 1 changed file with 8 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -466,6 +466,14 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus > Weβre excited to launch a $1 million initiative supporting open source projects to use Codex CLI and OpenAI models. Applications will be reviewed on an ongoing basis, with projects receiving grants in increments of $25,000 in API credits. > > If youβre interested in participating, please fill out the form below. - https://docs.litellm.ai/docs/tutorials/openai_codex - > Using LiteLLM with OpenAI Codex > This guide walks you through connecting OpenAI Codex to LiteLLM. - https://github.com/BerriAI/litellm/discussions/10156 - > LiteLLM / OpenAI Codex Discussion and Support - https://github.com/lolrazh/codex - > OpenAI Codex CLI (Open Responses Fork) - > A fork of [openai/codex](https://github.com/openai/codex) integrated with [Julep's Open Responses API](https://docs.julep.ai/responses/quickstart) ### aider -
0xdevalias revised this gist
Apr 20, 2025 . 1 changed file with 2 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -693,6 +693,8 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus ### Continue - Custom AI Code Assistant - https://github.com/continuedev/continue - > Continue enables developers to create, share, and use custom AI code assistants with our open-source VS Code and JetBrains extensions and hub of models, rules, prompts, docs, and other building blocks - https://www.continue.dev/ - > Amplified developers, AI-native development - > Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks -
0xdevalias revised this gist
Apr 20, 2025 . 1 changed file with 66 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -27,6 +27,9 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [OpenAI Codex CLI](#openai-codex-cli) - [aider](#aider) - [llm](#llm) - [Continue - Custom AI Code Assistant](#continue---custom-ai-code-assistant) - [Cline](#cline) - [Roo-Code (formerly Roo Cline)](#roo-code-formerly-roo-cline) - [Autogen / FLAML / etc](#autogen--flaml--etc) - [OpenHands (formerly OpenDevin)](#openhands-formerly-opendevin) - [SWE-agent](#swe-agent) @@ -688,6 +691,69 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://simonwillison.net/2024/Mar/26/llm-cmd/ - > I just released a neat new plugin for my LLM command-line tool: `llm-cmd`. It lets you run a command to to generate a further terminal command, review and edit that command, then hit `<enter>` to execute it or `<ctrl-c>` to cancel. ### Continue - Custom AI Code Assistant - https://www.continue.dev/ - > Amplified developers, AI-native development - > Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks - > Make your own custom AI code assistants - https://www.continue.dev/amplified - > Amplified developers, AI-enhanced development - > We've worked together with dozens of engineers, platform teams, and others to sketch a path toward a future where developers are amplified, not automated. You can read, suggest edits, and add support for the open document at amplified.dev. - https://amplified.dev/ - > We believe in a future where developers are amplified, not automated - https://docs.continue.dev/ - > Continue enables developers to create, share, and use custom AI code assistants with our open-source VS Code and JetBrains extensions and hub of models, rules, prompts, docs, and other building blocks - https://docs.continue.dev/reference - > `config.yaml` Reference - > Continue hub assistants are defined using the `config.yaml` specification. Assistants can be loaded from the Hub or locally - https://hub.continue.dev/explore/assistants - > Assistants > Custom AI code assistants are configurations of building blocks that enable you to receive assistance tailored to your specific use cases - > Explore custom AI code assistants - https://docs.continue.dev/hub/assistants/intro - > Introduction to Assistants - > Custom AI code assistants are configurations of building blocks that enable a coding experience tailored to your specific use cases. > > `config.yaml` is a format for defining custom AI code assistants. An assistant has some top-level properties (e.g. name, version), but otherwise consists of composable lists of blocks such as models and rules, which are the atomic building blocks of an assistant. > > The `config.yaml` is parsed by the open-source Continue IDE extensions to create custom assistant experiences. When you log in to hub.continue.dev, your assistants will automatically be synced with the IDE extensions. - https://blog.continue.dev/ - > The custom AI code assistant blog - https://blog.continue.dev/continue-1-0/ - > Continue 1.0 - > With Continue 1.0, our community has helped us define seven building blocks so far: [models](https://hub.continue.dev/explore/models), [rules](https://hub.continue.dev/explore/rules), [context](https://hub.continue.dev/explore/context), [docs](https://hub.continue.dev/explore/docs), [prompts](https://hub.continue.dev/explore/prompts), [data](https://hub.continue.dev/explore/data), and [MCP](https://hub.continue.dev/explore/mcp). These blocks will evolve over time, and new ones will emerge, as developers determine the customizations they want and need. - > With Continue 1.0, we are standardizing on [config.yaml](https://docs.continue.dev/yaml-reference) as our packaging format. Our approach to packaging configurations is through an open format that we plan to evolve over time. - https://blog.continue.dev/transforming-code-search-with-voyage-ai-why-your-continue-assistant-needs-better-embeddings-and-reranking/ - > Transforming Code Search with Voyage AI: Why Your Continue Assistant Needs Better Embedding Models and Rerankers ### Cline - https://github.com/cline/cline - > Cline - > Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way. ### Roo-Code (formerly Roo Cline) - https://roocode.com/ - > Your AI-Powered Dev Team, Right in Your Editor. - > Supercharge your editor with AI that understands your codebase, streamlines development, and helps you write, refactor, and debug with ease. - https://roocode.com/evals - > Evals - > Roo Code tests each frontier model against a suite of hundreds of exercises across 5 programming languages with varying difficulty. These results can help you find the right price-to-intelligence ratio for your use case. - https://github.com/RooVetGit/Roo-Code - > Roo Code (prev. Roo Cline) gives you a whole dev team of AI agents in your code editor. - > Roo Code is an AI-powered autonomous coding agent that lives in your editor. It can: > > - Communicate in natural language > - Read and write files directly in your workspace > - Run terminal commands > - Automate browser actions > - Integrate with any OpenAI-compatible or custom API/model > - Adapt its βpersonalityβ and capabilities through Custom Modes > > Whether youβre seeking a flexible coding partner, a system architect, or specialized roles like a QA engineer or product manager, Roo Code can help you build software more efficiently. ### Autogen / FLAML / etc - https://github.com/microsoft/autogen -
0xdevalias revised this gist
Apr 20, 2025 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -18,7 +18,7 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [Vercel AI SDK / Toolkit](#vercel-ai-sdk--toolkit) - [GenKit](#genkit) - [LangChain, LangServe, LangSmith, LangFlow, LangGraph, etc](#langchain-langserve-langsmith-langflow-langgraph-etc) - [AI Agents / Assistants / etc](#ai-agents--assistants--etc) - [Agent Benchmarks / Leaderboards](#agent-benchmarks--leaderboards) - [OpenAI Assistants / ChatGPT custom GPTs](#openai-assistants--chatgpt-custom-gpts) - [OpenGPTs](#opengpts) @@ -323,7 +323,7 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://docs.langflow.org/guides/langfuse_integration - > Integrating Langfuse with Langflow ## AI Agents / Assistants / etc ### Agent Benchmarks / Leaderboards -
0xdevalias revised this gist
Apr 20, 2025 . 1 changed file with 111 additions and 39 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -10,10 +10,14 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [ollama](#ollama) - [vLLM](#vllm) - [LiteLLM](#litellm) - [Protocols / Standards / etc](#protocols--standards--etc) - [Agent2Agent Protocol (A2A)](#agent2agent-protocol-a2a) - [Model Context Protocol (MCP)](#model-context-protocol-mcp) - [SDKs / Toolkits / etc](#sdks--toolkits--etc) - [Google Agent Development Kit (ADK)](#google-agent-development-kit-adk) - [Vercel AI SDK / Toolkit](#vercel-ai-sdk--toolkit) - [GenKit](#genkit) - [LangChain, LangServe, LangSmith, LangFlow, LangGraph, etc](#langchain-langserve-langsmith-langflow-langgraph-etc) - [AI Agents / etc](#ai-agents--etc) - [Agent Benchmarks / Leaderboards](#agent-benchmarks--leaderboards) - [OpenAI Assistants / ChatGPT custom GPTs](#openai-assistants--chatgpt-custom-gpts) @@ -158,7 +162,112 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - > LiteLLM - > Call all LLM APIs using the OpenAI format (Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.) ## Protocols / Standards / etc ### Agent2Agent Protocol (A2A) - https://google.github.io/A2A/#/ - https://github.com/google/A2A - > A2A protocol - > An open protocol enabling communication and interoperability between opaque agentic applications. - https://github.com/google/A2A/tree/main/samples - https://github.com/google/A2A/blob/main/samples/python/agents/google_adk/README.md - > Google Agent Development Kit (ADK) - https://github.com/google/A2A/blob/main/samples/python/agents/langgraph/README.md - > LangGraph - https://github.com/google/A2A/blob/main/samples/js/src/agents/README.md - Firebase GenKit - https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/ - > Announcing the Agent2Agent Protocol (A2A) - > A2A is an open protocol that complements Anthropic's Model Context Protocol (MCP), which provides helpful tools and context to agents. Drawing on Google's internal expertise in scaling agentic systems, we designed the A2A protocol to address the challenges we identified in deploying large-scale, multi-agent systems for our customers. A2A empowers developers to build agents capable of connecting with any other agent built using the protocol and offers users the flexibility to combine agents from various providers. Critically, businesses benefit from a standardized method for managing their agents across diverse platforms and cloud environments. We believe this universal interoperability is essential for fully realizing the potential of collaborative AI agents. ### Model Context Protocol (MCP) - https://github.com/modelcontextprotocol - > Model Context Protocol - > A protocol for seamless integration between LLM applications and external data sources - https://modelcontextprotocol.io/introduction - > Introduction - > Get started with the Model Context Protocol (MCP) - > MCP is an open protocol that standardizes how applications provide context to LLMs. - https://docs.anthropic.com/en/docs/agents-and-tools/mcp - > Agents and tools > Model Context Protocol (MCP) - > MCP is an open protocol that standardizes how applications provide context to LLMs. - https://www.anthropic.com/news/model-context-protocol - > Introducing the Model Context Protocol - > Today, we're open-sourcing the Model Context Protocol (MCP), a new standard for connecting AI assistants to the systems where data lives, including content repositories, business tools, and development environments. Its aim is to help frontier models produce better, more relevant responses. - > It provides a universal, open standard for connecting AI systems with data sources, replacing fragmented integrations with a single protocol. The result is a simpler, more reliable way to give AI systems access to the data they need. ## SDKs / Toolkits / etc ### Google Agent Development Kit (ADK) - https://developers.googleblog.com/en/agent-development-kit-easy-to-build-multi-agent-applications/ - > Agent Development Kit: Making it easy to build multi-agent applications - https://google.github.io/adk-docs/ - > Agent Development Kit > An open-source AI agent framework integrated with Gemini and Google - > What is Agent Development Kit? > Agent Development Kit (ADK) is a flexible and modular framework for developing and deploying AI agents. ADK can be used with popular LLMs and open-source generative AI tools and is designed with a focus on tight integration with the Google ecosystem and Gemini models. ADK makes it easy to get started with simple agents powered by Gemini models and Google AI tools while providing the control and structure needed for more complex agent architectures and orchestration. - https://github.com/google/adk-docs - > adk-docs - https://cloud.google.com/vertex-ai/generative-ai/docs/agent-development-kit/quickstart - > Vertex AI: Quickstart: Build an agent with the Agent Development Kit - > This quickstart guides you through setting up your Google Cloud project, installing the Agent Development Kit (ADK), setting up a basic agent, and running its developer user interface. - https://github.com/google/adk-python - > Agent Development Kit (ADK) - > An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control. - https://github.com/google/adk-samples - > Agent Development Kit (ADK) Samples - > A collection of sample agents built with Agent Development (ADK) - > Welcome to the Sample Agents repository! This collection provides ready-to-use agents built on top of the Agent Development Kit, designed to accelerate your development process. These agents cover a range of common use cases and complexities, from simple conversational bots to complex multi-agent workflows. ### Vercel AI SDK / Toolkit - https://github.com/vercel/ai - > AI SDK > The AI SDK is a TypeScript toolkit designed to help you build AI-powered applications using popular frameworks like Next.js, React, Svelte, Vue and runtimes like Node.js. - https://github.com/vercel/ai#ai-sdk-core - > AI SDK Core > The AI SDK Core module provides a unified API to interact with model providers like OpenAI, Anthropic, Google, and more. > > You will then install the model provider of your choice. - https://github.com/vercel/ai#ai-sdk-ui - > AI SDK UI > The AI SDK UI module provides a set of hooks that help you build chatbots and generative user interfaces. These hooks are framework agnostic, so they can be used in Next.js, React, Svelte, and Vue. > > You need to install the package for your framework - https://sdk.vercel.ai/ - > The AI Toolkit for TypeScript > From the creators of Next.js, the AI SDK is a free open-source library that gives you the tools you need to build AI-powered products. - https://sdk.vercel.ai/docs/introduction - > AI SDK > The AI SDK is the TypeScript toolkit designed to help developers build AI-powered applications and agents with React, Next.js, Vue, Svelte, Node.js, and more. - https://sdk.vercel.ai/docs/reference - > API Reference - https://vercel.com/templates?type=ai - > Find your Template > Jumpstart your app development process with pre-built solutions from Vercel and our community. ### GenKit - https://github.com/firebase/genkit - > Genkit is a framework for building AI-powered applications. It provides open source libraries for Node.js and Go, along with tools to help you debug and iterate quickly. - > Genkit is built for developers seeking to add generative AI to their apps with Node.js or Go, and can run anywhere these runtimes are supported. It's designed around a plugin architecture that can work with any generative model API or vector database, with many integrations already available. - > While developed by the Firebase team, Genkit can be used independently of Firebase or Google Cloud services. - https://firebase.google.com/docs/genkit - > Genkit - JS - > Genkit is an open-source TypeScript toolkit designed to help you build AI-powered features in web and mobile apps. > > It offers a unified interface for integrating AI models from Google, OpenAI, Anthropic, Ollama, and more, so you can explore and choose the best models for your needs. Genkit simplifies AI development with streamlined APIs for multimodal content generation, structured data generation, tool calling, human-in-the-loop, and other advanced capabilities. > > Whether you're building chatbots, intelligent agents, workflow automations, or recommendation systems, Genkit handles the complexity of AI integration so you can focus on creating incredible user experiences. - https://firebase.google.com/docs/genkit-go/get-started-go - > GenKit - Go - > Get started with Genkit using Go ### LangChain, LangServe, LangSmith, LangFlow, LangGraph, etc - https://github.com/langchain-ai/langchain - > Building applications with LLMs through composability @@ -214,43 +323,6 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://docs.langflow.org/guides/langfuse_integration - > Integrating Langfuse with Langflow ## AI Agents / etc ### Agent Benchmarks / Leaderboards -
0xdevalias revised this gist
Apr 20, 2025 . 1 changed file with 67 additions and 10 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -6,8 +6,11 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus <!-- TOC start (generated with https://bitdowntoc.derlin.ch/) --> - [Some of my other related gists](#some-of-my-other-related-gists) - [OpenRouter](#openrouter) - [ollama](#ollama) - [vLLM](#vllm) - [LiteLLM](#litellm) - [LangChain, LangServe, LangSmith, LangFlow, LangGraph, etc](#langchain-langserve-langsmith-langflow-langgraph-etc) - [Protocols / Standards / etc](#protocols--standards--etc) - [Agent2Agent Protocol (A2A)](#agent2agent-protocol-a2a) - [Model Context Protocol (MCP)](#model-context-protocol-mcp) @@ -62,6 +65,25 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [AI Agent Swarm Musings (0xdevalias' gist)](https://gist.github.com/0xdevalias/4ce1ecd18b3a20ea6a9e58b1a2881875#ai-agent-swarm-musings) - [ChatGPT / AI Rental Property Plugins/Agents (0xdevalias' gist)](https://gist.github.com/0xdevalias/18e666bc319b2e08f90e52bb5cb53538#chatgpt--ai-rental-property-pluginsagents) ## OpenRouter - https://openrouter.ai/ - > The Unified Interface For LLMs - > Better prices, better uptime, no subscription. - https://openrouter.ai/rankings - > Rankings - https://openrouter.ai/models - > Models - https://openrouter.ai/chat - > Chat - https://openrouter.ai/docs/ - > Docs - Quickstart - > Get started with OpenRouter - > OpenRouter provides a unified API that gives you access to hundreds of AI models through a single endpoint, while automatically handling fallbacks and selecting the most cost-effective options. Get started with just a few lines of code using your preferred SDK or framework. - https://openrouter.ai/docs/community/frameworks - > Frameworks - > Using OpenRouter with Frameworks ## ollama - https://github.com/ollama/ollama @@ -102,7 +124,41 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://ollama.ai/blog/python-javascript-libraries - > Python & JavaScript Libraries ## vLLM - https://github.com/vllm-project/vllm - > A high-throughput and memory-efficient inference and serving engine for LLMs - > vLLM is a fast and easy-to-use library for LLM inference and serving. - https://blog.vllm.ai/ - https://blog.vllm.ai/2023/06/20/vllm.html - > vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention - https://blog.vllm.ai/2023/11/14/notes-vllm-vs-deepspeed.html - > Notes on vLLM v.s. DeepSpeed-FastGen ## LiteLLM - https://www.litellm.ai/ - > LiteLLM - > LLM Gateway to provide model access, fallbacks and spend tracking across 100+ LLMs. All in the OpenAI format. - https://docs.litellm.ai/ - > LiteLLM - Getting Started - > You can use litellm through either: > > - LiteLLM Proxy Server - Server (LLM Gateway) to call 100+ LLMs, load balance, cost tracking across projects > - LiteLLM python SDK - Python Client to call 100+ LLMs, load balance, cost tracking - https://docs.litellm.ai/#litellm-python-sdk - > LiteLLM Python SDK - https://docs.litellm.ai/#litellm-proxy-server-llm-gateway - > LiteLLM Proxy Server (LLM Gateway) - https://docs.litellm.ai/docs/hosted - > Hosted LiteLLM Proxy - https://models.litellm.ai/ - > LLM Model Cost Map - https://github.com/BerriAI/litellm - > LiteLLM - > Call all LLM APIs using the OpenAI format (Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.) ## LangChain, LangServe, LangSmith, LangFlow, LangGraph, etc - https://github.com/langchain-ai/langchain - > Building applications with LLMs through composability @@ -141,6 +197,15 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - > Examples for LangFlow - https://github.com/logspace-ai/langflow-embedded-chat - > The Langflow Embedded Chat is a powerful web component that enables seamless communication with the Langflow. This widget provides a chat interface, allowing you to integrate Langflow into your web applications effortlessly. - https://www.langchain.com/langgraph - > LangGraph - > Balance agent control with agency - > Gain control with LangGraph to design agents that reliably handle complex tasks. Build and scale agentic applications with LangGraph Platform. - https://github.com/langchain-ai/langgraph - > LangGraph β used by Replit, Uber, LinkedIn, GitLab and more β is a low-level orchestration framework for building controllable agents. While langchain provides integrations and composable components to streamline LLM application development, the LangGraph library enables agent orchestration β offering customizable architectures, long-term memory, and human-in-the-loop to reliably handle complex tasks. - https://langchain-ai.github.io/langgraph/ - https://langchain-ai.github.io/langgraphjs/tutorials/quickstart/ - > LangGraph.js - Quickstart - https://github.com/langfuse/langfuse - > Langfuse is the open source LLM engineering platform - https://langfuse.com/ @@ -1780,14 +1845,6 @@ Additionally, our ablation study revealed no noticeable overhead even in the abs - > https://chat.lmsys.org/ - > Arena has collected over 100K human votes from side-by-side LLM battles to compile an online LLM Elo leaderboard - https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard - https://github.com/philipturner/metal-benchmarks - > Apple GPU microarchitecture - > This document thoroughly explains the Apple GPU microarchitecture, focusing on its GPGPU performance. Details include latencies for each ALU assembly instruction, cache sizes, and the number of unique instruction pipelines. This document enables evidence-based reasoning about performance on the Apple GPU, helping people diagnose bottlenecks in real-world software. It also compares Apple silicon to generations of AMD and Nvidia microarchitectures, showing where it might exhibit different performance patterns. Finally, the document examines how Apple's design choices improve power efficiency compared to other vendors. -
0xdevalias revised this gist
Apr 20, 2025 . 1 changed file with 40 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -8,6 +8,9 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [Some of my other related gists](#some-of-my-other-related-gists) - [ollama](#ollama) - [LangChain, LangServe, LangSmith, LangFlow, etc](#langchain-langserve-langsmith-langflow-etc) - [Protocols / Standards / etc](#protocols--standards--etc) - [Agent2Agent Protocol (A2A)](#agent2agent-protocol-a2a) - [Model Context Protocol (MCP)](#model-context-protocol-mcp) - [AI Agents / etc](#ai-agents--etc) - [Agent Benchmarks / Leaderboards](#agent-benchmarks--leaderboards) - [OpenAI Assistants / ChatGPT custom GPTs](#openai-assistants--chatgpt-custom-gpts) @@ -146,6 +149,43 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://docs.langflow.org/guides/langfuse_integration - > Integrating Langfuse with Langflow ## Protocols / Standards / etc ### Agent2Agent Protocol (A2A) - https://google.github.io/A2A/#/ - https://github.com/google/A2A - > A2A protocol - > An open protocol enabling communication and interoperability between opaque agentic applications. - https://github.com/google/A2A/tree/main/samples - https://github.com/google/A2A/blob/main/samples/python/agents/google_adk/README.md - > Google Agent Development Kit (ADK) - https://github.com/google/A2A/blob/main/samples/python/agents/langgraph/README.md - > LangGraph - https://github.com/google/A2A/blob/main/samples/js/src/agents/README.md - Firebase GenKit - https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/ - > Announcing the Agent2Agent Protocol (A2A) - > A2A is an open protocol that complements Anthropic's Model Context Protocol (MCP), which provides helpful tools and context to agents. Drawing on Google's internal expertise in scaling agentic systems, we designed the A2A protocol to address the challenges we identified in deploying large-scale, multi-agent systems for our customers. A2A empowers developers to build agents capable of connecting with any other agent built using the protocol and offers users the flexibility to combine agents from various providers. Critically, businesses benefit from a standardized method for managing their agents across diverse platforms and cloud environments. We believe this universal interoperability is essential for fully realizing the potential of collaborative AI agents. ### Model Context Protocol (MCP) - https://github.com/modelcontextprotocol - > Model Context Protocol - > A protocol for seamless integration between LLM applications and external data sources - https://modelcontextprotocol.io/introduction - > Introduction - > Get started with the Model Context Protocol (MCP) - > MCP is an open protocol that standardizes how applications provide context to LLMs. - https://docs.anthropic.com/en/docs/agents-and-tools/mcp - > Agents and tools > Model Context Protocol (MCP) - > MCP is an open protocol that standardizes how applications provide context to LLMs. - https://www.anthropic.com/news/model-context-protocol - > Introducing the Model Context Protocol - > Today, we're open-sourcing the Model Context Protocol (MCP), a new standard for connecting AI assistants to the systems where data lives, including content repositories, business tools, and development environments. Its aim is to help frontier models produce better, more relevant responses. - > It provides a universal, open standard for connecting AI systems with data sources, replacing fragmented integrations with a single protocol. The result is a simpler, more reliable way to give AI systems access to the data they need. ## AI Agents / etc ### Agent Benchmarks / Leaderboards -
0xdevalias revised this gist
Apr 20, 2025 . 1 changed file with 520 additions and 520 deletions.There are no files selected for viewing
-
0xdevalias revised this gist
Apr 20, 2025 . 1 changed file with 56 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -25,7 +25,9 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [Agent Benchmarks / Leaderboards](#agent-benchmarks--leaderboards) - [OpenAI Assistants / ChatGPT custom GPTs](#openai-assistants--chatgpt-custom-gpts) - [OpenGPTs](#opengpts) - [GitHub Copilot Agent / Workspace / CLI](#github-copilot-agent--workspace--cli) - [Claude Code](#claude-code) - [OpenAI Codex CLI](#openai-codex-cli) - [aider](#aider) - [llm](#llm) - [Autogen / FLAML / etc](#autogen--flaml--etc) @@ -509,8 +511,13 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus > - The retrieval algorithm you use > - The chat history database you use ### GitHub Copilot Agent / Workspace / CLI - https://github.com/features/copilot - > GitHub Copilot - https://code.visualstudio.com/docs/copilot/chat/chat-agent-mode - > Use agent mode in VS Code > With chat agent mode in Visual Studio Code, you can use natural language to define a high-level task and to start an agentic code editing session to accomplish that task. In agent mode, Copilot autonomously plans the work needed and determines the relevant files and context. It then makes edits to your codebase and invokes tools to accomplish the request you made. Agent mode monitors the outcome of edits and tools and iterates to resolve any issues that arise. - https://githubnext.com/projects/copilot-workspace - > Copilot Workspace > A Copilot-native dev environment, designed for everyday tasks. @@ -519,6 +526,53 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://github.blog/2024-04-29-github-copilot-workspace/ - > GitHub Copilot Workspace: Welcome to the Copilot-native developer environment > Weβre redefining the developer environment with GitHub Copilot Workspace - where any developer can go from idea, to code, to software all in natural language. - https://githubnext.com/projects/copilot-cli/ - > Copilot for CLI > Ever having trouble remembering shell commands and flags for this or that? Ever wish you could just say what you want the shell to do? Don't worry: we're building GitHub Copilot assistance right into your terminal. - https://docs.github.com/en/copilot/using-github-copilot/using-github-copilot-in-the-command-line - > Using GitHub Copilot in the command line > You can use Copilot with the GitHub CLI to get suggestions and explanations for the command line. - https://docs.github.com/en/copilot/responsible-use-of-github-copilot-features/responsible-use-of-github-copilot-in-the-cli - > Responsible use of GitHub Copilot in the CLI > Learn how to use GitHub Copilot in the CLI responsibly by understanding its purposes, capabilities, and limitations. ### Claude Code - https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview - > Claude Code overview - > Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster through natural language commands. By integrating directly with your development environment, Claude Code streamlines your workflow without requiring additional servers or complex setup. - https://github.com/anthropics/claude-code - > Claude Code (Research Preview) - > Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands. > > Some of its key capabilities include: > > - Edit files and fix bugs across your codebase > - Answer questions about your code's architecture and logic > - Execute and fix tests, lint, and other commands > - Search through git history, resolve merge conflicts, and create commits and PRs ### OpenAI Codex CLI - https://github.com/openai/codex - > OpenAI Codex CLI - > Lightweight coding agent that runs in your terminal - https://help.openai.com/en/articles/11096431-openai-codex-cli-getting-started - > OpenAI Codex CLI β Getting Started - > OpenAI Codex CLI is an openβsource commandβline tool that brings the power of our latest reasoning models directly to your terminal. It acts as a lightweight coding agent that can read, modify, and run code on your local machine to help you build features faster, squash bugs, and understand unfamiliar code. Because the CLI runs locally, your source code never leaves your environment unless you choose to share it. - https://openai.com/index/introducing-o3-and-o4-mini/#codex-cli-frontier-reasoning-in-the-terminal - > ## Codex CLI: frontier reasoning in the terminal > > Weβre also sharing a new experiment: Codex CLI, a lightweight coding agent you can run from your terminal. It works directly on your computer and is designed to maximize the reasoning capabilities of models like o3 and o4-mini, with upcoming support for additional API models like [GPTβ4.1β ](https://openai.com/index/gpt-4-1/). > > You can get the benefits of multimodal reasoning from the command line by passing screenshots or low fidelity sketches to the model, combined with access to your code locally. We think of it as a minimal interface to connect our models to users and their computers. Codex CLI is fully open-source at [<u>github.com/openai/codex</u>β (opens in a new window)](http://github.com/openai/codex) today. > > Alongside, we are launching a \$1 million initiative to support projects using Codex CLI and OpenAI models. We will evaluate and accept applications for grants in increments of \$25,000 USD in the form of API credits. Proposals can be submitted [here](https://openai.com/form/codex-open-source-fund/). - https://openai.com/form/codex-open-source-fund/ - > Codex open source fund > Weβre excited to launch a $1 million initiative supporting open source projects to use Codex CLI and OpenAI models. Applications will be reviewed on an ongoing basis, with projects receiving grants in increments of $25,000 in API credits. > > If youβre interested in participating, please fill out the form below. ### aider -
0xdevalias revised this gist
Apr 20, 2025 . 1 changed file with 27 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -443,10 +443,15 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://github.com/THUDM/AgentBench - > A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24) - https://llmbench.ai/agent - https://aider.chat/docs/leaderboards/ - > Aider LLM Leaderboards - https://github.com/princeton-nlp/SWE-bench - > [ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues? - > SWE-bench is a benchmark for evaluating large language models on real world software issues collected from GitHub. Given a codebase and an issue, a language model is tasked with generating a patch that resolves the described problem. - https://www.swebench.com/ - https://openai.com/index/introducing-swe-bench-verified/ - > Introducing SWE-bench Verified > Weβre releasing a human-validated subset of SWE-bench that more reliably evaluates AI modelsβ ability to solve real-world software issues. - https://www.swebench.com/lite.html - > SWE-bench Lite > A Canonical Subset for Efficient Evaluation of Language Models as Software Engineers @@ -460,6 +465,27 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - > This is a Dockerfile based solution of the SWE-Bench evaluation framework. > > The solution is designed so that each "testbed" for testing a version of a repository is built in a separate Docker image. Each test is then run in its own Docker container. This approach ensures more stable test results because the environment is completely isolated and is reset for each test. Since the Docker container can be recreated each time, there's no need for reinstallation, speeding up the benchmark process. - https://multi-swe-bench.github.io/ - > Multi-SWE-bench - > A Multilingual Benchmark for Issue Resolving - https://github.com/multi-swe-bench/multi-swe-bench - > Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving - > We are extremely delighted to release Multi-SWE-bench! Multi-SWE-bench addresses the lack of multilingual benchmarks for evaluating LLMs in real-world code issue resolution. Unlike existing Python-centric benchmarks (e.g., SWE-bench), our framework spans β7 languages (i.e., Java, TypeScript, JavaScript, Go, Rust, C, and C++) with β1,632 high-quality instances, curated from 2,456 candidates by β68 expert annotators for reliability. - https://arxiv.org/abs/2504.02605 - > Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving (April, 2025) - https://liveswebench.ai/ - > LiveSWEBench - > A Challenging, Contamination-Free Benchmark for AI Software Engineers - https://github.com/livebench/liveswebench - > LiveSWEBench - > LiveSWEBench is a benchmark for evaluating the utility of AI coding assistants in real-world software engineering tasks, at varying levels of developer involvement. Given a real-world codebase and issue, we investigate the following questions: > > - How useful are AI coding assistants at completing tasks with no developer involvement? > - How useful are AI coding assistants at completing tasks with some developer involvement (i.e. writing prompts)? > - How useful are AI coding assistants at aiding in the completion of tasks with high developer involvement (i.e. writing code)? - https://livebench.ai/ - > LiveBench - > A Challenging, Contamination-Free LLM Benchmark ### OpenAI Assistants / ChatGPT custom GPTs @@ -824,7 +850,7 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus ### SWE-agent - https://github.com/SWE-agent/SWE-agent - > SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models - > SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories. > -
0xdevalias revised this gist
Apr 20, 2025 . 1 changed file with 247 additions and 218 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -4,48 +4,52 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus ## Table of Contents <!-- TOC start (generated with https://bitdowntoc.derlin.ch/) --> - [Some of my other related gists](#some-of-my-other-related-gists) - [Image Generation](#image-generation) - [Automatic1111 (Stable Diffusion WebUI)](#automatic1111-stable-diffusion-webui) - [ComfyUI](#comfyui) - [Unsorted](#unsorted) - [Song / Audio Generation](#song--audio-generation) - [Udio](#udio) - [Suno](#suno) - [Stable Audio](#stable-audio) - [AudioCraft: MusicGen, AudioGen, etc](#audiocraft-musicgen-audiogen-etc) - [Neural Audio Codecs](#neural-audio-codecs) - [Audio Super Resolution](#audio-super-resolution) - [Unsorted](#unsorted-1) - [See Also](#see-also) - [ollama](#ollama) - [LangChain, LangServe, LangSmith, LangFlow, etc](#langchain-langserve-langsmith-langflow-etc) - [AI Agents / etc](#ai-agents--etc) - [Agent Benchmarks / Leaderboards](#agent-benchmarks--leaderboards) - [OpenAI Assistants / ChatGPT custom GPTs](#openai-assistants--chatgpt-custom-gpts) - [OpenGPTs](#opengpts) - [GitHub Copilot Workspace](#github-copilot-workspace) - [aider](#aider) - [llm](#llm) - [Autogen / FLAML / etc](#autogen--flaml--etc) - [OpenHands (formerly OpenDevin)](#openhands-formerly-opendevin) - [SWE-agent](#swe-agent) - [ChatDev](#chatdev) - [AutoCoder](#autocoder) - [OpenCodeInterpreter](#opencodeinterpreter) - [OpenInterpreter](#openinterpreter) - [Unsorted](#unsorted-2) - [Code Generation / Execution](#code-generation--execution) - [Code Leaderboards / Benchmarks](#code-leaderboards--benchmarks) - [Vision / Multimodal](#vision--multimodal) - [OpenAI](#openai) - [LLaVA / etc](#llava--etc) - [Unsorted](#unsorted-3) - [Vector Databases/Search, Similarity Search, Clustering, etc](#vector-databasessearch-similarity-search-clustering-etc) - [Faiss](#faiss) - [Benchmarks / Leaderboards](#benchmarks--leaderboards) - [Prompts / Prompt Engineering / etc](#prompts--prompt-engineering--etc) - [Other Useful Tools / Libraries / etc](#other-useful-tools--libraries--etc) - [Unsorted](#unsorted-4) - [Node-based UI's, Graph Execution, Flow Based Programming, etc](#node-based-uis-graph-execution-flow-based-programming-etc) - [Unsorted](#unsorted-5) <!-- TOC end --> ## Some of my other related gists @@ -479,97 +483,7 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus > - The retrieval algorithm you use > - The chat history database you use ### GitHub Copilot Workspace - https://githubnext.com/projects/copilot-workspace - > Copilot Workspace @@ -579,52 +493,9 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://github.blog/2024-04-29-github-copilot-workspace/ - > GitHub Copilot Workspace: Welcome to the Copilot-native developer environment > Weβre redefining the developer environment with GitHub Copilot Workspace - where any developer can go from idea, to code, to software all in natural language. ### aider - https://github.com/paul-gauthier/aider - > aider is AI pair programming in your terminal > Aider is a command line tool that lets you pair program with GPT-3.5/GPT-4, to edit code stored in your local git repository. Aider will directly edit the code in your local source files, and git commit the changes with sensible commit messages. You can start a new project or work with an existing git repo. Aider is unique in that it lets you ask for changes to pre-existing, larger codebases. @@ -634,9 +505,9 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - > Building a better repository map with tree sitter - https://aider.chat/2023/12/21/unified-diffs.html - > Unified diffs make GPT-4 Turbo 3X less lazy ### llm - https://github.com/simonw/llm - > Access large language models from the command-line - https://llm.datasette.io/ @@ -845,9 +716,91 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus > The new release adds several command-line tools for working with embeddings, plus a new Python API for working with embeddings in your own code. > It also adds support for installing additional embedding models via plugins. - https://simonwillison.net/2024/Mar/26/llm-cmd/ - > I just released a neat new plugin for my LLM command-line tool: `llm-cmd`. It lets you run a command to to generate a further terminal command, review and edit that command, then hit `<enter>` to execute it or `<ctrl-c>` to cancel. ### Autogen / FLAML / etc - https://github.com/microsoft/autogen - > Enable Next-Gen Large Language Model Applications. - > AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools. - > - AutoGen enables building next-gen LLM applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation, and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses. > - It supports diverse conversation patterns for complex workflows. With customizable and conversable agents, developers can use AutoGen to build a wide range of conversation patterns concerning conversation autonomy, the number of agents, and agent conversation topology. > - It provides a collection of working systems with different complexities. These systems span a wide range of applications from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns. > - AutoGen provides enhanced LLM inference. It offers utilities like API unification and caching, and advanced usage patterns, such as error handling, multi-config inference, context programming, etc. - Roadmap: https://github.com/orgs/microsoft/projects/989/views/3 - https://github.com/microsoft/autogen#multi-agent-conversation-framework - > Autogen enables the next-gen LLM applications with a generic multi-agent conversation framework. It offers customizable and conversable agents that integrate LLMs, tools, and humans. By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code. - https://microsoft.github.io/autogen/blog/ - https://microsoft.github.io/autogen/blog/2023/12/01/AutoGenAssistant/ - > AutoGen Assistant: Interactively Explore Multi-Agent Workflows - > To help you rapidly prototype multi-agent solutions for your tasks, we are introducing AutoGen Assistant, an interface powered by AutoGen. It allows you to: > > - Declaratively define and modify agents and multi-agent workflows through a point and click, drag and drop interface (e.g., you can select the parameters of two agents that will communicate to solve your task). > - Use our UI to create chat sessions with the specified agents and view results (e.g., view chat history, generated files, and time taken). > - Explicitly add skills to your agents and accomplish more tasks. > - Publish your sessions to a local gallery. > - AutoGen Assistant is open source, give it a try! - > we are thrilled to introduce a new user-friendly interface: the AutoGen Assistant. Built upon the leading foundation of AutoGen and robust, modern web technologies like React. - > With the AutoGen Assistant, users can rapidly create, manage, and interact with agents that can learn, adapt, and collaborate. As we release this interface into the open-source community, our ambition is not only to enhance productivity but to inspire a level of personalized interaction between humans and agents. - > We recommend using a virtual environment (e.g., `conda`) to avoid conflicts with existing Python packages. With Python 3.10 or newer active in your virtual environment, use `pip` to install AutoGen Assistant: `pip install autogenra` - > Once installed, run the web UI by entering the following in your terminal: `autogenra ui --port 8081`. This will start the application on the specified port. Open your web browser and go to `http://localhost:8081/` to begin using AutoGen Assistant. - > The AutoGen Assistant proposes some high-level concepts that help compose agents to solve tasks. > > - **Agent Workflow:** An agent workflow is a specification of a set of agents that can work together to accomplish a task. The simplest version of this is a setup with two agents β a user proxy agent (that represents a user i.e. it compiles code and prints result) and an assistant that can address task requests (e.g., generating plans, writing code, evaluating responses, proposing error recovery steps, etc.). A more complex flow could be a group chat where even more agents work towards a solution. > - **Session:** A session refers to a period of continuous interaction or engagement with an agent workflow, typically characterized by a sequence of activities or operations aimed at achieving specific objectives. It includes the agent workflow configuration, the interactions between the user and the agents. A session can be βpublishedβ to a βgalleryβ. > - **Skills:** Skills are functions (e.g., Python functions) that describe how to solve a task. In general, a good skill has a descriptive name (e.g. `generate_images`), extensive docstrings and good defaults (e.g., writing out files to disk for persistence and reuse). You can add new skills to the AutoGen Assistant via the provided UI. At inference time, these skills are made available to the assistant agent as they address your tasks. > > AutoGen Assistant comes with 3 example skills: `fetch_profile`, `find_papers`, `generate_images`. Please feel free to review the repo to learn more about how they work. - > While the AutoGen Assistant is a web interface, it is powered by an underlying python API that is reusable and modular. Importantly, we have implemented an API where agent workflows can be declaratively specified (in JSON), loaded and run. - https://microsoft.github.io/autogen/blog/2023/11/26/Agent-AutoBuild/ - > Agent AutoBuild - Automatically Building Multi-agent Systems - > Introducing `AutoBuild`, building multi-agent system automatically, fast, and easily for complex tasks with minimal user prompt required, powered by a new designed class `AgentBuilder`. `AgentBuilder` also supports open-source LLMs by leveraging `vLLM` and `FastChat`. - > In this blog, we introduce `AutoBuild`, a pipeline that can automatically build multi-agent systems for complex tasks. Specifically, we design a new class called `AgentBuilder`, which will complete the generation of participant expert agents and the construction of group chat automatically after the user provides descriptions of a building task and an execution task. - > AutoBuild supports open-source LLM by vLLM and FastChat. - > OpenAI Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. AutoBuild also supports the assistant API by adding `use_oai_assistant=True` to `build()`. - https://microsoft.github.io/autogen/blog/2023/11/20/AgentEval/ - > How to Assess Utility of LLM-powered Applications? - > As a developer of an LLM-powered application, how can you assess the utility it brings to end users while helping them with their tasks? - > We introduce AgentEval β the first version of the framework to assess the utility of any LLM-powered application crafted to assist users in specific tasks. AgentEval aims to simplify the evaluation process by automatically proposing a set of criteria tailored to the unique purpose of your application. This allows for a comprehensive assessment, quantifying the utility of your application against the suggested criteria. - https://microsoft.github.io/autogen/blog/2023/11/13/OAI-assistants/ - > AutoGen Meets GPTs - > OpenAI assistants are now integrated into AutoGen via GPTAssistantAgent. This enables multiple OpenAI assistants, which form the backend of the now popular GPTs, to collaborate and tackle complex tasks. - https://microsoft.github.io/autogen/blog/2023/11/09/EcoAssistant/ - > EcoAssistant - Using LLM Assistants More Accurately and Affordably - > TL;DR: > - Introducing the EcoAssistant, which is designed to solve user queries more accurately and affordably. > - We show how to let the LLM assistant agent leverage external API to solve user query. > - We show how to reduce the cost of using GPT models via Assistant Hierachy. > - We show how to leverage the idea of Retrieval-augmented Generation (RAG) to improve the success rate via Solution Demonstration. - https://microsoft.github.io/autogen/blog/2023/11/06/LMM-Agent/ - > Multimodal with GPT-4V and LLaVA - > This blog post and the latest AutoGen update concentrate on visual comprehension. Users can input images, pose questions about them, and receive text-based responses from these LMMs. We support the `gpt-4-vision-preview` model from OpenAI and LLaVA model from Microsoft now. - https://microsoft.github.io/autogen/blog/2023/10/26/TeachableAgent/ - > AutoGen's TeachableAgent - > We introduce `TeachableAgent` (which uses `TextAnalyzerAgent`) so that users can teach their LLM-based assistants new facts, preferences, and skills. - https://microsoft.github.io/autogen/blog/2023/10/18/RetrieveChat/ - > Retrieval-Augmented Generation (RAG) Applications with AutoGen - > TL;DR: > - We introduce RetrieveUserProxyAgent and RetrieveAssistantAgent, RAG agents of AutoGen that allows retrieval-augmented generation, and its basic usage. > - We showcase customizations of RAG agents, such as customizing the embedding function, the text split function and vector database. > - We also showcase two advanced usage of RAG agents, integrating with group chat and building a Chat application with Gradio. - https://github.com/microsoft/FLAML - > A Fast Library for Automated Machine Learning & Tuning - > FLAML is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance. > - FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness. > - For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range. > - It supports fast and economical automatic tuning (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations), capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping. - > Heads-up: We have migrated [AutoGen](https://microsoft.github.io/autogen/) into a dedicated [github repository](https://github.com/microsoft/autogen). Alongside this move, we have also launched a dedicated [Discord](https://discord.gg/pAbnFJrkgZ) server and a [website](https://microsoft.github.io/autogen/) for comprehensive documentation. ### OpenHands (formerly OpenDevin) - https://www.all-hands.dev/ - > Open Source Agents for Developers - https://github.com/All-Hands-AI - > All Hands AI > We build AI software development agents for everyone, in the open. - https://github.com/All-Hands-AI/OpenHands - > OpenHands: Code Less, Make More - https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/ - > Introducing OpenDevin CodeAct 1.0, a new State-of-the-art in Coding Agents - > today we introduce a new state-of-the-art coding agent, OpenDevin CodeAct 1.0, which achieves 21% solve rate on SWE-Bench Lite unassisted, a 17% relative improvement above the previous state-of-the-art posted by SWE-Agent. OpenDevin CodeAct 1.0 is now the default in OpenDevin v0.5 @@ -857,8 +810,127 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - > SWE-Bench is a great benchmark that tests the ability of coding agents to solve real-world github issues on a number of popular repositories. However, due in part to its realism the process of evaluating on SWE-Bench can initially seem daunting. - > To help make it easy to perform this process in an efficient, stable, and reproducible manner, the OpenDevin team containerized the evaluation environment. This preparation involves setting up all necessary testbeds (codebases at various versions) and their respective conda environments in advance. For each task instance, we initiate a sandbox container where the testbed is pre-configured, ensuring a ready-to-use setup for the agent - > This supports both SWE-Bench-Lite (a smaller benchmark of 300 issues that is more conducive to quick benchmarking) and SWE-Bench (the full dataset of 2,294 issues, work-in-progress). With our evaluation pipeline, we obtained a replicated SWE-agent resolve score of 17.3% (52 out of 300 test instances) on SWE-Bench-Lite using the released SWE-agent patch predictions, which differs by 2 from the originally reported 18.0% (54 out of 300). - https://github.com/All-Hands-AI/OpenHands/issues/742 - > Explore using stack graphs for better code search / navigation / context / repo map / etc - https://github.com/All-Hands-AI/openhands-aci - > Agent-Computer Interface (ACI) for OpenHands - > An Agent-Computer Interface (ACI) designed for software development agents OpenHands. This package provides essential tools and interfaces for AI agents to interact with computer systems for software development tasks. - https://github.com/All-Hands-AI/open-operator - > Open Operator - > Open-source resources on agents for computer use. - > What will it take to make a versatile computer use agent that can safely and effectively handle any task? > > This is a collection of resources and ideas towards this goal. ### SWE-agent - https://github.com/princeton-nlp/SWE-agent - > SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models - > SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories. > > On SWE-bench, SWE-agent resolves 12.29% of issues, achieving the state-of-the-art performance on the full test set. - > Agent-Computer Interface (ACI) > We accomplish these results by designing simple LM-centric commands and feedback formats to make it easier for the LM to browse the repository, view, edit and execute code files. We call this an Agent-Computer Interface (ACI) and build the SWE-agent repository to make it easy to iterate on ACI design for repository-level coding agents. > > Just like how typical language models requires good prompt engineering, good ACI design leads to much better results when using agents. As we show in our paper, a baseline agent without a well-tuned ACI does much worse than SWE-agent. ### ChatDev - https://github.com/OpenBMB/ChatDev - > Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration) - > Communicative Agents for Software Development - > ChatDev stands as a virtual software company that operates through various intelligent agents holding different roles, including Chief Executive Officer , Chief Product Officer , Chief Technology Officer , programmer , reviewer , tester , art designer . These agents form a multi-agent organizational structure and are united by a mission to "revolutionize the digital world through programming." The agents within ChatDev collaborate by participating in specialized functional seminars, including tasks such as designing, coding, testing, and documenting. > The primary objective of ChatDev is to offer an easy-to-use, highly customizable and extendable framework, which is based on large language models (LLMs) and serves as an ideal scenario for studying collective intelligence. - https://github.com/OpenBMB/ChatDev#-news - > November 15th, 2023: We launched ChatDev as a SaaS platform that enables software developers and innovative entrepreneurs to build software efficiently at a very low cost and barrier to entry. Try it out at https://chatdev.modelbest.cn/ - > November 2nd, 2023: ChatDev is now supported with a new feature: incremental development, which allows agents to develop upon existing codes. Try `--config "incremental" --path "[source_code_directory_path]"` to start it. - > October 26th, 2023: ChatDev is now supported with Docker for safe execution (thanks to contribution from ManindraDeMel). Please see [Docker Start Guide](https://github.com/OpenBMB/ChatDev/blob/main/wiki.md#docker-start). - > September 25th, 2023: The Git mode is now available, enabling the programmer to utilize Git for version control. To enable this feature, simply set `"git_management"` to `"True"` in `ChatChainConfig.json`. [See guide](https://github.com/OpenBMB/ChatDev/blob/main/wiki.md#git-mode). - > September 20th, 2023: The Human-Agent-Interaction mode is now available! You can get involved with the ChatDev team by playing the role of reviewer and making suggestions to the programmer ; try `python3 run.py --task [description_of_your_idea] --config "Human"`. See [guide](https://github.com/OpenBMB/ChatDev/blob/main/wiki.md#human-agent-interaction) and [example](https://github.com/OpenBMB/ChatDev/blob/main/WareHouse/Gomoku_HumanAgentInteraction_20230920135038). - > September 1st, 2023: The Art mode is available now! You can activate the designer agent to generate images used in the software; try `python3 run.py --task [description_of_your_idea] --config "Art"`. See [guide](https://github.com/OpenBMB/ChatDev/blob/main/wiki.md#art) and [example](https://github.com/OpenBMB/ChatDev/blob/main/WareHouse/gomokugameArtExample_THUNLP_20230831122822). - https://chatdev.modelbest.cn/ ### AutoCoder - https://github.com/bin123apple/AutoCoder - > AutoCoder - > We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024). (90.9% vs 90.2%). > > Additionally, compared to previous open-source models, AutoCoder offers a new feature: it can automatically install the required packages and attempt to run the code until it deems there are no issues, whenever the user wishes to execute the code. - https://arxiv.org/abs/2405.14906 - > AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct} - > We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test (90.9% vs. 90.2%). In addition, AutoCoder offers a more versatile code interpreter compared to GPT-4 Turbo and GPT-4o. It's code interpreter can install external packages instead of limiting to built-in packages. AutoCoder's training data is a multi-turn dialogue dataset created by a system combining agent interaction and external code execution verification, a method we term \textbf{\textsc{AIEV-Instruct}} (Instruction Tuning with Agent-Interaction and Execution-Verified). Compared to previous large-scale code dataset generation methods, \textsc{AIEV-Instruct} reduces dependence on proprietary large models and provides execution-validated code dataset. ### OpenCodeInterpreter - https://github.com/OpenCodeInterpreter/OpenCodeInterpreter - > OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement - > OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophisticated proprietary systems like the GPT-4 Code Interpreter. It significantly enhances code generation capabilities by integrating execution and iterative refinement functionalities. - https://opencodeinterpreter.github.io/ - https://arxiv.org/abs/2402.14658 - > OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement - > The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address this, we introduce OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Code-Feedback, a dataset featuring 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement. Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance. Notably, OpenCodeInterpreter-33B achieves an accuracy of 83.2 (76.4) on the average (and plus versions) of HumanEval and MBPP, closely rivaling GPT-4's 84.2 (76.2) and further elevates to 91.6 (84.6) with synthesized human feedback from GPT-4. OpenCodeInterpreter brings the gap between open-source code generation models and proprietary systems like GPT-4 Code Interpreter. ### OpenInterpreter - https://github.com/KillianLucas/open-interpreter - > OpenInterpreter > A natural language interface for computers - > Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running `$ interpreter` after installing. > > This provides a natural-language interface to your computer's general-purpose capabilities - https://openinterpreter.com/ - https://docs.openinterpreter.com/introduction - https://github.com/KillianLucas/open-interpreter-docs - > Documentation site for the Open Interpreter project - https://changes.openinterpreter.com/ - https://github.com/KillianLucas/open-procedures - > Tiny, structured coding tutorials that can be searched semantically - > Open Procedures is an open-source project offering tiny, structured coding tutorials that can be searched semantically. It was created to help code-interpreting language models complete tasks by fetching relevant and up-to-date code snippets. - https://open-procedures.replit.app/ ### Unsorted - https://github.com/holmeswww/agentkit - > AgentKit: Flow Engineering with Graphs, not Coding - > An intuitive LLM prompting framework for multifunctional agents, by explicitly constructing a complex "thought process" from simple natural language prompts. - > AgentKit offers a unified framework for explicitly constructing a complex human "thought process" from simple natural language prompts. The user puts together chains of nodes, like stacking LEGO pieces. The chains of nodes can be designed to explicitly enforce a naturally structured "thought process". > > Different arrangements of nodes could represent different functionalities, allowing the user to integrate various functionalities to build multifunctional agents. > > A basic agent could be implemented as simple as a list of prompts for the subtasks and therefore could be designed and tuned by someone without any programming experience. - https://github.com/CopilotKit/CopilotKit - > CopiloptKit - > A framework for building custom AI Copilots π€ in-app AI chatbots, in-app AI Agents, & AI-powered Textareas. - > The Open-Source Copilot Framework > Build, deploy, and operate fully custom AI Copilots. > in-app AI chatbots, AI agents, and AI Textareas - https://www.copilotkit.ai/ - https://github.com/CopilotKit/demo-todo - > This is a demo that showcases using CopilotKit to build a simple Todo app. - https://todo-demo-phi.vercel.app/ - https://github.com/OpenBMB/AgentVerse - > π€ AgentVerse πͺ is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation - > Task-solving: This framework assembles multiple agents as an automatic multi-agent system (AgentVerse-Tasksolving, Multi-agent as system) to collaboratively accomplish the corresponding tasks. Applications: software development system, consulting system, etc. - > Simulation: This framework allows users to set up custom environments to observe behaviors among, or interact with, multiple agents. Applications: game, social behavior research of LLM-based agents, etc. - https://arxiv.org/abs/2308.10848 - > AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors - > Autonomous agents empowered by Large Language Models (LLMs) have undergone significant improvements, enabling them to generalize across a broad spectrum of tasks. However, in real-world scenarios, cooperation among individuals is often required to enhance the efficiency and effectiveness of task accomplishment. Hence, inspired by human group dynamics, we propose a multi-agent framework that can collaboratively and dynamically adjust its composition as a greater-than-the-sum-of-its-parts system. Our experiments demonstrate that the framework can effectively deploy multi-agent groups that outperform a single agent. Furthermore, we delve into the emergence of social behaviors among individual agents within a group during collaborative task accomplishment. In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups. - https://developer.nvidia.com/blog/building-your-first-llm-agent-application/ - > Building Your First LLM Agent Application - https://gpt.chatcody.com/ - > ChatGPT GitHub Empowered assistant > Designed for comprehensive repository interaction - from code contributions to read/write operations, reviews and advanced task automation. - https://chat.openai.com/g/g-jSqTyHBbh-chatcody-github-gitlab-assistant - https://dosu.dev/ - > Dosu is an AI teammate that lives in your GitHub repo, helping you respond to issues, triage bugs, and build better documentation. - > How much does Dosu cost? > Auto-labeling and backlog grooming are completely free! For Q&A and debugging, Dosu is free for 25 tickets per month. After that, paid plans start at $20 per month. A detailed pricing page is coming soon. > > At Dosu, we are strong advocates of OSS. If you maintain a project that is FOSS, part of the Cloud Native Computing Foundation (CNCF), or the Apache Software Foundation (ASF), please reach out to [email protected] about special free-tier plans - https://github.com/NL2Code/CodeR - > CodeR - > GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28% of issues, in the case of submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction. - https://github.com/stitionai/devika - > Devika - Agentic AI Software Engineer - > Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika aims to be a competitive open-source alternative to Devin by Cognition AI. @@ -916,10 +988,6 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - See also: - [AI Agents / etc](#ai-agents--etc) ### Code Leaderboards / Benchmarks - https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard @@ -952,45 +1020,6 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus > - π Searching Needle Function (π): Search a function given its description. > - π§ RepoQA is still under development... More types of QA tasks are coming soon... Stay tuned! ## Vision / Multimodal ### OpenAI -
0xdevalias revised this gist
Jun 10, 2024 . 1 changed file with 15 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -152,6 +152,21 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus ### Stable Audio - https://arxiv.org/abs/2404.10301 - > Long-form music generation with latent diffusion (2024) - > Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure. We show that by training a generative model on long temporal contexts it is possible to produce long-form music of up to 4m45s. Our model consists of a diffusion-transformer operating on a highly downsampled continuous latent representation (latent rate of 21.5Hz). It obtains state-of-the-art generations according to metrics on audio quality and prompt alignment, and subjective tests reveal that it produces full-length music with coherent structure. - https://stability-ai.github.io/stable-audio-2-demo/ - > stable-audio-2-demo - > Additional creative capabilities > Audio-to-audio With diffusion models is possible to perform some degree of style-transfer by initializing the noise with audio during sampling. This capability can be used to modify the aesthetics of an existing recording based on a given text prompt, whilst maintaining the reference audioβs structure (e.g., a beatbox recording could be style-transfered to produce realistic-sounding drums). As a result, our model can be influenced by not only text prompts but also audio inputs, enhancing its controllability and expressiveness. We noted that when initialized with voice recordings (such as beatbox or onomatopoeias), there is a sensation of control akin to an instrument. - > Memorization analysis > Recent works examined the potential of generative models to memorize training data, especially for repeated elements in the training set. Further, musicLM conducted a memorization analysis to address concerns on the potential misappropriation of creative content. Adhering to principles of responsible model development, we also run a comprehensive study on memorization. > > Considering the increased probability of memorizing repeated music within the dataset, we start by studying if our training set contains repeated data. We embed all our training data using the LAION-CLAP audio encoder to select audios that are close in this space based on a manually set threshold. The threshold is set such that the selected audios correspond to exact replicas. With this process, we identify 5566 repeated audios in our training set. > > We compare our modelβs generations against the training set in LAION-CLAP space. Generations are from 5566 prompts within the repeated training data (in-distribution), and 586 prompts from the Song Describer Dataset (no-singing, out-of-distribution). We then identify the top-50 generated music that is closest to the training data and listen. > > We extensively listened to potential memorization candidates, and could not find memorization. - https://www.stableaudio.com/ - > Stable Audio > Create music with AI -
0xdevalias revised this gist
Jun 10, 2024 . 1 changed file with 53 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -38,7 +38,8 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [OpenAI](#openai) - [LLaVA / etc](#llava--etc) - [Unsorted](#unsorted-4) - [Vector Databases/Search, Similarity Search, Clustering, etc](#vector-databasessearch-similarity-search-clustering-etc) - [Faiss](#faiss) - [Benchmarks / Leaderboards](#benchmarks--leaderboards) - [Prompts / Prompt Engineering / etc](#prompts--prompt-engineering--etc) - [Other Useful Tools / Libraries / etc](#other-useful-tools--libraries--etc) @@ -563,6 +564,14 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://github.blog/2024-04-29-github-copilot-workspace/ - > GitHub Copilot Workspace: Welcome to the Copilot-native developer environment > Weβre redefining the developer environment with GitHub Copilot Workspace - where any developer can go from idea, to code, to software all in natural language. - https://github.com/holmeswww/agentkit - > AgentKit: Flow Engineering with Graphs, not Coding - > An intuitive LLM prompting framework for multifunctional agents, by explicitly constructing a complex "thought process" from simple natural language prompts. - > AgentKit offers a unified framework for explicitly constructing a complex human "thought process" from simple natural language prompts. The user puts together chains of nodes, like stacking LEGO pieces. The chains of nodes can be designed to explicitly enforce a naturally structured "thought process". > > Different arrangements of nodes could represent different functionalities, allowing the user to integrate various functionalities to build multifunctional agents. > > A basic agent could be implemented as simple as a list of prompts for the subtasks and therefore could be designed and tuned by someone without any programming experience. - https://github.com/CopilotKit/CopilotKit - > CopiloptKit - > A framework for building custom AI Copilots π€ in-app AI chatbots, in-app AI Agents, & AI-powered Textareas. @@ -1088,9 +1097,17 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - > Lightweight GPT-4 Vision processing over the Webcam - > WebcamGPT-Vision is a lightweight web application that enables users to process images from their webcam using OpenAI's GPT-4 Vision API. The application captures images from the user's webcam, sends them to the GPT-4 Vision API, and displays the descriptive results. ## Vector Databases/Search, Similarity Search, Clustering, etc - TODO: add more things here ### Faiss - https://github.com/facebookresearch/faiss - > Faiss - > A library for efficient similarity search and clustering of dense vectors. - > Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed primarily at Meta's Fundamental AI Research group. - https://faiss.ai/ ## Benchmarks / Leaderboards @@ -1471,6 +1488,20 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://github.com/pytorch/torchtune#llama3 - > torchtune supports fine-tuning for the Llama3 8B models with support for 70B on its way. We currently support LoRA, QLoRA and Full-finetune on a single GPU as well as LoRA and Full fine-tune on multiple devices. - https://pytorch.org/blog/torchtune-fine-tune-llms/ - https://llama.meta.com/llama3/ - > Meta Llama 3 > Now available with both 8B and 70B pretrained and instruction-tuned versions to support a wide range of applications - https://github.com/meta-llama/llama3 - > Meta Llama 3 - > The official Meta Llama 3 GitHub site - > We are unlocking the power of large language models. Our latest version of Llama is now accessible to individuals, creators, researchers, and businesses of all sizes so that they can experiment, innovate, and scale their ideas responsibly. > > This release includes model weights and starting code for pre-trained and instruction-tuned Llama 3 language models β including sizes of 8B to 70B parameters. > > This repository is a minimal example of loading Llama 3 models and running inference. For more detailed examples, see llama-recipes. - https://github.com/meta-llama/llama-recipes - > Llama Recipes: Examples to get started using the Llama models from Meta - > Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger. - https://zapier.com/blog/train-chatgpt-to-write-like-you/ - > How to train ChatGPT to write like you - https://github.com/EleutherAI/gpt-neox @@ -1490,6 +1521,25 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus > - Predefined configurations for popular architectures including Pythia, PaLM, Falcon, and LLaMA 1 & 2 > - Curriculum Learning > - Easy connections with the open source ecosystem, including Hugging Face's [tokenizers](https://github.com/huggingface/tokenizers) and [transformers](https://github.com/huggingface/transformers/) libraries, logging via [WandB](https://wandb.ai/site), and evaluation via our [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness). - https://microsoft.github.io/promptflow/ - > Prompt flow > Prompt flow is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality. > > With prompt flow, you will be able to: > > - Create flows that link LLMs, prompts, Python code and other tools together in a executable workflow. > - Debug and iterate your flows, especially the interaction with LLMs with ease. > - Evaluate your flows, calculate quality and performance metrics with larger datasets. > - Integrate the testing and evaluation into your CI/CD system to ensure quality of your flow. > - Deploy your flows to the serving platform you choose or integrate into your appβs code base easily. > - (Optional but highly recommended) Collaborate with your team by leveraging the cloud version of Prompt flow in Azure AI. - https://microsoft.github.io/promptflow/concepts/concept-flows.html - > Flows - > While how LLMs work may be elusive to many developers, how LLM apps work is not - they essentially involve a series of calls to external services such as LLMs/databases/search engines, or intermediate data processing, all glued together. - https://microsoft.github.io/promptflow/reference/index.html - > Reference - https://github.com/microsoft/autogen/tree/main/samples/apps/promptflow-autogen - > Pomptflow Autogen Example - https://github.com/stanfordnlp/dspy - > DSPy: The framework for programmingβnot promptingβfoundation models - > DSPy is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used one or more times within a pipeline. To use LMs to build a complex system without DSPy, you generally have to: (1) break the problem down into steps, (2) prompt your LM well until each step works well in isolation, (3) tweak the steps to work well together, (4) generate synthetic examples to tune each step, and (5) use these examples to finetune smaller LMs to cut costs. Currently, this is hard and messy: every time you change your pipeline, your LM, or your data, all prompts (or finetuning steps) may need to change. -
0xdevalias revised this gist
Jun 10, 2024 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1103,6 +1103,7 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://huggingface.co/spaces/openlifescienceai/open_medical_llm_leaderboard - > Open Medical-LLM Leaderboard - https://huggingface.co/blog/leaderboard-medicalllm - > The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare - https://github.com/EleutherAI/lm-evaluation-harness - > Language Model Evaluation Harness - > A framework for few-shot evaluation of language models. -
0xdevalias revised this gist
Jun 10, 2024 . 1 changed file with 20 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1448,6 +1448,12 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus ## Unsorted - https://github.com/google-gemini/cookbook - > Gemini API Cookbook - > A collection of guides and examples for the Gemini API. - > This is a collection of guides and examples for the Gemini API, including quickstart tutorials for writing prompts and using different features of the API, and examples of things you can build. - https://ai.google.dev/gemini-api/docs - > Get started with Gemini API - https://github.com/NaturalNode/natural/ - > Natural - > general natural language facilities for node @@ -1483,6 +1489,20 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus > - Predefined configurations for popular architectures including Pythia, PaLM, Falcon, and LLaMA 1 & 2 > - Curriculum Learning > - Easy connections with the open source ecosystem, including Hugging Face's [tokenizers](https://github.com/huggingface/tokenizers) and [transformers](https://github.com/huggingface/transformers/) libraries, logging via [WandB](https://wandb.ai/site), and evaluation via our [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness). - https://github.com/stanfordnlp/dspy - > DSPy: The framework for programmingβnot promptingβfoundation models - > DSPy is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used one or more times within a pipeline. To use LMs to build a complex system without DSPy, you generally have to: (1) break the problem down into steps, (2) prompt your LM well until each step works well in isolation, (3) tweak the steps to work well together, (4) generate synthetic examples to tune each step, and (5) use these examples to finetune smaller LMs to cut costs. Currently, this is hard and messy: every time you change your pipeline, your LM, or your data, all prompts (or finetuning steps) may need to change. > > To make this more systematic and much more powerful, DSPy does two things. First, it separates the flow of your program (modules) from the parameters (LM prompts and weights) of each step. Second, DSPy introduces new optimizers, which are LM-driven algorithms that can tune the prompts and/or the weights of your LM calls, given a metric you want to maximize. > > DSPy can routinely teach powerful models like GPT-3.5 or GPT-4 and local models like T5-base or Llama2-13b to be much more reliable at tasks, i.e. having higher quality and/or avoiding specific failure patterns. DSPy optimizers will "compile" the same program into different instructions, few-shot prompts, and/or weight updates (finetunes) for each LM. This is a new paradigm in which LMs and their prompts fade into the background as optimizable pieces of a larger system that can learn from data. tldr; less prompting, higher scores, and a more systematic approach to solving hard tasks with LMs. - https://dspy-docs.vercel.app/ - > DSPy - Programmingβnot promptingβLanguage Models - > The Way of DSPy > > - Systematic Optimization: Choose from a range of optimizers to enhance your program. Whether it's generating refined instructions, or fine-tuning weights, DSPy's optimizers are engineered to maximize efficiency and effectiveness. > - Modular Approach: With DSPy, you can build your system using predefined modules, replacing intricate prompting techniques with straightforward, effective solutions. > - Cross-LM Compatibility: Whether you're working with powerhouse models like GPT-3.5 or GPT-4, or local models such as T5-base or Llama2-13b, DSPy seamlessly integrates and enhances their performance in your system. - https://github.com/sgl-project/sglang - > SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable. - https://lmsys.org/blog/2024-01-17-sglang/ -
0xdevalias revised this gist
Jun 10, 2024 . 1 changed file with 10 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -186,6 +186,16 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://github.com/Stability-AI/stable-audio-tools#fine-tuning - > Fine-tuning > Fine-tuning a model involves continuning a training run from a pre-trained checkpoint. - https://github.com/diontimmer/audio-diffusion-gradio - > audio-diffusion-gradio - > Decked-out gradio client for audio diffusion, mainly stable-audio-tools. - > The Audio Diffusion Gradio Interface is a user-friendly graphical user interface (GUI) made in Gradio that simplifies the process of working with audio diffusion models, autoencoders, diffusion autoencoders, and various models trainable using the stable-audio-tools package. This interface not only streamlines your audio diffusion tasks but also provides a modular extension system, enabling users to easily integrate additional functionalities. - https://github.com/lks-ai/ComfyUI-StableAudioSampler - > ComfyUI-StableAudioSampler > The New Stable Audio Open 1.0 Sampler In a ComfyUI Node. Make some beats! - https://huggingface.co/spaces/ameerazam08/stableaudio-open-1.0 - > Stable Audio Multiplayer Live > Generate audio with text, share and learn from others how to best prompt this new model ### AudioCraft: MusicGen, AudioGen, etc -
0xdevalias revised this gist
Jun 10, 2024 . 1 changed file with 48 additions and 35 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -14,6 +14,9 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [Udio](#udio) - [Suno](#suno) - [Stable Audio](#stable-audio) - [AudioCraft: MusicGen, AudioGen, etc](#audiocraft-musicgen-audiogen-etc) - [Neural Audio Codecs](#neural-audio-codecs) - [Audio Super Resolution](#audio-super-resolution) - [Unsorted](#unsorted-1) - [See Also](#see-also) - [ollama](#ollama) @@ -184,14 +187,39 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - > Fine-tuning > Fine-tuning a model involves continuning a training run from a pre-trained checkpoint. ### AudioCraft: MusicGen, AudioGen, etc - https://github.com/facebookresearch/audiocraft - > Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning. - https://github.com/facebookresearch/audiocraft#models - > At the moment, AudioCraft contains the training code and inference code for: - > MusicGen: A state-of-the-art controllable text-to-music model. - https://github.com/facebookresearch/audiocraft/blob/main/docs/MUSICGEN.md - > MusicGen: Simple and Controllable Music Generation > AudioCraft provides the code and models for MusicGen, a simple and controllable model for music generation. MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. - > AudioGen: A state-of-the-art text-to-sound model. - https://github.com/facebookresearch/audiocraft/blob/main/docs/AUDIOGEN.md - > AudioGen: Textually-guided audio generation > AudioCraft provides the code and a model re-implementing AudioGen, a textually-guided audio generation model that performs text-to-sound generation. > > The provided AudioGen reimplementation follows the LM model architecture introduced in MusicGen and is a single stage auto-regressive Transformer model trained over a 16kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. This model variant reaches similar audio quality than the original implementation introduced in the AudioGen publication while providing faster generation speed given the smaller frame rate. - > EnCodec: A state-of-the-art high fidelity neural audio codec. - https://github.com/facebookresearch/audiocraft/blob/main/docs/ENCODEC.md - > EnCodec: High Fidelity Neural Audio Compression > AudioCraft provides the training code for EnCodec, a state-of-the-art deep learning based audio codec supporting both mono and stereo audio, presented in the High Fidelity Neural Audio Compression paper. - > Multi Band Diffusion: An EnCodec compatible decoder using diffusion. - https://github.com/facebookresearch/audiocraft/blob/main/docs/MBD.md - > MultiBand Diffusion > AudioCraft provides the code and models for MultiBand Diffusion, From Discrete Tokens to High Fidelity Audio using MultiBand Diffusion. MultiBand diffusion is a collection of 4 models that can decode tokens from EnCodec tokenizer into waveform audio. - > MAGNeT: A state-of-the-art non-autoregressive model for text-to-music and text-to-sound. - https://github.com/facebookresearch/audiocraft/blob/main/docs/MAGNET.md - > MAGNeT: Masked Audio Generation using a Single Non-Autoregressive Transformer > AudioCraft provides the code and models for MAGNeT, Masked Audio Generation using a Single Non-Autoregressive Transformer. > > MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples conditioned on text descriptions. It is a masked generative non-autoregressive Transformer trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike prior work on masked generative audio Transformers, such as SoundStorm and VampNet, MAGNeT doesn't require semantic token conditioning, model cascading or audio prompting, and employs a full text-to-audio using a single non-autoregressive Transformer. ### Neural Audio Codecs - https://haoheliu.github.io/SemantiCodec/ - > SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound - > Highlights @@ -261,34 +289,19 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://arxiv.org/abs/2210.13438 - > High Fidelity Neural Audio Compression - > We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary that efficiently reduces artifacts and produce high-quality samples. We introduce a novel loss balancer mechanism to stabilize training: the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss. Finally, we study how lightweight Transformer models can be used to further compress the obtained representation by up to 40%, while staying faster than real time. We provide a detailed description of the key design choices of the proposed model including: training objective, architectural changes and a study of various perceptual loss functions. We present an extensive subjective evaluation (MUSHRA tests) together with an ablation study for a range of bandwidths and audio domains, including speech, noisy-reverberant speech, and music. Our approach is superior to the baselines methods across all evaluated settings, considering both 24 kHz monophonic and 48 kHz stereophonic audio. ### Audio Super Resolution - https://github.com/haoheliu/versatile_audio_super_resolution - > AudioSR: Versatile Audio Super-resolution at Scale - > Versatile audio super resolution (any -> 48kHz) with AudioSR. - > Pass your audio in, AudioSR will make it high fidelity! > > Work on all types of audio (e.g., music, speech, dog, raining, ...) & all sampling rates. - https://replicate.com/nateraw/audio-super-resolution - > AudioSR: Versatile Audio Super-resolution at Scale ### Unsorted - https://cassetteai.com/ - > Cassette is your Copilot for AI Music Generation. -
0xdevalias revised this gist
Jun 10, 2024 . 1 changed file with 44 additions and 5 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -11,8 +11,9 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - [ComfyUI](#comfyui) - [Unsorted](#unsorted) - [Song / Audio Generation](#song--audio-generation) - [Udio](#udio) - [Suno](#suno) - [Stable Audio](#stable-audio) - [Unsorted](#unsorted-1) - [See Also](#see-also) - [ollama](#ollama) @@ -132,6 +133,11 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus ## Song / Audio Generation ### Udio - https://www.udio.com/ - > Udio | Make your music ### Suno - https://www.suno.ai/ @@ -140,10 +146,43 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - https://github.com/suno-ai/bark - > Text-Prompted Generative Audio Model ### Stable Audio - https://www.stableaudio.com/ - > Stable Audio > Create music with AI - https://www.stableaudio.com/user-guide/text-to-audio - > Text-to-audio - https://www.stableaudio.com/user-guide/audio-to-audio - > Audio-to-audio - https://www.stableaudio.com/user-guide/model-2 - > Stable Audio 2.0 Model - > Our groundbreaking Stable Audio AudioSparx 2.0 model has been designed to generate full tracks with coherent structure at 3 minutes and 10 seconds. Our new model is available for everyone to generate full tracks on our Stable Audio product. - > Key features: > > - Stable Audio 2.0 sets a new standard in AI generated audio, producing high-quality, full tracks with coherent musical structure up to three minutes in length at 44.1KHz stereo. > - The new model introduces audio-to-audio generation by allowing users to upload and transform samples using natural language prompts. > - Stable Audio 2.0 was exclusively trained on a licensed dataset from the AudioSparx music library, honoring opt-out requests and ensuring fair compensation for creators. - https://stability.ai/news?tags=Audio - https://stability.ai/news/stable-audio-using-ai-to-generate-music - > Announcing Stable Audio, a product for music & sound generation - https://stability.ai/news/stable-audio-2-0 - > Introducing Stable Audio 2.0 - https://stability.ai/news/introducing-stable-audio-open - > Introducing Stable Audio Open - An Open Source Model for Audio Samples and Sound Design - > Key Takeaways: > - Stable Audio Open is an open source text-to-audio model for generating up to 47 seconds of samples and sound effects. > - Users can create drum beats, instrument riffs, ambient sounds, foley and production elements. > - The model enables audio variations and style transfer of audio samples. - https://huggingface.co/stabilityai/stable-audio-open-1.0 - > Stable Audio Open 1.0 generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts. It comprises three components: an autoencoder that compresses waveforms into a manageable sequence length, a T5-based text embedding for text conditioning, and a transformer-based diffusion (DiT) model that operates in the latent space of the autoencoder. - > This model is made to be used with the `stable-audio-tools` library for inference - https://github.com/Stability-AI/stable-audio-tools - > stable-audio-tools > Training and inference code for audio generation models - https://github.com/Stability-AI/stable-audio-tools#fine-tuning - > Fine-tuning > Fine-tuning a model involves continuning a training run from a pre-trained checkpoint. ### Unsorted -
0xdevalias revised this gist
Jun 9, 2024 . 1 changed file with 17 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -190,6 +190,22 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus - > A JAX Implementation of the Descript Audio Codec - > Descript Audio Codec (.dac) is a high-fidelity general neural audio codec introduced in the paper "High-Fidelity Audio Compression with Improved RVQGAN". > This repository is an unofficial JAX implementation of the PyTorch-based DAC and has no affiliation with Descript. - https://github.com/AudiogenAI/agc - > Audiogen Codec (agc) > We are announcing the open source release of Audiogen Codec (agc) π. A low compression 48khz stereo neural audio codec for general audio, optimizing for audio fidelity π΅. > > It comes in two flavors: > > - agc-continuous π KL regularized, 32 channels, 100hz. > - agc-discrete π’ 24 stages of residual vector quantization, 50hz. > > AGC (Audiogen Codec) is a convolutional autoencoder based on the DAC architecture, which holds SOTA π. We found that training with EMA and adding a perceptual loss term with CLAP features improved performance. These codecs, being low compression, outperform Meta's EnCodec and DAC on general audio as validated from internal blind ELO games π². > > We trained (relatively) very low compression codecs in the pursuit of solving a core issue regarding general music and audio generation, low acoustic quality and audible artifacts, which hinder industry use for these models π«πΆ. Our hope is to encourage researchers to build hierarchical generative audio models that can efficiently use high sequence length representations without sacrificing semantic abilities π§ . > > This codec will power Audiogen's upcoming models. Stay tuned! π - https://audiogen.notion.site/Audiogen-Codec-Examples-546fe64596f54e20be61deae1c674f20 - > Audiogen Codec Examples - https://github.com/facebookresearch/encodec - > EnCodec: High Fidelity Neural Audio Compression - > State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio. @@ -234,6 +250,7 @@ Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus > AudioCraft provides the code and models for MAGNeT, Masked Audio Generation using a Single Non-Autoregressive Transformer. > > MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples conditioned on text descriptions. It is a masked generative non-autoregressive Transformer trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike prior work on masked generative audio Transformers, such as SoundStorm and VampNet, MAGNeT doesn't require semantic token conditioning, model cascading or audio prompting, and employs a full text-to-audio using a single non-autoregressive Transformer. - https://cassetteai.com/ - > Cassette is your Copilot for AI Music Generation. >
NewerOlder