Transcriptor

https://github.com/aphananthe42/transcriptor/assets/68156481/d6fe689d-ee05-4809-8460-de8522bdd66e

**The recorded content of the audio data**
ใ€Œๆœฌๆ—ฅใฏใ”ๆฅๅ ดใ„ใŸใ ใใพใ—ใฆใ€่ช ใซใ‚ใ‚ŠใŒใจใ†ใ”ใ–ใ„ใพใ™ใ€‚  ้–‹ๆผ”ใซๅ…ˆ็ซ‹ใกใพใ—ใฆใ€ใŠๅฎขๆง˜ใซใŠ้ก˜ใ„็”ณใ—ไธŠใ’ใพใ™ใ€‚ๆบๅธฏ้›ป่ฉฑใชใฉใ€้Ÿณใฎๅ‡บใ‚‹ใ‚‚ใฎใฎ้›ปๆบใฏใŠๅˆ‡ใ‚Šใใ ใ•ใ„ใ€‚ใพใŸ่จฑๅฏใฎใชใ„้Œฒ้Ÿณใƒปๆ’ฎๅฝฑใฏใ”้ ๆ…ฎใใ ใ•ใ„ใ€‚็š†ๆง˜ใฎใ”ๅ”ๅŠ›ใ‚’ใ‚ˆใ‚ใ—ใใŠ้ก˜ใ„ใ„ใŸใ—ใพใ™ใ€‚ใ€

Overview

This is a CLI tool for transcribing and summarizing audio data. It can also distinguish speakers and output the transcription separately for each speaker, useful for meeting minutes.

Requirement

  • Deno >= 1.41.0
  • Amazon S3 Bucket
  • AWS IAM Access key pair
  • OpenAI API key

Technologies

  • Deno
  • TypeScript
  • Amazon S3
  • Amazon Transcribe

System

Screenshot 2024-03-07 at 0 54 05

  1. Transcriptor put audio data to S3.
  2. AmazonTranscribe read audio data from S3.
  3. AmazonTranscribe output transcription result to S3.(same bucket as the one where the audio data is stored.)
  4. Transcriptor get transcription result from S3.
  5. Transcriptor summarize transcription result via OpenAI API.

Usage

0. Clone this repository

$ git clone https://github.com/aphananthe42/transcriptor

1. Create .env file and fill in environment variables as per the following example.

$ touch .env
# or add the following if .env already exists.
AWS_ACCESS_KEY_ID="YOUR_AWS_IAM_ACCESS_KEY_ID"
AWS_SECRET_ACCESS_KEY="YOUR_AWS_IAM_SECRET_ACCESS_KEY"
AWS_REGION="your-aws-region"
TRANSCRIPTOR_S3_BUCKET_NAME='your-s3-bucket-name'

OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
OPENAI_GPT_MODEL="your-prefer-gpt-model(ex. gpt-3.5-turbo)"
TRANSCRIPTOR_SYSTEM_PROMPT="system prompt for summarizing with GPT"

2. Run script like below.

deno run --allow-env --allow-sys --allow-read --allow-net src/app.ts --file='path/to/your/audio/data/to/summarize

Argument options

--lang='ja-JP'
// The language spoken in the audio file.
// default: 'ja-JP'

--model='gpt-3.5-turbo'
// The name of the GPT model used for summarizing.
// default: 'gpt-3.5-turbo'

--speakerCount='4'
// The number of speakers in the audio data.
// default: 1

License

MIT License