My first code created by Gemini
I have been slowly starting to use AI on my Mac Studio. Since I am embedded in the Google ecosystem, my AI of choice is Gemini.
I spent most of last weekend taking a Zoom course called C135 Generative AI Mastermind INTL. It opened my eyes to see what could be done using AI. Their bottom line was that it is vital to create a structured requirements document to get good results. This week I decided to try and do something difficult. I tried this using Bolt (recommended in the course, but ran out of tokens, so I tried it using Gemini (I had a Pro subscription).
For over a decade I have been videoing the concerts presented by the Oak Ridge Civic Music Association (ORCMA), and placing them in our YouTube channel https://youtube.com/orcma.

If you click on the Videos, you will see all of the videos I have posted since 2013. Unfortunately, when I began doing this, I had no idea how large this collection would become, and I had no set format for creating the description of the video. New videos created since I have been using DaVinci Resolve Studio have tags that are uploaded to YouTube and which create an easy-to-parse description:
Old videos presented the information a gazillion different ways, making it very difficult to parse; sometimes the metadata were missing altogether.
So armed with a Gemini Pro subscription, and the knowledge that could do a lot I decided to make a searchable Web page that would list all performances that I have videoed. I note that not all videos are Public because of things such as copyright or contractual limitations. However, I wanted to list the contents of these concerts so that people (e.g., our Music Director) could see what pieces have been performed.
Check out the resulting Web page. It is not perfect, but the information is there, and it is a waste of time solving all edge cases.
Gemini summarized our journey together nicely:
To help you write your blog post, here is the reconstruction of the evolution of this project, starting from your original goal down through the specific architectural instructions we solved together.
The Core Objective
Your original goal was to build an automated, offline pipeline (update_archive.py) capable of scraping the Oak Ridge Civic Music Association (ORCMA) YouTube channel. The ultimate objective was to parse the raw video titles and description text, cleanly extract the classical metadata, and structurally rebuild a clean, searchable SQLite database (orcma_archive.db) to power a video gallery on the website.
The Evolution of Your Technical Instructions
As the project scaled, your requirements grew from a basic web-scraper into a highly customized, resilient extraction engine. Here are the core instruction milestones you established:
1. The Dual-Stage Ingestion Pipeline
-
Public Extraction in Batches: Scrape the main public channel (
@orcma/videos), but fetch it in explicit chunks of 90 videos at a time to prevent long-running connection drops. -
Unlisted Archive Ingestion: Explicitly iterate through a hardcoded array of private/unlisted YouTube URLs so they are preserved in the database even though they don't appear on the public feed.
2. Deep Description Parsing & Classical Metadata Extraction
-
The Classical Split Rule: Look for timestamps (e.g.,
07:36) or known composer prefixes to identify individual tracks within a single long concert video. Split the text using hyphens, dashes, or semicolons to isolate the Piece from the Soloist/Performer. -
The "Last, First" Formatting Constraint: For database uniformity, all composer and artist names must automatically normalize to a traditional indexed format (e.g., transforming Johann Sebastian Bach into
Bach, Johann Sebastian). -
The Movement & Junk Filter: Implement a rule-based regex to explicitly ignore applause, tuning, intermission, or movement markers (like I. Allegro or II. Adagio) so they don't generate duplicate or broken database entries.
3. Bypassing YouTube’s Anti-Bot Defense
-
Humanizing Delays: Enforce a strict anti-scraping policy using an average 10-second sleep interval (
sleep_interval: 5,max_sleep_interval: 15) between video fetches to mimic a human browser and protect your IP from being flagged. -
The Infinite Cookie Rotator: Cycle sequentially through multiple Netscape-formatted session files (
cookies_1.txt,cookies_3.txt, etc.). Shift to the next identity automatically every 90 requests using a modulo loop wrap-around, ensuring the script checks for file existence on disk so a missing or deleted cookie file won't cause a crash. -
Fail-Fast Mechanics: Prevent silent infinite retry loops on HTTP or socket errors by enforcing fixed timeouts and limiting retries to 3 attempts.
4. Data Sanitization & Edge-Case Handling
-
The Master Lists: Cross-reference strings against custom Python arrays (
KNOWN_COMPOSERS,KNOWN_CONDUCTORS,KNOWN_ENSEMBLES) to prioritize exact matches before falling back to raw regex guessing. -
Festival Context Preservation: If a standalone video has no internal timestamps but the description mentions a festival series (like Rock to Bach), capture that context and explicitly populate it into the
concert_titlefield while remapping the isolated title to thesoloistcolumn. -
Blank-Space Hygiene: Instead of letting the script output literal placeholder strings like
"Unknown"or"Unknown Composer"for uncredited fields, sanitize the outputs down to perfectly clean, empty rows ("").
The "Overnight Run" Breakthrough
A great focal point for your blog is how we solved the illusion of the script getting "stuck." Because the script was intentionally pausing for up to 15 seconds per video to stay under YouTube's radar, processing a batch of 90 videos meant it would sit in complete silence for 15 minutes.
By turning on yt-dlp's heartbeat logging while turning off infinite retries, you transformed a seemingly frozen program into a highly stable, self-rotating automation engine that successfully completed its massive ingestion overnight!
Here is the python code to scrape YouTube with minor changes for security:
import sqlite3 import re import yt_dlp import shutil import os from datetime import datetime KNOWN_COMPOSERS = [ "Ludwig van Beethoven", "Wolfgang Amadeus Mozart", "Johann Sebastian Bach", "Richard Wagner", "Franz Joseph Haydn", "Joseph Haydn", "Johannes Brahms", "Franz Schubert", "Peter Ilyich Tchaikovsky", "George Frideric Handel", "Igor Stravinsky", "Robert Schumann", "Frederic Chopin", "Felix Mendelssohn", "Claude Debussy", "Franz Liszt", "Antonín Dvořák", "Giuseppe Verdi", "Gustav Mahler", "Antonio Vivaldi", "Richard Strauss", "Serge Prokofiev", "Dmitri Shostakovich", "Béla Bartók", "Hector Berlioz", "Anton Bruckner", "Giovanni Pierluigi da Palestrina", "Claudio Monteverdi", "Jean Sibelius", "Maurice Ravel", "Ralph Vaughan Williams", "Modest Mussorgsky", "Giacomo Puccini", "Henry Purcell", "Gioacchino Rossini", "Edward Elgar", "Sergei Rachmaninoff", "Camille Saint-Saëns", "Josquin Des Prez", "Nikolai Rimsky-Korsakov", "Carl Maria von Weber", "Jean-Philippe Rameau", "Jean-Baptiste Lully", "Gabriel Fauré", "Edvard Grieg", "Christoph Willibald Gluck", "Arnold Schoenberg", "Charles Ives", "Paul Hindemith", "Olivier Messiaen", "Aaron Copland", "Francois Couperin", "William Byrd", "Erik Satie", "Benjamin Britten", "Bedrich Smetana", "Bedrick Smetana", "César Franck", "Alexander Nikolayevich Scriabin", "Georges Bizet", "Domenico Scarlatti", "Georg Philipp Telemann", "Anton Webern", "Roland de Lassus", "George Gershwin", "Gaetano Donizetti", "Carl Philipp Emanuel Bach", "Archangelo Corelli", "Thomas Tallis", "Johann Strauss II", "Leos Janácek", "Guillaume de Machaut", "Alban Berg", "Alexander Borodin", "Vincenzo Bellini", "Charles Gounod", "Jules Massenet", "Francis Poulenc", "Giovanni Gabrieli", "Pérotin", "Heinrich Schütz", "John Cage", "Giovanni Battista Pergolesi", "John Dowland", "Gustav Holst", "Dietrich Buxtehude", "Ottorino Respighi", "Guillaume Dufay", "Hugo Wolf", "Carl Nielsen", "William Walton", "Darius Milhaud", "Orlando Gibbons", "Giacomo Meyerbeer", "Samuel Barber", "Tomás Luis de Victoria", "Léonin", "Manuel de Falla", "Hildegard von Bingen", "Mikhail Glinka", "Alexander Glazunov", "Don Carlo Gesualdo", "Astor Piazzolla", "Juan Bautista Plaza", "Luiz Bonfá", "Shostakovich", "W. A. Mozart", "Mozart", "William Grant Still", "Ennio Morricone", "Samuel Coleridge Taylor", "Samuel Coleridge-Taylor" ] KNOWN_CONDUCTORS = [ "Dan Allcott", "Jaclyn Johnson", "Ian Passmore", "Sarah Henrich", "Wilbur Lin", "Kevin Class", "James Fellenbaum", "Régulo Stabilito", "Cornelia Laemmli Orth","Carlos Hernandez-Baez",'Carlos Hernandez-Baez' ] KNOWN_ENSEMBLES = [ "Oak Ridge Symphony Orchestra", "Oak Ridge Symphony and Chorus", "Oak Ridge Symphony Wind Ensemble", "Oak Ridge Symphony", "Oak Ridge Chorus", "Trillium Trio", "Poulenc Trio", "Frisson", "Borealis Wind Quintet", "Tesla Quartet", "Mirai Brass Quintet", "Akropolis Reed Quintet", "Ariel Quartet", "Karen Kartal & Friends", "Los Angeles Guitar Quartet", "Edgar Meyer", "Cuartete Latinoamericana and Jiji", "Trio Matisse", "Cumberland Piano Trio" ] class CookieRotator: def __init__(self, cookies, rotate_every=90): self.cookies = [c for c in cookies if os.path.exists(c)] self.rotate_every = rotate_every self.request_count = 0 self.index = 0 def get_current(self): return self.cookies[self.index] if self.cookies else None def add_requests(self, count): self.request_count += count while self.request_count >= self.rotate_every: self.request_count -= self.rotate_every if self.cookies: self.index = (self.index + 1) % len(self.cookies) print(f"\n[🔄] Rotation threshold met! Switching to: {self.get_current()}\n") def is_movement(clean_text): text = clean_text.strip() if re.search(r'^(movement|variation|var\.?|satz|act|scene|teil)\s+([IVXLCDM]+|\d+)', text, re.IGNORECASE): return True if re.search(r'^[IVXLCDM]+\b\.?\s+', text): return True if re.search(r'^\d+\.?\s+', text): return True if re.search(r'^(allegro|adagio|andante|moderato|presto|largo|menuet|scherzo|finale|rondo|gavotte|bourrée|gigue|overture|aire|aria)\b', text, re.IGNORECASE): return True if not re.search(r'[-–—:]', text) and len(text.split()) < 8: return True return False def parse_classical_title(video_title): # 1. Strip promotional text from the very beginning safe_title = re.sub(r'(?i)^(?:ORCMA|The Oak Ridge Civic Music Association)\s+presents:?\s*', '', video_title).strip() # 2. Convert long strings of dots (......) into a standard hyphen safe_title = re.sub(r'\.{2,}', ' - ', safe_title) # 3. Existing volume/flat/sharp cleanup safe_title = re.sub(r'^.*?(?:Part|Vol\.?)\s*\d+[\.:]?\s*', '', safe_title, flags=re.IGNORECASE) safe_title = re.sub(r'\b([A-G])\s*[-–—]\s*(flat|sharp)\b', r'\1 \2', safe_title, flags=re.IGNORECASE) # --- Master Composer List Check --- for composer in KNOWN_COMPOSERS: if re.search(rf'\b{re.escape(composer)}\b', safe_title, re.IGNORECASE): # FIXED: Made parentheses optional around the lifespan dates pattern = rf'\b{re.escape(composer)}\b\s*[()]?\s*\d{{4}}[-–—]\d{{4}}\s*[()]?' cleaned_piece = re.sub(pattern, '', safe_title, flags=re.IGNORECASE).strip() # If the above didn't catch it, clean just the standalone name token if cleaned_piece == safe_title.strip(): cleaned_piece = re.sub(rf'\b{re.escape(composer)}\b', '', safe_title, flags=re.IGNORECASE).strip() # Extra cleanup for lingering standalone lifespan blocks left in the title cleaned_piece = re.sub(r'\b\d{4}[-–—]\d{4}\b', '', cleaned_piece).strip() cleaned_piece = re.sub(r'^[,\-–—:;\s]+|[,\-–—:;\s]+$', '', cleaned_piece).strip() cleaned_piece = re.sub(r'\(\s*\)\s*$', '', cleaned_piece).strip() name_parts = composer.split() if len(name_parts) > 1: formatted_composer = f"{name_parts[-1]}, {' '.join(name_parts[:-1])}" else: formatted_composer = composer return formatted_composer, cleaned_piece # --- Fallback Logic --- nl_match = re.match(r'(?i)^Music of\s+(.*?)(?:\s+for\b|\s+performed by\b|\s+with\b|$)', safe_title) if nl_match: raw_composer = nl_match.group(1).strip() piece = video_title.strip() else: # FIXED: Also updated the fallback path date-stripper to handle optional parentheses date_pattern = r'[()]?\s*(?:b\.|c\.)?\s*\d{4}\s*[-–—]*\s*(?:\d{2,4})?\s*[()]?|\b\d{4}\s*[-–—]+\s*\d{2,4}\b|\b(?:b\.|c\.)\s*\d{4}\b' date_split = re.split(date_pattern, safe_title, maxsplit=1) if len(date_split) == 2 and date_split[1].strip(): raw_composer = date_split[0].strip() piece = date_split[1].strip() piece = re.sub(r'^[-–—:\s]+', '', piece) else: parts = re.split(r'\s+[-]\s+|[–—:]', safe_title, maxsplit=1) if len(parts) < 2: return "", video_title.strip() raw_composer = parts[0].strip() piece = parts[1].strip() word_count = len(raw_composer.split()) non_name_words = ['symphony', 'chorus', 'concert', 'festival', 'part', 'music', 'celebration', 'orchestra', 'beginnings', 'band', 'first', 'second', 'third', 'fourth', 'fifth', 'sixth', 'seventh', 'eighth', 'ninth', 'tenth', 'set', 'loss'] if word_count > 6 or any(w in raw_composer.lower() for w in non_name_words): return "", video_title.strip() raw_composer = re.sub(r'\(.*?\)', '', raw_composer) raw_composer = re.sub(r'\d+', '', raw_composer) raw_composer = re.sub(r'[:;,()~]', '', raw_composer) raw_composer = " ".join(raw_composer.split()) name_parts = raw_composer.split() if len(name_parts) <= 1: formatted_name = raw_composer else: last_name = name_parts[-1] first_names = " ".join(name_parts[:-1]) formatted_name = f"{last_name}, {first_names}" return formatted_name, piece def fetch_youtube_data(url, is_unlisted=False, cookie_file=None, playlist_range=None): js_executable = shutil.which("node") or shutil.which("nodejs") ydl_opts = { 'extract_flat': False, 'skip_download': True, 'quiet': False, 'verbose': False, 'ignoreerrors': True, 'socket_timeout': 5, 'retries': 3, 'fragment_retries': 3, 'sleep_interval': 2, 'max_sleep_interval': 10, 'extractor_args': {'youtube': {'client': ['android', 'ios']}}, 'js_runtimes': {'node': {'path': js_executable}}, } if cookie_file: ydl_opts['cookiefile'] = cookie_file if playlist_range: ydl_opts['playlist_items'] = playlist_range concert_data = [] with yt_dlp.YoutubeDL(ydl_opts) as ydl: try: info = ydl.extract_info(url, download=False) if not info: return [] videos = info.get('entries', [info]) for video in videos: if not video: continue video_title = video.get('title', 'Unknown Concert') if is_unlisted: video_link = "" else: video_link = f"https://www.youtube.com/watch?v={video.get('id')}" description = video.get('description', '') raw_date = video.get('upload_date', 'Unknown') formatted_date = raw_date if raw_date and raw_date.isdigit() and len(raw_date) == 8: formatted_date = datetime.strptime(raw_date, "%Y%m%d").strftime("%b %d, %Y") conductor = "Unknown" search_text = f"{video_title} \n {description}" for kc in KNOWN_CONDUCTORS: if re.search(rf'\b{re.escape(kc)}\b', search_text, re.IGNORECASE): conductor = kc break if conductor == "Unknown": cond_match = re.search(r'(?:Conductor:\s*|Conducted by\s*)([A-ZÀ-ÿ][a-zÀ-ÿ]+(?:\s+[A-ZÀ-ÿ][a-zÀ-ÿ]+){0,3})', search_text) if not cond_match: cond_match = re.search(r'([A-ZÀ-ÿ][a-zÀ-ÿ]+(?:\s+[A-ZÀ-ÿ][a-zÀ-ÿ]+){0,3})(?:,\s*Conductor|\s+-\s+Conductor|\s+conducts)', search_text) if cond_match: conductor = cond_match.group(1).strip() main_composer, main_piece = parse_classical_title(video_title) global_performer = "Unknown" for ensemble in sorted(KNOWN_ENSEMBLES, key=len, reverse=True): if re.search(rf'\b{re.escape(ensemble)}\b', search_text, re.IGNORECASE): global_performer = ensemble break if global_performer == "Unknown": perf_match = re.search(r'(?i)(?:performed by|with)\s+([^–—\-;:,(\n]+)', video_title) if perf_match: global_performer = perf_match.group(1).strip() elif re.search(r'(?i)\b(trio|quartet|quintet|ensemble|orchestra|chorus|band|winds|brass)\b', video_title) or (re.search(r'(?i)\bsymphony\b', video_title) and not re.search(r'(?i)\bsymphony\s+(no\.?|#|\d+)', video_title)): gp_split = re.split(r'(?i)[-–—:;,]|\b(?:at|in|plays|part|performs|presents)\b', video_title)[0].strip() gp_split = re.sub(r'(?i)^(ORCMA|Oak Ridge Civic Music Association)\s*(presents)?\s*', '', gp_split).strip() if len(gp_split.split()) <= 10: global_performer = gp_split tracks_found = 0 if description: lines = description.split('\n') for i, line in enumerate(lines): line = line.strip() if not line: continue if re.match(r'(?i)^\d{1,2}:\d{2}\s*[ap]\.?m\.?', line): continue clean_text = line is_track_line = False if re.match(r'^\d{1,2}:\d{2}(:\d{2})?\s*[-–—]*\s*', line): clean_text = re.sub(r'^\d{1,2}:\d{2}(:\d{2})?\s*[-–—]*\s*', '', line).strip() is_track_line = True else: for composer in KNOWN_COMPOSERS: if re.match(rf'^{re.escape(composer)}\s*[:\-–—;]\s+', line, re.IGNORECASE): is_track_line = True break if not is_track_line: continue date_pattern = r'\(\s*(?:b\.|c\.)?\s*\d{4}\s*[-–—]*\s*(?:\d{2,4})?\s*\)|\b\d{4}\s*[-–—]+\s*\d{2,4}\b|\b(?:b\.|c\.)\s*\d{4}\b' temp_no_dates = re.sub(date_pattern, '', clean_text) if not re.search(r'[-–—:]', temp_no_dates) and (i + 1 < len(lines)): next_line = lines[i+1].strip() if next_line and not re.match(r'^\d{1,2}:\d{2}', next_line): clean_text = clean_text + " - " + next_line junk_keywords = ['start', 'musicians', 'program notes', 'applause', 'intermission', 'tuning', 'concert', 'break'] if any(junk in clean_text.lower() for junk in junk_keywords): continue if is_movement(clean_text): continue composer, piece_and_soloist = parse_classical_title(clean_text) parts = re.split(r'\s+[-–—;]\s+', piece_and_soloist, maxsplit=1) if len(parts) == 2: piece = parts[0].strip() potential_soloist = parts[1].strip() # If the text after the hyphen is a tempo marking, do not treat it as a person tempos = r'(?i)\b(presto|allegro|andante|adagio|vivace|largo|moderato|rondo|gigue|finale)\b' if re.match(tempos, potential_soloist): piece = f"{piece} - {potential_soloist}" potential_soloist = "" if len(parts) == 2: piece = parts[0].strip() potential_soloist = parts[1].strip() is_conductor = False matched_kc = None for kc in KNOWN_CONDUCTORS: if kc.lower() in potential_soloist.lower(): is_conductor = True matched_kc = kc break if not is_conductor and "conductor" in potential_soloist.lower(): is_conductor = True if is_conductor: if matched_kc: leftover = re.sub(re.escape(matched_kc), '', potential_soloist, flags=re.IGNORECASE).strip() leftover = re.sub(r'(?i)\bconductor\b', '', leftover).strip() leftover = re.sub(r'^[,;\-\s]+|[,;\-\s]+$', '', leftover).strip() soloist = leftover if leftover else "Unknown" else: clean_str = re.sub(r'(?i)[,\-\s]*conductor[,\-\s]*', '', potential_soloist).strip() if len(clean_str.split()) <= 4: soloist = "Unknown" else: soloist = potential_soloist else: soloist = potential_soloist else: piece = piece_and_soloist.strip() soloist = "Unknown" if soloist == "Unknown" and global_performer != "Unknown": if not re.search(r'(?i)\b(orchestra|chorus|symphony|band)\b', global_performer): soloist = global_performer concert_data.append({ 'composer': composer, 'piece': piece, 'soloist': soloist, 'conductor': conductor, 'concert_title': video_title, 'link': video_link, 'date': formatted_date }) tracks_found += 1 if tracks_found == 0: fallback_soloist = global_performer if fallback_soloist == "Unknown": perf_match = re.search(r'(?i)performed by\s+(.*)', video_title) if perf_match: fallback_soloist = re.split(r'[-–—]', perf_match.group(1))[0].strip() # NEW: Catch festival name from the description if it's missing in the title final_concert_title = video_title if description and "Rock to Bach" in description: # Extract the specific year variant if it exists, e.g., "Rock to Bach 2020" festival_match = re.search(r'(?i)(Rock to Bach(?:\s+\d{4})?)', description) if festival_match: final_concert_title = festival_match.group(1).strip() concert_data.append({ 'composer': main_composer, 'piece': main_piece, 'soloist': fallback_soloist, 'conductor': conductor, 'concert_title': final_concert_title, 'link': video_link, 'date': formatted_date }) except Exception as e: print(f"An error occurred while connecting to YouTube: {e}") return [] return concert_data def update_archive(db_path, raw_concert_data): conn = sqlite3.connect(db_path) cursor = conn.cursor() cursor.execute(''' CREATE TABLE IF NOT EXISTS archive ( id INTEGER PRIMARY KEY AUTOINCREMENT, composer TEXT, piece TEXT, soloist TEXT, conductor TEXT, date TEXT, concert_title TEXT, link TEXT ) ''') cursor.execute("DELETE FROM archive") try: cursor.execute("DELETE FROM sqlite_sequence WHERE name='archive'") except sqlite3.OperationalError: pass seen_tracks = set() for entry in raw_concert_data: # Pull values, defaulting to a blank string if missing composer = entry.get('composer', '').strip() piece = entry.get('piece', '').strip() soloist = entry.get('soloist', '').strip() conductor = entry.get('conductor', '').strip() date = entry.get('date', '').strip() concert_title = entry.get('concert_title', '').strip() link = entry.get('link', '').strip() # Sanitize any explicit "Unknown" strings generated during earlier stages if composer == "Unknown": composer = "" if piece == "Unknown": piece = "" if soloist == "Unknown": soloist = "" if conductor == "Unknown": conductor = "" if date == "Unknown": date = "" if concert_title == "Unknown": concert_title = "" if "Rock to Bach" in concert_title or "Rock to Bach" in piece: composer = "Various Artists" if "—" in piece: parts = piece.split("—", 1) soloist = parts[1].strip() if len(parts) > 1 else parts[0].strip() piece = parts[0].strip() elif " - " in piece: parts = piece.split(" - ", 1) soloist = parts[1].strip() if len(parts) > 1 else parts[0].strip() piece = parts[0].strip() else: # NEW fallback: If there is no delimiter to split on, # the "piece" text is almost certainly the artist's name. if soloist == "Unknown" or soloist == "": soloist = piece piece = "Performance" # Or leave blank if preferred if piece == "Unknown" or piece == "": piece = concert_title track_signature = (composer, piece, soloist, conductor, concert_title) if track_signature in seen_tracks: continue seen_tracks.add(track_signature) cursor.execute(''' INSERT INTO archive (composer, piece, soloist, conductor, date, concert_title, link) VALUES (?, ?, ?, ?, ?, ?, ?) ''', (composer, piece, soloist, conductor, date, concert_title, link)) conn.commit() conn.close() print("Database successfully rebuilt and saved!") if __name__ == "__main__": my_db_file = "orcma_archive.db" rotator = CookieRotator( ['cookies_1.txt', 'cookies_3.txt', 'cookies_4.txt'], rotate_every=90 ) print("\n--- STAGE 1: Fetching Public Channel in Batches ---") youtube_playlist_url = "https://www.youtube.com/@orcma/videos" public_data = [] batch_size = 90 start_index = 1 while True: end_index = start_index + batch_size - 1 playlist_range = f"{start_index}-{end_index}" current_cookie = rotator.get_current() print(f"Scraping channel videos {playlist_range} using {current_cookie}...") batch = fetch_youtube_data( youtube_playlist_url, is_unlisted=False, cookie_file=current_cookie, playlist_range=playlist_range ) if not batch: break public_data.extend(batch) rotator.add_requests(len(batch)) if len(batch) < batch_size: break start_index += batch_size print(f"SUCCESS: Found {len(public_data)} public tracks.\n") if len(public_data) == 0: print("CRITICAL ERROR: Failed to fetch public videos.") print("All cookies may be blocked, or the connection dropped.") else: print("--- STAGE 2: Fetching Unlisted Archive ---") UNLISTED_VIDEOS = [ "https://youtu.be/example1", "https://youtu.be/example2" ] unlisted_data = [] for unlisted_url in UNLISTED_VIDEOS: current_cookie = rotator.get_current() tracks = fetch_youtube_data( unlisted_url, is_unlisted=True, cookie_file=current_cookie ) if tracks: unlisted_data.extend(tracks) rotator.add_requests(1) print(f"SUCCESS: Found {len(unlisted_data)} unlisted tracks.\n") print("--- STAGE 3: Rebuilding Database ---") all_data = public_data + unlisted_data update_archive(my_db_file, all_data)
And the code for the Web page display:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>ORCMA Digital Archive</title> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.2/dist/css/bootstrap.min.css" rel="stylesheet"> <style> body { background-color: #f8f9fa; padding-top: 40px; } .table-hover tbody tr:hover { background-color: #f1f3f5; } </style> </head> <body> <div class="container-fluid px-5"> <h1 class="mb-4 text-center">ORCMA Digital Archive</h1> <p class="text-center text-muted mb-4">Database Last Updated: {{ last_updated }}</p> <div class="row justify-content-center mb-4"> <div class="col-md-6"> <form action="/" method="GET" class="d-flex"> <input type="text" name="q" class="form-control me-2" placeholder="Search composers, pieces, performers..." value="{{ search_query }}"> <button type="submit" class="btn btn-primary">Search</button> <a href="/" class="btn btn-outline-secondary ms-2">Clear</a> </form> </div> </div> <div class="card shadow-sm"> <div class="card-body p-0"> <div class="table-responsive"> <table class="table table-striped table-hover mb-0"> <thead class="table-dark"> <tr> <th>Composer</th> <th>Piece</th> <th>Performer</th> <th>Conductor</th> <th>Date</th> <th>Concert Title</th> <th>Link</th> </tr> </thead> <tbody> {% for row in performances %} <tr> <td class="fw-bold">{{ row['composer'] }}</td> <td>{{ row['piece'] }}</td> <td>{{ row['soloist'] if row['soloist'] else '<span class="text-muted">-</span>'|safe }}</td> <td>{{ row['conductor'] }}</td> <td>{{ row['date'] }}</td> <td class="small">{{ row['concert_title'] }}</td> <td> {% if row['link'] %} <a href="{{ row['link'] }}" target="_blank" class="btn btn-sm btn-danger">Watch</a> {% endif %} </td> </tr> {% else %} <tr> <td colspan="7" class="text-center py-4 text-muted">No performances found.</td> </tr> {% endfor %} </tbody> </table> </div> </div> </div> </div> </body> </html>