Two new corpora are now available via the Brigham Young University collection.
Here’s more information from BYU:
The TV Corpus contains 325 million words of data in 75,000 TV episodes from the 1950s to the current time.
The Movies Corpus contains 200 million words of data in more than 25,000 movies from the 1930s to the current time.
All of the 75,000 TV episodes, and 25,000+ movies, are tied in to their IMDB entry, which means that you can create Virtual Corpora using extensive metadata — year, country, series, rating, genre, plot summary, etc.
Both Corpora allow you to look at variation over time (1950s-1970s to 1990s-2010s) and variation between dialects (e.g. American and British English). In this sense, the corpus is related to many other corpora of English that we have created, which offer unparalleled insight into variation in English.
You can find the corpora at https://corpus.byu.edu/corpora.asp and to use it you will need to register using your university email account