>头
在这个故事中,我们将深入了解 LangChain’s LLM链 class. 根据 LangChain的文档 LLM链 allows to define a prompt template and then send a list of key value pairs to the prompt template for the large language model to process.
你通常会使用 LLM链 使用像这样的工作流:
步骤如下:
Key_value_list = [ {'内容':“整个英国, fossil fuel companies’ broken promises have left scarred and polluted landscapes, 没有人被追究责任。”, 'title': "As the toxic legacy of opencast 最小值ing in Wales"}, {'内容': "The solution is not more fields but better, 更紧凑, 零残忍和无污染的工厂”, 'title': '我们认为农业好,工厂坏' '},]
以便更好地了解您可以如何使用 LLM链’s we created a simple script which applies multiple prompt templates to TheGuardian的 (一份英国报纸)RSS订阅. RSS feeds contains lists of text based records which can be converted into a key value list.
这个脚本的目的是获得情感, keywords of a set of articles and to categorize the articles on the basis of a pre-defined set of categories.
The script then counts the sentiments, categories and keywords found on the columnist’s RSS feed:
我们使用的模型是gpt-3.5-turbo”
脚本可以在Github上查看:
http://github.com/gilfernandes/llm_chain_out/blob/main/langchain_llm_chain_extract.py
The command line script performs the following operations:
对于sys中的url.argv [1]: process_url (url)def process_url (url): """ 提取每个RSS提要的内容. Sends the 内容 of each RSS feed to the LLMChain to apply the prompts to the extracted records. Creates a data set for each RSS feed which combines the output of the LLM and generates an HTML and Excel 文件 out of it. :param url: RSS提要的url,如e.g: http://www.theguardian.com/profile/georgemonbiot/rss """ 打印(f”处理{url}”) Zipped_results = [] Llm_responses = [] Input_list = extract_rss(url) 对于prompt_templates中的prompt_template: llm_responses.追加(process_llm (input_list, prompt_template)) sentiment_counter = Counter() categores_counter = Counter() 对于zip(input_list, *llm_responses)中的压缩: 情绪={'情绪':压缩[1]['text']} categorized_sentiment = categorize_sentiment(zipped[1]['text']) Sentiment_counter [categorized_sentiment] += 1 sentiment_category = {'sentiment_category': categorized_sentiment} Keywords = {' Keywords ': zipped[2]['text']} Raw_categories = zipped[3]['text'] 分类= {' Classification ': raw_categories} 消毒_topics = sanitize_categories(raw_categories) categories_counter.更新(消毒_topics) Sanitized_categories = {'topics': ",".澳门十大正规赌博娱乐平台(消毒_topics)} Full_record = { * *压缩[0], * *情绪, * *关键字, * * sentiment_category, * *分类, * * 消毒_categories } zipped_results.追加(full_record) Result_df = pd.DataFrame (zipped_results) Title = url.取代 (":", "_").取代 ("/", "_") serialize_results(url, result_df, title, sentiment_counter, categories_counter)
def extract_rss (url): """ 从URL中提取内容和标题. :param url RSS提要的url,如e.g: http://www.theguardian.com/pro文件/georgemonbiot/rs """ 响应=请求.得到(url) tree = ElementTree.fromstring(响应.内容) 内容= [] 对于树中的孩子: 如果孩子.标签== 'channel': 对于child中的channel_child: 如果channel_child.标签== 'item': 内容.追加({“内容”:channel_child [2].Text, 'title': channel_child[0].文本}) 返回内容
Prompt_templates = [( "Please tell me the sentiment of {内容} 有了这个标题: {标题}? Is it very positive, positive, very negative, negative or neutral? " + "Please answer using these expressions: 'very positive', “积极”, “非常消极的”, '消极'或'中性'”), "Please extract the most relevant keywords from {内容} 标题: {标题}. Use a the prefix 'Keywords:' before the list of keywords.", "Please categorize the following 内容 using the following 内容 {内容} 与标题 {标题} using these categories: " + ",".澳门十大正规赌博娱乐平台(accepted_categories)]模型= 'gpt-3.5-turbo”Def process_llm(input_list: list, prompt_template): """ 使用特定模型创建LLMChain对象 :param input_list a list of dictionaries with the 内容 and title of each article :param prompt_template A single prompt template with 内容 and title parameters """ llm = ChatOpenAI(温度=0,模型=模型) 你也可以用另一种型号. Text-davinci-003比gpt-3贵.5-turbo # llm = OpenAI(temperature=0, model='text-davinci-003') llm_chain = LLMChain( llm = llm, 提示= PromptTemplate.from_template (prompt_template) ) 返回llm_chain.应用(input_list)
def sanitize_categories(文本): 文本=文本.低() 消毒= [] 对于accepted_categories中的cat: 如果在文本中: 消毒.追加(cat) 返回消毒def sanitize_keywords(文本): 文本=文本.低() 文本=文本.替换(“关键词:”、“”).带() 消毒= [re].子(r \.$", "", s.Strip())用于文本中的s.分裂(",")) 返回消毒def categorize_sentiment(文本): 文本=文本.低() 如果在文本中“非常消极”: 返回“非常消极” 文本中的Elif 'negative': 返回“负面” Elif在文本中表示“非常积极”: 回复“非常积极” 文本中的Elif “积极”: 返回"正面" 返回“中性”
def serialize_results(url, result_df, title, sentiment_counter, categories_counter): """ 将结果转换为Excel工作表或HTML页面. HTML页面还包含计数器信息. :param url RSS提要url :param result_df The combined raw data and with the LLM output :param title The RSS feed URL with some modified characters :param sentiment_counter The counter with the sentiment information :param categories_counter The counter with the counted categories """ result_df.to_excel (target_folder / f”{标题}.xlsx”) Html_文件 = target_folder/f"{标题}.html” Html_内容 = result_df.to_html(逃避= False) #确保文件是用UTF-8写的 使用open(html_文件, "w", encoding="utf-8")作为文件: 文件.写(html_内容) sentiment_html = generate_sentiment_table(sentiment_counter, "Sentiment") categories_html = generate_sentiment_table(categories_counter, "Category") 使用open(html_文件, encoding="utf8")作为f: 内容 = f""" {re.子(r '.+?theguardian.com/pro文件', '', url)} Sentiment Count {sentiment_html} Categories Count {categories_html} {f.read ()} 博彩app Buying-website-media@forum4women.com Gaming-platform-website-hr@ivantseng.com Gaming-platform-support@salamzone.com Asian-gaming-admin@azarnewsonline.com Crown-betting-contact@eagle1027.com 体育博彩 European-Cup-bowling-help@noriko7.com 博彩app下载 Crown-Sports-contact@apipros.net 91wan网页游戏平台 Crown-Sports-hr@tertemizhaliyikama.com 正规博彩平台 博彩平台 沙巴体育 New-Portuguese-gambling-marketing@silvamkt.com 线上博彩网址 新葡京 育儿论坛 迅雷看看音乐频道 洛阳房地产信息网 奥特美克 魔云手机建站助手 龙泉宝剑网 赣州人才网 临床药师网 淘宝试用中心 大连瓜皮岛旅游网 天逸音响 贵阳新闻网 歪歪频道设计 晨越建管 站点地图 西安高新第一中学初中校区 """ 内容=内容.replace('class="dataframe"', 'class="table table-striped table-hover dataframe"') 使用open(html_文件, "w", encoding="utf8")作为f: f.写(内容)
This is the script output for example for the following columnists:
LangChain’s LLM链 provides a very convenient way to interact with an LLM when you have a list based input to which you want to apply a pre-defined LLM prompt with parameters.
吉尔·费尔南德斯,Onepoint咨询公司
在这里注册