Home > Technology > Building Useful Agents 01

Building Useful Agents 01

This is a POC. Hunting through multiple pages of content while sites throw up multiple random pages triggered by javascript is very annoying. This useful agent requires a file called show_list.txt including PERL regular expressions if you are so inclined. It’s a basic list of shows you want it to find.

eg.g1gz0
3 Percent
^Aftermath S0
Agent X
Aquarius (US)*
Ash vs Evil Dead
Better Call Saul
Blindspot S0
DC[s]* Legends
Elementary S0

The file is read into the script and all the spaces are replaced by \W* to cope with . _ etc.

This script will load and parse the website looking for words matching the expressions and create a HTML page with links.

The last part of the script also looks for the latest version of the file I currently have and then I can decide whether I need a new version or not.

Now I can run this agent to the page depth I define and it will return matches based on the shows in show_list.txt

At a page depth of 1 it runs in about 5 seconds, at a page depth of 4 it runs in about 10 seconds. Which is so much more efficient than manually checking those pages myself.

This can be be extended (and you can also use other tools that have been around for a long time) however I find that I usually learn more building it myself.

#!/bin/bash
#useful sites:
# Multi line grep http://stackoverflow.com/questions/2686147/how-to-find-patterns-across-multiple-lines-using-grep
# command: pcregrep -M '<tr name="hover" class="forum_header_border">.*(\n|.)*?</tr>' homepage.txt
# Helpful site for regex https://regex101.com/

if [ $# -eq 0 ]; then
echo "No arguements supplied - defaulting for 4 pages"
PAGES=4
else
PAGES=$1
fi

echo $PAGES

SHOW_LIST=""
COUNT=0
while IFS='' read -r line || [[ -n "$line" ]]; do
echo -n "Text read from file: $line"
SHOW_LIST1=$(echo $line | sed 's_[.\ ]_\\W*_g')
echo " --> $SHOW_LIST1"
if [ $COUNT -eq 0 ]; then
SHOW_LIST=$SHOW_LIST1
else
SHOW_LIST="$SHOW_LIST|$SHOW_LIST1"
# echo $SHOW_LIST

fi
((COUNT ))
done < "show_list.txt"
echo $SHOW_LIST

SHOWS=$SHOW_LIST
URL="https://aaaaa.ag/page_&quot;
RESULT=""
for i in $(seq 0 $PAGES);
do
if [ $i -eq 0 ]; then
RESPONSE=$(curl -s "https://aaaaa.ag/&quot;)
RESULT=$(echo "$RESPONSE" | grep -iP "$SHOWS" | grep -iP "download|magnet" )
echo result="$RESULT"
echo -------------
else

URL1=$URL$i
echo $URL1
PAGE="page_$i.txt"
echo $PAGE
# RESPONSE=$(cat $PAGE)
RESPONSE=$(curl -s $URL1)
#RESULT1=$(echo "$RESPONSE" | grep -iP "$SHOWS" | grep -i download | grep -v 720p)
RESULT1=$(echo "$RESPONSE" | grep -iP "$SHOWS" | grep -iP "download|magnet")
echo result1="$RESULT1"
RESULT="$RESULT$RESULT1"
echo \n result="$RESULT"
echo -------------
fi
done

echo
echo ****** FINAL ******

echo "$RESULT"
echo "$RESULT" > result.txt

OUTPUT=$(echo "$RESULT" | sed s/title=\"/\>/g | sed s/\"\>\</\</g | sed s/a\>/a\>\<p\>/g)
echo
echo
echo "$(date) <BR>PAGES=$PAGES<P>" > _show_list.html
echo $OUTPUT >> _show_list.html

---- you will need to customise this for your system -----
echo "<P><P> File List <P><P>" >> _show_list.html
echo "<table border=0>" >> _show_list.html

FILES=$(find /data01/download/complete/ -maxdepth 5 -printf "%f\n")
#echo "$FILES"

for f in $(echo $SHOW_LIST | sed 's/|/ /g')
do
NICE_FILE=$(echo $f | sed 's/\\W\*/ /g')
echo "<TR><TD>" >> _show_list.html
echo -n "$NICE_FILE</TD>" >> _show_list.html
FILE_NAME=$(echo "$FILES" | sed 's/[.-]/ /g' | grep -iP "$f" | sort -rf | head -1)
echo "<TD> $FILE_NAME </TD></TR>" >> _show_list.html

done
echo "</table>" >> _show_list.html
#echo "<P><P><P>ALL FIELS<P><P>$FILES" >> _show_list.html

cp _show_list.html /data01/download/complete/_showlist.html

echo "**** COMPLETE ****"

Advertisements
Categories: Technology Tags: , ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s