Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 2c70706

Browse files
committed
New code commits for publication of 4th edition of my Java AI book
1 parent e736fe1 commit 2c70706

File tree

71 files changed

+11888
-33
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+11888
-33
lines changed

Makefile

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
11
clean:
22
rm -r -f out
33
(cd clojure_examples; lein clean; rm -r -f lib/* classes)
4+
5+
mapreduce_example:
6+
rm -r -f mr_temp
7+
mkdir -p mr_temp/nlp/com/knowledgebooks/mapreduce
8+
mkdir -p mr_temp/nlp/com/knowledgebooks/nlp/util
9+
cp src/nlp/com/knowledgebooks/mapreduce/NameFinder.java mr_temp/nlp/com/knowledgebooks/mapreduce/
10+
cp src/nlp/com/knowledgebooks/nlp/util/ScoredList.java mr_temp/nlp/com/knowledgebooks/nlp/util/
11+
cp src/nlp/com/knowledgebooks/nlp/util/Tokenizer.java mr_temp/nlp/com/knowledgebooks/nlp/util/
12+
cp src/nlp/com/knowledgebooks/nlp/ExtractNames.java mr_temp/nlp/com/knowledgebooks/nlp/
13+
mkdir -p mr_temp/test_data
14+
cp test_data/propername.ser mr_temp/test_data/
15+
(cd mr_temp; jar xvf ../lib/hadoop-core-1.1.2.jar)
16+
(cd mr_temp; javac nlp/com/knowledgebooks/mapreduce/NameFinder.java)
17+
(cd mr_temp; jar cvf ../namefinder.jar .)

README

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1-
I am currently writing the 4th edition of my "Practical Artificial Intelligence Programming with Java" book.
1+
# Code examples for the 4th edition of "Practical Artificial Intelligence Programming with Java"
22

3-
This git repo currently has the code examples for the 3rd edition and over the next few months they will be updated for the 4th edition.
3+
All code examples can be used either under the LGPL version 3 license or the Apache 2 license.
4+
5+
You can buy a copy of the book (includes PDF, Kindle, and iPad/iPhone formats
6+
at

clojure_examples/README

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,17 @@
1-
# clojure_examples
1+
# clojure_examples for the book "Practical Artificial Intelligence Programming with Java"
22

3-
FIXME: write description
3+
This directory contains Clojure wrappers for some of the Java example programs in the book.
44

5-
## Usage
5+
## Getting started with the Clojure examples
66

7-
FIXME: write
7+
The easiest way to make sure everything is set up to run correctly is to try:
8+
9+
lein test
10+
11+
in order to run the unit tests. The source code for the unit tests show how to call the Clojure wrappers.
812

913
## License
1014

11-
Copyright (C) 2012 FIXME
15+
Copyright (C) 2012 Mark Watson
1216

13-
Distributed under the Eclipse Public License, the same as Clojure.
17+
Distributed under both the LGPL 3.0 and the Apache 2 licenses - pick the license that works best for you..

google_book_ngram_data/README.txt

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# This directory accompanies the Chapter on Data Science
2+
3+
You will need to run the script best_ngrams.rb 5 times, setting the variable $$match$$ to:
4+
5+
1gram
6+
2gram
7+
3gram
8+
4gram
9+
5gram
10+
11+
And Adjusting the value of $$CUTOFF$$.
12+
13+
Also, on the leased Linux server I used, I was putting the best ngram data (best in the sense that I only kept ngrams with a use count greater than $$CUTOFF$$) in my home directory "/home/markw" - you will want to change the target directory for your system.
14+
15+
~~~~~~~~
16+
match = "3gram"
17+
CUTOFF = 500
18+
19+
$words = "====="
20+
$count = 0
21+
22+
$out = File.new("/home/markw/#{match}.txt", 'w')
23+
24+
File.new("ngrams_uris.txt").lines.each do |line|
25+
if line.index("<a href='") && line.index(match)
26+
~~~~~~~~
27+

google_book_ngram_data/best_ngrams.rb

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# new: gunzip files one at a time:
2+
3+
match = "3gram"
4+
CUTOFF = 500
5+
6+
$words = "====="
7+
$count = 0
8+
9+
$out = File.new("/home/markw/#{match}.txt", 'w')
10+
11+
File.new("ngrams_uris.txt").lines.each do |line|
12+
if line.index("<a href='") && line.index(match)
13+
uri = line[9...line.index("'>")]
14+
puts "|#{uri}|"
15+
`wget #{uri}`
16+
sleep 60
17+
18+
Dir.entries(".").each do |fn|
19+
if fn.index(match) && fn.index(".gz")
20+
file_root = fn[0..-4]
21+
puts `gunzip #{fn}`
22+
sleep 20
23+
puts `ls -lh #{file_root}*`
24+
count = 0
25+
File.new(file_root).each_line.each do |line|
26+
count += 1
27+
tokens = line.split("\t")
28+
if tokens[1].size > 0
29+
#puts tokens.join("|")
30+
words = tokens[0].downcase.split.collect do |w|
31+
index = w.index("_")
32+
if w.length<2 || w[0]=="_" || w[0]=="(" || w[0]==")" || w[0]=="." || w[0]=="'"
33+
"^"
34+
elsif index
35+
w[0...index]
36+
else
37+
w
38+
end
39+
end.join(' ')
40+
if $words == words
41+
$count += tokens[2].to_i if !words.index("^")
42+
else
43+
$out.puts "#{$words}\t#{$count}" if $words != "=====" && !$words.index("^") && $count > 20 && !$words.index(",") && !$words.index(".") && !$words.index(";") && !$words.index(":") && !$words.index("!") && $words[0]!="0" && $words[0].to_i==0 if $count > CUTOFF
44+
$words = words
45+
$count = tokens[2].to_i
46+
end
47+
end
48+
end
49+
puts "count=#{count} for #{file_root}"
50+
puts `rm -r -f *#{file_root}*`
51+
end
52+
end
53+
end
54+
end
55+
$out.close

0 commit comments

Comments
 (0)